Re: [Moses-support] Fwd: Binarization fails with the Segmentation Fault error

2016-06-28 Thread Sašo Kuntaric
Well, I installed Moses only a few months ago, so it should be the latest
version.

I find it really strange. I have tried everything - binarizing tables
(which finishes with no problems), using the --no-filter-phrase-table
parameter, adding language models for all the factors I have (this one gave
me a segmentation fault) and I always get the same result. Tuning stops
after two runs and all the weights get set to zero with the message

(2) BEST at 2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 => 0 at Tue Jun 28
17:38:43 CEST 2016
None of the weights changed more than 1e-05. Stopping.

The translation models themselves are created with no issues. If I have one
translation table, I can tune them with an unfactored corpus, but as soon
as I use a factored one, everything goes south. If I have two translation
tables, I cannot tune with an unfactored file, since it wants the stated
number of factors.

I would really appreciate if someone has an idea what to do.

Best regards,

Saso

2016-06-27 14:45 GMT+02:00 Rajen Chatterjee :

> Hi, in the past I had similar problem, the weights after 1 iteration of
> tuning were getting to 0. I do not know the cause of this, but if I
> remember when I used another version of Moses (I think Release-3.0) I
> didn't had this problem.
>
> On Sun, Jun 26, 2016 at 1:40 PM, Sašo Kuntaric 
> wrote:
>
>> Hi all again,
>>
>> A little more info, if someone has any ideas as I still haven't been able
>> to figure it out.
>>
>> When I do tuning with models that only have one translation table, it
>> works fine, however with a non-factored tuning corpus. If I use a factored
>> tuning corpus, Moses does one run and sets all weights to zero. If I have
>> two translation tables, Moses doesn't do the tuning as he is missing
>> factors. If I use the factored corpus, I get a similar result as above.
>> Tuning stops after one run and sets all weights to zero. There was a
>> similar error mentioned a few monts back and the solution was to turn of
>> mbr decoding, however I am not using it. I just use the command:
>>
>> ~/mosesdecoder/scripts/training/mert-moses.pl
>> ~/working/IT_corpus/TMX/txt/tuning_corpus/tuning_corpus.tagged.en
>> ~/working/IT_corpus/TMX/txt/tuning_corpus/tuning_corpus.tagged.sl
>> ~/mosesdecoder/bin/moses
>> ~/working/IT_corpus/TMX/txt/factored_corpus/complex/model/moses.ini
>> --mertdir ~/mosesdecoder/bin/ --decoder-flags="-threads 32"
>>
>> Is there something I am missing? Do I have to add anything else for
>> tuning a factored model?
>>
>> Any help will be greatly appreciated.
>>
>> Best regards,
>>
>> Saso
>>
>> -- Forwarded message --
>> From: Sašo Kuntaric 
>> Date: 2016-06-20 19:36 GMT+02:00
>> Subject: Binarization fails with the Segmentation Fault error
>> To: moses-support 
>>
>>
>> Hi all,
>>
>> Me again (last time I hope). I have successfully trained and tuned my
>> factored model. Here are both moses.ini files:
>>
>> #
>> ### MOSES CONFIG FILE ###
>> #
>>
>> # input factors
>> [input-factors]
>> 0
>> 1
>>
>> # mapping steps
>> [mapping]
>> 0 T 0
>> 0 G 0
>> 0 T 1
>>
>> [distortion-limit]
>> 6
>>
>> # feature functions
>> [feature]
>> UnknownWordPenalty
>> WordPenalty
>> PhrasePenalty
>> PhraseDictionaryMemory name=TranslationModel0 num-features=4
>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/morphgen/model/phrase-table.0-1.gz
>> input-factor=0 output-factor=1
>> PhraseDictionaryMemory name=TranslationModel1 num-features=4
>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/morphgen/model/phrase-table.1-2.gz
>> input-factor=1 output-factor=2
>> Generation name=GenerationModel0 num-features=2
>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/morphgen/model/generation.1-0,3.gz
>> input-factor=1 output-factor=0,3
>> Distortion
>> KENLM name=LM0 factor=0
>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/language_model/
>> IT_corpus_surface.blm.sl order=3
>> KENLM name=LM1 factor=2
>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/language_model/
>> IT_corpus_parts.blm.sl order=3
>>
>> # dense weights for feature functions
>> [weight]
>> # The default weights are NOT optimized for translation quality. You MUST
>> tune the weights.
>> # Documentation for tuning is here:
>> http://www.statmt.org/moses/?n=FactoredTraining.Tuning
>> UnknownWordPenalty0= 1
>> WordPenalty0= -1
>> PhrasePenalty0= 0.2
>> TranslationModel0= 0.2 0.2 0.2 0.2
>> TranslationModel1= 0.2 0.2 0.2 0.2
>> GenerationModel0= 0.3 0
>> Distortion0= 0.3
>> LM0= 0.5
>> LM1= 0.5
>>
>> # MERT optimized configuration
>> # decoder /home/ksaso/mosesdecoder/bin/moses
>> # BLEU 0 on dev
>> /home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/tuning/tuning-corpus.tagged.en
>> # We were before running iteration 2
>> # finished Mon Jun 20 16:19:08 CEST 2016
>> ### MOSES CONFIG FILE ###
>> #
>>
>> # input factors
>> [input-factors]
>> 0
>> 1
>>
>> # mapping steps
>> [mapping]
>> 0

Re: [Moses-support] Language model interpolation without SRILM

2016-06-28 Thread Kenneth Heafield
Oh also, use a small -S argument to the interpolate program because it
doesn't quite budget memory properly yet.

On 06/28/2016 05:08 PM, Kenneth Heafield wrote:
> Log-linear interpolation is in KenLM in the lm/interpolate directory.
> You'll want to get KenLM from github.com/kpu/kenlm and compile with Eigen.
> 
> Tuning log-linear weights is super slow, but applying them is reasonably
> fast.  In total the tuning + applying weights time is comparable to SRILM.
> 
> https://kheafield.com/professional/edinburgh/interpolate_paper.pdf
> 
> Kenneth
> 
> On 06/28/2016 03:27 PM, Philipp Koehn wrote:
>> Hi,
>>
>> unfortunately, the interpolation of language models requires two pieces
>> of code that only exist in SRILM: The EM training method to find weights
>> for the language models, and the linear interpolation of the language
>> models.
>>
>> Maybe Ken and Lane can weigh in, if/when a replacement in KENLM will be
>> available.
>>
>> -phi
>>
>> On Tue, Jun 28, 2016 at 10:10 AM, Mathias Müller > > wrote:
>>
>> Hi all
>>
>> I have trained several language models and would like to combine
>> them with interpolate-lm.perl:
>>
>> 
>> https://github.com/moses-smt/mosesdecoder/blob/master/scripts/ems/support/interpolate-lm.perl
>>
>> As the language model tool, I always use KenLM, but looking at the
>> code of interpolate-lm.perl, it seems that the use of SRILM is
>> hard-coded in the script. I would like to avoid SRILM because, if I
>> understand correctly, its license does not permit use in commercial
>> products.
>>
>> My question is:
>>
>> Can I simply replace the call to SRILM with KenLM in my copy of
>> interpolate-lm.perl? Does KenLM have the functionality necessary for
>> language model combination, e.g. a substitute for SRILM's
>> "compute-best-mix"?
>>
>> Thanks for your help.
>> Mathias
>>
>> —
>>
>> Mathias Müller
>> AND-2-20
>> Institute of Computational Linguistics
>> University of Zurich
>> +41 44 635 75 81 
>> mathias.muel...@uzh.ch 
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu 
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Language model interpolation without SRILM

2016-06-28 Thread Kenneth Heafield
Log-linear interpolation is in KenLM in the lm/interpolate directory.
You'll want to get KenLM from github.com/kpu/kenlm and compile with Eigen.

Tuning log-linear weights is super slow, but applying them is reasonably
fast.  In total the tuning + applying weights time is comparable to SRILM.

https://kheafield.com/professional/edinburgh/interpolate_paper.pdf

Kenneth

On 06/28/2016 03:27 PM, Philipp Koehn wrote:
> Hi,
> 
> unfortunately, the interpolation of language models requires two pieces
> of code that only exist in SRILM: The EM training method to find weights
> for the language models, and the linear interpolation of the language
> models.
> 
> Maybe Ken and Lane can weigh in, if/when a replacement in KENLM will be
> available.
> 
> -phi
> 
> On Tue, Jun 28, 2016 at 10:10 AM, Mathias Müller  > wrote:
> 
> Hi all
> 
> I have trained several language models and would like to combine
> them with interpolate-lm.perl:
> 
> 
> https://github.com/moses-smt/mosesdecoder/blob/master/scripts/ems/support/interpolate-lm.perl
> 
> As the language model tool, I always use KenLM, but looking at the
> code of interpolate-lm.perl, it seems that the use of SRILM is
> hard-coded in the script. I would like to avoid SRILM because, if I
> understand correctly, its license does not permit use in commercial
> products.
> 
> My question is:
> 
> Can I simply replace the call to SRILM with KenLM in my copy of
> interpolate-lm.perl? Does KenLM have the functionality necessary for
> language model combination, e.g. a substitute for SRILM's
> "compute-best-mix"?
> 
> Thanks for your help.
> Mathias
> 
> —
> 
> Mathias Müller
> AND-2-20
> Institute of Computational Linguistics
> University of Zurich
> +41 44 635 75 81 
> mathias.muel...@uzh.ch 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu 
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Language model interpolation without SRILM

2016-06-28 Thread Philipp Koehn
Hi,

unfortunately, the interpolation of language models requires two pieces of
code that only exist in SRILM: The EM training method to find weights for
the language models, and the linear interpolation of the language models.

Maybe Ken and Lane can weigh in, if/when a replacement in KENLM will be
available.

-phi

On Tue, Jun 28, 2016 at 10:10 AM, Mathias Müller 
wrote:

> Hi all
>
> I have trained several language models and would like to combine them with
> interpolate-lm.perl:
>
>
> https://github.com/moses-smt/mosesdecoder/blob/master/scripts/ems/support/interpolate-lm.perl
>
> As the language model tool, I always use KenLM, but looking at the code of
> interpolate-lm.perl, it seems that the use of SRILM is hard-coded in the
> script. I would like to avoid SRILM because, if I understand correctly, its
> license does not permit use in commercial products.
>
> My question is:
>
> Can I simply replace the call to SRILM with KenLM in my copy of
> interpolate-lm.perl? Does KenLM have the functionality necessary for
> language model combination, e.g. a substitute for SRILM's
> "compute-best-mix"?
>
> Thanks for your help.
> Mathias
>
> —
>
> Mathias Müller
> AND-2-20
> Institute of Computational Linguistics
> University of Zurich
> +41 44 635 75 81
> mathias.muel...@uzh.ch
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Language model interpolation without SRILM

2016-06-28 Thread Mathias Müller
Hi all

I have trained several language models and would like to combine them with
interpolate-lm.perl:

https://github.com/moses-smt/mosesdecoder/blob/master/scripts/ems/support/interpolate-lm.perl

As the language model tool, I always use KenLM, but looking at the code of
interpolate-lm.perl, it seems that the use of SRILM is hard-coded in the
script. I would like to avoid SRILM because, if I understand correctly, its
license does not permit use in commercial products.

My question is:

Can I simply replace the call to SRILM with KenLM in my copy of
interpolate-lm.perl? Does KenLM have the functionality necessary for
language model combination, e.g. a substitute for SRILM's "compute-best-mix
"?

Thanks for your help.
Mathias

—

Mathias Müller
AND-2-20
Institute of Computational Linguistics
University of Zurich
+41 44 635 75 81
mathias.muel...@uzh.ch
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] mosesserver and placeholder compatibility with tree-based models

2016-06-28 Thread Vito Mandorino
I have managed to track more precisely where the segfault occurs but I did
not understand why yet. It happens at some point during the for loop in the
function

void ChartHypothesis::GetOutputPhrase(Phrase &outPhrase) const

in the file moses/ChartHypothesis.cpp

Strangely, if the decoder is called in normal (not mosesserver) mode no
such error occurs.

If you have the chance could you please try to have a look at it?

Thanks,
Vito


2016-06-24 17:16 GMT+02:00 Vito Mandorino :

> Thank you, I changed [inputtype] in the moses.ini from 3 --> 0 and now the
> command
>
> echo 'the' | bin/moses -f moses.ini
> -xml-input inclusive -placeholder-factor 1
>
> works fine but in server mode
>
> bin/mosesserver -f moses.ini -xml-input inclusive -placeholder-factor 1
>
> still yields a segfault at a later stage of the process, namely after
> ending the ChartManager::Decode function from the file
> moses/ChartManager.cpp
>
> Best regards,
> Vito
>
> 2016-06-24 12:35 GMT+02:00 Hieu Hoang :
>
>> it doesn't need to be tree input, just normal strings is fine for hiero
>> models. Change [inputtype] in the moses.ini file from 3 -> 0.
>> if that doesn't work, let me know and i'll debug it
>>
>> On 24/06/2016 11:31, Vito Mandorino wrote:
>>
>> Hi,
>>
>> I am trying to debug by running the decoder with high verbosity.
>> I have found that the segfault happens in the function
>> TreeInput::ProcessAndStripXMLTags in the file TreeInput.cpp, and more
>> precisely at the line
>>
>> Phrase sourcePhrase = this->GetSubString(Range(startPos,endPos-1));
>>
>> The phrase-based decoder uses instead the ProcessAndStripXMLTags function
>> from the XmlOption.cpp. Everything works fine of course in this function. I
>> also noticed that the line
>>
>> string entity = ParseXmlTagAttribute(tagContent,"entity");
>>
>> in XmlOption.cpp does not have an analogue in TreeInput.cpp .
>>
>> Do you think that a similar line should be added also in the
>> TreeInput.cpp in order to parse placeholder xml tags?
>>
>> Thank you and best regards,
>>
>> Vito
>>
>>
>>
>>
>>
>> 2016-06-23 16:34 GMT+02:00 Hieu Hoang :
>>
>>> it's very easy, i had thought i done it already but maybe it's dropped
>>> out.
>>>
>>> The placeholder feature just tacks the translation on at the very end,
>>> just before outputting the translation.
>>>
>>> Run it in a debugger to see where it segfault and change that part of
>>> the code
>>>
>>> Hieu Hoang
>>> http://www.hoang.co.uk/hieu
>>>
>>> On 23 June 2016 at 14:29, Vito Mandorino <
>>> vito.mandor...@linguacustodia.com>
>>> wrote:
>>>
 Dear all,

 to add up to the previous messages, I have tried the current version of
 Moses from Github and found that hierarchical models do work fine with
 mosesserver now. Placeholders are still an issue though: the commands

 /bin/mosesserver -f moses.ini -xml-input inclusive -placeholder-factor 1
 /bin/moses -f moses.ini -xml-input inclusive -placeholder-factor 1

 both yield a Segmentation fault upon translation of a segment with
 placeholder xml tags in it.

 Similar variations on the commands, such as changing the index of the
 placeholder-factor or the 'inclusive' xml option into 'exclusive', yield
 similar problems (even though -placeholder-factor 3 or -placeholder-factor
 4 result in different error messages).

 Do you know whether and how this could be fixed? If this is a fix which
 does not require a deep understanding of Moses code I wouldn't mind to do
 it myself.

 Thank you and best regards,

 Vito


 2016-06-03 11:43 GMT+02:00 Vito Mandorino <
 vito.mandor...@linguacustodia.com>:

> Hi Hieu,
>
> There is also the mosesserver issue so I cannot say that's the only
> difficulty from using tree-based models. If you can have a look it would 
> be
> great anyway. Speed is also an issue even though much less for 
> hierarchical
> than syntactic models in our tests.
>
> Thank you and best regards,
> Vito
>
> 2016-06-02 14:44 GMT+02:00 Hieu Hoang < 
> hieuho...@gmail.com>:
>
>> I believe placeholders can be used but no-one has really tried. If
>> thats the only thing holding you back from using hiero/syntax models, I 
>> can
>> look into it for you.
>>
>> However, there's likely to be other issues, such as speed and memory
>> consumption which may make these models unsuitable for commercial use.
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 2 June 2016 at 11:48, Vito Mandorino <
>> vito.mandor...@linguacustodia.com>
>> wrote:
>>
>>> Dear all,
>>>
>>> I am exploring hierarchical and syntactic models and I have two
>>> questions:
>>>
>>> 1. Is it possible to decode using mosesserver instead of moses_chart
>>> ? According to  " /mosesdecoder/bin/mosesserver --help "  one can choose
>>> different

[Moses-support] EMS: pruning question

2016-06-28 Thread Tomasz Gawryl
Hi!

 

I have one question about pruning translation table during EMS training.
What method is better SALM or based on low scores (described here:
http://www.statmt.org/moses/?n=Advanced.RuleTables#ntoc5) ? 

SALM filtering takes relatively more time than pruning while LM creation.
But I'm not sure what to do:

 

1.   Keep both,

2.   Choose  one of them (which one and why? :) )

3.   Don't prune (why?)

 

Thank you in advance for any suggestions :)

 

Regards,

Thomas

 

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support