Re: [Moses-support] Fwd: Binarization fails with the Segmentation Fault error

Sašo Kuntaric Thu, 30 Jun 2016 13:49:11 -0700

Hi all,

I would like to ask one more question. When you say that my reference only
has the surface form, are you talking about the "tuning corpus", which in
the case of my command


~/mosesdecoder/scripts/training/mert-moses.pl
~/working/IT_corpus/TMX/txt/factored_corpus/singles/tuning_corpus.tagged.clean.en
~/working/IT_corpus/TMX/txt/factored_corpus/singles/
tuning_corpus.tagged.clean.sl ~/mosesdecoder/bin/moses
~/working/IT_corpus/TMX/txt/factored_corpus/singles/test/model/moses.ini
--mertdir ~/mosesdecoder/bin/ --decoder-flags="-threads all"

are tuning_corpus.tagged.clean.en and tuning_corpus.tagged.clean.sl? Can
tuning be done with files that only contains surface forms? Will the
results be compatible with tuning done with a factored tuning corpus?

Models with one translation table work fine with corpora with only surface
forms, while models with 2 tables do not. Is that expected behavior?

I checked all my files and everything seems fine ... phrase table and
language model files look OK, there is almost 400 GB of free space, my
tuning set contains aligned source and target files.

The only strange thing that I could find in the tuning folder was line
UnknownWordPenalty0 UNTUNEABLE in the features.list file ... everything
else has values, although they can be zero.

Best regards and thanks again for all the help,

Saso


2016-06-30 10:00 GMT+02:00 Hieu Hoang <hieuho...@gmail.com>:

>
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 30 June 2016 at 08:11, Sašo Kuntaric <saso.kunta...@gmail.com> wrote:
>
>> Hi Hieu,
>>
>> Thanks for the tip, unfortunately it didn't solve my problem. I tried
>> creating a very simple model with the command:
>>
>> ~/mosesdecoder/scripts/training/train-model.perl -root-dir test -corpus
>> ~/working/IT_corpus/TMX/txt/factored_corpus/singles/corpus.tagged.clean -f
>> en -e sl -lm
>> 0:3:$HOME/working/IT_corpus/TMX/txt/factored_corpus/language_model/
>> IT_corpus_surface.blm.sl -lm
>> 2:3:$HOME/working/IT_corpus/TMX/txt/factored_corpus/language_model/
>> IT_corpus_parts.blm.sl --translation-factors 0-0,2 -external-bin-dir
>> ~/mosesdecoder/tools --cores 32,
>>
>> however the results of the tuning are still the same ... all zeros after
>> the second run.
>>
>> Do I have to use a factored or unfactored corpus for tuning?
>>
>> There was one suggestion I found online, namely to add something like
>>
>> [output-factors]
>> 0
>> 1
>> 2
>>
>> to moses.ini. I tried it, but it made no difference. Should I explore it 
>> further?
>>
>> no, this will output all the other factors, as well as the surface form.
> I'm sure your reference only has the surface form
>
> Are you sure your phrase-table and language models contains data? And your
> tuning set contains data for the input and reference? There's plenty of
> space on your hard disk?
>
> I would suggest you look at the files the tuning process creates and debug
> it. It's likely to be a data problem.
>
>
>> If anyone has another suggestion please let know.
>>
>> Best regards,
>>
>> Sašo
>>
>>
>> 2016-06-29 15:44 GMT+02:00 Hieu Hoang <hieuho...@gmail.com>:
>>
>>> I don't know the exact problem but your factored model looks too
>>> complicated so the tuning algorithm kinda just gives up.
>>> i would try a very simple model 1st, eg.
>>>    translate 0 -> 0,1,2,3
>>> or
>>>    translate 0,1 -> 0,1,2,3
>>> Once you see that working correctly, add a generation model.
>>>
>>> You have to do this bit-by-bit and see what happens
>>>
>>>
>>> On 28/06/2016 20:44, Sašo Kuntaric wrote:
>>>
>>> Well, I installed Moses only a few months ago, so it should be the
>>> latest version.
>>>
>>> I find it really strange. I have tried everything - binarizing tables
>>> (which finishes with no problems), using the --no-filter-phrase-table
>>> parameter, adding language models for all the factors I have (this one gave
>>> me a segmentation fault) and I always get the same result. Tuning stops
>>> after two runs and all the weights get set to zero with the message
>>>
>>> (2) BEST at 2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 => 0 at Tue Jun 28
>>> 17:38:43 CEST 2016
>>> None of the weights changed more than 1e-05. Stopping.
>>>
>>> The translation models themselves are created with no issues. If I have
>>> one translation table, I can tune them with an unfactored corpus, but as
>>> soon as I use a factored one, everything goes south. If I have two
>>> translation tables, I cannot tune with an unfactored file, since it wants
>>> the stated number of factors.
>>>
>>> I would really appreciate if someone has an idea what to do.
>>>
>>> Best regards,
>>>
>>> Saso
>>>
>>> 2016-06-27 14:45 GMT+02:00 Rajen Chatterjee <
>>> rajen.k.chatter...@gmail.com>:
>>>
>>>> Hi, in the past I had similar problem, the weights after 1 iteration of
>>>> tuning were getting to 0. I do not know the cause of this, but if I
>>>> remember when I used another version of Moses (I think Release-3.0) I
>>>> didn't had this problem.
>>>>
>>>> On Sun, Jun 26, 2016 at 1:40 PM, Sašo Kuntaric <
>>>> <saso.kunta...@gmail.com>saso.kunta...@gmail.com> wrote:
>>>>
>>>>> Hi all again,
>>>>>
>>>>> A little more info, if someone has any ideas as I still haven't been
>>>>> able to figure it out.
>>>>>
>>>>> When I do tuning with models that only have one translation table, it
>>>>> works fine, however with a non-factored tuning corpus. If I use a factored
>>>>> tuning corpus, Moses does one run and sets all weights to zero. If I have
>>>>> two translation tables, Moses doesn't do the tuning as he is missing
>>>>> factors. If I use the factored corpus, I get a similar result as above.
>>>>> Tuning stops after one run and sets all weights to zero. There was a
>>>>> similar error mentioned a few monts back and the solution was to turn of
>>>>> mbr decoding, however I am not using it. I just use the command:
>>>>>
>>>>> ~/mosesdecoder/scripts/training/mert-moses.pl
>>>>> ~/working/IT_corpus/TMX/txt/tuning_corpus/tuning_corpus.tagged.en
>>>>> ~/working/IT_corpus/TMX/txt/tuning_corpus/tuning_corpus.tagged.sl
>>>>> ~/mosesdecoder/bin/moses
>>>>> ~/working/IT_corpus/TMX/txt/factored_corpus/complex/model/moses.ini
>>>>> --mertdir ~/mosesdecoder/bin/ --decoder-flags="-threads 32"
>>>>>
>>>>> Is there something I am missing? Do I have to add anything else for
>>>>> tuning a factored model?
>>>>>
>>>>> Any help will be greatly appreciated.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Saso
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Sašo Kuntaric < <saso.kunta...@gmail.com>saso.kunta...@gmail.com
>>>>> >
>>>>> Date: 2016-06-20 19:36 GMT+02:00
>>>>> Subject: Binarization fails with the Segmentation Fault error
>>>>> To: moses-support < <moses-support@mit.edu>moses-support@mit.edu>
>>>>>
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Me again (last time I hope). I have successfully trained and tuned my
>>>>> factored model. Here are both moses.ini files:
>>>>>
>>>>> #########################
>>>>> ### MOSES CONFIG FILE ###
>>>>> #########################
>>>>>
>>>>> # input factors
>>>>> [input-factors]
>>>>> 0
>>>>> 1
>>>>>
>>>>> # mapping steps
>>>>> [mapping]
>>>>> 0 T 0
>>>>> 0 G 0
>>>>> 0 T 1
>>>>>
>>>>> [distortion-limit]
>>>>> 6
>>>>>
>>>>> # feature functions
>>>>> [feature]
>>>>> UnknownWordPenalty
>>>>> WordPenalty
>>>>> PhrasePenalty
>>>>> PhraseDictionaryMemory name=TranslationModel0 num-features=4
>>>>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/morphgen/model/phrase-table.0-1.gz
>>>>> input-factor=0 output-factor=1
>>>>> PhraseDictionaryMemory name=TranslationModel1 num-features=4
>>>>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/morphgen/model/phrase-table.1-2.gz
>>>>> input-factor=1 output-factor=2
>>>>> Generation name=GenerationModel0 num-features=2
>>>>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/morphgen/model/generation.1-0,3.gz
>>>>> input-factor=1 output-factor=0,3
>>>>> Distortion
>>>>> KENLM name=LM0 factor=0
>>>>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/language_model/
>>>>> IT_corpus_surface.blm.sl order=3
>>>>> KENLM name=LM1 factor=2
>>>>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/language_model/
>>>>> IT_corpus_parts.blm.sl order=3
>>>>>
>>>>> # dense weights for feature functions
>>>>> [weight]
>>>>> # The default weights are NOT optimized for translation quality. You
>>>>> MUST tune the weights.
>>>>> # Documentation for tuning is here:
>>>>> <http://www.statmt.org/moses/?n=FactoredTraining.Tuning>
>>>>> http://www.statmt.org/moses/?n=FactoredTraining.Tuning
>>>>> UnknownWordPenalty0= 1
>>>>> WordPenalty0= -1
>>>>> PhrasePenalty0= 0.2
>>>>> TranslationModel0= 0.2 0.2 0.2 0.2
>>>>> TranslationModel1= 0.2 0.2 0.2 0.2
>>>>> GenerationModel0= 0.3 0
>>>>> Distortion0= 0.3
>>>>> LM0= 0.5
>>>>> LM1= 0.5
>>>>>
>>>>> # MERT optimized configuration
>>>>> # decoder /home/ksaso/mosesdecoder/bin/moses
>>>>> # BLEU 0 on dev
>>>>> /home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/tuning/tuning-corpus.tagged.en
>>>>> # We were before running iteration 2
>>>>> # finished Mon Jun 20 16:19:08 CEST 2016
>>>>> ### MOSES CONFIG FILE ###
>>>>> #########################
>>>>>
>>>>> # input factors
>>>>> [input-factors]
>>>>> 0
>>>>> 1
>>>>>
>>>>> # mapping steps
>>>>> [mapping]
>>>>> 0 T 0
>>>>> 0 G 0
>>>>> 0 T 1
>>>>>
>>>>> [distortion-limit]
>>>>> 6
>>>>>
>>>>> # feature functions
>>>>> [feature]
>>>>> UnknownWordPenalty
>>>>> WordPenalty
>>>>> PhrasePenalty
>>>>> PhraseDictionaryMemory name=TranslationModel0 num-features=4
>>>>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/morphgen/model/phrase-table.0-1.gz
>>>>> input-factor=0 output-factor=1
>>>>> PhraseDictionaryMemory name=TranslationModel1 num-features=4
>>>>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/morphgen/model/phrase-table.1-2.gz
>>>>> input-factor=1 output-factor=2
>>>>> Generation name=GenerationModel0 num-features=2
>>>>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/morphgen/model/generation.1-0,3.gz
>>>>> input-factor=1 output-factor=0,3
>>>>> Distortion
>>>>> KENLM name=LM0 factor=0
>>>>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/language_model/
>>>>> IT_corpus_surface.blm.sl order=3
>>>>> KENLM name=LM1 factor=2
>>>>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/language_model/
>>>>> IT_corpus_parts.blm.sl order=3
>>>>>
>>>>> # dense weights for feature functions
>>>>>
>>>>> [threads]
>>>>> 16
>>>>> [weight]
>>>>>
>>>>> Distortion0= 0
>>>>> LM0= 0
>>>>> LM1= 0
>>>>> WordPenalty0= 0
>>>>> PhrasePenalty0= 0
>>>>> TranslationModel0= 0 0 0 0
>>>>> TranslationModel1= 0 0 0 0
>>>>> GenerationModel0= 0 0
>>>>> UnknownWordPenalty0= 1
>>>>>
>>>>> First of all, is it strange that I get all zeroes after tuning?
>>>>>
>>>>> My problem is that the translation with this model is spectacularly
>>>>> slow (a few days to translate a couple of thousand words with a 2,4 
>>>>> million
>>>>> line corpus), so naturally I tried to binarize my phrase tables with the
>>>>> command
>>>>>
>>>>> ~/mosesdecoder/bin/processPhraseTableMin -in
>>>>> ~/working/IT_corpus/TMX/txt/factored_corpus/morphgen/model/phrase-table.0-1.gz
>>>>> -nscores 4 -out ~/working/binarised_model/phrase-table.0-1 and
>>>>> ~/mosesdecoder/bin/processPhraseTableMin -in
>>>>> ~/working/IT_corpus/TMX/txt/factored_corpus/morphgen/model/phrase-table.1-2.gz
>>>>> -nscores 4 -out ~/working/binarised_model/phrase-table.1-2
>>>>>
>>>>> The process itself finishes without errors and I can run the
>>>>> translation with the command
>>>>>
>>>>> ~/mosesdecoder/bin/moses -f
>>>>> /home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/morphgen/binarised_model/moses.ini
>>>>>
>>>>> But when I try to enter my text, I get the following:
>>>>>
>>>>>  Translating: use|NN of|IN light|JJ
>>>>> Line 1: Initialize search took 0.000 seconds total
>>>>> Segmentation fault (core dumped)
>>>>>
>>>>> When I try to filter my model, I get the same error. Any ideas what
>>>>> could be causing this?
>>>>>
>>>>> My final moses.ini file looks like this:
>>>>>
>>>>> # MERT optimized configuration
>>>>> # decoder /home/ksaso/mosesdecoder/bin/moses
>>>>> # BLEU 0 on dev
>>>>> /home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/tuning/tuning-corpus.tagged.en
>>>>> # We were before running iteration 2
>>>>> # finished Mon Jun 20 16:19:08 CEST 2016
>>>>> ### MOSES CONFIG FILE ###
>>>>> #########################
>>>>>
>>>>> # input factors
>>>>> [input-factors]
>>>>> 0
>>>>> 1
>>>>>
>>>>> # mapping steps
>>>>> [mapping]
>>>>> 0 T 0
>>>>> 0 G 0
>>>>> 0 T 1
>>>>>
>>>>> [distortion-limit]
>>>>> 6
>>>>>
>>>>> # feature functions
>>>>> [feature]
>>>>> UnknownWordPenalty
>>>>> WordPenalty
>>>>> PhrasePenalty
>>>>> PhraseDictionaryCompact name=TranslationModel0 num-features=4
>>>>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/morphgen/binarised_model/phrase-table.0-1.minphr
>>>>> input-factor=0 output-factor=1
>>>>> PhraseDictionaryCompact name=TranslationModel1 num-features=4
>>>>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/morphgen/binarised_model/phrase-table.1-2.minphr
>>>>> input-factor=1 output-factor=2
>>>>> Generation name=GenerationModel0 num-features=2
>>>>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/morphgen/model/generation.1-0,3.gz
>>>>> input-factor=1 output-factor=0,3
>>>>> Distortion
>>>>> KENLM name=LM0 factor=0
>>>>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/language_model/
>>>>> IT_corpus_surface.blm.sl order=3
>>>>> KENLM name=LM1 factor=2
>>>>> path=/home/ksaso/working/IT_corpus/TMX/txt/factored_corpus/language_model/
>>>>> IT_corpus_parts.blm.sl order=3
>>>>>
>>>>> # dense weights for feature functions
>>>>>
>>>>> [threads]
>>>>> 16
>>>>> [weight]
>>>>>
>>>>> Distortion0= 0
>>>>> LM0= 0
>>>>> LM1= 0
>>>>> WordPenalty0= 0
>>>>> PhrasePenalty0= 0
>>>>> TranslationModel0= 0 0 0 0
>>>>> TranslationModel1= 0 0 0 0
>>>>> GenerationModel0= 0 0
>>>>> UnknownWordPenalty0= 1
>>>>>
>>>>> And one more question ... can I run a translation (with the
>>>>> ~/mosesdecoder/bin/moses command) multi-threaded?
>>>>>
>>>>> Thanks for all the help and best regards,
>>>>>
>>>>> Saso
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> lp,
>>>>>
>>>>> Sašo
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> Moses-support@mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> -Regards,
>>>>  Rajen Chatterjee.
>>>>
>>>
>>>
>>>
>>> --
>>> lp,
>>>
>>> Sašo
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing 
>>> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>
>>
>> --
>> lp,
>>
>> Sašo
>>
>
>


-- 
lp,

Sašo

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Fwd: Binarization fails with the Segmentation Fault error

Reply via email to