Hi Rico,

Thanks for the links. Accordingly, I tried decreasing the learning rate to
0.25 and starting seeing numbers instead of nan in the log-likelihood.
vocabulary files are not needed using train_nplm.py.

I restarted tuning and 'nan' scores for bilingual lm disappeared as well in
the N-best lists. I'll post the new scores on the German-English baseline.

Ergun

On Mon, Apr 15, 2019 at 3:43 PM Rico Sennrich <rico.sennr...@gmx.ch> wrote:

> Hello Ergun,
>
> we've had the 'nan' issue reported before ( see
>
> https://moses-support.mit.narkive.com/hs8LwsnT/blingual-neural-lm-log-likelihood-nan
> https://moses-support.mit.narkive.com/fklzlBiW/bilingual-lm-nan-nan-nan ).
>
> You can follow Nick's recommendation of lowering the learning rate, or try
> to enable gradient clipping (which is commented out in the code).
>
> I'm afraid nlpm is no longer heavily used, so it's unlikely that somebody
> has fresh experience.
>
> best wishes,
> Rico
>
> On 15/04/2019 12:44, Ergun Bicici wrote:
>
>
> I found that training also produced 'nan' scores:
> Training NCE log-likelihood: nan.
>
> I used EMS training:
> [LM:comb]
> nplm-dir = "Programs/nplm/"
> order = 5
> source-window = 4
> bilingual-lm = yes
> bilingual-lm-settings = "--prune-source-vocab 100000 --prune-target-vocab
> 100000"
>
> I am re-running train_nplm.py.
>
> Ergun
>
> On Mon, Apr 15, 2019 at 2:26 PM Ergun Bicici <bic...@gmail.com> wrote:
>
>>
>> Dear moses-support,
>>
>> I tried the nplm model on the German-English baseline dataset ( wget
>> http://www.statmt.org/wmt13/training-parallel-nc-v8.tgz) and it improved
>> the scores from 0.2266 to 0.2317 BLEU.
>>
>> I tried the bilingual LM:
>>
>> http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc37
>> However:
>> - vocab files were not written in the end and I used extract_training.py
>> to obtain them.
>> - I still obtained 'nan' scores from the bilingual lm model.
>> Error: "Not a label, not a score 'nan'. Failed to parse the scores string:
>> 0 ||| ... айта ... болатын .  ||| LexicalReordering0= -11.3723 -15.4848
>> -26.5152 -17.8301 -6.95664 -16.8553 -29.4425 -22.5538 OpSequenceModel0=
>> -403.825 99 22 45 5 Distortion0= -146 LM0= -685.828 BLMcomb= nan
>> WordPenalty0= -76 PhrasePenalty0= 53 TranslationModel0= -242.874 -179.189
>> -291.623 -342.085 ||| nan
>>
>> KENLM name=LM0 factor=0 path=en-kk/lm.corpus.tok.kk.6.blm.bin order=6
>> BilingualNPLM name=BLMcomb order=5 source_window=4
>> path=wmt19_en-kk/lm/comb.blm.2/train.10
>> source_vocab=wmt19_en-kk/lm/comb.blm.2/vocab.source
>> target_vocab=wmt19_en-kk/lm/comb.blm.2/vocab.target
>>
>> Therefore, this may be due to some bug in moses C++ code and not the
>> input data / configuration.
>>
>> The documentation appears also not in sync about "average the <null>
>> word embedding as per the instructions here
>> <http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#anchorNULL>."
>> part since averageNullEmbedding.py asks for -i, -o, and -t.
>>
>> I found some related note in a paper by Barry Haddow at WMT'15 saying
>> that the model is not used in the final submission due to insignificant
>> differences.
>>
>> Do you have any recent results on the bilingual LM model?
>>
>> --
>>
>> Regards,
>> Ergun
>>
>>
>>
>
> --
>
> Regards,
> Ergun
>
>
>
> _______________________________________________
> Moses-support mailing 
> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


-- 

Regards,
Ergun
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to