Re: [Moses-support] BLEU Score Variance: Which score to use?

Marcin Junczys-Dowmunt Sun, 21 Jun 2015 23:24:07 -0700

Hm. That's interesting. The language should not matter.

1) Do not report results without tuning. They are meaningless. There is 
a whole thread on that, look for "Major bug found in Moses". If you 
ignore the trollish aspects it contains may good descriptions why this 
is a mistake.


2) Assuming it was the same data every time (was it?), without tuning 
however I do not quite see where the variance is coming from. This 
rather suggests you have something weird in your pipeline. Mgiza is the 
only stochastic element there, but usually its results are quite 
consistent. For the same weights in your ini-file you should have very 
similar results. Tuning would be the part that introduces instability, 
but even then these differences would be a little on the extreme end, 
though possible.

On 22.06.2015 08:12, Hokage Sama wrote:
> Thanks Marcin. Its for a new resource-poor language so I only trained 
> it with what I could collect so far (i.e. only 190,630 words of 
> parallel data). I retrained the entire system each time without any 
> tuning.
>
> On 22 June 2015 at 01:00, Marcin Junczys-Dowmunt <junc...@amu.edu.pl 
> <mailto:junc...@amu.edu.pl>> wrote:
>
>     Hi,
>     I think the average is OK, your variance is however quite high.
>     Did you
>     retrain the entire system or just optimize parameters a couple of
>     times?
>
>     Two useful papers on the topic:
>
>     https://www.cs.cmu.edu/~jhclark/pubs/significance.pdf
>     <https://www.cs.cmu.edu/%7Ejhclark/pubs/significance.pdf>
>     http://www.mt-archive.info/MTS-2011-Cettolo.pdf
>
>
>     On 22.06.2015 02 <tel:22.06.2015%2002>:37, Hokage Sama wrote:
>     > Hi,
>     >
>     > Since MT training is non-convex and thus the BLEU score varies,
>     which
>     > score should I use for my system? I trained my system three times
>     > using the same data and obtained the three different scores below.
>     > Should I take the average or the best score?
>     >
>     > BLEU = 17.84, 49.1/22.0/12.5/7.5 (BP=1.000, ratio=1.095,
>     hyp_len=3952,
>     > ref_len=3609)
>     > BLEU = 16.51, 48.4/20.7/11.4/6.5 (BP=1.000, ratio=1.093,
>     hyp_len=3945,
>     > ref_len=3609)
>     > BLEU = 15.33, 48.2/20.1/10.3/5.5 (BP=1.000, ratio=1.087,
>     hyp_len=3924,
>     > ref_len=3609)
>     >
>     > Thanks,
>     > Hilton
>     >
>     >
>     > _______________________________________________
>     > Moses-support mailing list
>     > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>     > http://mailman.mit.edu/mailman/listinfo/moses-support
>
>     _______________________________________________
>     Moses-support mailing list
>     Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>     http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] BLEU Score Variance: Which score to use?

Reply via email to