Re: [Moses-support] The BELU score from MultiEval is much lower than which generated by the Moses mert-moses.pl script

Jonathan Clark Thu, 24 Jan 2013 06:08:39 -0800

Hi Jun, all:

I just released a new version of MultEval (V0.5.1) that does not give the
strange NaN's, but instead prints a warning message telling you that you're
using a single optimizer run so that no value can be calculated and that
any conclusions you draw from these numbers may be unreliable and not
reproducible.

I think Barry has identified the reason for the differences in scores
correctly: Comparing tuning set scores to test set scores. I have
regression tests against Moses' multi-bleu.pl that make sure results come
out the same, when expected. You can see such tests being automatically run
here: https://travis-ci.org/jhclark/multeval/jobs/4355357/#L310. The only
time Moses and multeval might disagree is when smoothing n-gram orders that
have zero matches -- I use Papineni's method instead of the
multi-bleu.plmethod. Such differences are generally unimportant and
rare.

Cheers,
Jon

On Thu, Jan 24, 2013 at 6:01 AM, Rico Sennrich <rico.sennr...@gmx.ch> wrote:

> Barry Haddow <bhaddow@...> writes:
>
> > The NaNs in the MultiEval output are a bit strange. I'm not familiar
> > with this tool, but Moses contains multi-bleu.pl (in scripts/generic)
> > which you can also use to calculate Bleu,
> >
> > cheers - Barry
>
> s_opt is the variance of different optimizer runs. MultEval is intended to
> deal
> with optimizer instability, and is most useful if you run your optimizer
> (e.g.
> MERT) multiple times, and create multiple hypotheses per system.
>
> With only one hypothesis per system, there is no way to calculate optimizer
> variance, and you get NaN for this field.
>
> Rico
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] The BELU score from MultiEval is much lower than which generated by the Moses mert-moses.pl script

Reply via email to