Hi Jun, all: I just released a new version of MultEval (V0.5.1) that does not give the strange NaN's, but instead prints a warning message telling you that you're using a single optimizer run so that no value can be calculated and that any conclusions you draw from these numbers may be unreliable and not reproducible.
I think Barry has identified the reason for the differences in scores correctly: Comparing tuning set scores to test set scores. I have regression tests against Moses' multi-bleu.pl that make sure results come out the same, when expected. You can see such tests being automatically run here: https://travis-ci.org/jhclark/multeval/jobs/4355357/#L310. The only time Moses and multeval might disagree is when smoothing n-gram orders that have zero matches -- I use Papineni's method instead of the multi-bleu.plmethod. Such differences are generally unimportant and rare. Cheers, Jon On Thu, Jan 24, 2013 at 6:01 AM, Rico Sennrich <rico.sennr...@gmx.ch> wrote: > Barry Haddow <bhaddow@...> writes: > > > The NaNs in the MultiEval output are a bit strange. I'm not familiar > > with this tool, but Moses contains multi-bleu.pl (in scripts/generic) > > which you can also use to calculate Bleu, > > > > cheers - Barry > > s_opt is the variance of different optimizer runs. MultEval is intended to > deal > with optimizer instability, and is most useful if you run your optimizer > (e.g. > MERT) multiple times, and create multiple hypotheses per system. > > With only one hypothesis per system, there is no way to calculate optimizer > variance, and you get NaN for this field. > > Rico > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support