Very short sentences will give you high scores. Also multiple references will boost them
Miles On Apr 26, 2012 8:13 PM, "John D Burger" <j...@mitre.org> wrote: > I =think= I recall that pairwise BLEU scores for human translators are > usually around 0.50, so anything much better than that is indeed suspect. > > - JB > > On Apr 26, 2012, at 14:18 , Daniel Schaut wrote: > > > Hi all, > > > > > > I’m running some experiments for my thesis and I’ve been told by a more > experienced user that the achieved scores for BLEU/METEOR of my MT engine > were too good to be true. Since this is the very first MT engine I’ve ever > made and I am not experienced with interpreting scores, I really don’t know > how to reflect them. The first test set achieves a BLEU score of 0.6508 > (v13). METEOR’s final score is 0.7055 (v1.3, exact, stem, paraphrase). A > second test set indicated a slightly lower BLEU score of 0.6267 and a > METEOR score of 0.6748. > > > > > > Here are some basic facts about my system: > > > > Decoding direction: EN-DE > > > > Training corpus: 1.8 mil sentences > > > > Tuning runs: 5 > > > > Test sets: a) 2,000 sentences, b) 1,000 sentences (both in-domain) > > > > LM type: trigram > > > > TM type: unfactored > > > > > > I’m now trying to figure out if these scores are realistic at all, as > different papers indicate by far lower BLEU scores, e.g. Koehn and Hoang > 2011. Any comments regarding the mentioned decoding direction and related > scores will be much appreciated. > > > > > > Best, > > > > Daniel > > > > _______________________________________________ > > Moses-support mailing list > > Moses-support@mit.edu > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support