Hi Daniel

BLEU scores do vary according to test set, but the scores you report are much 
higher than usual.

The most likely thing is that you have some of your test set included in your 
training set,

cheers - Barry

On Thursday 26 April 2012 19:18:33 Daniel Schaut wrote:
> Hi all,
> 
> I'm running some experiments for my thesis and I've been told by a more
> experienced user that the achieved scores for BLEU/METEOR of my MT engine
> were too good to be true. Since this is the very first MT engine I've ever
> made and I am not experienced with interpreting scores, I really don't know
> how to reflect them. The first test set achieves a BLEU score of 0.6508
> (v13). METEOR's final score is 0.7055 (v1.3, exact, stem, paraphrase). A
> second test set indicated a slightly lower BLEU score of 0.6267 and a
>  METEOR score of 0.6748.
> 
> Here are some basic facts about my system:
> Decoding direction: EN-DE
> Training corpus: 1.8 mil sentences
> Tuning runs: 5
> Test sets: a) 2,000 sentences, b) 1,000 sentences (both in-domain)
> LM type: trigram
> TM type: unfactored
> 
> I'm now trying to figure out if these scores are realistic at all, as
> different papers indicate by far lower BLEU scores, e.g. Koehn and Hoang
> 2011. Any comments regarding the mentioned decoding direction and related
> scores will be much appreciated.
> 
> Best,
> Daniel
> 
 
--
Barry Haddow
University of Edinburgh
+44 (0) 131 651 3173

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to