Hi - I trained a phrase based system from a low resource language to english, and got *13.6633* as the BLEU score. However, when I tested on the same dev set and computed BLEU against the English corpus in the dev set, I only got *3.69*. Then I did a manual grid search over the parameter space in moses.ini (the one that's generated upon the end of tuning/development), and got the BLEU of *3.77* at best. Both recasing and tokenization are used to the dev set I computed BLEU on. I'm wondering what could be the potential reason why the BLEU score reported in moses.ini derived from the dev set doesn't align with the one I computed with the same dev set.
Thanks. - Angli
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support