[Moses-support] The BELU score from MultiEval is much lower than which generated by the Moses mert-moses.pl script

Tan, Jun Wed, 23 Jan 2013 18:51:30 -0800

Hello all,

I have created an English-Chinese MT engine via Moses.  I’m doing the 
translation quality evaluation regard this engine. I have an evaluation report 
created by MultiEval tool about 1000 sentences. I found the BELU score is much 
lower than the score generated by the mert-moses.pl script.  It’s only 0.3 of 
MultiEval, but for mert-moses.pl is 0.65.



MultiEval report:

        BLEU (s_sel/s_opt/p)    METEOR (s_sel/s_opt/p)  TER (s_sel/s_opt/p)     
Length (s_sel/s_opt/p)
EMC DATA        29.0 (0.6/NaN/-)        31.7 (0.3/NaN/-)        57.1 
(0.7/NaN/-)        100.4 (0.6/NaN/-)
TAUS DATA       21.8 (0.5/NaN/0.00)     28.1 (0.2/NaN/0.00)     61.8 
(0.6/NaN/0.00)     97.5 (0.6/NaN/0.00)

Top unmatched hypothesis words according to METEOR:
[的 x 341, ， x 177, 在 x 117, &quot; x 91, 和 x 85, 中 x 84, 到 x 84, 将 x 74, / x 
65, 一个 x 65]
[的 x 436, ， x 273, 在 x 163, 将 x 85, 中 x 82, 时 x 71, 上 x 65, 以 x 54, 为 x 52, 数据 
x 50]
[的 x 400, ， x 197, 在 x 139, 一个 x 91, 数据 x 89, 将 x 89, 是 x 85, “ x 85, 和 x 82, 
数据域 x 77]
[的 x 369, ， x 227, 在 x 151, Domain x 139, Data x 136, 数据 x 115, 上 x 96, 中 x 93, 
将 x 86, 消除 x 83]


I have some following questions regard this issue:
1.      The causes of this issue.
2.      Anyone else has similar experience?
3.      Is it normal?
4.      Which tool do you recommend for the MT evaluation?
5.      How to improve the engine according to the MultiEval report?

Any question or  any suggestion is welcome ~

Thanks,
Jun

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] The BELU score from MultiEval is much lower than which generated by the Moses mert-moses.pl script

Reply via email to