Hi,
I’ve written a BLEU scoring tool called “sacreBLEU” that may be of use to
people here. The goal is to get people to start reporting WMT-matrix compatible
scores in their papers (i.e., scoring on detokenized outputs with a fixed
reference tokenization) so that numbers can be compared directly, in the spirit
of Rico Sennrich's multi-bleu-detok.pl. The nice part for you is that it
auto-downloads WMT datasets and makes it so you no longer have to deal with
SGML. You can install it via pip:
pip3 install sacrebleu
For starters, you can use it to easily download datasets:
sacrebleu -t wmt17 -l en-de --echo src > wmt17.en-de.en
sacrebleu -t wmt17 -l en-de --echo ref > wmt17.en-de.de
<http://wmt17.en-de.de/>
You don’t need to download the reference, though. You can just score against it
using sacreBLEU directly. After decoding and detokenizing, try:
cat output.detok.txt | sacrebleu -t wmt17 -l en-de
I have tested and it produces the exact same numbers as Moses' mteval-v13a.pl,
which is the official scoring script for WMT. It computes the exact same
numbers for all 153 WMT17 system submissions (column BLEU-cased at
matrix.statmt.org <http://matrix.statmt.org/>). For example:
$ cat newstest2017.uedin-nmt.4722.en-de | sacrebleu -t wmt17 -l en-de
BLEU+case.mixed+lang.en-de+numrefs.1+smooth.exp+test.wmt17+tok.13a+version.1.1.4
= 28.30 59.9/34.0/21.8/14.4 (BP = 1.000 ratio = 1.026 hyp_len = 62873 ref_len
= 61287)
This means numbers computed with it are directly comparable across papers. As
you can see, in addition to the score, it outputs a version string that records
the exact BLEU parameters used. The output string is compatible with the output
of multi-bleu.pl, so your old code for parsing the BLEU score out of
multi-bleu.pl should still work.
You can also use the tool in a backward compatible mode with arbitrary
references, the same way
cat output.detok.txt | sacrebleu ref1 [ref2 …]
The official code is in sockeye (Amazon’s NMT system):
github.com
<http://github.com/>/awslabs/sockeye/tree/master/contrib/sacrebleu
<http://github.com/awslabs/sockeye/tree/master/contrib/sacrebleu>
I will also likely maintain a clone here:
github.com/mjpost/sacreBLEU <http://github.com/mjpost/sacreBLEU>
matt
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support