Re: [Moses-support] Cumulative BLEU scores
Dear Phi and Moses community, @Phi, thanks for confirming the note on the 2nd row of BLEU up to the 4th order being our "normally-regarded" BLEU. >From looking through some sets, it seems like the cumulative BLEU is some how linearly declining while the individual BLEU is decaying at some exponential rate. Possibly this is because the higher order ngrams are rarer to match than the unigrams/bigrams. So are we artificially inflating the numbers by overcounting the repeats (e.g. unigrams appears in bigram) when we solely consider the cumulative BLEU, esp. when individually the ngram matches are exponentially decaying? Any input from some statistics pro? Regards, Nat On Thu, Oct 27, 2016 at 3:46 AM, Philipp Koehn wrote: > Hi, > > I think you are right - the first set of numbers are the n-gram precisions > for each order of n-gram. > The second set are numbers that you get if you take the geometric mean of > the n-gram precisions. > Hence, the number under 4-gram is the BLEU score. > > The BLEU score is traditionally computed for 1-4 grams, the original BLEU > paper discusses this. > There was the expectation that if machine translation gets better, we > should use higher-order BLEU, > but we never did. > > -phi > > > > > On Wed, Oct 26, 2016 at 12:44 AM, Nat Gillin wrote: > >> Dear Moses community, >> >> Ah, I found out what the cumulative means. The cumulative scores are the >> usual BLEU scores that we report because it includes the order of ngrams >> before the order that is desired. >> >> The only odd numbers from the mteval-v13a.pl are the individual BLEU >> scores. Is it right that the individual BLEU scores are the bp * weights * >> modified_precision for each order of ngram? Are there corresponding papers >> that investigates these numbers? >> >> Regards, >> Nat >> >> On Tue, Oct 25, 2016 at 12:02 PM, Nat Gillin >> wrote: >> >>> Dear Moses community, >>> >>> To make the question clearer: >>> >>> The question is why does the cumulative score add the brevity penalty >>> before taking the exponent at every order of ngram but the individual score >>> only takes the brevity penalty into account at >>> https://github.com/moses-smt/mosesdecoder/blob/master/scr >>> ipts/generic/mteval-v13a.pl#L874 >>> >>> Any pointers to the papers describing the cumulative score would be nice >>> =) >>> >>> Thanks in advance again, >>> Nat >>> >>> On Tue, Oct 25, 2016 at 11:58 AM, Nat Gillin >>> wrote: >>> Dear Moses Community, When using mteval-13a.pl, we note that the output looks like this: length ratio: 1.07303974221267 (1998/1862), penalty (log): 0 NIST score = 5.0564 BLEU score = 0.2318 for system "Google" # Individual N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram 9-gram -- -- -- -- -- -- -- -- -- NIST: 4.4488 0.5554 0.0477 0.0045 0. 0. 0. 0. 0. "Google" BLEU: 0.5415 0.2972 0.1752 0.1025 0.0626 0.0354 0.0193 0.0085 0.0017 "Google" # Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram 9-gram -- -- -- -- -- -- -- -- -- NIST: 4.4488 5.0043 5.0520 5.0564 5.0564 5.0564 5.0564 5.0564 5.0564 "Google" BLEU: 0.5415 0.4012 0.3044 0.2318 0.1784 0.1362 0.1031 0.0754 0.0493 "Google" And at https://github.com/moses-smt/mosesdecoder/blob/master/scr ipts/generic/mteval-v13a.pl#L823, it tries to calculate the cumulative score by accumulate the individual ngram precisions and at each order of ngram add to it and do a normalization before calculating the cumulative score for each order of nrgram. The question is why does it add the brevity penalty? (i.e. $len_score) Also, is this score discussed in any paper? Thanks in advance for the clarifications! Regards, Nat >>> >> >> ___ >> Moses-support mailing list >> Moses-support@mit.edu >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Cumulative BLEU scores
Hi, I think you are right - the first set of numbers are the n-gram precisions for each order of n-gram. The second set are numbers that you get if you take the geometric mean of the n-gram precisions. Hence, the number under 4-gram is the BLEU score. The BLEU score is traditionally computed for 1-4 grams, the original BLEU paper discusses this. There was the expectation that if machine translation gets better, we should use higher-order BLEU, but we never did. -phi On Wed, Oct 26, 2016 at 12:44 AM, Nat Gillin wrote: > Dear Moses community, > > Ah, I found out what the cumulative means. The cumulative scores are the > usual BLEU scores that we report because it includes the order of ngrams > before the order that is desired. > > The only odd numbers from the mteval-v13a.pl are the individual BLEU > scores. Is it right that the individual BLEU scores are the bp * weights * > modified_precision for each order of ngram? Are there corresponding papers > that investigates these numbers? > > Regards, > Nat > > On Tue, Oct 25, 2016 at 12:02 PM, Nat Gillin wrote: > >> Dear Moses community, >> >> To make the question clearer: >> >> The question is why does the cumulative score add the brevity penalty >> before taking the exponent at every order of ngram but the individual score >> only takes the brevity penalty into account at >> https://github.com/moses-smt/mosesdecoder/blob/master/scr >> ipts/generic/mteval-v13a.pl#L874 >> >> Any pointers to the papers describing the cumulative score would be nice >> =) >> >> Thanks in advance again, >> Nat >> >> On Tue, Oct 25, 2016 at 11:58 AM, Nat Gillin >> wrote: >> >>> Dear Moses Community, >>> >>> When using mteval-13a.pl, we note that the output looks like this: >>> >>> length ratio: 1.07303974221267 (1998/1862), penalty (log): 0 >>> >>> NIST score = 5.0564 BLEU score = 0.2318 for system "Google" >>> >>> >>> # >>> >>> >>> >>> Individual N-gram scoring >>> >>> 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram >>> 8-gram 9-gram >>> >>> -- -- -- -- -- -- -- >>> -- -- >>> >>> NIST: 4.4488 0.5554 0.0477 0.0045 0. 0. 0. >>> 0. 0. "Google" >>> >>> >>> BLEU: 0.5415 0.2972 0.1752 0.1025 0.0626 0.0354 0.0193 >>> 0.0085 0.0017 "Google" >>> >>> >>> # >>> >>> >>> Cumulative N-gram scoring >>> >>> 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram >>> 8-gram 9-gram >>> >>> -- -- -- -- -- -- -- >>> -- -- >>> >>> NIST: 4.4488 5.0043 5.0520 5.0564 5.0564 5.0564 5.0564 >>> 5.0564 5.0564 "Google" >>> >>> >>> BLEU: 0.5415 0.4012 0.3044 0.2318 0.1784 0.1362 0.1031 >>> 0.0754 0.0493 "Google" >>> >>> And at https://github.com/moses-smt/mosesdecoder/blob/master/scr >>> ipts/generic/mteval-v13a.pl#L823, it tries to calculate the cumulative >>> score by accumulate the individual ngram precisions and at each order of >>> ngram add to it and do a normalization before calculating the cumulative >>> score for each order of nrgram. >>> >>> The question is why does it add the brevity penalty? (i.e. $len_score) >>> >>> Also, is this score discussed in any paper? >>> >>> Thanks in advance for the clarifications! >>> >>> Regards, >>> Nat >>> >>> >> > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Cumulative BLEU scores
Dear Moses community, Ah, I found out what the cumulative means. The cumulative scores are the usual BLEU scores that we report because it includes the order of ngrams before the order that is desired. The only odd numbers from the mteval-v13a.pl are the individual BLEU scores. Is it right that the individual BLEU scores are the bp * weights * modified_precision for each order of ngram? Are there corresponding papers that investigates these numbers? Regards, Nat On Tue, Oct 25, 2016 at 12:02 PM, Nat Gillin wrote: > Dear Moses community, > > To make the question clearer: > > The question is why does the cumulative score add the brevity penalty > before taking the exponent at every order of ngram but the individual score > only takes the brevity penalty into account at https://github.com/moses- > smt/mosesdecoder/blob/master/scripts/generic/mteval-v13a.pl#L874 > > Any pointers to the papers describing the cumulative score would be nice =) > > Thanks in advance again, > Nat > > On Tue, Oct 25, 2016 at 11:58 AM, Nat Gillin wrote: > >> Dear Moses Community, >> >> When using mteval-13a.pl, we note that the output looks like this: >> >> length ratio: 1.07303974221267 (1998/1862), penalty (log): 0 >> >> NIST score = 5.0564 BLEU score = 0.2318 for system "Google" >> >> >> # >> >> >> >> Individual N-gram scoring >> >> 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram >> 8-gram 9-gram >> >> -- -- -- -- -- -- -- >> -- -- >> >> NIST: 4.4488 0.5554 0.0477 0.0045 0. 0. 0. >> 0. 0. "Google" >> >> >> BLEU: 0.5415 0.2972 0.1752 0.1025 0.0626 0.0354 0.0193 >> 0.0085 0.0017 "Google" >> >> >> # >> >> >> Cumulative N-gram scoring >> >> 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram >> 8-gram 9-gram >> >> -- -- -- -- -- -- -- >> -- -- >> >> NIST: 4.4488 5.0043 5.0520 5.0564 5.0564 5.0564 5.0564 >> 5.0564 5.0564 "Google" >> >> >> BLEU: 0.5415 0.4012 0.3044 0.2318 0.1784 0.1362 0.1031 >> 0.0754 0.0493 "Google" >> >> And at https://github.com/moses-smt/mosesdecoder/blob/master/scr >> ipts/generic/mteval-v13a.pl#L823, it tries to calculate the cumulative >> score by accumulate the individual ngram precisions and at each order of >> ngram add to it and do a normalization before calculating the cumulative >> score for each order of nrgram. >> >> The question is why does it add the brevity penalty? (i.e. $len_score) >> >> Also, is this score discussed in any paper? >> >> Thanks in advance for the clarifications! >> >> Regards, >> Nat >> >> > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Cumulative BLEU scores
Dear Moses community, To make the question clearer: The question is why does the cumulative score add the brevity penalty before taking the exponent at every order of ngram but the individual score only takes the brevity penalty into account at https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/mteval-v13a.pl#L874 Any pointers to the papers describing the cumulative score would be nice =) Thanks in advance again, Nat On Tue, Oct 25, 2016 at 11:58 AM, Nat Gillin wrote: > Dear Moses Community, > > When using mteval-13a.pl, we note that the output looks like this: > > length ratio: 1.07303974221267 (1998/1862), penalty (log): 0 > > NIST score = 5.0564 BLEU score = 0.2318 for system "Google" > > > # > > > Individual N-gram scoring > > 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram > 8-gram 9-gram > > -- -- -- -- -- -- -- > -- -- > > NIST: 4.4488 0.5554 0.0477 0.0045 0. 0. 0. > 0. 0. "Google" > > > BLEU: 0.5415 0.2972 0.1752 0.1025 0.0626 0.0354 0.0193 > 0.0085 0.0017 "Google" > > > # > > Cumulative N-gram scoring > > 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram > 8-gram 9-gram > > -- -- -- -- -- -- -- > -- -- > > NIST: 4.4488 5.0043 5.0520 5.0564 5.0564 5.0564 5.0564 > 5.0564 5.0564 "Google" > > > BLEU: 0.5415 0.4012 0.3044 0.2318 0.1784 0.1362 0.1031 > 0.0754 0.0493 "Google" > > And at https://github.com/moses-smt/mosesdecoder/blob/master/ > scripts/generic/mteval-v13a.pl#L823, it tries to calculate the cumulative > score by accumulate the individual ngram precisions and at each order of > ngram add to it and do a normalization before calculating the cumulative > score for each order of nrgram. > > The question is why does it add the brevity penalty? (i.e. $len_score) > > Also, is this score discussed in any paper? > > Thanks in advance for the clarifications! > > Regards, > Nat > > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support