Re: [Moses-support] Cumulative BLEU scores

2016-10-26 Thread Nat Gillin
Dear Phi and Moses community,

@Phi, thanks for confirming the note on the 2nd row of BLEU up to the 4th
order being our "normally-regarded" BLEU.

>From looking through some sets, it seems like the cumulative BLEU is some
how linearly declining while the individual BLEU is decaying at some
exponential rate. Possibly this is because the higher order ngrams are
rarer to match than the unigrams/bigrams.

So are we artificially inflating the numbers by overcounting the repeats
(e.g. unigrams appears in bigram) when we solely consider the cumulative
BLEU, esp. when individually the ngram matches are exponentially decaying?


Any input from some statistics pro?

Regards,
Nat

On Thu, Oct 27, 2016 at 3:46 AM, Philipp Koehn  wrote:

> Hi,
>
> I think you are right - the first set of numbers are the n-gram precisions
> for each order of n-gram.
> The second set are numbers that you get if you take the geometric mean of
> the n-gram precisions.
> Hence, the number under 4-gram is the BLEU score.
>
> The BLEU score is traditionally computed for 1-4 grams, the original BLEU
> paper discusses this.
> There was the expectation that if machine translation gets better, we
> should use higher-order BLEU,
> but we never did.
>
> -phi
>
>
>
>
> On Wed, Oct 26, 2016 at 12:44 AM, Nat Gillin  wrote:
>
>> Dear Moses community,
>>
>> Ah, I found out what the cumulative means. The cumulative scores are the
>> usual BLEU scores that we report because it includes the order of ngrams
>> before the order that is desired.
>>
>> The only odd numbers from the mteval-v13a.pl are the individual BLEU
>> scores. Is it right that the individual BLEU scores are the bp * weights *
>> modified_precision for each order of ngram? Are there corresponding papers
>> that investigates these numbers?
>>
>> Regards,
>> Nat
>>
>> On Tue, Oct 25, 2016 at 12:02 PM, Nat Gillin 
>> wrote:
>>
>>> Dear Moses community,
>>>
>>> To make the question clearer:
>>>
>>> The question is why does the cumulative score add the brevity penalty
>>> before taking the exponent at every order of ngram but the individual score
>>> only takes the brevity penalty into account at
>>> https://github.com/moses-smt/mosesdecoder/blob/master/scr
>>> ipts/generic/mteval-v13a.pl#L874
>>>
>>> Any pointers to the papers describing the cumulative score would be nice
>>> =)
>>>
>>> Thanks in advance again,
>>> Nat
>>>
>>> On Tue, Oct 25, 2016 at 11:58 AM, Nat Gillin 
>>> wrote:
>>>
 Dear Moses Community,

 When using mteval-13a.pl, we note that the output looks like this:

 length ratio: 1.07303974221267 (1998/1862), penalty (log): 0

 NIST score = 5.0564  BLEU score = 0.2318 for system "Google"


 # 
 


 Individual N-gram scoring

 1-gram   2-gram   3-gram   4-gram   5-gram   6-gram   7-gram
 8-gram   9-gram

 --   --   --   --   --   --   --
 --   --

  NIST:  4.4488   0.5554   0.0477   0.0045   0.   0.   0.
 0.   0.  "Google"


  BLEU:  0.5415   0.2972   0.1752   0.1025   0.0626   0.0354   0.0193
 0.0085   0.0017  "Google"


 # 
 

 Cumulative N-gram scoring

 1-gram   2-gram   3-gram   4-gram   5-gram   6-gram   7-gram
 8-gram   9-gram

 --   --   --   --   --   --   --
 --   --

  NIST:  4.4488   5.0043   5.0520   5.0564   5.0564   5.0564   5.0564
 5.0564   5.0564  "Google"


  BLEU:  0.5415   0.4012   0.3044   0.2318   0.1784   0.1362   0.1031
 0.0754   0.0493  "Google"

 And at https://github.com/moses-smt/mosesdecoder/blob/master/scr
 ipts/generic/mteval-v13a.pl#L823, it tries to calculate the cumulative
 score by accumulate the individual ngram precisions and at each order of
 ngram add to it and do a normalization before calculating the cumulative
 score for each order of nrgram.

 The question is why does it add the brevity penalty? (i.e. $len_score)

 Also, is this score discussed in any paper?

 Thanks in advance for the clarifications!

 Regards,
 Nat


>>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Cumulative BLEU scores

2016-10-26 Thread Philipp Koehn
Hi,

I think you are right - the first set of numbers are the n-gram precisions
for each order of n-gram.
The second set are numbers that you get if you take the geometric mean of
the n-gram precisions.
Hence, the number under 4-gram is the BLEU score.

The BLEU score is traditionally computed for 1-4 grams, the original BLEU
paper discusses this.
There was the expectation that if machine translation gets better, we
should use higher-order BLEU,
but we never did.

-phi




On Wed, Oct 26, 2016 at 12:44 AM, Nat Gillin  wrote:

> Dear Moses community,
>
> Ah, I found out what the cumulative means. The cumulative scores are the
> usual BLEU scores that we report because it includes the order of ngrams
> before the order that is desired.
>
> The only odd numbers from the mteval-v13a.pl are the individual BLEU
> scores. Is it right that the individual BLEU scores are the bp * weights *
> modified_precision for each order of ngram? Are there corresponding papers
> that investigates these numbers?
>
> Regards,
> Nat
>
> On Tue, Oct 25, 2016 at 12:02 PM, Nat Gillin  wrote:
>
>> Dear Moses community,
>>
>> To make the question clearer:
>>
>> The question is why does the cumulative score add the brevity penalty
>> before taking the exponent at every order of ngram but the individual score
>> only takes the brevity penalty into account at
>> https://github.com/moses-smt/mosesdecoder/blob/master/scr
>> ipts/generic/mteval-v13a.pl#L874
>>
>> Any pointers to the papers describing the cumulative score would be nice
>> =)
>>
>> Thanks in advance again,
>> Nat
>>
>> On Tue, Oct 25, 2016 at 11:58 AM, Nat Gillin 
>> wrote:
>>
>>> Dear Moses Community,
>>>
>>> When using mteval-13a.pl, we note that the output looks like this:
>>>
>>> length ratio: 1.07303974221267 (1998/1862), penalty (log): 0
>>>
>>> NIST score = 5.0564  BLEU score = 0.2318 for system "Google"
>>>
>>>
>>> # 
>>> 
>>>
>>>
>>> Individual N-gram scoring
>>>
>>> 1-gram   2-gram   3-gram   4-gram   5-gram   6-gram   7-gram
>>> 8-gram   9-gram
>>>
>>> --   --   --   --   --   --   --
>>> --   --
>>>
>>>  NIST:  4.4488   0.5554   0.0477   0.0045   0.   0.   0.
>>> 0.   0.  "Google"
>>>
>>>
>>>  BLEU:  0.5415   0.2972   0.1752   0.1025   0.0626   0.0354   0.0193
>>> 0.0085   0.0017  "Google"
>>>
>>>
>>> # 
>>> 
>>>
>>> Cumulative N-gram scoring
>>>
>>> 1-gram   2-gram   3-gram   4-gram   5-gram   6-gram   7-gram
>>> 8-gram   9-gram
>>>
>>> --   --   --   --   --   --   --
>>> --   --
>>>
>>>  NIST:  4.4488   5.0043   5.0520   5.0564   5.0564   5.0564   5.0564
>>> 5.0564   5.0564  "Google"
>>>
>>>
>>>  BLEU:  0.5415   0.4012   0.3044   0.2318   0.1784   0.1362   0.1031
>>> 0.0754   0.0493  "Google"
>>>
>>> And at https://github.com/moses-smt/mosesdecoder/blob/master/scr
>>> ipts/generic/mteval-v13a.pl#L823, it tries to calculate the cumulative
>>> score by accumulate the individual ngram precisions and at each order of
>>> ngram add to it and do a normalization before calculating the cumulative
>>> score for each order of nrgram.
>>>
>>> The question is why does it add the brevity penalty? (i.e. $len_score)
>>>
>>> Also, is this score discussed in any paper?
>>>
>>> Thanks in advance for the clarifications!
>>>
>>> Regards,
>>> Nat
>>>
>>>
>>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Cumulative BLEU scores

2016-10-25 Thread Nat Gillin
Dear Moses community,

Ah, I found out what the cumulative means. The cumulative scores are the
usual BLEU scores that we report because it includes the order of ngrams
before the order that is desired.

The only odd numbers from the mteval-v13a.pl are the individual BLEU
scores. Is it right that the individual BLEU scores are the bp * weights *
modified_precision for each order of ngram? Are there corresponding papers
that investigates these numbers?

Regards,
Nat

On Tue, Oct 25, 2016 at 12:02 PM, Nat Gillin  wrote:

> Dear Moses community,
>
> To make the question clearer:
>
> The question is why does the cumulative score add the brevity penalty
> before taking the exponent at every order of ngram but the individual score
> only takes the brevity penalty into account at https://github.com/moses-
> smt/mosesdecoder/blob/master/scripts/generic/mteval-v13a.pl#L874
>
> Any pointers to the papers describing the cumulative score would be nice =)
>
> Thanks in advance again,
> Nat
>
> On Tue, Oct 25, 2016 at 11:58 AM, Nat Gillin  wrote:
>
>> Dear Moses Community,
>>
>> When using mteval-13a.pl, we note that the output looks like this:
>>
>> length ratio: 1.07303974221267 (1998/1862), penalty (log): 0
>>
>> NIST score = 5.0564  BLEU score = 0.2318 for system "Google"
>>
>>
>> # 
>> 
>>
>>
>> Individual N-gram scoring
>>
>> 1-gram   2-gram   3-gram   4-gram   5-gram   6-gram   7-gram
>> 8-gram   9-gram
>>
>> --   --   --   --   --   --   --
>> --   --
>>
>>  NIST:  4.4488   0.5554   0.0477   0.0045   0.   0.   0.
>> 0.   0.  "Google"
>>
>>
>>  BLEU:  0.5415   0.2972   0.1752   0.1025   0.0626   0.0354   0.0193
>> 0.0085   0.0017  "Google"
>>
>>
>> # 
>> 
>>
>> Cumulative N-gram scoring
>>
>> 1-gram   2-gram   3-gram   4-gram   5-gram   6-gram   7-gram
>> 8-gram   9-gram
>>
>> --   --   --   --   --   --   --
>> --   --
>>
>>  NIST:  4.4488   5.0043   5.0520   5.0564   5.0564   5.0564   5.0564
>> 5.0564   5.0564  "Google"
>>
>>
>>  BLEU:  0.5415   0.4012   0.3044   0.2318   0.1784   0.1362   0.1031
>> 0.0754   0.0493  "Google"
>>
>> And at https://github.com/moses-smt/mosesdecoder/blob/master/scr
>> ipts/generic/mteval-v13a.pl#L823, it tries to calculate the cumulative
>> score by accumulate the individual ngram precisions and at each order of
>> ngram add to it and do a normalization before calculating the cumulative
>> score for each order of nrgram.
>>
>> The question is why does it add the brevity penalty? (i.e. $len_score)
>>
>> Also, is this score discussed in any paper?
>>
>> Thanks in advance for the clarifications!
>>
>> Regards,
>> Nat
>>
>>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Cumulative BLEU scores

2016-10-24 Thread Nat Gillin
Dear Moses community,

To make the question clearer:

The question is why does the cumulative score add the brevity penalty
before taking the exponent at every order of ngram but the individual score
only takes the brevity penalty into account at
https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/mteval-v13a.pl#L874

Any pointers to the papers describing the cumulative score would be nice =)

Thanks in advance again,
Nat

On Tue, Oct 25, 2016 at 11:58 AM, Nat Gillin  wrote:

> Dear Moses Community,
>
> When using mteval-13a.pl, we note that the output looks like this:
>
> length ratio: 1.07303974221267 (1998/1862), penalty (log): 0
>
> NIST score = 5.0564  BLEU score = 0.2318 for system "Google"
>
>
> # 
>
>
> Individual N-gram scoring
>
> 1-gram   2-gram   3-gram   4-gram   5-gram   6-gram   7-gram
> 8-gram   9-gram
>
> --   --   --   --   --   --   --
> --   --
>
>  NIST:  4.4488   0.5554   0.0477   0.0045   0.   0.   0.
> 0.   0.  "Google"
>
>
>  BLEU:  0.5415   0.2972   0.1752   0.1025   0.0626   0.0354   0.0193
> 0.0085   0.0017  "Google"
>
>
> # 
>
> Cumulative N-gram scoring
>
> 1-gram   2-gram   3-gram   4-gram   5-gram   6-gram   7-gram
> 8-gram   9-gram
>
> --   --   --   --   --   --   --
> --   --
>
>  NIST:  4.4488   5.0043   5.0520   5.0564   5.0564   5.0564   5.0564
> 5.0564   5.0564  "Google"
>
>
>  BLEU:  0.5415   0.4012   0.3044   0.2318   0.1784   0.1362   0.1031
> 0.0754   0.0493  "Google"
>
> And at https://github.com/moses-smt/mosesdecoder/blob/master/
> scripts/generic/mteval-v13a.pl#L823, it tries to calculate the cumulative
> score by accumulate the individual ngram precisions and at each order of
> ngram add to it and do a normalization before calculating the cumulative
> score for each order of nrgram.
>
> The question is why does it add the brevity penalty? (i.e. $len_score)
>
> Also, is this score discussed in any paper?
>
> Thanks in advance for the clarifications!
>
> Regards,
> Nat
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support