Thanks Phil.
I figured out for the lowercase thing thanks.
For the short n-grams this is not exactly what I meant.
Any 1-gram sentence will give either 1 when exact match or 0.8409 when
different match
Any 2-gram sentence will give 1 when exact match or 0.7598 when 1 word match
....
My point is that this is a twist of the BLEU+ algorithm.
it is supposed to avoid to get a 0 score when there is no match under or
equal 4-gram (because of the geometric mean)
but it twists the scores of short segments giving a much too high score.
By the way :
The reason I am looking into this is that I am using sentence-level bleu
to filter some noisy corpora.
For instance, out of 6 million sentences I will keep only the sentences
with a Bleu score > XX to avoid keeping misaligned segments.
Le 04/03/2016 21:59, Philipp Koehn a écrit :
Hi,
this BLEU calculation happens in the function bleu_annotation in lines
224ff in scripts/ems/support/analysis.perl
You could convert the system translation $system and the reference
translations to $REFERENCE[$i] to lowercase (lc) if you prefer that.
The code suggests that n-gram precision for sentences of length < n is
treated as 100% - which may be not what you want, but it is a
degenerate case, so treating it is a bit undefined.
-phi
On Sat, Feb 27, 2016 at 6:20 AM, <vngu...@neuf.fr
<mailto:vngu...@neuf.fr>> wrote:
Ok obviously this is a modified bleu+ algorithm, similar to what
sentence-bleu does.
However I believe this is still not right for unigram sentences.
____________________
De : "Vincent Nguyen"
Date : 26 févr. 2016 22:21:59
A : moses-support@mit.edu <mailto:moses-support@mit.edu>
Sujet : Re: [Moses-support] bleu-annotation / analysis.perl
Am I correct saying that when sentences length is less or equal to 4
tokens then the BLEU score should be 1 for exact matches and 0
when not
exact match ?
(by definition of http://www1.cs.columbia.edu/nlp/sgd/bleu.pdf)
Le 26/02/2016 <tel:26/02/2016> 10:02, Vincent Nguyen a écrit :
> Hi,
>
> I would like to understand better the analysis.perl script that
> generates the bleu-annotation file.
>
> Is there an easy way to get the uncased bleu score of each line
instead
> of the cased calculation ?
> Am I right that this script recompute its own Bleu score without
calling
> the Nist-Bleu nor Multi-Bleu external scripts ?
>
>
> Also I find it strange sometimes when there is only one or two
words :
>
> Translation / reference / score
> Contents / Content / 0.8409
> Ireland / Irish / 0.8409
> Issuer / Italie / 0.8409
> PT / US / 0.8409
> .....
> and so on, two words, unrelated will always generate similar
0.8409 scores.
>
> for 2-grams
> Very strong / Very high / 0.7598
> Public sector / Public Sector / 0.7598
> However : / But : / 0.7598
>
> so, for 2-grams, when one word only is good it will generate a
score of
> 0.7598
>
>
> Thanks,
>
> Vincent
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu <mailto:Moses-support@mit.edu>
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu <mailto:Moses-support@mit.edu>
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support