Thanks Phil.
I figured out for the lowercase thing thanks.

For the short n-grams this is not exactly what I meant.

Any 1-gram sentence will give either 1 when exact match or 0.8409 when different match
Any 2-gram sentence will give 1 when exact match or 0.7598 when 1 word match
....

My point is that this is a twist of the BLEU+ algorithm.
it is supposed to avoid to get a 0 score when there is no match under or equal 4-gram (because of the geometric mean)
but it twists the scores of short segments giving a much too high score.

By the way :
The reason I am looking into this is that I am using sentence-level bleu to filter some noisy corpora. For instance, out of 6 million sentences I will keep only the sentences with a Bleu score > XX to avoid keeping misaligned segments.



Le 04/03/2016 21:59, Philipp Koehn a écrit :
Hi,

this BLEU calculation happens in the function bleu_annotation in lines 224ff in scripts/ems/support/analysis.perl

You could convert the system translation $system and the reference translations to $REFERENCE[$i] to lowercase (lc) if you prefer that.

The code suggests that n-gram precision for sentences of length < n is treated as 100% - which may be not what you want, but it is a degenerate case, so treating it is a bit undefined.

-phi

On Sat, Feb 27, 2016 at 6:20 AM, <vngu...@neuf.fr <mailto:vngu...@neuf.fr>> wrote:

    Ok obviously this is a modified bleu+ algorithm, similar to what
    sentence-bleu does.
    However I believe this is still not right for unigram sentences.



    ____________________

    De : "Vincent Nguyen"
    Date : 26 févr. 2016 22:21:59
    A : moses-support@mit.edu <mailto:moses-support@mit.edu>

    Sujet : Re: [Moses-support] bleu-annotation / analysis.perl



    Am I correct saying that when sentences length is less or equal to 4
    tokens then the BLEU score should be 1 for exact matches and 0
    when not
    exact match ?
    (by definition of http://www1.cs.columbia.edu/nlp/sgd/bleu.pdf)


    Le 26/02/2016 <tel:26/02/2016> 10:02, Vincent Nguyen a écrit :
    > Hi,
    >
    > I would like to understand better the analysis.perl script that
    > generates the bleu-annotation file.
    >
    > Is there an easy way to get the uncased bleu score of each line
    instead
    > of the cased calculation ?
    > Am I right that this script recompute its own Bleu score without
    calling
    > the Nist-Bleu nor Multi-Bleu external scripts ?
    >
    >
    > Also I find it strange sometimes when there is only one or two
    words :
    >
    > Translation / reference / score
    > Contents / Content / 0.8409
    > Ireland / Irish / 0.8409
    > Issuer / Italie / 0.8409
    > PT / US / 0.8409
    > .....
    > and so on, two words, unrelated will always generate similar
    0.8409 scores.
    >
    > for 2-grams
    > Very strong / Very high / 0.7598
    > Public sector / Public Sector / 0.7598
    > However : / But : / 0.7598
    >
    > so, for 2-grams, when one word only is good it will generate a
    score of
    > 0.7598
    >
    >
    > Thanks,
    >
    > Vincent
    >
    >
    > _______________________________________________
    > Moses-support mailing list
    > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
    > http://mailman.mit.edu/mailman/listinfo/moses-support
    _______________________________________________
    Moses-support mailing list
    Moses-support@mit.edu <mailto:Moses-support@mit.edu>
    http://mailman.mit.edu/mailman/listinfo/moses-support

    _______________________________________________
    Moses-support mailing list
    Moses-support@mit.edu <mailto:Moses-support@mit.edu>
    http://mailman.mit.edu/mailman/listinfo/moses-support



_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to