Re: [Moses-support] bleu-annotation / analysis.perl
Thanks Phil. I figured out for the lowercase thing thanks. For the short n-grams this is not exactly what I meant. Any 1-gram sentence will give either 1 when exact match or 0.8409 when different match Any 2-gram sentence will give 1 when exact match or 0.7598 when 1 word match My point is that this is a twist of the BLEU+ algorithm. it is supposed to avoid to get a 0 score when there is no match under or equal 4-gram (because of the geometric mean) but it twists the scores of short segments giving a much too high score. By the way : The reason I am looking into this is that I am using sentence-level bleu to filter some noisy corpora. For instance, out of 6 million sentences I will keep only the sentences with a Bleu score > XX to avoid keeping misaligned segments. Le 04/03/2016 21:59, Philipp Koehn a écrit : Hi, this BLEU calculation happens in the function bleu_annotation in lines 224ff in scripts/ems/support/analysis.perl You could convert the system translation $system and the reference translations to $REFERENCE[$i] to lowercase (lc) if you prefer that. The code suggests that n-gram precision for sentences of length < n is treated as 100% - which may be not what you want, but it is a degenerate case, so treating it is a bit undefined. -phi On Sat, Feb 27, 2016 at 6:20 AM, <vngu...@neuf.fr <mailto:vngu...@neuf.fr>> wrote: Ok obviously this is a modified bleu+ algorithm, similar to what sentence-bleu does. However I believe this is still not right for unigram sentences. De : "Vincent Nguyen" Date : 26 févr. 2016 22:21:59 A : moses-support@mit.edu <mailto:moses-support@mit.edu> Sujet : Re: [Moses-support] bleu-annotation / analysis.perl Am I correct saying that when sentences length is less or equal to 4 tokens then the BLEU score should be 1 for exact matches and 0 when not exact match ? (by definition of http://www1.cs.columbia.edu/nlp/sgd/bleu.pdf) Le 26/02/2016 10:02, Vincent Nguyen a écrit : > Hi, > > I would like to understand better the analysis.perl script that > generates the bleu-annotation file. > > Is there an easy way to get the uncased bleu score of each line instead > of the cased calculation ? > Am I right that this script recompute its own Bleu score without calling > the Nist-Bleu nor Multi-Bleu external scripts ? > > > Also I find it strange sometimes when there is only one or two words : > > Translation / reference / score > Contents / Content / 0.8409 > Ireland / Irish / 0.8409 > Issuer / Italie / 0.8409 > PT / US / 0.8409 > . > and so on, two words, unrelated will always generate similar 0.8409 scores. > > for 2-grams > Very strong / Very high / 0.7598 > Public sector / Public Sector / 0.7598 > However : / But : / 0.7598 > > so, for 2-grams, when one word only is good it will generate a score of > 0.7598 > > > Thanks, > > Vincent > > > ___ > Moses-support mailing list > Moses-support@mit.edu <mailto:Moses-support@mit.edu> > http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu <mailto:Moses-support@mit.edu> http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu <mailto:Moses-support@mit.edu> http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] bleu-annotation / analysis.perl
Hi, this BLEU calculation happens in the function bleu_annotation in lines 224ff in scripts/ems/support/analysis.perl You could convert the system translation $system and the reference translations to $REFERENCE[$i] to lowercase (lc) if you prefer that. The code suggests that n-gram precision for sentences of length < n is treated as 100% - which may be not what you want, but it is a degenerate case, so treating it is a bit undefined. -phi On Sat, Feb 27, 2016 at 6:20 AM, <vngu...@neuf.fr> wrote: > Ok obviously this is a modified bleu+ algorithm, similar to what > sentence-bleu does. > However I believe this is still not right for unigram sentences. > > > > > > De : "Vincent Nguyen" > Date : 26 févr. 2016 22:21:59 > A : moses-support@mit.edu > > Sujet : Re: [Moses-support] bleu-annotation / analysis.perl > > > > Am I correct saying that when sentences length is less or equal to 4 > tokens then the BLEU score should be 1 for exact matches and 0 when not > exact match ? > (by definition of http://www1.cs.columbia.edu/nlp/sgd/bleu.pdf) > > > Le 26/02/2016 10:02, Vincent Nguyen a écrit : > > Hi, > > > > I would like to understand better the analysis.perl script that > > generates the bleu-annotation file. > > > > Is there an easy way to get the uncased bleu score of each line instead > > of the cased calculation ? > > Am I right that this script recompute its own Bleu score without calling > > the Nist-Bleu nor Multi-Bleu external scripts ? > > > > > > Also I find it strange sometimes when there is only one or two words : > > > > Translation / reference / score > > Contents / Content / 0.8409 > > Ireland / Irish / 0.8409 > > Issuer / Italie / 0.8409 > > PT / US / 0.8409 > > . > > and so on, two words, unrelated will always generate similar 0.8409 > scores. > > > > for 2-grams > > Very strong / Very high / 0.7598 > > Public sector / Public Sector / 0.7598 > > However : / But : / 0.7598 > > > > so, for 2-grams, when one word only is good it will generate a score of > > 0.7598 > > > > > > Thanks, > > > > Vincent > > > > > > ___ > > Moses-support mailing list > > Moses-support@mit.edu > > http://mailman.mit.edu/mailman/listinfo/moses-support > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] bleu-annotation / analysis.perl
Ok obviously this is a modified bleu algorithm, similar to what sentence-bleu does.However I believe this is still not right for unigram sentences.De : "Vincent Nguyen"Date : 26 févr. 2016 22:21:59A : moses-support@mit.eduSujet : Re: [Moses-support] bleu-annotation / analysis.perlAm I correct saying that when sentences length is less or equal to 4 tokens then the BLEU score should be 1 for exact matches and 0 when not exact match ?(by definition of http://www1.cs.columbia.edu/nlp/sgd/bleu.pdf)Le 26/02/2016 10:02, Vincent Nguyen a écrit :___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] bleu-annotation / analysis.perl
Am I correct saying that when sentences length is less or equal to 4 tokens then the BLEU score should be 1 for exact matches and 0 when not exact match ? (by definition of http://www1.cs.columbia.edu/nlp/sgd/bleu.pdf) Le 26/02/2016 10:02, Vincent Nguyen a écrit : > Hi, > > I would like to understand better the analysis.perl script that > generates the bleu-annotation file. > > Is there an easy way to get the uncased bleu score of each line instead > of the cased calculation ? > Am I right that this script recompute its own Bleu score without calling > the Nist-Bleu nor Multi-Bleu external scripts ? > > > Also I find it strange sometimes when there is only one or two words : > > Translation / reference / score > Contents / Content / 0.8409 > Ireland / Irish / 0.8409 > Issuer / Italie / 0.8409 > PT / US / 0.8409 > . > and so on, two words, unrelated will always generate similar 0.8409 scores. > > for 2-grams > Very strong / Very high / 0.7598 > Public sector / Public Sector / 0.7598 > However : / But : / 0.7598 > > so, for 2-grams, when one word only is good it will generate a score of > 0.7598 > > > Thanks, > > Vincent > > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] bleu-annotation / analysis.perl
Hi, I would like to understand better the analysis.perl script that generates the bleu-annotation file. Is there an easy way to get the uncased bleu score of each line instead of the cased calculation ? Am I right that this script recompute its own Bleu score without calling the Nist-Bleu nor Multi-Bleu external scripts ? Also I find it strange sometimes when there is only one or two words : Translation / reference / score Contents / Content / 0.8409 Ireland / Irish / 0.8409 Issuer / Italie / 0.8409 PT / US / 0.8409 . and so on, two words, unrelated will always generate similar 0.8409 scores. for 2-grams Very strong / Very high / 0.7598 Public sector / Public Sector / 0.7598 However : / But : / 0.7598 so, for 2-grams, when one word only is good it will generate a score of 0.7598 Thanks, Vincent ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support