this age group
is decoded as
ce groupe d " âge

I'll check my corpus and see why it got " instead of ' in there.

thanks.


Le 10/03/2016 13:00, Philipp Koehn a écrit :
Hi,

I do not think that the detokenizer would cause conversion of ' to ".
You can check the raw output of the decoder, and see how it is
changed by the detokenizer.

-phi

On Wed, Mar 9, 2016 at 11:44 AM, Vincent Nguyen <vngu...@neuf.fr <mailto:vngu...@neuf.fr>> wrote:

    Hi,

    I got the following situation:

    This group age
    is translated sometimes in:
    ce groupe d'âge (correct)
    ce groupe d" âge (incorrect)
    ce groupe d "âge (incorrect)

    I am wondering if this is more a detokenizer issue or a corpus
    issue, or
    both.

    Technically in French, there shouldn't be any space before or
    after the
    apostrophe.
    In the Europarl Corpus, as well as in the News2014 one, there are some
    instances with a space before or after.

    Then I have the feeling that the decoder gets a &apos; with
    surrounding
    spaces leading to the detokenizer to transform into "

    Anyone with a similar issue ?

    thanks.
    _______________________________________________
    Moses-support mailing list
    Moses-support@mit.edu <mailto:Moses-support@mit.edu>
    http://mailman.mit.edu/mailman/listinfo/moses-support



_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to