Hi Fatma,

another frequent problem with “real-world”  Arabic script (or, basically,
with anything that is not just “plain ASCII”) is that text may often
contain invisible or unexpected unicode characters, like right-to-left
markers, non-ascii space, ligatures, etc pp.
Token matching within Moses happens on the “byte string” level, not on a
“visual” level, so any of those characters left either during training or
during translation may prevent phrase table entries from matching. The
simplest way to check whether this happens is trying to find the
corresponding string in your (preprocessed) training data, the phrase
table, and your input, and compare on the level of unicode code points.

Best,

Gregor



-----Original Message-----
From: Rico Sennrich <rico.sennr...@gmx.ch>
Date: Friday 19 June 2015 11:27
To: "moses-support@mit.edu" <moses-support@mit.edu>
Subject: Re: [Moses-support] problem in translation

>fatma elzahraa Eltaher <fatmaeltaher@...> writes:
>
>> 
>> Dears,
>> I have a problem in translation. After building Moses model , I try to
>test it by a  word but the output was the same word.
>> I did not know where is the problem? could you help me?
>> kindly find attached pic.
>> 
>> 
>> 
>> thank you,
>
>hello Fatma,
>
>I'd check if your input words are in your phrase table, and if they're
>correctly aligned to English words. I don't know how you trained your
>model,
>but the words could be unknown because you have too little training data,
>or
>because you mixed up the languages in the training corpora. Another
>possibility is that you have sentences in your training data that are
>Arabic
>on both sides of your parallel corpus. A look at the
>phrasg�х����͡�ձ�)ѕ����ԁ��ɔ�()���Ёݥ͡�̰)I��
>


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to