Hi Fatma, another frequent problem with “real-world” Arabic script (or, basically, with anything that is not just “plain ASCII”) is that text may often contain invisible or unexpected unicode characters, like right-to-left markers, non-ascii space, ligatures, etc pp. Token matching within Moses happens on the “byte string” level, not on a “visual” level, so any of those characters left either during training or during translation may prevent phrase table entries from matching. The simplest way to check whether this happens is trying to find the corresponding string in your (preprocessed) training data, the phrase table, and your input, and compare on the level of unicode code points.
Best, Gregor -----Original Message----- From: Rico Sennrich <rico.sennr...@gmx.ch> Date: Friday 19 June 2015 11:27 To: "moses-support@mit.edu" <moses-support@mit.edu> Subject: Re: [Moses-support] problem in translation >fatma elzahraa Eltaher <fatmaeltaher@...> writes: > >> >> Dears, >> I have a problem in translation. After building Moses model , I try to >test it by a word but the output was the same word. >> I did not know where is the problem? could you help me? >> kindly find attached pic. >> >> >> >> thank you, > >hello Fatma, > >I'd check if your input words are in your phrase table, and if they're >correctly aligned to English words. I don't know how you trained your >model, >but the words could be unknown because you have too little training data, >or >because you mixed up the languages in the training corpora. Another >possibility is that you have sentences in your training data that are >Arabic >on both sides of your parallel corpus. A look at the >phrasg�х����͡�ձ�)ѕ����ԁ��ɔ�()���Ёݥ͡�̰)I�� > _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support