I find the following code in the moses/TranslationOptionCollection.cpp

isDigit = s.find_first_of(“0123456789”);
if (isDigit == 1)
    isDigit = 1;
else
    isDigit == 0;

But nearly the same code segment appears in the moses/ChartParser.cpp

isDigit = s.find_first_of(“0123456789”);
if (isDigit == string::npos)
    isDigit = 0;
else
    isDigit == 1;


I guess that it is to treat a token which contains a digit as a normal work not 
an unknown word. However, the digit ‘0’ is a character in the Latin Mongolian. 
So those work which contain digit  will be treated as a known word. It is 
strange that at the MEMT, unknown words does not appear in the nbest file, but 
appear in the final dev result file.


在 2013年7月25日,2:44,Hieu Hoang <hieu.ho...@ed.ac.uk> 写道:

> I think you asked this question before. I check and was pretty sure it works.
> 
> How exactly are you running Moses? Can you send me your config files and any 
> other info that you think might be useful to debug this issue.
> 
> On 23 July 2013 07:46, Li Xiang <lixiang....@gmail.com> wrote:
> At MERT stage, I open the switch "-drop-unknown" for decoder moses_chart. But 
> some oov works sill appear in the output translation. I carefully check the 
> source traing data, but I does not find the oov words.
> 
> The source language is latin mongolian. Its character consists of "0 % _ -" 
> additionally.
> 
> Whether the switch option does not play a rule for MERT?
> 
> -- 
> Xiang Li
> 
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 
> 
> -- 
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to