I find the following code in the moses/TranslationOptionCollection.cpp isDigit = s.find_first_of(“0123456789”); if (isDigit == 1) isDigit = 1; else isDigit == 0;
But nearly the same code segment appears in the moses/ChartParser.cpp isDigit = s.find_first_of(“0123456789”); if (isDigit == string::npos) isDigit = 0; else isDigit == 1; I guess that it is to treat a token which contains a digit as a normal work not an unknown word. However, the digit ‘0’ is a character in the Latin Mongolian. So those work which contain digit will be treated as a known word. It is strange that at the MEMT, unknown words does not appear in the nbest file, but appear in the final dev result file. 在 2013年7月25日,2:44,Hieu Hoang <hieu.ho...@ed.ac.uk> 写道: > I think you asked this question before. I check and was pretty sure it works. > > How exactly are you running Moses? Can you send me your config files and any > other info that you think might be useful to debug this issue. > > On 23 July 2013 07:46, Li Xiang <lixiang....@gmail.com> wrote: > At MERT stage, I open the switch "-drop-unknown" for decoder moses_chart. But > some oov works sill appear in the output translation. I carefully check the > source traing data, but I does not find the oov words. > > The source language is latin mongolian. Its character consists of "0 % _ -" > additionally. > > Whether the switch option does not play a rule for MERT? > > -- > Xiang Li > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > -- > Hieu Hoang > Research Associate > University of Edinburgh > http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support