Dear Moses Support Team, I added a source context-dependent translation feature in moses baseline system. In order to avoid modifying the source code, i append a unique identifier to every word in the test/dev source file. for example, a source file with two lines like the following: this is sentence 1 . sentence 2 would become this~~1 is~~2 sentence~~3 1~~4, .~~5 sentence~~6 2~~7. Then, i generate my sentence-specific phrase tables for each sentence, use the same IDs as the source file words in those phrase table entries. I concatenate all the phrase tables together, then MERT and Decoder as usual. I do my experiments on Chinese2English translation tasks, and I found that in the output file the oov words still have IDs . E.g. the translation of one NIST03 sentence are as follows: published by the british science weekly , according to the study by the 14th on chromosome sequencing of genes and gene segments 一千零五十~~97 . 一千零五十~~97 ~~97 is the ID of word " 一千零五十" I found that when i remove IDs in the output file, the BLEU scores are significantly difference. I have no idea what happens ? could you give me some advices? I use mteval-v13a.pl scripts to calculate BLEU scores in my experiment .
Thanks, Liang
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support