Dear Moses Support Team,
  
   I added a source context-dependent  translation feature in moses baseline 
system.
   In order to avoid  modifying the source code, i append a unique identifier 
to every word in the test/dev source file.
   for example, a source file with two lines like the following: 
      this is sentence 1
     .  sentence 2
would become this~~1 is~~2 sentence~~3 1~~4, .~~5 sentence~~6 2~~7.
Then, i generate my sentence-specific phrase tables for each sentence, use the 
same IDs as the source file words in those phrase table entries. 
I concatenate all the phrase tables together, then MERT and Decoder as usual. 
 
I do my experiments on Chinese2English translation tasks, and I found that in 
the output file the oov words still have IDs .
E.g. the translation of one NIST03 sentence are as follows:
 published by the british science weekly , according to the study by the 14th 
on chromosome sequencing of genes and gene segments 一千零五十~~97 .
     一千零五十~~97 ~~97 is the ID of word " 一千零五十"
I found that when i remove IDs in the output file, the BLEU scores are 
significantly difference. I have no idea what happens ? could you give me some 
advices?
I use mteval-v13a.pl scripts to calculate BLEU scores in my experiment .




Thanks,
Liang


       

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to