Hi,

my colleagues and I noticed the following in the KenLM code when a Hypo is 
evaluated with the LM:


https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/Ken.cpp#L203


Do we understand it correctly that because of this line, for phrases longer 
than the LM order N only the first N words are scored with the LM, the 
subsequent words are not scored?  At least I don't see a call to add their 
scores anywhere, they are just passed on to update the LM state in lines 
222-225.


Please clarify. It seems like a phrase should be scored by the LM completely, 
otherwise longer phrases which start with frequent n-grams but have unlikely 
word sequences afterwards are wrongly preferred. Also, longer phrases are 
preferred in general with such scoring.


Thanks,


Evgeny.

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to