Hi,
my colleagues and I noticed the following in the KenLM code when a Hypo is evaluated with the LM: https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/Ken.cpp#L203 Do we understand it correctly that because of this line, for phrases longer than the LM order N only the first N words are scored with the LM, the subsequent words are not scored? At least I don't see a call to add their scores anywhere, they are just passed on to update the LM state in lines 222-225. Please clarify. It seems like a phrase should be scored by the LM completely, otherwise longer phrases which start with frequent n-grams but have unlikely word sequences afterwards are wrongly preferred. Also, longer phrases are preferred in general with such scoring. Thanks, Evgeny.
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support