Re: [Moses-support] KenLM scoring of long target phrases

2016-04-19 Thread Kenneth Heafield
Hi,

Any words beyond N-1 have full context and are included in the
phrase's score.  So it's hypothesis + target phrase + adjustments.  And
the routine you cite is computing adjustments. 

Kenneth

On 04/19/16 10:50, Evgeny Matusov wrote:
>
> Hi,
>
>
> my colleagues and I noticed the following in the KenLM code when a
> Hypo is evaluated with the LM:
>
>
> https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/Ken.cpp#L203
>
>
> Do we understand it correctly that because of this line, for phrases
> longer than the LM order N only the first N words are scored with the
> LM, the subsequent words are not scored?  At least I don't see a call
> to add their scores anywhere, they are just passed on to update the LM
> state in lines 222-225.
>
>
> Please clarify. It seems like a phrase should be scored by the LM
> completely, otherwise longer phrases which start with frequent
> n-grams but have unlikely word sequences afterwards are wrongly
> preferred. Also, longer phrases are preferred in general with such
> scoring.
>
>
> Thanks,
>
>
> Evgeny.
>
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] KenLM scoring of long target phrases

2016-04-19 Thread Evgeny Matusov
Hi,


my colleagues and I noticed the following in the KenLM code when a Hypo is 
evaluated with the LM:


https://github.com/moses-smt/mosesdecoder/blob/master/moses/LM/Ken.cpp#L203


Do we understand it correctly that because of this line, for phrases longer 
than the LM order N only the first N words are scored with the LM, the 
subsequent words are not scored?  At least I don't see a call to add their 
scores anywhere, they are just passed on to update the LM state in lines 
222-225.


Please clarify. It seems like a phrase should be scored by the LM completely, 
otherwise longer phrases which start with frequent n-grams but have unlikely 
word sequences afterwards are wrongly preferred. Also, longer phrases are 
preferred in general with such scoring.


Thanks,


Evgeny.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support