[Moses-support] KenLM: "The context of every 4-gram should appear as a 3-gram"

Sylvain Raybaud Thu, 16 Feb 2012 05:00:59 -0800

Hi

  LM stuff again!


I've created a language model with IRSTLM (release 5.70.04):
tlm -tr=toy.sent_start_end.en -lm=msb -n=5 -o=toy.en.n5.lm

When I specify type 1 (IRSTLM) in moses.ini it's loading fine. But if I
try to load it with KenLM I get:

The context of every 4-gram should appear as a 3-gram Byte: 471440 File:
/global/markov/raybauds/DATA/TOY/toy.en.n5.lm

Byte 471440 seems to be the '\n' between the following lines:
-1.16894        to support them .       -0.0679314
-0.836008       to deal with hamas

As a matter of fact, "to support them" does not appear as a trigram in
the model. If I remove this 4-gram the same problem arises with another
one, whose 3-gram prefix is also missing. I think it is the problem. If
I change the smoothing method to "sb" instead of "msb" I get a usable
LM. Is this normal behavior? Do you think it's a KenLM or an IRSTLM
related problem?


cheers,

-- 
Sylvain Raybaud
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] KenLM: "The context of every 4-gram should appear as a 3-gram"

Reply via email to