Hi No, I haven't turned on pruning. I've been looking in IRSTLM manual if it was on by default but I couldn't find the information (and I couldn't find an up to date manual either, only for version 5.60.something).
Since it seems to depend on the smoothing method, maybe msb turns it on, but not sb? The solution you propose would indeed make me happy :) Actually, I just need it to run with moses and yield acceptable performance to be happy. I can even live with -lm=sb, since finding the best LM parameters isn't the core of my research :) thanks for your reply! cheers, Sylvain On 16/02/12 17:46, Kenneth Heafield wrote: > Hi, > > This is hopefully a stupid question. Did you turn on pruning? I don't > see it in the command line: "tlm -tr=toy.sent_start_end.en -lm=msb -n=5 > -o=toy.en.n5.lm". Or did IRSTLM make pruning the default in new releases? > > KenLM should be accepting pruned models and I take responsibility for > that. But I am also confused as to how "to support them" did not appear > if pruning was off. > > Kenneth > > On 02/16/2012 10:16 AM, Kenneth Heafield wrote: >> Hi, >> >> Interesting. The only other person to run into this is David Chiang >> who had some custom software to prune/build models. >> >> I have been requiring that property to make right state minimization >> work correctly: if it doesn't match "to support them" then the right >> state contains at most "support them", rendering "to support them ." >> inaccessible. I could reinsert "to support them" when this happens, >> with p(to support them) = b(to support)p(support them) and b(to support >> them) = 0. >> >> It's a bit of a pain to do this correctly. Would you be happy if only >> the default probing model supported it, but the trie continued to throw >> an error message? >> >> The ARPA standard, to the extent that there is one, does not require >> this behavior, so IRSTLM is within their rights to prune them. >> >> Nicola, how does IRSTLM handle these cases at inference time? >> >> Kenneth >> >> On 02/16/2012 07:59 AM, Sylvain Raybaud wrote: >>> Hi >>> >>> LM stuff again! >>> >>> I've created a language model with IRSTLM (release 5.70.04): >>> tlm -tr=toy.sent_start_end.en -lm=msb -n=5 -o=toy.en.n5.lm >>> >>> When I specify type 1 (IRSTLM) in moses.ini it's loading fine. But if I >>> try to load it with KenLM I get: >>> >>> The context of every 4-gram should appear as a 3-gram Byte: 471440 File: >>> /global/markov/raybauds/DATA/TOY/toy.en.n5.lm >>> >>> Byte 471440 seems to be the '\n' between the following lines: >>> -1.16894 to support them . -0.0679314 >>> -0.836008 to deal with hamas >>> >>> As a matter of fact, "to support them" does not appear as a trigram in >>> the model. If I remove this 4-gram the same problem arises with another >>> one, whose 3-gram prefix is also missing. I think it is the problem. If >>> I change the smoothing method to "sb" instead of "msb" I get a usable >>> LM. Is this normal behavior? Do you think it's a KenLM or an IRSTLM >>> related problem? >>> >>> >>> cheers, >>> >> _______________________________________________ >> Moses-support mailing list >> Moses-support@mit.edu >> http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support -- Sylvain Raybaud _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support