Re: [Moses-support] KenLM: "The context of every 4-gram should appear as a 3-gram"

Sylvain Raybaud Thu, 16 Feb 2012 09:24:34 -0800

Hi

  No, I haven't turned on pruning. I've been looking in IRSTLM manual if
it was on by default but I couldn't find the information (and I couldn't
find an up to date manual either, only for version 5.60.something).


Since it seems to depend on the smoothing method, maybe msb turns it on,
but not sb?

The solution you propose would indeed make me happy :) Actually, I just
need it to run with moses and yield acceptable performance to be happy.
I can even live with -lm=sb, since finding the best LM parameters isn't
the core of my research :)

thanks for your reply!

cheers,

Sylvain

On 16/02/12 17:46, Kenneth Heafield wrote:
> Hi,
> 
>       This is hopefully a stupid question.  Did you turn on pruning?  I don't 
> see it in the command line: "tlm -tr=toy.sent_start_end.en -lm=msb -n=5 
> -o=toy.en.n5.lm".  Or did IRSTLM make pruning the default in new releases?
> 
>       KenLM should be accepting pruned models and I take responsibility for 
> that.  But I am also confused as to how "to support them" did not appear 
> if pruning was off.
> 
> Kenneth
> 
> On 02/16/2012 10:16 AM, Kenneth Heafield wrote:
>> Hi,
>>
>>      Interesting.  The only other person to run into this is David Chiang
>> who had some custom software to prune/build models.
>>
>>      I have been requiring that property to make right state minimization
>> work correctly: if it doesn't match "to support them" then the right
>> state contains at most "support them", rendering "to support them ."
>> inaccessible.  I could reinsert "to support them" when this happens,
>> with p(to support them) = b(to support)p(support them) and b(to support
>> them) = 0.
>>
>>      It's a bit of a pain to do this correctly.  Would you be happy if only
>> the default probing model supported it, but the trie continued to throw
>> an error message?
>>
>>      The ARPA standard, to the extent that there is one, does not require
>> this behavior, so IRSTLM is within their rights to prune them.
>>
>>      Nicola, how does IRSTLM handle these cases at inference time?
>>
>> Kenneth
>>
>> On 02/16/2012 07:59 AM, Sylvain Raybaud wrote:
>>> Hi
>>>
>>>     LM stuff again!
>>>
>>> I've created a language model with IRSTLM (release 5.70.04):
>>> tlm -tr=toy.sent_start_end.en -lm=msb -n=5 -o=toy.en.n5.lm
>>>
>>> When I specify type 1 (IRSTLM) in moses.ini it's loading fine. But if I
>>> try to load it with KenLM I get:
>>>
>>> The context of every 4-gram should appear as a 3-gram Byte: 471440 File:
>>> /global/markov/raybauds/DATA/TOY/toy.en.n5.lm
>>>
>>> Byte 471440 seems to be the '\n' between the following lines:
>>> -1.16894        to support them .       -0.0679314
>>> -0.836008       to deal with hamas
>>>
>>> As a matter of fact, "to support them" does not appear as a trigram in
>>> the model. If I remove this 4-gram the same problem arises with another
>>> one, whose 3-gram prefix is also missing. I think it is the problem. If
>>> I change the smoothing method to "sb" instead of "msb" I get a usable
>>> LM. Is this normal behavior? Do you think it's a KenLM or an IRSTLM
>>> related problem?
>>>
>>>
>>> cheers,
>>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


-- 
Sylvain Raybaud
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] KenLM: "The context of every 4-gram should appear as a 3-gram"

Reply via email to