Re: [Moses-support] Merging language models with IRSTLM..?

2012-04-26 Thread Marcello Federico
Hi,

we are currently working on  a project that includes incremental training of 
LMs. 
Hence, there are plans to introduce quick adaptation in IRSTLM, but not soon.

The question is indeed how often you need to adapt the LM. If you are working
with large news LMs then it seems that adapting once a week is enough (you
simply do not collect enough data in fewer days to significantly change the 
LM). 

If you want to continuously update the LM you can also consider using an 
external 
interpolation.  You interpolate two distinct LMs, one fixed and one smaller 
that 
is continuously retrained (should be fast to do), using the interpolate-lm 
command (see manual).

Greetings,
Marcello

 

On Apr 22, 2012, at 9:12 PM, Pratyush Banerjee wrote:

 Hi,
 
 I have recently been trying to create incremental adapted language models 
 using IRSTLM.
 
 I have a in-domain data set on which the mixture adapted weights are computed 
 using the -lm=mix option and i have a larger out-domain dataset from which i 
 incrementally add data to create adapted LMs of different size.
 
 Currently, every time saveBIN is called, the entire lmtable is estimated and 
 saved which makes the process slow...
 
 Is there a functionality in IRSTLM to incrementally train/save adapted 
 Language models?
 
 Secondly, given a existing adapted language model in ARPA format (old), and 
 another small language model built on incremental data (new), 
 
 would it be safe to update the smoothed probabilities (fstar) using the 
 following formula: 
 c_sum(wh) = c_old(wh) + c_new(wh)
 f*_old(w|h)*(c_old(wh)/c_sum(wh)) + f*_new(w|h)*(c_new(wh)/c_sum(wh))
 
 where the c_old and c_new counts are estimated from the ngram tables?
 
 
 Thanks and Regards,
 
 Pratyush
 
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Merging language models with IRSTLM..?

2012-04-22 Thread Pratyush Banerjee
Hi,

I have recently been trying to create incremental adapted language models
using IRSTLM.

I have a in-domain data set on which the mixture adapted weights are
computed using the -lm=mix option and i have a larger out-domain dataset
from which i incrementally add data to create adapted LMs of different size.

Currently, every time saveBIN is called, the entire lmtable is estimated
and saved which makes the process slow...

Is there a functionality in IRSTLM to incrementally train/save adapted
Language models?

Secondly, given a existing adapted language model in ARPA format (old), and
another small language model built on incremental data (new),

would it be safe to update the smoothed probabilities (fstar) using the
following formula:
c_sum(wh) = c_old(wh) + c_new(wh)
f*_old(w|h)*(c_old(wh)/c_sum(wh)) + f*_new(w|h)*(c_new(wh)/c_sum(wh))

where the c_old and c_new counts are estimated from the ngram tables?


Thanks and Regards,

Pratyush
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support