the problem is that there are many ways to create five grams etc from common words --words you will find in a large parallel set. you will save on space, but it will not be significant.
nowadays there really is no need to use SRILM at all. before considering drastic action, look at these other methods that I mentioned which are explicitly designed to be more space efficient and do not involve any filtering. filtering is an act of desperation Miles On 24 November 2011 13:22, Thomas Schoenemann <thomas_schoenem...@yahoo.de> wrote: > Dear Miles, > thank you for the quick answer! Does it not even save (significant) space > for a trigram or higher? That's my major concern. > Concerning OOVs, I understand. The filtering would have to be a little more > refined then. > Best, > Thomas > ________________________________ > Von: Miles Osborne <mi...@inf.ed.ac.uk> > An: Thomas Schoenemann <thomas_schoenem...@yahoo.de> > Cc: "moses-support@mit.edu" <moses-support@mit.edu> > Gesendet: 14:07 Donnerstag, 24.November 2011 > Betreff: Re: [Moses-support] Filtering LMs > > this can be done, but it tends to not save much space. also it does > not help deal with OOVs, which the language model can still score even > though they are not in the parallel set. > > if you are worried about saving space then you should either look at > KenLM or RandLM > > Miles > > On 24 November 2011 12:58, Thomas Schoenemann > <thomas_schoenem...@yahoo.de> wrote: >> Dear all, >> I hope that this is not too stupid a question, and that it hasn't been >> asked recently. >> In the MOSES EMS, when running experiments the phrase table is >> automatically >> reduced to only those phrases that actually occur in the respective >> dev/test >> set. Obviously this saves a lot of memory without changing the resulting >> translations. >> >> Now, I was wondering if something similar can be done/is done with the >> language model. That is, can one reduce the ARPA-file to only those words >> that occur on the target side in the (filtered) phrase table? The >> objective >> would of course be to maintain the translation result. Would the >> LM-software >> renormalize internally if some of the original entries are removed? Then >> the >> results would differ. >> This may even depend on what language model you use to load (rather than >> train) the ARPA file. I am using SRILM in my own translation programs, but >> would also be interested in other toolkits in case they behave more >> suitably. >> >> Can anyone point me to anything? >> Many thanks! >> Thomas Schoenemann (currently University of Pisa) >> >> _______________________________________________ >> Moses-support mailing list >> Moses-support@mit.edu >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support