Revision 3671 introduces an updated version of kenlm. Queries are faster now (no more string vocab lookups, state is kept so backoffs cost less). The binary format has changed as a result; please rebuild your binary files. Timing is forthcoming.
Kenneth On 10/18/10 20:31, Kenneth Heafield wrote: > Hi Moses, > > Introducing kenlm in Moses trunk. You no longer need to download a > separate language model to use Moses; it's distributed with Moses and > compiled in by default on UNIX. This is threadsafe language model > inference code that returns the same probabilities as SRI (up to > floating point rounding). It loads APRA files in 2/3 the time SRI takes > and uses less memory too. Using kenlm is simple: in your [lmodel-file] > section, change the first digit to 8. For example, > > "0 0 2 foo.arpa" changes to "8 0 2 foo.arpa" > > For even faster loading, use the binary format: > > kenlm/build_binary foo.arpa foo.binary > > then simply provide the binary filename in your moses.ini e.g. > "8 0 2 foo.binary"; it auto detects binary files using magic bytes at > the beginning. > > The code is ready for use and provides correct results. Inference is > slower than it should be due to inefficiencies in the Moses-side wrapper > code (it does a vocab lookup for all 5 words every time). I'm working > on it and once this is done I'll post some benchmarks against SRI and > IRST. The binary format is subject to change, but contains a version > number so on very rare occasions after, new versions will tell you to > rebuild your binary files. Windows is currently not supported (it uses > mmap) though I welcome contributions using #ifdef and CreateFileMapping. > > Have fun and let me know about your experiences with it. > > "Ken" > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support