For reference to the list.
-------- Original Message --------
Subject: Re: [Moses-support] Is there multithread option for KenLM's
build_binary?
Date: Fri, 07 Aug 2015 22:02:36 +0200
From: Marcin Junczys-Dowmunt <junc...@amu.edu.pl>
To: liling tan <alvati...@gmail.com>
Hi Liling,
There is a switch e.g. "-S 30G" with which you can control memory usage.
I actually recommend setting it to something smaller than 50G, otherwise
memory mapping sometimes goes crazy and the process stalls (OS bug?).
Cannot tell you much about the time needed. I think I binarized a 250G
gzipped ARPA on a similar machine in less than 24 hours. Sometimes the
process will slow down a lot, that will be on-disk sorting of ngrams
before crunching them into a trie. I doubt multi-threading would help much.
Quantization is lossy. In practice however you may not notice any
difference in BLEU. I am using "-a 22 -q 8 -b 8" and I haven't seen any
quality loss. I think there were increases in perplexity, but
translation quality was not affected.
Best,
Marcin
On 07.08.2015 21:31, liling tan wrote:
Dear Moses dev/users,
On a related note, without multi-threads, can anyone give a gauge of
how much RAM is required to binarized a 80GB (compressed .gz) 6gram
arpa file? The no. of ngrams are:
\data\
ngram 1=7503209
ngram 2=131003943
ngram 3=671005861
ngram 4=1510529519
ngram 5=2165163610
ngram 6=2477533666
Also, how long would it take (single-threadedly) on a 2.4Ghz core with
128GB RAM? Is there a way to mathematically estimate the time taken
and RAM required to binarize a language model?
Also, is binarized and quantized LM from KenLM lossy? If so how lossy?
The KenLM paper states "To conserve memory at the expense of accuracy,
values may be quantized using q bits per probability and r bits per
backoff". Can someone help point us to papers that quanitfy how lossy
it gets in terms of MT experiments or word perplexity task?
Thanks in advance for the pointers!
Regards,
Liling
On Fri, Aug 7, 2015 at 8:56 PM, liling tan <alvati...@gmail.com
<mailto:alvati...@gmail.com>> wrote:
Dear Moses dev/users,
Is there multithread option for KenLM's build_binary?
Regards,
Liling
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support