Quantization is also supported in the grammar packer. Another idea: since we know the model weights when we publish a language pack, we should pre-compute the dot product of the weight vector against the grammar weights and reduce it to a single (quantized) score.
(This would reduce the ability for users to play with the individual weights, but I don't think that's a huge loss, since the main weight is LM vs. TM). matt > On May 13, 2016, at 4:45 PM, Matt Post <p...@cs.jhu.edu> wrote: > > Oh, yes, of course. That's in build_binary. > > >> On May 13, 2016, at 4:39 PM, kellen sunderland <kellen.sunderl...@gmail.com> >> wrote: >> >> Could we also use quantization with the language model to reduce the size? >> KenLM supports this right? >> >> On Fri, May 13, 2016 at 1:19 PM, Matt Post <p...@cs.jhu.edu> wrote: >> >>> Great idea, hadn't thought of that. >>> >>> I think we could also get some leverage out of: >>> >>> - Reducing the language model to a 4-gram one >>> - Doing some filtering of the phrase table to reduce low-probability >>> translation options >>> >>> These would be a bit lossier but I doubt it would matter much at all. >>> >>> matt >>> >>> >>>> On May 13, 2016, at 4:02 PM, Tom Barber <t...@analytical-labs.com> wrote: >>>> >>>> Out of curiosity more than anything else I tested XZ compression on a >>> model >>>> instead of Gzip, it takes the Spain pack down from 1.9GB to 1.5GB, not >>> the >>>> most ever, but obviously does mean 400MB+ less in remote storage and data >>>> going over the wire. >>>> >>>> Worth considering I guess. >>>> >>>> Tom >>>> -------------- >>>> >>>> Director Meteorite.bi - Saiku Analytics Founder >>>> Tel: +44(0)5603641316 >>>> >>>> (Thanks to the Saiku community we reached our Kickstart >>>> < >>> http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/ >>>> >>>> goal, but you can always help by sponsoring the project >>>> <http://www.meteorite.bi/products/saiku/sponsorship>) >>> >>> >