Oh, yeah, of course. So to summarize: Language model:
- quantize - try four grams Packing: - pre-compute dot product with weight vector - quantize the score - prune out low-probability translation options - pre-sort the grammar This would make things both smaller and faster, and most of this wouldn't even require changing any code (pre-sorting might). I am working on fixing up the language pack pages; I want to have a suite of test sets for each language pack, so we can know whether a new language pack is better. matt > On May 13, 2016, at 5:11 PM, kellen sunderland <kellen.sunderl...@gmail.com> > wrote: > > That's a great idea, can we pre-sort the grammar as well? > > On Fri, May 13, 2016 at 1:47 PM, Matt Post <p...@cs.jhu.edu> wrote: > >> Quantization is also supported in the grammar packer. >> >> Another idea: since we know the model weights when we publish a language >> pack, we should pre-compute the dot product of the weight vector against >> the grammar weights and reduce it to a single (quantized) score. >> >> (This would reduce the ability for users to play with the individual >> weights, but I don't think that's a huge loss, since the main weight is LM >> vs. TM). >> >> matt >> >> >>> On May 13, 2016, at 4:45 PM, Matt Post <p...@cs.jhu.edu> wrote: >>> >>> Oh, yes, of course. That's in build_binary. >>> >>> >>>> On May 13, 2016, at 4:39 PM, kellen sunderland < >> kellen.sunderl...@gmail.com> wrote: >>>> >>>> Could we also use quantization with the language model to reduce the >> size? >>>> KenLM supports this right? >>>> >>>> On Fri, May 13, 2016 at 1:19 PM, Matt Post <p...@cs.jhu.edu> wrote: >>>> >>>>> Great idea, hadn't thought of that. >>>>> >>>>> I think we could also get some leverage out of: >>>>> >>>>> - Reducing the language model to a 4-gram one >>>>> - Doing some filtering of the phrase table to reduce low-probability >>>>> translation options >>>>> >>>>> These would be a bit lossier but I doubt it would matter much at all. >>>>> >>>>> matt >>>>> >>>>> >>>>>> On May 13, 2016, at 4:02 PM, Tom Barber <t...@analytical-labs.com> >> wrote: >>>>>> >>>>>> Out of curiosity more than anything else I tested XZ compression on a >>>>> model >>>>>> instead of Gzip, it takes the Spain pack down from 1.9GB to 1.5GB, not >>>>> the >>>>>> most ever, but obviously does mean 400MB+ less in remote storage and >> data >>>>>> going over the wire. >>>>>> >>>>>> Worth considering I guess. >>>>>> >>>>>> Tom >>>>>> -------------- >>>>>> >>>>>> Director Meteorite.bi - Saiku Analytics Founder >>>>>> Tel: +44(0)5603641316 >>>>>> >>>>>> (Thanks to the Saiku community we reached our Kickstart >>>>>> < >>>>> >> http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/ >>>>>> >>>>>> goal, but you can always help by sponsoring the project >>>>>> <http://www.meteorite.bi/products/saiku/sponsorship>) >>>>> >>>>> >>> >> >>