Re: Language Pack size

Matt Post Fri, 13 May 2016 13:47:56 -0700

Quantization is also supported in the grammar packer.

Another idea: since we know the model weights when we publish a language pack, 
we should pre-compute the dot product of the weight vector against the grammar 
weights and reduce it to a single (quantized) score.


(This would reduce the ability for users to play with the individual weights, 
but I don't think that's a huge loss, since the main weight is LM vs. TM).

matt


> On May 13, 2016, at 4:45 PM, Matt Post <p...@cs.jhu.edu> wrote:
> 
> Oh, yes, of course. That's in build_binary.
> 
> 
>> On May 13, 2016, at 4:39 PM, kellen sunderland <kellen.sunderl...@gmail.com> 
>> wrote:
>> 
>> Could we also use quantization with the language model to reduce the size?
>> KenLM supports this right?
>> 
>> On Fri, May 13, 2016 at 1:19 PM, Matt Post <p...@cs.jhu.edu> wrote:
>> 
>>> Great idea, hadn't thought of that.
>>> 
>>> I think we could also get some leverage out of:
>>> 
>>> - Reducing the language model to a 4-gram one
>>> - Doing some filtering of the phrase table to reduce low-probability
>>> translation options
>>> 
>>> These would be a bit lossier but I doubt it would matter much at all.
>>> 
>>> matt
>>> 
>>> 
>>>> On May 13, 2016, at 4:02 PM, Tom Barber <t...@analytical-labs.com> wrote:
>>>> 
>>>> Out of curiosity more than anything else I tested XZ compression on a
>>> model
>>>> instead of Gzip, it takes the Spain pack down from 1.9GB to 1.5GB, not
>>> the
>>>> most ever, but obviously does mean 400MB+ less in remote storage and data
>>>> going over the wire.
>>>> 
>>>> Worth considering I guess.
>>>> 
>>>> Tom
>>>> --------------
>>>> 
>>>> Director Meteorite.bi - Saiku Analytics Founder
>>>> Tel: +44(0)5603641316
>>>> 
>>>> (Thanks to the Saiku community we reached our Kickstart
>>>> <
>>> http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/
>>>> 
>>>> goal, but you can always help by sponsoring the project
>>>> <http://www.meteorite.bi/products/saiku/sponsorship>)
>>> 
>>> 
>

Re: Language Pack size

Reply via email to