Re: Language Pack size

2016-05-13 Thread kellen sunderland
That's a great idea, can we pre-sort the grammar as well?

On Fri, May 13, 2016 at 1:47 PM, Matt Post  wrote:

> Quantization is also supported in the grammar packer.
>
> Another idea: since we know the model weights when we publish a language
> pack, we should pre-compute the dot product of the weight vector against
> the grammar weights and reduce it to a single (quantized) score.
>
> (This would reduce the ability for users to play with the individual
> weights, but I don't think that's a huge loss, since the main weight is LM
> vs. TM).
>
> matt
>
>
> > On May 13, 2016, at 4:45 PM, Matt Post  wrote:
> >
> > Oh, yes, of course. That's in build_binary.
> >
> >
> >> On May 13, 2016, at 4:39 PM, kellen sunderland <
> kellen.sunderl...@gmail.com> wrote:
> >>
> >> Could we also use quantization with the language model to reduce the
> size?
> >> KenLM supports this right?
> >>
> >> On Fri, May 13, 2016 at 1:19 PM, Matt Post  wrote:
> >>
> >>> Great idea, hadn't thought of that.
> >>>
> >>> I think we could also get some leverage out of:
> >>>
> >>> - Reducing the language model to a 4-gram one
> >>> - Doing some filtering of the phrase table to reduce low-probability
> >>> translation options
> >>>
> >>> These would be a bit lossier but I doubt it would matter much at all.
> >>>
> >>> matt
> >>>
> >>>
>  On May 13, 2016, at 4:02 PM, Tom Barber 
> wrote:
> 
>  Out of curiosity more than anything else I tested XZ compression on a
> >>> model
>  instead of Gzip, it takes the Spain pack down from 1.9GB to 1.5GB, not
> >>> the
>  most ever, but obviously does mean 400MB+ less in remote storage and
> data
>  going over the wire.
> 
>  Worth considering I guess.
> 
>  Tom
>  --
> 
>  Director Meteorite.bi - Saiku Analytics Founder
>  Tel: +44(0)5603641316
> 
>  (Thanks to the Saiku community we reached our Kickstart
>  <
> >>>
> http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/
> 
>  goal, but you can always help by sponsoring the project
>  )
> >>>
> >>>
> >
>
>


Re: Language Pack size

2016-05-13 Thread Matt Post
Quantization is also supported in the grammar packer.

Another idea: since we know the model weights when we publish a language pack, 
we should pre-compute the dot product of the weight vector against the grammar 
weights and reduce it to a single (quantized) score.

(This would reduce the ability for users to play with the individual weights, 
but I don't think that's a huge loss, since the main weight is LM vs. TM).

matt


> On May 13, 2016, at 4:45 PM, Matt Post  wrote:
> 
> Oh, yes, of course. That's in build_binary.
> 
> 
>> On May 13, 2016, at 4:39 PM, kellen sunderland  
>> wrote:
>> 
>> Could we also use quantization with the language model to reduce the size?
>> KenLM supports this right?
>> 
>> On Fri, May 13, 2016 at 1:19 PM, Matt Post  wrote:
>> 
>>> Great idea, hadn't thought of that.
>>> 
>>> I think we could also get some leverage out of:
>>> 
>>> - Reducing the language model to a 4-gram one
>>> - Doing some filtering of the phrase table to reduce low-probability
>>> translation options
>>> 
>>> These would be a bit lossier but I doubt it would matter much at all.
>>> 
>>> matt
>>> 
>>> 
 On May 13, 2016, at 4:02 PM, Tom Barber  wrote:
 
 Out of curiosity more than anything else I tested XZ compression on a
>>> model
 instead of Gzip, it takes the Spain pack down from 1.9GB to 1.5GB, not
>>> the
 most ever, but obviously does mean 400MB+ less in remote storage and data
 going over the wire.
 
 Worth considering I guess.
 
 Tom
 --
 
 Director Meteorite.bi - Saiku Analytics Founder
 Tel: +44(0)5603641316
 
 (Thanks to the Saiku community we reached our Kickstart
 <
>>> http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/
 
 goal, but you can always help by sponsoring the project
 )
>>> 
>>> 
> 



Re: Language Pack size

2016-05-13 Thread Matt Post
Oh, yes, of course. That's in build_binary.


> On May 13, 2016, at 4:39 PM, kellen sunderland  
> wrote:
> 
> Could we also use quantization with the language model to reduce the size?
> KenLM supports this right?
> 
> On Fri, May 13, 2016 at 1:19 PM, Matt Post  wrote:
> 
>> Great idea, hadn't thought of that.
>> 
>> I think we could also get some leverage out of:
>> 
>> - Reducing the language model to a 4-gram one
>> - Doing some filtering of the phrase table to reduce low-probability
>> translation options
>> 
>> These would be a bit lossier but I doubt it would matter much at all.
>> 
>> matt
>> 
>> 
>>> On May 13, 2016, at 4:02 PM, Tom Barber  wrote:
>>> 
>>> Out of curiosity more than anything else I tested XZ compression on a
>> model
>>> instead of Gzip, it takes the Spain pack down from 1.9GB to 1.5GB, not
>> the
>>> most ever, but obviously does mean 400MB+ less in remote storage and data
>>> going over the wire.
>>> 
>>> Worth considering I guess.
>>> 
>>> Tom
>>> --
>>> 
>>> Director Meteorite.bi - Saiku Analytics Founder
>>> Tel: +44(0)5603641316
>>> 
>>> (Thanks to the Saiku community we reached our Kickstart
>>> <
>> http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/
>>> 
>>> goal, but you can always help by sponsoring the project
>>> )
>> 
>> 



Language Pack size

2016-05-13 Thread Tom Barber
Out of curiosity more than anything else I tested XZ compression on a model
instead of Gzip, it takes the Spain pack down from 1.9GB to 1.5GB, not the
most ever, but obviously does mean 400MB+ less in remote storage and data
going over the wire.

Worth considering I guess.

Tom
--

Director Meteorite.bi - Saiku Analytics Founder
Tel: +44(0)5603641316

(Thanks to the Saiku community we reached our Kickstart

goal, but you can always help by sponsoring the project
)