Hi,

with regard to pruning ---

the example EMS config files have

[TRAINING]
score-settings = "--GoodTuring --MinScore 2:0.0001"

which carries out threshold pruning during phrase table construction, going
a good way towards avoiding too many translation options per phrase.

-phi

On Mon, Oct 5, 2015 at 11:08 AM, Barry Haddow <bhad...@inf.ed.ac.uk> wrote:

> Hi Hieu
>
> That's exactly why I took to pre-pruning the phrase table, as I mentioned
> on Friday. I had something like 750,000 translations of the most common
> word, and it took half-an-hour to get the first sentence translated.
>
> cheers - Barry
>
>
> On 05/10/15 15:48, Hieu Hoang wrote:
>
> what pt implementation did you use, and had it been pre-pruned so that
> there's a limit on how many target phrase for a particular source phrase?
> ie. don't have 10,000 entries for 'the' .
>
> I've been digging around multithreading in the last few weeks. I've
> noticed that the compact pt is VERY bad at handling unpruned pt.
>     Cores               1 5 10 15 20 25 Unpruned compact pt 143 42 32 38
> 52 62   probing pt 245 58 33 25 24 21 Pruned compact pt 119 24 15 10 10 10
>   probing pt 117 25 25 10 10 10
>
> Hieu Hoang
> http://www.hoang.co.uk/hieu
>
> On 5 October 2015 at 15:15, Michael Denkowski <
> michael.j.denkow...@gmail.com> wrote:
>
>> Hi all,
>>
>> Like some other Moses users, I noticed diminishing returns from running
>> Moses with several threads.  To work around this, I added a script to run
>> multiple single-threaded instances of moses instead of one multi-threaded
>> instance.  In practice, this sped things up by about 2.5x for 16 cpus and
>> using memory mapped models still allowed everything to fit into memory.
>>
>> If anyone else is interested in using this, you can prefix a moses
>> command with scripts/generic/multi_moses.py.  To use multiple instances in
>> mert-moses.pl, specify --multi-moses and control the number of parallel
>> instances with --decoder-flags='-threads N'.
>>
>> Below is a benchmark on WMT fr-en data (2M training sentences, 400M words
>> mono, suffix array PT, compact reordering, 5-gram KenLM) testing default
>> stack decoding vs cube pruning without and with the parallelization script
>> (+multi):
>>
>> ---
>> 1cpu   sent/sec
>> stack      1.04
>> cube       2.10
>> ---
>> 16cpu  sent/sec
>> stack      7.63
>> +multi    12.20
>> cube       7.63
>> +multi    18.18
>> ---
>>
>> --Michael
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> _______________________________________________
> Moses-support mailing 
> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to