cool, I was expecting only single digits improvements. If the pt in Moses1 hadn't been pruned, the speedup is a lot to do with the pruning i think
Hieu Hoang http://moses-smt.org/ On 14 December 2017 at 07:41, liling tan <alvati...@gmail.com> wrote: > With Moses2 and ProbingPT, I got 4M sentence, 86M words for 14 hours on > moses2 for -threads 50 for 56 cores. So it's around 6M words per hour for > Moses2. > > With Moses1, ProbingPT and gzipped LO table but with 32K sentences, 280K > words per hour for -threads 50 for 56 cores > > Moses2 is 20x faster than Moses1 for my model!! > > For Moses1 my moses.ini : > > > ######################### > ### MOSES CONFIG FILE ### > ######################### > > # input factors > [input-factors] > 0 > > # mapping steps > [mapping] > 0 T 0 > > [distortion-limit] > 6 > > # feature functions > [feature] > UnknownWordPenalty > WordPenalty > PhrasePenalty > #PhraseDictionaryMemory name=TranslationModel0 num-features=4 > path=/home/ltan/momo/pt.gz input-factor=0 output-factor=0 > ProbingPT name=TranslationModel0 num-features=4 > path=/home/ltan/momo/momo-bin input-factor=0 output-factor=0 > LexicalReordering name=LexicalReordering0 num-features=6 > type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 > path=/home/ltan/momo/reordering-table.wbe-msd-bidirectional-fe.gz > #LexicalReordering name=LexicalReordering0 num-features=6 > type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 > property-index=0 > > Distortion > KENLM name=LM0 factor=0 path=/home/ltan/momo/lm.ja.kenlm order=5 > > > > On Thu, Dec 14, 2017 at 8:58 AM, liling tan <alvati...@gmail.com> wrote: > >> I don't have a comparison between moses vs moses2. I'll give some moses >> numbers once the full dataset is decoded. And I can repeat the decoding for >> moses on the same machine. >> >> BTW, the ProbingPT directory created by binarize4moses2.pl , could it be >> used for old Moses? >> Or would I have to use re-prune the phrase-table and then use >> the PhraseDictionaryMemory and LexicalReordering separatedly? >> >> But I'm getting 4M sentence, 86M words for 14 hours on moses2 for >> -threads 50 for 56 cores. >> >> >> ######################### >> ### MOSES CONFIG FILE ### >> ######################### >> >> # input factors >> [input-factors] >> 0 >> >> # mapping steps >> [mapping] >> 0 T 0 >> >> [distortion-limit] >> 6 >> >> # feature functions >> [feature] >> UnknownWordPenalty >> WordPenalty >> PhrasePenalty >> #PhraseDictionaryMemory name=TranslationModel0 num-features=4 >> path=/home/ltan/momo/phrase-table.gz input-factor=0 output-factor=0 >> ProbingPT name=TranslationModel0 num-features=4 >> path=/home/ltan/momo/momo-bin input-factor=0 output-factor=0 >> #LexicalReordering name=LexicalReordering0 num-features=6 >> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 >> path=/home/ltan/momo/reordering-table.wbe-msd-bidirectional-fe.gz >> LexicalReordering name=LexicalReordering0 num-features=6 >> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 >> property-index=0 >> >> Distortion >> KENLM name=LM0 factor=0 path=/home/ltan/momo/lm.ja.kenlm order=5 >> >> >> On Thu, Dec 14, 2017 at 3:52 AM, Hieu Hoang <hieuho...@gmail.com> wrote: >> >>> do up have comparison figures for moses v moses2? I never managed to get >>> reliable info for more than 32 cores >>> >>> config/moses.ini files would be good too >>> >>> Hieu Hoang >>> http://moses-smt.org/ >>> >>> >>> On 13 December 2017 at 06:10, liling tan <alvati...@gmail.com> wrote: >>> >>>> Ah, that's why the phrase-table is exploding... I've never decoded more >>>> than 100K sentences before =) >>>> >>>> binarize4moses2.perl is awesome! Let me see how much speed up I get >>>> with Moses2 and pruned tables. >>>> >>>> Thank you Hieu and Barry! >>>> >>>> >>>> >>>> >>>> On Tue, Dec 12, 2017 at 6:38 PM, Hieu Hoang <hieuho...@gmail.com> >>>> wrote: >>>> >>>>> Barry is correct, having 750,000 translations for '.' severely >>>>> degrades speed. >>>>> >>>>> I had forgotten about the script I created: >>>>> scripts/generic/binarize4moses2.perl >>>>> which takes in the phrase table & lex reordering model, and prunes >>>>> them and runs addLexROtoPT. Basically, everything you need to do to create >>>>> a fast model for Moses2 >>>>> >>>>> Hieu Hoang >>>>> http://moses-smt.org/ >>>>> >>>>> >>>>> On 12 December 2017 at 09:16, Barry Haddow <bhad...@staffmail.ed.ac.uk >>>>> > wrote: >>>>> >>>>>> Hi Liling >>>>>> >>>>>> The short answer is you need need to prune/filter your phrase table >>>>>> prior to creating the compact phrase table. I don't mean "filter model >>>>>> given input", because that won't make much difference if you have a very >>>>>> large input, I mean getting rid of rare translations which won't be used >>>>>> anyway. >>>>>> >>>>>> The compact phrase does not do pruning, it ends up being done in >>>>>> memory, so if you have 750,000 translations of the full-stop in your >>>>>> model >>>>>> then they all get loaded into memory, before Moses selects the top 20. >>>>>> >>>>>> You can use prunePhraseTable from Moses (which bizarrely needs to >>>>>> load a phrase table in order to parse the config file, last time I >>>>>> looked). >>>>>> You could also apply Johnson / entropic pruning, whatever works for you, >>>>>> >>>>>> cheers - Barry >>>>>> >>>>>> >>>>>> On 11/12/17 09:20, liling tan wrote: >>>>>> >>>>>> Dear Moses community/developers, >>>>>> >>>>>> I have a question on how to handle large models created using moses. >>>>>> >>>>>> I've a vanilla phrase-based model with >>>>>> >>>>>> - PhraseDictionary num-features=4 input-factor=0 output-factor=0 >>>>>> - LexicalReordering num-features=6 input-factor=0 output-factor=0 >>>>>> - KENLM order=5 factor=0 >>>>>> >>>>>> The size of the model is: >>>>>> >>>>>> - compressed phrase table is 5.4GB, >>>>>> - compressed reordering table is 1.9GB and >>>>>> - quantized LM is 600MB >>>>>> >>>>>> >>>>>> I'm running on a single 56 cores machine with 256GB RAM. Whenever I'm >>>>>> decoding I use -threads 56 parameter. >>>>>> >>>>>> It's takes really long to load the table and after loading, it breaks >>>>>> inconsistently at different lines when decoding, I notice that the RAM >>>>>> goes >>>>>> into swap before it breaks. >>>>>> >>>>>> I've tried compact phrased table and get a >>>>>> >>>>>> - 3.2GB .minphr >>>>>> - 1.5GV .minlexr >>>>>> >>>>>> And the same kind of random breakage happens when RAM goes into swap >>>>>> after loading the phrase-table. >>>>>> >>>>>> Strangely, it still manage to decode ~500K sentences before it >>>>>> breaks. >>>>>> >>>>>> Then I've tried with ondisk phrasetable and it's around 37GB >>>>>> uncompressed. Using the ondisk PT didn't cause breakage but the decoding >>>>>> time is significantly increased, now it can only decode 15K sentences in >>>>>> an >>>>>> hour. >>>>>> >>>>>> The setup is a little different from normal where we have the >>>>>> train/dev/test split. Currently, my task is to decode the train set. I've >>>>>> tried filtering the table with the trainset with >>>>>> filter-model-given-input.pl but the size of the compressed table >>>>>> didn't really decrease much. >>>>>> >>>>>> The entire training set is made up of 5M sentence pairs and it's >>>>>> taking 3+ days just to decode ~1.5M sentences with ondisk PT. >>>>>> >>>>>> >>>>>> My questions are: >>>>>> >>>>>> - Are there best practices with regards to deploying large Moses >>>>>> models? >>>>>> - Why does the 5+GB phrase table take up > 250GB RAM when decoding? >>>>>> - How else should I filter/compress the phrase table? >>>>>> - Is it normal to decode only ~500K sentence a day given the machine >>>>>> specs and the model size? >>>>>> >>>>>> I understand that I could split the train set up into two and train 2 >>>>>> models then cross-decode but if the training size is 10M sentence pairs, >>>>>> we'll face the same issues. >>>>>> >>>>>> Thank you for reading the long post and thank you in advances for any >>>>>> answers, discussions and enlightenment on this issue =) >>>>>> >>>>>> Regards, >>>>>> LIling >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Moses-support mailing >>>>>> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support >>>>>> >>>>>> >>>>>> >>>>>> The University of Edinburgh is a charitable body, registered in >>>>>> Scotland, with registration number SC005336. >>>>>> >>>>>> _______________________________________________ >>>>>> Moses-support mailing list >>>>>> Moses-support@mit.edu >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support