Hi all, what about mosesserver? Do you think the same speed gains would occur?
Best, Vito 2015-10-06 22:39 GMT+02:00 Michael Denkowski <michael.j.denkow...@gmail.com> : > Hi Hieu and all, > > I just checked in a bug fix for the multi_moses.py script. I forgot to > override the number of threads for each moses command, so if [threads] were > specified in the moses.ini, the multi-moses runs were cheating by running a > bunch of multi-threaded instances. If threads were only being specified on > the command line, the script was correctly stripping the flag so everything > should be good. I finished a benchmark on my system with an unpruned > compact PT (with the fixed script) and got the following: > > 16 threads 5.38 sent/sec > 16 procs 13.51 sent/sec > > This definitely used a lot more memory though. Based on some very rough > estimates looking at free system memory, the memory mapped suffix array PT > went from 2G to 6G with 16 processes while the compact PT went from 3G to > 37G. For cases where everything fits into memory, I've seen significant > speedup from multi-process decoding. > > For cases where things don't fit into memory, the multi-moses script could > be extended to start as many multi-threaded instances as will fit into ram > and farm out sentences in a way that keeps all of the CPUs busy. I know > Marcin has mentioned using GNU parallel. > > Best, > Michael > > On Tue, Oct 6, 2015 at 4:16 PM, Hieu Hoang <hieuho...@gmail.com> wrote: > >> I've just run some comparison between multithreaded decoder and the >> multi_moses.py script. It's good stuff. >> >> It make me seriously wonder whether we should use abandon multi-threading >> and go all out for the multi-process approach. >> >> There's some advantage to multi-thread - eg. where model files are loaded >> into memory rather than memory map. But there's disadvantages too - it more >> difficult to maintain and there's about a 10% overhead. >> >> What do people think? >> >> Phrase-based: >> >> 1 5 10 15 20 25 30 32 real 4m37.000s real 1m15.391s real >> 0m51.217s real 0m48.287s real 0m50.719s real 0m52.027s real >> 0m53.045s Baseline (Compact pt) user 4m21.544s user 5m28.597s user >> 6m38.227s user 8m0.975s user 8m21.122s user 8m3.195s user >> 8m4.663s >> sys 0m15.451s sys 0m34.669s sys 0m53.867s sys 1m10.515s >> sys 1m20.746s sys 1m24.368s sys 1m23.677s >> >> >> >> >> >> >> >> 34 4m49.474s real 1m17.867s real 0m43.096s real 0m31.999s >> 0m26.497s 0m26.296s killed (32) + multi_moses 4m33.580s user 4m40.486s >> user 4m56.749s user 5m6.692s 5m43.845s 7m34.617s >> >> 0m15.957s sys 0m32.347s sys 0m51.016s sys 1m11.106s 1m44.115s >> 2m21.263s >> >> >> >> >> >> >> >> >> 38 real 4m46.254s real 1m16.637s real 0m49.711s real >> 0m48.389s real 0m49.144s real 0m51.676s real 0m52.472s Baseline >> (Probing pt) user 4m30.596s user 5m32.500s user 6m23.706s user >> 7m40.791s user 7m51.946s user 7m52.892s user 7m53.569s >> sys 0m15.624s sys 0m36.169s sys 0m49.433s sys 1m6.812s >> sys 1m9.614s sys 1m13.108s sys 1m12.644s >> >> >> >> >> >> >> >> 39 real 4m43.882s real 1m17.849s real 0m34.245s real >> 0m31.318s real 0m28.054s real 0m24.120s real 0m22.520s (38) + >> multi moses user 4m29.212s user 4m47.693s user 5m5.750s user >> 5m33.573s user 6m18.847s user 7m19.642s user 8m38.013s >> sys 0m15.835s sys 0m25.398s sys 0m36.716s sys 0m41.349s >> sys 0m48.494s sys 1m0.843s sys 1m13.215s >> Hiero: >> 3 real 5m33.011s real 1m28.935s real 0m59.470s real 1m0.315s >> real 0m55.619s real 0m57.347s real 0m59.191s 1m2.786s 6/10 >> baseline user 4m53.187s user 6m23.521s user 8m17.170s user >> 12m48.303s user 14m45.954s user 17m58.109s user 20m22.891s >> 21m13.605s >> sys 0m39.696s sys 0m51.519s sys 1m3.788s sys 1m22.125s >> sys 1m58.718s sys 2m51.249s sys 4m4.807s 4m37.691s >> >> >> >> >> >> >> >> >> 4 >> real 1m27.215s real 0m40.495s real 0m36.206s real 0m28.623s >> real 0m26.631s real 0m25.817s 0m25.401s (3) + multi_moses >> user 5m4.819s user 5m42.070s user 5m35.132s user 6m46.001s >> user 7m38.151s user 9m6.500s 10m32.739s >> >> sys 0m38.039s sys 0m45.753s sys 0m44.117s sys 0m52.285s >> sys 0m56.655s sys 1m6.749s 1m16.935s >> >> On 05/10/2015 16:05, Michael Denkowski wrote: >> >> Hi Philipp, >> >> Unfortunately I don't have a precise measurement. If anyone knows of a >> good way to benchmark a process tree with lots of memory mapping the same >> files, I would be glad to run it. >> >> --Michael >> >> On Mon, Oct 5, 2015 at 10:26 AM, Philipp Koehn <p...@jhu.edu> wrote: >> >>> Hi, >>> >>> great - that will be very useful. >>> >>> Since you just ran the comparison - do you have any numbers on "still >>> allowed everything to fit into memory", i.e., how much more memory is used >>> by running parallel instances? >>> >>> -phi >>> >>> On Mon, Oct 5, 2015 at 10:15 AM, Michael Denkowski < >>> <michael.j.denkow...@gmail.com>michael.j.denkow...@gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> Like some other Moses users, I noticed diminishing returns from running >>>> Moses with several threads. To work around this, I added a script to run >>>> multiple single-threaded instances of moses instead of one multi-threaded >>>> instance. In practice, this sped things up by about 2.5x for 16 cpus and >>>> using memory mapped models still allowed everything to fit into memory. >>>> >>>> If anyone else is interested in using this, you can prefix a moses >>>> command with scripts/generic/multi_moses.py. To use multiple instances in >>>> mert-moses.pl, specify --multi-moses and control the number of >>>> parallel instances with --decoder-flags='-threads N'. >>>> >>>> Below is a benchmark on WMT fr-en data (2M training sentences, 400M >>>> words mono, suffix array PT, compact reordering, 5-gram KenLM) testing >>>> default stack decoding vs cube pruning without and with the parallelization >>>> script (+multi): >>>> >>>> --- >>>> 1cpu sent/sec >>>> stack 1.04 >>>> cube 2.10 >>>> --- >>>> 16cpu sent/sec >>>> stack 7.63 >>>> +multi 12.20 >>>> cube 7.63 >>>> +multi 18.18 >>>> --- >>>> >>>> --Michael >>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> Moses-support@mit.edu >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>>> >>> >> >> >> _______________________________________________ >> Moses-support mailing >> listMoses-support@mit.eduhttp://mailman.mit.edu/mailman/listinfo/moses-support >> >> >> -- >> Hieu Hoanghttp://www.hoang.co.uk/hieu >> >> > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- *M**. Vito MANDORINO -- Chief Scientist* [image: Description : Description : lingua_custodia_final full logo] *The Translation Trustee* *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89* *Email :* *vito.mandor...@linguacustodia.com <massinissa.ah...@linguacustodia.com>* *Website :* *www.linguacustodia.com <http://www.linguacustodia.com/> - www.thetranslationtrustee.com <http://www.thetranslationtrustee.com/>*
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support