Hi Tom

I try to answer below

On Apr 9, 2012, at 3:58 PM, Tom Hoar wrote:


I sent this to the irstlm list, but also include it here in case this team has 
some comments.

Hieu recently checked in changes to the build-lm.sh script to run the splits in 
parallel. About 6 months ago, we replaced IRSTLM's shell script with a Python 
wrapper to give us more control in our environment. We also prepared to 
multi-process the splits. We stopped work because of concerns that parallel 
processing might overloading system RAM resources.

I think that the Hieu change had the empirical assumption that the number of 
split could not exceed the number of CPUs. And in any case using 
parallelization we are not assured to run out-of-memory.


As we know, building LM's is memory intensive. Without the parallel processing, 
each serialized split can use 100% of the host's RAM, but the extra CPU cores 
sit idle. Parallel processing uses all CPU's, but each CPU competes for RAM 
resources.

  1.  Is the final result of a build identical if you build with one chunk or 3 
splits or 30 splits?

YES, the regression test build-lm-sublm2    check for that   (1 split vs 5 
splits)


  1.  Are there any advantages/disadvantages to use a large number of splits 
with a queue manager so-as to only parallel process up to the max number of 
CPU's and reduce the RAM requirements with more but smaller splits?

The main rules to take into account are the following:
- the smaller  the splits, the less RAM requirement (for the single split)
- the larger the number of splits, the larger is the time for merging results  
(even this is not a very big issue)

Hence, I think that, if a queue manager is available like that one you are 
proposing,
the best policy should be to use more but smaller splits.

I am going to write such a manager, because I think it is a good enhancement of 
IRSTLM toolokit
Have you already something written in Python I can mimic in my scripts?


The best tradeoff between number of splits (and hence RAM requirements) and 
computation time should be found by means some experimentation,
on different machine with different RAM size different number of threads, and 
so on.

  1.  Has anyone experimented with other ways to reduce the RAM requirement for 
each process while still allowing them to run in parallel?

No in FBK.


Tom


best regards,
Nicola Bertoldi


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu<mailto:Moses-support@mit.edu>
http://mailman.mit.edu/mailman/listinfo/moses-support


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to