Re: [Moses-support] Scripts for parallel tuning

Hieu Hoang Sun, 27 Apr 2014 11:08:19 -0700

thanks. Please git push that branch, otherwise the code will just be on
your own computer.


If you're unsure on how to use git, send me those files and I'll add it to
the Moses repository for you


On 27 April 2014 18:43, Raymond W. M. Ng <wm...@sheffield.ac.uk> wrote:

> Hi Hieu (and all),
>
> I have created a branch called mert-sge-nosync, and upload the scripts for
> doing MERT in SGE no-sync mode. I have done git add, git commit (not sure I
> have to push though). Altogether 12 files are added.
>
> scripts/generic/moses-parallel-sge-nosync.pl
> scripts/generic/qsub-wrapper-exit-sge-nosync.pl
> scripts/generic/qsub-wrapper-sge-nosync.pl
> scripts/training/mert-moses-sge-nosync.pl
> scripts/training/sge-nosync/cleartmpfiles.pl
> scripts/training/sge-nosync/create-config-sge-nosync.pl
> scripts/training/sge-nosync/moses-parallel-postdecode-sge-nosync.pl
> scripts/training/sge-nosync/poll-decoder.pl
> scripts/training/sge-nosync/process-featlist-sge-nosync.pl
> scripts/training/sge-nosync/process-moses-result-sge-nosync.pl
> scripts/training/sge-nosync/run-decoder-sge-nosync.pl
> scripts/training/sge-nosync/zipextract-decoder-result.pl
>
> You will need some SSH packages for perl to run these scripts too. I have
> written up some details on
> http://staffwww.dcs.shef.ac.uk/people/W.Ng/MERT.html
>
> Thanks
> raymond
>
>
> On 27 April 2014 00:32, Hieu Hoang <hieu.ho...@ed.ac.uk> wrote:
>
>> it looks like there is interest in your changes. You should add it and
>> let other people play.
>>
>> I'm not sure what's the best way to add it, scripts are sometimes a
>> little fragile. Probably best if you add it to a separate directory or a
>> branch,rather than try to change or replace what's there. It'll also be
>> good if you can stay on the moses mailing list after you've added in case
>> of trouble
>>
>>
>> On 18 April 2014 21:59, Raymond W. M. Ng <wm...@sheffield.ac.uk> wrote:
>>
>>> Hi all,
>>>
>>> Thanks for the reply. I will try creating a branch in the git repository
>>> later.
>>>
>>> Ondrej> I have reasonably reliable disk access on my side so there are
>>> almost no failure cases. I did recall some job failures in the course of
>>> the development, when a newly created process (in machine B) happened to
>>> take an existing process id (from an earlier job in machine A) and then
>>> they crashed. I got around by appending the start time to the process id as
>>> well... anyway, I kept the "retry for 5 times" in moses-parallel-decode.
>>> When one of the splits fails, the parallel-decode is incomplete and
>>> translation will restart until there are 5 times of failure. I haven't
>>> tested what would happen after 5 times of failure simply because of the
>>> hardware setting i mentioned above.
>>>
>>> I am new to the moses codes so when I started modifying them, I kept the
>>> original and renamed the related files by putting an suffix -sge-nosyl
>>> (e.g. qsub-wrapper-qsub-nosync.pl). There are in total 12 perl scripts
>>> and they are distributed in $MOSES/scripts/training and
>>> $MOSES/scripts/generic.
>>>
>>> Best
>>> raymond
>>>
>>>
>>>
>>>
>>>
>>> On 16 April 2014 21:44, Ondrej Bojar <bo...@ufal.mff.cuni.cz> wrote:
>>>
>>>> Hi, Raymond.
>>>>
>>>> Interesting. Is your parallelization also tolerant to random job
>>>> failures? How does it decide when to stop waiting? Cannot it degrade to
>>>> optimizing only on eg. one of the splits, all others failing?
>>>>
>>>> An option to commit your code in a more visible way is to put it in the
>>>> main branch under a different name, if the change affects just one script.
>>>> But I agree it's not very nice and clean.
>>>>
>>>> Cheers, O.
>>>>
>>>> On April 16, 2014 5:38:35 PM CEST, Hieu Hoang <hieu.ho...@ed.ac.uk>
>>>> wrote:
>>>> >hi raymond
>>>> >
>>>> >you're welcome to create a branch on the Moses github repository and
>>>> >add
>>>> >your code there. It's unlikely anyone will look at it or use it, but at
>>>> >least it won't get lost.
>>>> >
>>>> >Maybe in future, you or someone else might want to merge it with the
>>>> >master
>>>> >branch where it will get much more exposure
>>>> >
>>>> >
>>>> >On 16 April 2014 16:21, Raymond W. M. Ng <wm...@sheffield.ac.uk>
>>>> wrote:
>>>> >
>>>> >> Dear Ondrej,
>>>> >>
>>>> >> I have checked with Hieu when I met him in February, seems that the
>>>> >SGE
>>>> >> submission in MERT is using the -sync mode, which makes submission
>>>> >> difficult (user still in submission states until the all jobs end).
>>>> >In
>>>> >> short, the modification runs in a "no-sync" mode.
>>>> >>
>>>> >> In terms of efficiency, as for the reasons you have mentioned, the
>>>> >> combined wallclock time of N machines (N times actual program
>>>> >runtime) may
>>>> >> be longer than the single-threaded execution. But as in a lot of
>>>> >shared
>>>> >> computing environments, a single-threaded job execution for 20+ hours
>>>> >is
>>>> >> not favourable (sometimes disallowed). So by using the parallel mode,
>>>> >> runtime of single jobs is shortened. In my experience, using parallel
>>>> >mode
>>>> >> shortens the tuning time from 30hours (single-threaded) to 3 hours
>>>> >(20
>>>> >> threads on different machines). We are having infiniband access among
>>>> >nodes
>>>> >> and its a bit more sophisticated than NFS mounting though.
>>>> >>
>>>> >> best
>>>> >> raymond
>>>> >>
>>>> >>
>>>> >> On 16 April 2014 13:48, Ondrej Bojar <bo...@ufal.mff.cuni.cz> wrote:
>>>> >>
>>>> >>> Dear Raymond,
>>>> >>>
>>>> >>> The existing scripts always allowed running MERT in parallel jobs on
>>>> >SGE,
>>>> >>> one just had to use generic/moses-parallel as the "moses
>>>> >executable".
>>>> >>>
>>>> >>> Is there some other functionality that your modifications now bring?
>>>> >>>
>>>> >>> Btw, in my experience, parallelization into SGE jobs can be even
>>>> >less
>>>> >>> efficient than single-job-multi-threaded execution. It is hard to
>>>> >exactly
>>>> >>> describe the circumstances, but in general if your models are big
>>>> >and
>>>> >>> loaded from NFS, and you run many experiments at the same time, the
>>>> >>> slowdown of the network multiplied by the many SGE jobs makes the
>>>> >>> parallelization much more wasteful and sometimes slower (in
>>>> >wallclock time).
>>>> >>>
>>>> >>> Cheers, Ondrej.
>>>> >>>
>>>> >>> On April 16, 2014 1:07:37 PM CEST, "Raymond W. M. Ng" <
>>>> >>> wm...@sheffield.ac.uk> wrote:
>>>> >>> >Hi Moses support,
>>>> >>> >
>>>> >>> >Not sure I am making this enquiry in the right mailing list....
>>>> >>> >I have some modified scripts for parallel MERT tuning which can run
>>>> >on
>>>> >>> >SGE.
>>>> >>> >Now I would like to share this. It is based on an old version of
>>>> >moses
>>>> >>> >(around April 2012), what is the best way for sharing?
>>>> >>> >
>>>> >>> >Best
>>>> >>> >raymond
>>>> >>> >
>>>> >>> >
>>>> >>>
>>>>
>>>> >>------------------------------------------------------------------------
>>>> >>> >
>>>> >>> >_______________________________________________
>>>> >>> >Moses-support mailing list
>>>> >>> >Moses-support@mit.edu
>>>> >>> >http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> >>>
>>>> >>> --
>>>> >>> Ondrej Bojar (mailto:o...@cuni.cz / bo...@ufal.mff.cuni.cz)
>>>> >>> http://www.cuni.cz/~obo
>>>> >>>
>>>> >>>
>>>> >>
>>>> >> _______________________________________________
>>>> >> Moses-support mailing list
>>>> >> Moses-support@mit.edu
>>>> >> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> >>
>>>> >>
>>>>
>>>> --
>>>> Ondrej Bojar (mailto:o...@cuni.cz / bo...@ufal.mff.cuni.cz)
>>>> http://www.cuni.cz/~obo
>>>>
>>>>
>>>
>>
>>
>> --
>> Hieu Hoang
>> Research Associate
>> University of Edinburgh
>> http://www.hoang.co.uk/hieu
>>
>>
>


-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Scripts for parallel tuning

Reply via email to