The relevant digest files (steps/1/TRAINING_run-giza.1.STDERR.digest
and steps/1/TRAINING_run-giza-inverse.1.STDERR.digest) each contain
one line:

not found

The STDERR files for run-giza and run-giza-inverse when EMS crashes
while running via SGE are (modulo time-stamp messages) identical to
the respective STDERR files created for those steps when it
successfully executes when run locally (without the -cluster flag).

I did a grep in the ems scripts directory for the message "not found"
- it appears in experiment.meta under the run-giza and
run-giza-inverse steps, but I don't know enough about EMS to know why
that error is being triggered.

Any ideas for what else I should look for?

Thanks,
Lane


On Thu, Sep 20, 2012 at 2:09 AM, Barry Haddow
<[email protected]> wrote:
> Hi Lane
>
> If ems failed on a given step, then there should be a message in the digest
> file for that step. What exactly does ems report?
>
> Cheers - Barry
>
>
>
> Sent from my ZX81
>
>
> ----- Reply message -----
> From: "Lane Schwartz" <[email protected]>
> Date: Wed, Sep 19, 2012 20:18
> Subject: [Moses-support] EMS, mgiza, and SGE
> To: <[email protected]>
>
> I'm trying to get up to speed using EMS. I have a small dataset (IWSLT
> 2008) that I am using to train, tune, and test using EMS.
>
> I am able to reliably run EMS on my data on a single machine.
>
> My config file specifies jobs=10 and qsub-settings="-l
> hostname=*machinesA*|*machinesB*|*machinesC*" where the hostname
> patterns match machine names in my grid.
>
> When I run experiment.perl with the -cluster flag, the experiment
> runs, but it consistently dies while running run-giza and
> run-giza-inverse. Strangely, when I look in the steps directory and
> the training directory, it appears that mgiza has run successfully in
> both directions. I don't see any error messages. Does anyone have any
> idea what might be going on here?
>
> I am using the exact same config file, and it runs successfully when I
> launch experiment.perl without the -cluster flag. When I use the
> -cluster flag, everything runs successfully until it gets to the giza
> steps, which it appears to run, and then EMS dies.
>
> Thanks,
> Lane Schwartz
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>



-- 
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
                -- R.A. Heinlein, "Time Enough For Love"
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to