Re: Pipeline Mystery

2016-10-27 Thread Matt Post
yes mert must be dying. Can you post the contents of the tune/ directory? and 
tail mert.log?

matt (from my phone)

> Le 27 oct. 2016 à 00:49, John Hewitt  a écrit :
> 
> It seems like MERT isn't writing it's final config file (which is typical
> of MERT, in my experience). I recall giving up and using kbmira. This final
> config file is the one used in test, so I can see why skipping to test ends
> up failing pretty quick.
> 
> To answer your question, though, I haven't tried. Not in my bandwidth right
> now.
> 
> -John
> 
> On Thu, Oct 27, 2016 at 12:44 AM, lewis john mcgibbney 
> wrote:
> 
>> Hi Folks,
>> So I've been plodding away again and feel i am very close to generating my
>> first language pack, however I've arrived at the following fankle!!!
>> If I run a pipeline from start to finish it fails at the 'test-bundle-1'
>> phase as below stating " [Errno 2] No such file or directory:
>> '/usr/local/joshua_resources/russian_experiments/exp3/tune/
>> joshua.config.final'"
>> 
>> lmcgibbn@LMC-056430 /usr/local/joshua_resources/russian_experiments/exp3 $
>> /usr/local/incubator-joshua/bin/pipeline.pl  --rundir . --type hiero
>> --corpus
>> /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en
>> --tune
>> /usr/local/joshua_resources/russian_experiments/data/
>> commoncrawl.ru-en.tune
>> --test
>> /usr/local/joshua_resources/russian_experiments/data/
>> commoncrawl.ru-en.test
>> --source en --target ru --readme "Experiment 3 Run 1 of ru --> en model
>> training" --aligner berkeley --hadoop-mem 10g --tmp
>> /usr/local/hadoop-2.5.2/hadoop_tmp_dir
>> [train-copy-and-filter] cached, skipping...
>> [train-tokenize-en] cached, skipping...
>> [train-tokenize-ru] cached, skipping...
>> [train-trim] cached, skipping...
>> [train-lowercase-en] cached, skipping...
>> [train-lowercase-ru] cached, skipping...
>> [train-vocab-en] cached, skipping...
>> [train-vocab-ru] cached, skipping...
>> [tune-copy-and-filter] cached, skipping...
>> [tune-tokenize-en] cached, skipping...
>> [tune-tokenize-ru] cached, skipping...
>> [tune-lowercase-en] cached, skipping...
>> [tune-lowercase-ru] cached, skipping...
>> [tune-vocab-en] cached, skipping...
>> [tune-vocab-ru] cached, skipping...
>> [test-copy-and-filter] cached, skipping...
>> [test-tokenize-en] cached, skipping...
>> [test-tokenize-ru] cached, skipping...
>> [test-lowercase-en] cached, skipping...
>> [test-lowercase-ru] cached, skipping...
>> [test-vocab-en] cached, skipping...
>> [test-vocab-ru] cached, skipping...
>> [lm-sort-uniq] cached, skipping...
>> [kenlm] cached, skipping...
>> [compile-kenlm] cached, skipping...
>> [glue-tune] cached, skipping...
>> [tune-bundle] cached, skipping...
>> [mert-1] rebuilding...
>> 
>> dep=/usr/local/joshua_resources/russian_experiments/
>> exp3/data/tune/corpus.en
>> 
>> dep=/usr/local/joshua_resources/russian_experiments/
>> exp3/tune/joshua.config
>> [CHANGED]
>>  dep=tune/model/grammar.gz.packed/slice_0.source
>> 
>> dep=/usr/local/joshua_resources/russian_experiments/
>> exp3/tune/joshua.config.final
>> [NOT FOUND]
>>  cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py
>> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en
>> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru
>> --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune
>> --tuner
>> mert --decoder
>> /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command
>> --decoder-config
>> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config
>> --decoder-output-file
>> /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest
>> --decoder-log-file
>> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log
>> --iterations 10 --metric 'BLEU 4 closest'
>>  took 27 seconds (27s)
>> [test-bundle-1] rebuilding...
>> 
>> dep=/usr/local/joshua_resources/russian_experiments/
>> exp3/tune/joshua.config.final
>> [NOT FOUND]
>>  dep=grammar.gz
>> 
>> dep=/usr/local/joshua_resources/russian_experiments/
>> exp3/test/1/model/joshua.config
>>  cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force
>> --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir
>> /usr/local/joshua_resources/russian_experiments/exp3/tune/
>> joshua.config.final
>> /usr/local/joshua_resources/russian_experiments/exp3/test/1/model
>> --copy-config-options '-top-n 300 -pop-limit 5000 -output-format "%i ||| %s
>> ||| %f ||| %c" -mark-oovs false' --pack-tm grammar.gz --tm
>> /usr/local/joshua_resources/russian_experiments/exp3/data/
>> tune/grammar.glue
>>  JOB FAILED (return code 2)
>> ERROR:root:ERROR: argument config: can't open
>> '/usr/local/joshua_resources/russian_experiments/exp3/tune/
>> joshua.config.final':
>> [Errno 2] No such file or directory:
>> '/usr/local/joshua_resources/russian_experiments/exp3/tune/
>> joshua.config.final'
>> 
>> However, if I run the pipeline with the 

Re: Pipeline Mystery

2016-10-26 Thread lewis john mcgibbney
Hi John,
Thanks for your response. Replies inline...

On Wed, Oct 26, 2016 at 9:49 PM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
> From: John Hewitt <john...@seas.upenn.edu>
> To: dev@joshua.incubator.apache.org
> Cc:
> Date: Thu, 27 Oct 2016 00:49:34 -0400
> Subject: Re: Pipeline Mystery
> It seems like MERT isn't writing it's final config file (which is typical
> of MERT, in my experience). I recall giving up and using kbmira. This final
> config file is the one used in test, so I can see why skipping to test ends
> up failing pretty quick.
>

>From my understanding, in order to use --tuner kbmira, I need to download,
configure and run Moses. Is this correct? I would REALLY prefer not to do
this if at all possible. In the meantime, it looks like I'm going to try
another fresh pipeline run and see where I get. Sometimes starting afresh
has lead to surprising and delightful results :)


>
> To answer your question, though, I haven't tried. Not in my bandwidth right
> now.


No problems. In all honesty, an entire pipeline execution on a small
parallel dataset would be a killer smoke test(s) for any contributions
coming into Joshua. Language pack creation is so important and having
confidence in the overall process is something which I really look forward
to building over the next while.
Thanks


Re: Pipeline Mystery

2016-10-26 Thread John Hewitt
It seems like MERT isn't writing it's final config file (which is typical
of MERT, in my experience). I recall giving up and using kbmira. This final
config file is the one used in test, so I can see why skipping to test ends
up failing pretty quick.

To answer your question, though, I haven't tried. Not in my bandwidth right
now.

-John

On Thu, Oct 27, 2016 at 12:44 AM, lewis john mcgibbney 
wrote:

> Hi Folks,
> So I've been plodding away again and feel i am very close to generating my
> first language pack, however I've arrived at the following fankle!!!
> If I run a pipeline from start to finish it fails at the 'test-bundle-1'
> phase as below stating " [Errno 2] No such file or directory:
> '/usr/local/joshua_resources/russian_experiments/exp3/tune/
> joshua.config.final'"
>
> lmcgibbn@LMC-056430 /usr/local/joshua_resources/russian_experiments/exp3 $
> /usr/local/incubator-joshua/bin/pipeline.pl  --rundir . --type hiero
> --corpus
> /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en
> --tune
> /usr/local/joshua_resources/russian_experiments/data/
> commoncrawl.ru-en.tune
> --test
> /usr/local/joshua_resources/russian_experiments/data/
> commoncrawl.ru-en.test
> --source en --target ru --readme "Experiment 3 Run 1 of ru --> en model
> training" --aligner berkeley --hadoop-mem 10g --tmp
> /usr/local/hadoop-2.5.2/hadoop_tmp_dir
> [train-copy-and-filter] cached, skipping...
> [train-tokenize-en] cached, skipping...
> [train-tokenize-ru] cached, skipping...
> [train-trim] cached, skipping...
> [train-lowercase-en] cached, skipping...
> [train-lowercase-ru] cached, skipping...
> [train-vocab-en] cached, skipping...
> [train-vocab-ru] cached, skipping...
> [tune-copy-and-filter] cached, skipping...
> [tune-tokenize-en] cached, skipping...
> [tune-tokenize-ru] cached, skipping...
> [tune-lowercase-en] cached, skipping...
> [tune-lowercase-ru] cached, skipping...
> [tune-vocab-en] cached, skipping...
> [tune-vocab-ru] cached, skipping...
> [test-copy-and-filter] cached, skipping...
> [test-tokenize-en] cached, skipping...
> [test-tokenize-ru] cached, skipping...
> [test-lowercase-en] cached, skipping...
> [test-lowercase-ru] cached, skipping...
> [test-vocab-en] cached, skipping...
> [test-vocab-ru] cached, skipping...
> [lm-sort-uniq] cached, skipping...
> [kenlm] cached, skipping...
> [compile-kenlm] cached, skipping...
> [glue-tune] cached, skipping...
> [tune-bundle] cached, skipping...
> [mert-1] rebuilding...
>
> dep=/usr/local/joshua_resources/russian_experiments/
> exp3/data/tune/corpus.en
>
> dep=/usr/local/joshua_resources/russian_experiments/
> exp3/tune/joshua.config
> [CHANGED]
>   dep=tune/model/grammar.gz.packed/slice_0.source
>
> dep=/usr/local/joshua_resources/russian_experiments/
> exp3/tune/joshua.config.final
> [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru
> --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune
> --tuner
> mert --decoder
> /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command
> --decoder-config
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config
> --decoder-output-file
> /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest
> --decoder-log-file
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log
> --iterations 10 --metric 'BLEU 4 closest'
>   took 27 seconds (27s)
> [test-bundle-1] rebuilding...
>
> dep=/usr/local/joshua_resources/russian_experiments/
> exp3/tune/joshua.config.final
> [NOT FOUND]
>   dep=grammar.gz
>
> dep=/usr/local/joshua_resources/russian_experiments/
> exp3/test/1/model/joshua.config
>   cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force
> --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir
> /usr/local/joshua_resources/russian_experiments/exp3/tune/
> joshua.config.final
> /usr/local/joshua_resources/russian_experiments/exp3/test/1/model
> --copy-config-options '-top-n 300 -pop-limit 5000 -output-format "%i ||| %s
> ||| %f ||| %c" -mark-oovs false' --pack-tm grammar.gz --tm
> /usr/local/joshua_resources/russian_experiments/exp3/data/
> tune/grammar.glue
>   JOB FAILED (return code 2)
> ERROR:root:ERROR: argument config: can't open
> '/usr/local/joshua_resources/russian_experiments/exp3/tune/
> joshua.config.final':
> [Errno 2] No such file or directory:
> '/usr/local/joshua_resources/russian_experiments/exp3/tune/
> joshua.config.final'
>
> However, if I run the pipeline with the --first-step test flag, then I get
> the following where the 'test-bundle-1' phase executes and completes
> flawlessly however the pipeline then goes on to die at the 'test-decode-1'
> phase!!!
>
> lmcgibbn@LMC-056430 /usr/local/joshua_resources/russian_experiments/exp3 $
> /usr/local/incubator-joshua/bin/pipeline.pl  

Pipeline Mystery

2016-10-26 Thread lewis john mcgibbney
Hi Folks,
So I've been plodding away again and feel i am very close to generating my
first language pack, however I've arrived at the following fankle!!!
If I run a pipeline from start to finish it fails at the 'test-bundle-1'
phase as below stating " [Errno 2] No such file or directory:
'/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final'"

lmcgibbn@LMC-056430 /usr/local/joshua_resources/russian_experiments/exp3 $
/usr/local/incubator-joshua/bin/pipeline.pl  --rundir . --type hiero
--corpus
/usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en
--tune
/usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.tune
--test
/usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.test
--source en --target ru --readme "Experiment 3 Run 1 of ru --> en model
training" --aligner berkeley --hadoop-mem 10g --tmp
/usr/local/hadoop-2.5.2/hadoop_tmp_dir
[train-copy-and-filter] cached, skipping...
[train-tokenize-en] cached, skipping...
[train-tokenize-ru] cached, skipping...
[train-trim] cached, skipping...
[train-lowercase-en] cached, skipping...
[train-lowercase-ru] cached, skipping...
[train-vocab-en] cached, skipping...
[train-vocab-ru] cached, skipping...
[tune-copy-and-filter] cached, skipping...
[tune-tokenize-en] cached, skipping...
[tune-tokenize-ru] cached, skipping...
[tune-lowercase-en] cached, skipping...
[tune-lowercase-ru] cached, skipping...
[tune-vocab-en] cached, skipping...
[tune-vocab-ru] cached, skipping...
[test-copy-and-filter] cached, skipping...
[test-tokenize-en] cached, skipping...
[test-tokenize-ru] cached, skipping...
[test-lowercase-en] cached, skipping...
[test-lowercase-ru] cached, skipping...
[test-vocab-en] cached, skipping...
[test-vocab-ru] cached, skipping...
[lm-sort-uniq] cached, skipping...
[kenlm] cached, skipping...
[compile-kenlm] cached, skipping...
[glue-tune] cached, skipping...
[tune-bundle] cached, skipping...
[mert-1] rebuilding...

dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en

dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config
[CHANGED]
  dep=tune/model/grammar.gz.packed/slice_0.source

dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final
[NOT FOUND]
  cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py
/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en
/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru
--tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner
mert --decoder
/usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command
--decoder-config
/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config
--decoder-output-file
/usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest
--decoder-log-file
/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log
--iterations 10 --metric 'BLEU 4 closest'
  took 27 seconds (27s)
[test-bundle-1] rebuilding...

dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final
[NOT FOUND]
  dep=grammar.gz

dep=/usr/local/joshua_resources/russian_experiments/exp3/test/1/model/joshua.config
  cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force
--symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir
/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final
/usr/local/joshua_resources/russian_experiments/exp3/test/1/model
--copy-config-options '-top-n 300 -pop-limit 5000 -output-format "%i ||| %s
||| %f ||| %c" -mark-oovs false' --pack-tm grammar.gz --tm
/usr/local/joshua_resources/russian_experiments/exp3/data/tune/grammar.glue
  JOB FAILED (return code 2)
ERROR:root:ERROR: argument config: can't open
'/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final':
[Errno 2] No such file or directory:
'/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final'

However, if I run the pipeline with the --first-step test flag, then I get
the following where the 'test-bundle-1' phase executes and completes
flawlessly however the pipeline then goes on to die at the 'test-decode-1'
phase!!!

lmcgibbn@LMC-056430 /usr/local/joshua_resources/russian_experiments/exp3 $
/usr/local/incubator-joshua/bin/pipeline.pl  --rundir . --type hiero
--corpus
/usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en
--tune
/usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.tune
--test
/usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.test
--source en --target ru --readme "Experiment 3 Run 1 of ru --> en model
training" --aligner berkeley --hadoop-mem 10g --tmp
/usr/local/hadoop-2.5.2/hadoop_tmp_dir --first-step test --grammar
/usr/local/joshua_resources/russian_experiments/exp3/grammar.gz
--joshua-mem 10g
[train-copy-and-filter] cached, skipping...
[train-tokenize-en] cached, skipping...
[train-tokenize-ru]