[jira] [Updated] (JOSHUA-107) Verbosity levels

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-107:

Fix Version/s: (was: 6.2)
   6.1

> Verbosity levels
> 
>
> Key: JOSHUA-107
> URL: https://issues.apache.org/jira/browse/JOSHUA-107
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.1
>
>
> Joshua should support verbosity levels with a command-line switch, so it's 
> easy to shut it up with something like {{-v 0}} or {{-q}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-22) Parallelize MBR computation

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-22:
---
Fix Version/s: (was: 6.2)
   6.1

> Parallelize MBR computation
> ---
>
> Key: JOSHUA-22
> URL: https://issues.apache.org/jira/browse/JOSHUA-22
> Project: Joshua
>  Issue Type: Bug
>Reporter: Joshua Decoder
> Fix For: 6.1
>
>
> MBR should be multithreaded.  This would be easy to add following the model 
> used in the InputManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (JOSHUA-95) Vocabulary locking

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed JOSHUA-95.
--
Resolution: Fixed

> Vocabulary locking
> --
>
> Key: JOSHUA-95
> URL: https://issues.apache.org/jira/browse/JOSHUA-95
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Juri Ganitkevitch
> Fix For: 6.1
>
>
> Vocabulary::id() is still synchronized and a potential point of contention. 
> It would be nice to resolve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (JOSHUA-22) Parallelize MBR computation

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney reopened JOSHUA-22:


> Parallelize MBR computation
> ---
>
> Key: JOSHUA-22
> URL: https://issues.apache.org/jira/browse/JOSHUA-22
> Project: Joshua
>  Issue Type: Bug
>Reporter: Joshua Decoder
> Fix For: 6.1
>
>
> MBR should be multithreaded.  This would be easy to add following the model 
> used in the InputManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (JOSHUA-71) OS X installation depends on coreutils to run thrax test

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed JOSHUA-71.
--
Resolution: Fixed

> OS X installation depends on coreutils to run thrax test
> 
>
> Key: JOSHUA-71
> URL: https://issues.apache.org/jira/browse/JOSHUA-71
> Project: Joshua
>  Issue Type: Bug
>Reporter: Luke Orland
> Fix For: 6.1
>
>
> the {{gstat}} command from coreutils is not installed in Darwin by default. 
> One must resolve that dependency via Homebrew, Macports, etc.
> The {{test/thrax/test.sh}} test will fail on an OS X system that does not 
> have coreutils installed. We should either change the test so that it does 
> not require coreutils in Darwin or make it clear in the (developer) 
> installation/setup instructions that coreutils are required for this test, 
> check for coreutils when running the thrax test, and output a helpful message 
> instructing the developer to go install coreutils if {{gstat}} is not found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (JOSHUA-22) Parallelize MBR computation

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed JOSHUA-22.
--
Resolution: Fixed

> Parallelize MBR computation
> ---
>
> Key: JOSHUA-22
> URL: https://issues.apache.org/jira/browse/JOSHUA-22
> Project: Joshua
>  Issue Type: Bug
>Reporter: Joshua Decoder
> Fix For: 6.1
>
>
> MBR should be multithreaded.  This would be easy to add following the model 
> used in the InputManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-100) Add Shen et al. (2008) dependency LM

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-100:

Fix Version/s: (was: 6.2)
   6.1

> Add Shen et al. (2008) dependency LM
> 
>
> Key: JOSHUA-100
> URL: https://issues.apache.org/jira/browse/JOSHUA-100
> Project: Joshua
>  Issue Type: New Feature
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-95) Vocabulary locking

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-95:
---
Fix Version/s: (was: 6.2)
   6.1

> Vocabulary locking
> --
>
> Key: JOSHUA-95
> URL: https://issues.apache.org/jira/browse/JOSHUA-95
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Juri Ganitkevitch
> Fix For: 6.1
>
>
> Vocabulary::id() is still synchronized and a potential point of contention. 
> It would be nice to resolve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-71) OS X installation depends on coreutils to run thrax test

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-71:
---
Fix Version/s: (was: 6.2)
   6.1

> OS X installation depends on coreutils to run thrax test
> 
>
> Key: JOSHUA-71
> URL: https://issues.apache.org/jira/browse/JOSHUA-71
> Project: Joshua
>  Issue Type: Bug
>Reporter: Luke Orland
> Fix For: 6.1
>
>
> the {{gstat}} command from coreutils is not installed in Darwin by default. 
> One must resolve that dependency via Homebrew, Macports, etc.
> The {{test/thrax/test.sh}} test will fail on an OS X system that does not 
> have coreutils installed. We should either change the test so that it does 
> not require coreutils in Darwin or make it clear in the (developer) 
> installation/setup instructions that coreutils are required for this test, 
> check for coreutils when running the thrax test, and output a helpful message 
> instructing the developer to go install coreutils if {{gstat}} is not found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (JOSHUA-100) Add Shen et al. (2008) dependency LM

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed JOSHUA-100.
---
Resolution: Fixed

> Add Shen et al. (2008) dependency LM
> 
>
> Key: JOSHUA-100
> URL: https://issues.apache.org/jira/browse/JOSHUA-100
> Project: Joshua
>  Issue Type: New Feature
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (JOSHUA-107) Verbosity levels

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed JOSHUA-107.
---

> Verbosity levels
> 
>
> Key: JOSHUA-107
> URL: https://issues.apache.org/jira/browse/JOSHUA-107
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.1
>
>
> Joshua should support verbosity levels with a command-line switch, so it's 
> easy to shut it up with something like {{-v 0}} or {{-q}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (JOSHUA-100) Add Shen et al. (2008) dependency LM

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney reopened JOSHUA-100:
-

> Add Shen et al. (2008) dependency LM
> 
>
> Key: JOSHUA-100
> URL: https://issues.apache.org/jira/browse/JOSHUA-100
> Project: Joshua
>  Issue Type: New Feature
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (JOSHUA-71) OS X installation depends on coreutils to run thrax test

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-71?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney reopened JOSHUA-71:


> OS X installation depends on coreutils to run thrax test
> 
>
> Key: JOSHUA-71
> URL: https://issues.apache.org/jira/browse/JOSHUA-71
> Project: Joshua
>  Issue Type: Bug
>Reporter: Luke Orland
> Fix For: 6.1
>
>
> the {{gstat}} command from coreutils is not installed in Darwin by default. 
> One must resolve that dependency via Homebrew, Macports, etc.
> The {{test/thrax/test.sh}} test will fail on an OS X system that does not 
> have coreutils installed. We should either change the test so that it does 
> not require coreutils in Darwin or make it clear in the (developer) 
> installation/setup instructions that coreutils are required for this test, 
> check for coreutils when running the thrax test, and output a helpful message 
> instructing the developer to go install coreutils if {{gstat}} is not found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (JOSHUA-95) Vocabulary locking

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney reopened JOSHUA-95:


> Vocabulary locking
> --
>
> Key: JOSHUA-95
> URL: https://issues.apache.org/jira/browse/JOSHUA-95
> Project: Joshua
>  Issue Type: Bug
>Reporter: Matt Post
>Assignee: Juri Ganitkevitch
> Fix For: 6.1
>
>
> Vocabulary::id() is still synchronized and a potential point of contention. 
> It would be nice to resolve this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-259) Integration tests are failing

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-259:

Fix Version/s: (was: 6.2)
   6.1

> Integration tests are failing
> -
>
> Key: JOSHUA-259
> URL: https://issues.apache.org/jira/browse/JOSHUA-259
> Project: Joshua
>  Issue Type: Bug
>Reporter: Kellen Sunderland
> Fix For: 6.1
>
>
> Several integration tests are currently failing with Joshua.  I have a quick 
> fix coming for one of the tests but just in case we need more discussion 
> around the failures I'll open a bug.
> The currently failing tests for me:
> test/decoder/too-long
> test/server/http
> test/server/tcp-text
> test/thrax/extraction
> and 
> test/decoder/moses-compat (but this is easy to fix, simple extra space in the 
> expected file)
> These are failing under OS X 10.11.  If working under other environments feel 
> free to post a 'works for me'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (JOSHUA-259) Integration tests are failing

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney reopened JOSHUA-259:
-

> Integration tests are failing
> -
>
> Key: JOSHUA-259
> URL: https://issues.apache.org/jira/browse/JOSHUA-259
> Project: Joshua
>  Issue Type: Bug
>Reporter: Kellen Sunderland
> Fix For: 6.1
>
>
> Several integration tests are currently failing with Joshua.  I have a quick 
> fix coming for one of the tests but just in case we need more discussion 
> around the failures I'll open a bug.
> The currently failing tests for me:
> test/decoder/too-long
> test/server/http
> test/server/tcp-text
> test/thrax/extraction
> and 
> test/decoder/moses-compat (but this is easy to fix, simple extra space in the 
> expected file)
> These are failing under OS X 10.11.  If working under other environments feel 
> free to post a 'works for me'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (JOSHUA-259) Integration tests are failing

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney closed JOSHUA-259.
---
Resolution: Not A Problem

> Integration tests are failing
> -
>
> Key: JOSHUA-259
> URL: https://issues.apache.org/jira/browse/JOSHUA-259
> Project: Joshua
>  Issue Type: Bug
>Reporter: Kellen Sunderland
> Fix For: 6.1
>
>
> Several integration tests are currently failing with Joshua.  I have a quick 
> fix coming for one of the tests but just in case we need more discussion 
> around the failures I'll open a bug.
> The currently failing tests for me:
> test/decoder/too-long
> test/server/http
> test/server/tcp-text
> test/thrax/extraction
> and 
> test/decoder/moses-compat (but this is easy to fix, simple extra space in the 
> expected file)
> These are failing under OS X 10.11.  If working under other environments feel 
> free to post a 'works for me'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Joshua Model Input Format(s) and LM Loading

2016-10-26 Thread lewis john mcgibbney
I hear ye loud and clear Matt :) Thank you for the response.

On Wed, Oct 26, 2016 at 12:30 AM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
> From: Matt Post 
> To: dev@joshua.incubator.apache.org
> Cc:
> Date: Tue, 25 Oct 2016 08:49:19 -0400
> Subject: Re: Joshua Model Input Format(s) and LM Loading
> Hi Lewis,
>
> Joshua supports two language model representation packages: KenLM [0] and
> BerkeleyLM [1]. These were both developed at about the same time, and
> represented huge gains in doing this task efficiently, over what had
> previously been the standard approach (SRILM). Ken Heafield (who has
> contributed a lot to Joshua) went on to contribute a lot of other
> improvements to language model representation, decoder integration, and
> also the actual construction of language models and their efficient
> interpolation. His goal for a while was to make SRILM completely
> unnecessary, and I think he succeeded.
>
> BerkeleyLM was more of a one-off project. It is slower than KenLM and
> hasn't been touched in years. If you want to understand, your efforts are
> probably best spent looking into KenLM papers. But it's also worth noting
> that Ken is a crack C++ programmer who has spent years hacking away on
> these problems, and your chances of finding any further efficiencies there
> are probably quite limited unless you have a lot of background in the area.
> But even if you did, I would recommend you not spend your time that way — I
> basically consider the LM representation problem to have been solved by
> KenLM. That's not to say that there are some improvements to be had on the
> Joshua / JNI bridge, but even there, there are probably better things to do.
>
> matt
>
> [0] KenLM: Faster and Smaller Language Model Queries
> http://www.kheafield.com/professional/avenue/kenlm.pdf
>
> [1] Faster and Smaller N-Gram Language Models
> http://nlp.cs.berkeley.edu/pubs/Pauls-Klein_2011_LM_paper.pdf
>
>


[jira] [Created] (JOSHUA-317) SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391

2016-10-26 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created JOSHUA-317:
---

 Summary: SyntaxError: invalid syntax 
scripts/training/run_tuner.py", line 391
 Key: JOSHUA-317
 URL: https://issues.apache.org/jira/browse/JOSHUA-317
 Project: Joshua
  Issue Type: Bug
  Components: er
Affects Versions: 6.0.5
 Environment: Python 3.5
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 6.1


{code}
[tune-bundle] rebuilding...
  dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
[CHANGED]
  
dep=/usr/local/joshua_resources/russian_experiments/exp3/grammar.packed/slice_0.source
 [CHANGED]
  
dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/model/run-joshua.sh
 [NOT FOUND]
  cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force 
--symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir 
/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
/usr/local/joshua_resources/russian_experiments/exp3/tune/model 
--copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" 
-mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 
tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function 
"StateMinimizingLanguageModel -lm_order 5 -lm_file 
/usr/local/joshua_resources/russian_experiments/exp3/lm.kenlm"  -tm0/type hiero 
-tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm 
/usr/local/joshua_resources/russian_experiments/exp3/grammar.packed --tm 
/usr/local/joshua_resources/russian_experiments/exp3/data/tune/grammar.glue
  took 0 seconds (0s)
[mert-1] rebuilding...
  dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en 
[CHANGED]
  dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
[CHANGED]
  dep=tune/model/grammar.packed/slice_0.source [CHANGED]
  
dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final
 [NOT FOUND]
  cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py 
/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en 
/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru 
--tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner 
mert --decoder 
/usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command 
--decoder-config 
/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
--decoder-output-file 
/usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest 
--decoder-log-file 
/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log 
--iterations 10 --metric 'BLEU 4 closest'
  JOB FAILED (return code 1)
  File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 391
'ITERATIONS': `iterations`,
  ^
SyntaxError: invalid syntax
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-317) SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391

2016-10-26 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-317:

Component/s: (was: er)
 tuner

> SyntaxError: invalid syntax scripts/training/run_tuner.py", line 391
> 
>
> Key: JOSHUA-317
> URL: https://issues.apache.org/jira/browse/JOSHUA-317
> Project: Joshua
>  Issue Type: Bug
>  Components: tuner
>Affects Versions: 6.0.5
> Environment: Python 3.5
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 6.1
>
>
> {code}
> [tune-bundle] rebuilding...
>   
> dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/grammar.packed/slice_0.source
>  [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/model/run-joshua.sh
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force 
> --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir 
> /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/model 
> --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" 
> -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 
> tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function 
> "StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /usr/local/joshua_resources/russian_experiments/exp3/lm.kenlm"  -tm0/type 
> hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm 
> /usr/local/joshua_resources/russian_experiments/exp3/grammar.packed --tm 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/grammar.glue
>   took 0 seconds (0s)
> [mert-1] rebuilding...
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en 
> [CHANGED]
>   dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
> [CHANGED]
>   dep=tune/model/grammar.packed/slice_0.source [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru 
> --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner 
> mert --decoder 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command 
> --decoder-config 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
> --decoder-output-file 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest 
> --decoder-log-file 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log 
> --iterations 10 --metric 'BLEU 4 closest'
>   JOB FAILED (return code 1)
>   File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 391
> 'ITERATIONS': `iterations`,
>   ^
> SyntaxError: invalid syntax
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (JOSHUA-318) scripts/training/run_tuner.py should enable configurable memory usage when invioking joshua-decoder

2016-10-26 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created JOSHUA-318:
---

 Summary: scripts/training/run_tuner.py should enable configurable 
memory usage when invioking joshua-decoder
 Key: JOSHUA-318
 URL: https://issues.apache.org/jira/browse/JOSHUA-318
 Project: Joshua
  Issue Type: Improvement
  Components: tuner
Affects Versions: 6.0.5
Reporter: Lewis John McGibbney
 Fix For: 6.2


When I run the run_tuner.py script I can easily run into the following
{code}
[mert-1] rebuilding...
  dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en
  dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
[CHANGED]
  dep=tune/model/grammar.gz.packed/slice_0.source [CHANGED]
  
dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final
 [NOT FOUND]
  cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py 
/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en 
/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru 
--tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner 
mert --decoder 
/usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command 
--decoder-config 
/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
--decoder-output-file 
/usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest 
--decoder-log-file 
/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log 
--iterations 10 --metric 'BLEU 4 closest'
  JOB FAILED (return code 1)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at 
org.apache.joshua.decoder.ff.tm.packed.PackedGrammar$PackedSlice.initializeFeatureStructures(PackedGrammar.java:385)
at 
org.apache.joshua.decoder.ff.tm.packed.PackedGrammar$PackedSlice.(PackedGrammar.java:368)
at 
org.apache.joshua.decoder.ff.tm.packed.PackedGrammar.(PackedGrammar.java:153)
at 
org.apache.joshua.decoder.Decoder.initializeTranslationGrammars(Decoder.java:458)
at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:389)
at org.apache.joshua.decoder.Decoder.(Decoder.java:128)
at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69)
Traceback (most recent call last):
  File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 553, 
in 
main(sys.argv)
  File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 536, 
in main
run_zmert(opts.tunedir, opts.source, opts.target, opts.decoder, 
opts.decoder_config, opts.decoder_output_file, opts)
  File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 417, 
in run_zmert
opts.metric, opts.iterations or 10)
  File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 399, 
in setup_configs
for feature,weight in get_features(config):
  File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 351, 
in get_features
output = check_output("%s/bin/joshua-decoder -c %s -show-weights -v 0" % 
(JOSHUA, config_file), shell=True)
  File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 626, in 
check_output
**kwargs).stdout
  File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 708, in 
run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 
'/usr/local/incubator-joshua/bin/joshua-decoder -c 
/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
-show-weights -v 0' returned non-zero exit status 1
{code}
This is because, by default the joshua-decoder script runs with 4g of memory. 
The run_runer.py script should be flexible enough to continue with the memory 
allocation provided when a pipe was initially invoked. This value should then 
be passed to the joshua-decoder script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-318) scripts/training/run_tuner.py should enable configurable memory usage when invioking joshua-decoder

2016-10-26 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609503#comment-15609503
 ] 

Lewis John McGibbney commented on JOSHUA-318:
-

The following code is where the sh*t his the fan
{code}
def get_features(config_file):
"""Queries the decoder for all dense features that will be fired by the 
feature
functions activated in the config file"""

output = check_output("%s/bin/joshua-decoder -c %s -show-weights -v 0" % 
(JOSHUA, config_file), shell=True)
features = []
for index, item in enumerate(output.split('\n')):
if item != "":
features.append(tuple(item.split()))
return features
{code}

> scripts/training/run_tuner.py should enable configurable memory usage when 
> invioking joshua-decoder
> ---
>
> Key: JOSHUA-318
> URL: https://issues.apache.org/jira/browse/JOSHUA-318
> Project: Joshua
>  Issue Type: Improvement
>  Components: tuner
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
> Fix For: 6.2
>
>
> When I run the run_tuner.py script I can easily run into the following
> {code}
> [mert-1] rebuilding...
>   dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en
>   dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
> [CHANGED]
>   dep=tune/model/grammar.gz.packed/slice_0.source [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en 
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru 
> --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner 
> mert --decoder 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command 
> --decoder-config 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
> --decoder-output-file 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest 
> --decoder-log-file 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log 
> --iterations 10 --metric 'BLEU 4 closest'
>   JOB FAILED (return code 1)
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>   at 
> org.apache.joshua.decoder.ff.tm.packed.PackedGrammar$PackedSlice.initializeFeatureStructures(PackedGrammar.java:385)
>   at 
> org.apache.joshua.decoder.ff.tm.packed.PackedGrammar$PackedSlice.(PackedGrammar.java:368)
>   at 
> org.apache.joshua.decoder.ff.tm.packed.PackedGrammar.(PackedGrammar.java:153)
>   at 
> org.apache.joshua.decoder.Decoder.initializeTranslationGrammars(Decoder.java:458)
>   at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:389)
>   at org.apache.joshua.decoder.Decoder.(Decoder.java:128)
>   at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69)
> Traceback (most recent call last):
>   File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 553, 
> in 
> main(sys.argv)
>   File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 536, 
> in main
> run_zmert(opts.tunedir, opts.source, opts.target, opts.decoder, 
> opts.decoder_config, opts.decoder_output_file, opts)
>   File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 417, 
> in run_zmert
> opts.metric, opts.iterations or 10)
>   File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 399, 
> in setup_configs
> for feature,weight in get_features(config):
>   File "/usr/local/incubator-joshua/scripts/training/run_tuner.py", line 351, 
> in get_features
> output = check_output("%s/bin/joshua-decoder -c %s -show-weights -v 0" % 
> (JOSHUA, config_file), shell=True)
>   File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 626, in 
> check_output
> **kwargs).stdout
>   File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 708, in 
> run
> output=stdout, stderr=stderr)
> subprocess.CalledProcessError: Command 
> '/usr/local/incubator-joshua/bin/joshua-decoder -c 
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config 
> -show-weights -v 0' returned non-zero exit status 1
> {code}
> This is because, by default the joshua-decoder script runs with 4g of memory. 
> The run_runer.py script should be flexible enough to continue with the memory 
> allocation provided when a pipe was initially invoked. This value should then 
> be passed to the joshua-decoder script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Podling Report Reminder - November 2016

2016-10-26 Thread johndament
Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 16 November 2016, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, November 02).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

*   Your project name
*   A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
*   A list of the three most important issues to address in the move
towards graduation.
*   Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
*   How has the community developed since the last report
*   How has the project developed since the last report.

This should be appended to the Incubator Wiki page at:

http://wiki.apache.org/incubator/November2016

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC


[jira] [Created] (JOSHUA-319) test-decode decoder_command results in java.lang.NumberFormatException: For input string: "MAXSPAN"

2016-10-26 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created JOSHUA-319:
---

 Summary: test-decode decoder_command results in 
java.lang.NumberFormatException: For input string: "MAXSPAN"
 Key: JOSHUA-319
 URL: https://issues.apache.org/jira/browse/JOSHUA-319
 Project: Joshua
  Issue Type: Bug
  Components: decoders
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 6.1


When I run the following command
{code}
/usr/local/incubator-joshua/bin/pipeline.pl  --rundir . --type hiero --corpus 
/usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en --tune 
/usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.tune 
--test 
/usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.test 
--source en --target ru --readme "Experiment 3 Run 1 of ru --> en model 
training" --aligner berkeley --hadoop-mem 10g --tmp 
/usr/local/hadoop-2.5.2/hadoop_tmp_dir --first-step test --grammar 
/usr/local/joshua_resources/russian_experiments/exp3/grammar.gz --joshua-mem 10g
{code}
I end up with the following message.
{code}
INFO - Parameters read from configuration file: joshua.config
INFO - tm = 'TYPE -maxspan MAXSPAN -owner OWNER -path 
/usr/local/joshua_resources/russian_experiments/exp3/test/1/model/grammar.gz.packed'
INFO - tm = 'thrax -maxspan -1 -owner glue -path 
/usr/local/joshua_resources/russian_experiments/exp3/test/1/model/grammar.glue'
INFO - defaultnonterminal = 'X'
INFO - goalsymbol = 'GOAL'
INFO - markoovs = 'false'
INFO - search = 'cky'
INFO - pop-limit: 5000
INFO - poplimit = '5000'
INFO - topn = '300'
INFO - useuniquenbest = 'true'
INFO - outputformat = '%i ||| %s ||| %f ||| %c'
INFO - includealignindex = 'false'
INFO - featurefunction = 'OOVPenalty'
INFO - featurefunction = 'WordPenalty'
INFO - c = 'joshua.config'
INFO - threads = '1'
INFO - topn = '0'
INFO - outputformat = '%s'
INFO - Read 3 weights (0 of them dense)
Exception in thread "main" java.lang.NumberFormatException: For input string: 
"MAXSPAN"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at 
org.apache.joshua.decoder.Decoder.initializeTranslationGrammars(Decoder.java:451)
at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:389)
at org.apache.joshua.decoder.Decoder.(Decoder.java:128)
at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-319) test-decode decoder_command results in java.lang.NumberFormatException: For input string: "MAXSPAN"

2016-10-26 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610441#comment-15610441
 ] 

Lewis John McGibbney commented on JOSHUA-319:
-

As you can see, the following line from joshua.config is the culprit
{code}
tm = 'TYPE -maxspan MAXSPAN -owner OWNER -path 
/usr/local/joshua_resources/russian_experiments/exp3/test/1/model/grammar.gz.packed'
{code}
The string substitutions have not taken place!!! For the time being I have 
manually edited the file to look like the following...
{code}
tm = packed -maxspan -1 -owner packed -path 
/usr/local/joshua_resources/russian_experiments/exp3/test/1/model/grammar.gz.packed
{code}
... continuing.

> test-decode decoder_command results in java.lang.NumberFormatException: For 
> input string: "MAXSPAN"
> ---
>
> Key: JOSHUA-319
> URL: https://issues.apache.org/jira/browse/JOSHUA-319
> Project: Joshua
>  Issue Type: Bug
>  Components: decoders
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 6.1
>
>
> When I run the following command
> {code}
> /usr/local/incubator-joshua/bin/pipeline.pl  --rundir . --type hiero --corpus 
> /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en --tune 
> /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.tune 
> --test 
> /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.test 
> --source en --target ru --readme "Experiment 3 Run 1 of ru --> en model 
> training" --aligner berkeley --hadoop-mem 10g --tmp 
> /usr/local/hadoop-2.5.2/hadoop_tmp_dir --first-step test --grammar 
> /usr/local/joshua_resources/russian_experiments/exp3/grammar.gz --joshua-mem 
> 10g
> {code}
> I end up with the following message.
> {code}
> INFO - Parameters read from configuration file: joshua.config
> INFO - tm = 'TYPE -maxspan MAXSPAN -owner OWNER -path 
> /usr/local/joshua_resources/russian_experiments/exp3/test/1/model/grammar.gz.packed'
> INFO - tm = 'thrax -maxspan -1 -owner glue -path 
> /usr/local/joshua_resources/russian_experiments/exp3/test/1/model/grammar.glue'
> INFO - defaultnonterminal = 'X'
> INFO - goalsymbol = 'GOAL'
> INFO - markoovs = 'false'
> INFO - search = 'cky'
> INFO - pop-limit: 5000
> INFO - poplimit = '5000'
> INFO - topn = '300'
> INFO - useuniquenbest = 'true'
> INFO - outputformat = '%i ||| %s ||| %f ||| %c'
> INFO - includealignindex = 'false'
> INFO - featurefunction = 'OOVPenalty'
> INFO - featurefunction = 'WordPenalty'
> INFO - c = 'joshua.config'
> INFO - threads = '1'
> INFO - topn = '0'
> INFO - outputformat = '%s'
> INFO - Read 3 weights (0 of them dense)
> Exception in thread "main" java.lang.NumberFormatException: For input string: 
> "MAXSPAN"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.joshua.decoder.Decoder.initializeTranslationGrammars(Decoder.java:451)
>   at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:389)
>   at org.apache.joshua.decoder.Decoder.(Decoder.java:128)
>   at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Being realistic about memory usage

2016-10-26 Thread lewis john mcgibbney
Hi Folks,
By default everything is set to 4g.
IMHO this is unrealistic and 9/10 times leads to OOM Java Heap Space issues.
I would suggest that this is increased to around 8g across the board.
Further justification is that a) the models which are being distributed,
when run within the Joshua Server, consume around 5 1/2 GB RAM when idol,
b) Language packs will undoubtedly grow, for example I am working with
~850K Russian sentences... 4GB RAM does simply not cut the mustard.
Any comments?
Thanks

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney


[GitHub] incubator-joshua issue #73: JOSHUA-316 run_bundler.py returning JOB FAILED (...

2016-10-26 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/incubator-joshua/pull/73
  
The most recent commit I submitted here also addresses 
[JOSHUA-317](https://issues.apache.org/jira/browse/JOSHUA-317) and 
[JOSHUA-318](https://issues.apache.org/jira/browse/JOSHUA-318) as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (JOSHUA-316) run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a bytes-like object is required, not 'str'

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610450#comment-15610450
 ] 

ASF GitHub Bot commented on JOSHUA-316:
---

Github user lewismc commented on the issue:

https://github.com/apache/incubator-joshua/pull/73
  
The most recent commit I submitted here also addresses 
[JOSHUA-317](https://issues.apache.org/jira/browse/JOSHUA-317) and 
[JOSHUA-318](https://issues.apache.org/jira/browse/JOSHUA-318) as well.


> run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a 
> bytes-like object is required, not 'str'
> -
>
> Key: JOSHUA-316
> URL: https://issues.apache.org/jira/browse/JOSHUA-316
> Project: Joshua
>  Issue Type: Bug
>  Components: bundler
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 6.1
>
>
> {code}
> [glue-tune] rebuilding...
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source
>  [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/support/create_glue_grammar.sh 
> /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed > 
> /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue
>   took 1 seconds (1s)
> [tune-bundle] rebuilding...
>   
> dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source
>  [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/tune/model/run-joshua.sh
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force 
> --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir 
> /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> /usr/local/joshua_resources/russian_experiments/exp2/tune/model 
> --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" 
> -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 
> tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function 
> "StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm"  -tm0/type 
> hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm 
> /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed --tm 
> /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue
>   JOB FAILED (return code 1)
> * Running the copy-config.pl script with the command: 
> /usr/local/incubator-joshua/scripts/copy-config.pl -top-n 300 -output-format 
> "%i ||| %s ||| %f ||| %c" -mark-oovs false -search cky -weights "lm_0 1 
> tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " 
> -feature-function "StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm"  -tm0/type 
> hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue
> Traceback (most recent call last):
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 748, in main
> operations = collect_operations(opts)
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 637, in collect_operations
> opts.copy_config_options
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 202, in filter_through_copy_config_script
> result, err = p.communicate(config_text)
>   File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1072, 
> in communicate
> stdout, stderr = self._communicate(input, endtime, timeout)
>   File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1700, 
> in _communicate
> input_view = memoryview(self._input)
> TypeError: memoryview: a bytes-like object is required, not 'str'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 760, in 
> main(sys.argv)
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 751, in main
> error_quit(e.message)
> AttributeError: 'TypeError' object has no attribute 'message'
> * WARNING: no key 'outputformat' found in config file (appending to end)
> * WARNING: no key 'search' found in config file (appending to end)
> * WARNING: no key 'topn' found in config file (appending to end)
> * WARNING: no key 'markoovs' found in config file (appending to end)
> {code}



--
This message was sent by Atlassian JIR

Re: Being realistic about memory usage

2016-10-26 Thread John Hewitt
+1 I've never used Joshua successfully without twiddling around with memory
allowances. We'll put a nice warning up about the default memory usage, and
an advisory about how to set the maximum lower if the user's box can't
handle it.

-John

On Wed, Oct 26, 2016 at 10:52 PM, lewis john mcgibbney 
wrote:

> Hi Folks,
> By default everything is set to 4g.
> IMHO this is unrealistic and 9/10 times leads to OOM Java Heap Space
> issues.
> I would suggest that this is increased to around 8g across the board.
> Further justification is that a) the models which are being distributed,
> when run within the Joshua Server, consume around 5 1/2 GB RAM when idol,
> b) Language packs will undoubtedly grow, for example I am working with
> ~850K Russian sentences... 4GB RAM does simply not cut the mustard.
> Any comments?
> Thanks
>
> --
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney
>


Pipeline Mystery

2016-10-26 Thread lewis john mcgibbney
Hi Folks,
So I've been plodding away again and feel i am very close to generating my
first language pack, however I've arrived at the following fankle!!!
If I run a pipeline from start to finish it fails at the 'test-bundle-1'
phase as below stating " [Errno 2] No such file or directory:
'/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final'"

lmcgibbn@LMC-056430 /usr/local/joshua_resources/russian_experiments/exp3 $
/usr/local/incubator-joshua/bin/pipeline.pl  --rundir . --type hiero
--corpus
/usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en
--tune
/usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.tune
--test
/usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.test
--source en --target ru --readme "Experiment 3 Run 1 of ru --> en model
training" --aligner berkeley --hadoop-mem 10g --tmp
/usr/local/hadoop-2.5.2/hadoop_tmp_dir
[train-copy-and-filter] cached, skipping...
[train-tokenize-en] cached, skipping...
[train-tokenize-ru] cached, skipping...
[train-trim] cached, skipping...
[train-lowercase-en] cached, skipping...
[train-lowercase-ru] cached, skipping...
[train-vocab-en] cached, skipping...
[train-vocab-ru] cached, skipping...
[tune-copy-and-filter] cached, skipping...
[tune-tokenize-en] cached, skipping...
[tune-tokenize-ru] cached, skipping...
[tune-lowercase-en] cached, skipping...
[tune-lowercase-ru] cached, skipping...
[tune-vocab-en] cached, skipping...
[tune-vocab-ru] cached, skipping...
[test-copy-and-filter] cached, skipping...
[test-tokenize-en] cached, skipping...
[test-tokenize-ru] cached, skipping...
[test-lowercase-en] cached, skipping...
[test-lowercase-ru] cached, skipping...
[test-vocab-en] cached, skipping...
[test-vocab-ru] cached, skipping...
[lm-sort-uniq] cached, skipping...
[kenlm] cached, skipping...
[compile-kenlm] cached, skipping...
[glue-tune] cached, skipping...
[tune-bundle] cached, skipping...
[mert-1] rebuilding...

dep=/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en

dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config
[CHANGED]
  dep=tune/model/grammar.gz.packed/slice_0.source

dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final
[NOT FOUND]
  cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py
/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en
/usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru
--tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune --tuner
mert --decoder
/usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command
--decoder-config
/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config
--decoder-output-file
/usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest
--decoder-log-file
/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log
--iterations 10 --metric 'BLEU 4 closest'
  took 27 seconds (27s)
[test-bundle-1] rebuilding...

dep=/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final
[NOT FOUND]
  dep=grammar.gz

dep=/usr/local/joshua_resources/russian_experiments/exp3/test/1/model/joshua.config
  cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force
--symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir
/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final
/usr/local/joshua_resources/russian_experiments/exp3/test/1/model
--copy-config-options '-top-n 300 -pop-limit 5000 -output-format "%i ||| %s
||| %f ||| %c" -mark-oovs false' --pack-tm grammar.gz --tm
/usr/local/joshua_resources/russian_experiments/exp3/data/tune/grammar.glue
  JOB FAILED (return code 2)
ERROR:root:ERROR: argument config: can't open
'/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final':
[Errno 2] No such file or directory:
'/usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config.final'

However, if I run the pipeline with the --first-step test flag, then I get
the following where the 'test-bundle-1' phase executes and completes
flawlessly however the pipeline then goes on to die at the 'test-decode-1'
phase!!!

lmcgibbn@LMC-056430 /usr/local/joshua_resources/russian_experiments/exp3 $
/usr/local/incubator-joshua/bin/pipeline.pl  --rundir . --type hiero
--corpus
/usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en
--tune
/usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.tune
--test
/usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.test
--source en --target ru --readme "Experiment 3 Run 1 of ru --> en model
training" --aligner berkeley --hadoop-mem 10g --tmp
/usr/local/hadoop-2.5.2/hadoop_tmp_dir --first-step test --grammar
/usr/local/joshua_resources/russian_experiments/exp3/grammar.gz
--joshua-mem 10g
[train-copy-and-filter] cached, skipping...
[train-tokenize-en] cached, skipping...
[train-tokenize-ru] cached

Re: Pipeline Mystery

2016-10-26 Thread John Hewitt
It seems like MERT isn't writing it's final config file (which is typical
of MERT, in my experience). I recall giving up and using kbmira. This final
config file is the one used in test, so I can see why skipping to test ends
up failing pretty quick.

To answer your question, though, I haven't tried. Not in my bandwidth right
now.

-John

On Thu, Oct 27, 2016 at 12:44 AM, lewis john mcgibbney 
wrote:

> Hi Folks,
> So I've been plodding away again and feel i am very close to generating my
> first language pack, however I've arrived at the following fankle!!!
> If I run a pipeline from start to finish it fails at the 'test-bundle-1'
> phase as below stating " [Errno 2] No such file or directory:
> '/usr/local/joshua_resources/russian_experiments/exp3/tune/
> joshua.config.final'"
>
> lmcgibbn@LMC-056430 /usr/local/joshua_resources/russian_experiments/exp3 $
> /usr/local/incubator-joshua/bin/pipeline.pl  --rundir . --type hiero
> --corpus
> /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en
> --tune
> /usr/local/joshua_resources/russian_experiments/data/
> commoncrawl.ru-en.tune
> --test
> /usr/local/joshua_resources/russian_experiments/data/
> commoncrawl.ru-en.test
> --source en --target ru --readme "Experiment 3 Run 1 of ru --> en model
> training" --aligner berkeley --hadoop-mem 10g --tmp
> /usr/local/hadoop-2.5.2/hadoop_tmp_dir
> [train-copy-and-filter] cached, skipping...
> [train-tokenize-en] cached, skipping...
> [train-tokenize-ru] cached, skipping...
> [train-trim] cached, skipping...
> [train-lowercase-en] cached, skipping...
> [train-lowercase-ru] cached, skipping...
> [train-vocab-en] cached, skipping...
> [train-vocab-ru] cached, skipping...
> [tune-copy-and-filter] cached, skipping...
> [tune-tokenize-en] cached, skipping...
> [tune-tokenize-ru] cached, skipping...
> [tune-lowercase-en] cached, skipping...
> [tune-lowercase-ru] cached, skipping...
> [tune-vocab-en] cached, skipping...
> [tune-vocab-ru] cached, skipping...
> [test-copy-and-filter] cached, skipping...
> [test-tokenize-en] cached, skipping...
> [test-tokenize-ru] cached, skipping...
> [test-lowercase-en] cached, skipping...
> [test-lowercase-ru] cached, skipping...
> [test-vocab-en] cached, skipping...
> [test-vocab-ru] cached, skipping...
> [lm-sort-uniq] cached, skipping...
> [kenlm] cached, skipping...
> [compile-kenlm] cached, skipping...
> [glue-tune] cached, skipping...
> [tune-bundle] cached, skipping...
> [mert-1] rebuilding...
>
> dep=/usr/local/joshua_resources/russian_experiments/
> exp3/data/tune/corpus.en
>
> dep=/usr/local/joshua_resources/russian_experiments/
> exp3/tune/joshua.config
> [CHANGED]
>   dep=tune/model/grammar.gz.packed/slice_0.source
>
> dep=/usr/local/joshua_resources/russian_experiments/
> exp3/tune/joshua.config.final
> [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/training/run_tuner.py
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.en
> /usr/local/joshua_resources/russian_experiments/exp3/data/tune/corpus.ru
> --tunedir /usr/local/joshua_resources/russian_experiments/exp3/tune
> --tuner
> mert --decoder
> /usr/local/joshua_resources/russian_experiments/exp3/tune/decoder_command
> --decoder-config
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.config
> --decoder-output-file
> /usr/local/joshua_resources/russian_experiments/exp3/tune/output.nbest
> --decoder-log-file
> /usr/local/joshua_resources/russian_experiments/exp3/tune/joshua.log
> --iterations 10 --metric 'BLEU 4 closest'
>   took 27 seconds (27s)
> [test-bundle-1] rebuilding...
>
> dep=/usr/local/joshua_resources/russian_experiments/
> exp3/tune/joshua.config.final
> [NOT FOUND]
>   dep=grammar.gz
>
> dep=/usr/local/joshua_resources/russian_experiments/
> exp3/test/1/model/joshua.config
>   cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force
> --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir
> /usr/local/joshua_resources/russian_experiments/exp3/tune/
> joshua.config.final
> /usr/local/joshua_resources/russian_experiments/exp3/test/1/model
> --copy-config-options '-top-n 300 -pop-limit 5000 -output-format "%i ||| %s
> ||| %f ||| %c" -mark-oovs false' --pack-tm grammar.gz --tm
> /usr/local/joshua_resources/russian_experiments/exp3/data/
> tune/grammar.glue
>   JOB FAILED (return code 2)
> ERROR:root:ERROR: argument config: can't open
> '/usr/local/joshua_resources/russian_experiments/exp3/tune/
> joshua.config.final':
> [Errno 2] No such file or directory:
> '/usr/local/joshua_resources/russian_experiments/exp3/tune/
> joshua.config.final'
>
> However, if I run the pipeline with the --first-step test flag, then I get
> the following where the 'test-bundle-1' phase executes and completes
> flawlessly however the pipeline then goes on to die at the 'test-decode-1'
> phase!!!
>
> lmcgibbn@LMC-056430 /usr/local/joshua_resources/russian_experiments/exp3 $
> /usr/local/incubator-joshua/bin/pipeline.pl  --rundir . --type hiero
>

Re: Pipeline Mystery

2016-10-26 Thread lewis john mcgibbney
Hi John,
Thanks for your response. Replies inline...

On Wed, Oct 26, 2016 at 9:49 PM, <
dev-digest-h...@joshua.incubator.apache.org> wrote:

>
> From: John Hewitt 
> To: dev@joshua.incubator.apache.org
> Cc:
> Date: Thu, 27 Oct 2016 00:49:34 -0400
> Subject: Re: Pipeline Mystery
> It seems like MERT isn't writing it's final config file (which is typical
> of MERT, in my experience). I recall giving up and using kbmira. This final
> config file is the one used in test, so I can see why skipping to test ends
> up failing pretty quick.
>

>From my understanding, in order to use --tuner kbmira, I need to download,
configure and run Moses. Is this correct? I would REALLY prefer not to do
this if at all possible. In the meantime, it looks like I'm going to try
another fresh pipeline run and see where I get. Sometimes starting afresh
has lead to surprising and delightful results :)


>
> To answer your question, though, I haven't tried. Not in my bandwidth right
> now.


No problems. In all honesty, an entire pipeline execution on a small
parallel dataset would be a killer smoke test(s) for any contributions
coming into Joshua. Language pack creation is so important and having
confidence in the overall process is something which I really look forward
to building over the next while.
Thanks


[jira] [Commented] (JOSHUA-319) test-decode decoder_command results in java.lang.NumberFormatException: For input string: "MAXSPAN"

2016-10-26 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610743#comment-15610743
 ] 

Lewis John McGibbney commented on JOSHUA-319:
-

Some supplementary reading folks
http://www.mail-archive.com/dev%40joshua.incubator.apache.org/msg01769.html


> test-decode decoder_command results in java.lang.NumberFormatException: For 
> input string: "MAXSPAN"
> ---
>
> Key: JOSHUA-319
> URL: https://issues.apache.org/jira/browse/JOSHUA-319
> Project: Joshua
>  Issue Type: Bug
>  Components: decoders
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 6.1
>
>
> When I run the following command
> {code}
> /usr/local/incubator-joshua/bin/pipeline.pl  --rundir . --type hiero --corpus 
> /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en --tune 
> /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.tune 
> --test 
> /usr/local/joshua_resources/russian_experiments/data/commoncrawl.ru-en.test 
> --source en --target ru --readme "Experiment 3 Run 1 of ru --> en model 
> training" --aligner berkeley --hadoop-mem 10g --tmp 
> /usr/local/hadoop-2.5.2/hadoop_tmp_dir --first-step test --grammar 
> /usr/local/joshua_resources/russian_experiments/exp3/grammar.gz --joshua-mem 
> 10g
> {code}
> I end up with the following message.
> {code}
> INFO - Parameters read from configuration file: joshua.config
> INFO - tm = 'TYPE -maxspan MAXSPAN -owner OWNER -path 
> /usr/local/joshua_resources/russian_experiments/exp3/test/1/model/grammar.gz.packed'
> INFO - tm = 'thrax -maxspan -1 -owner glue -path 
> /usr/local/joshua_resources/russian_experiments/exp3/test/1/model/grammar.glue'
> INFO - defaultnonterminal = 'X'
> INFO - goalsymbol = 'GOAL'
> INFO - markoovs = 'false'
> INFO - search = 'cky'
> INFO - pop-limit: 5000
> INFO - poplimit = '5000'
> INFO - topn = '300'
> INFO - useuniquenbest = 'true'
> INFO - outputformat = '%i ||| %s ||| %f ||| %c'
> INFO - includealignindex = 'false'
> INFO - featurefunction = 'OOVPenalty'
> INFO - featurefunction = 'WordPenalty'
> INFO - c = 'joshua.config'
> INFO - threads = '1'
> INFO - topn = '0'
> INFO - outputformat = '%s'
> INFO - Read 3 weights (0 of them dense)
> Exception in thread "main" java.lang.NumberFormatException: For input string: 
> "MAXSPAN"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>   at java.lang.Integer.parseInt(Integer.java:580)
>   at java.lang.Integer.parseInt(Integer.java:615)
>   at 
> org.apache.joshua.decoder.Decoder.initializeTranslationGrammars(Decoder.java:451)
>   at org.apache.joshua.decoder.Decoder.initialize(Decoder.java:389)
>   at org.apache.joshua.decoder.Decoder.(Decoder.java:128)
>   at org.apache.joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:69)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)