[jira] [Updated] (JOSHUA-316) run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a bytes-like object is required, not 'str'

2016-10-25 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated JOSHUA-316:

Fix Version/s: (was: 6.2)
   6.1

> run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a 
> bytes-like object is required, not 'str'
> -
>
> Key: JOSHUA-316
> URL: https://issues.apache.org/jira/browse/JOSHUA-316
> Project: Joshua
>  Issue Type: Bug
>  Components: bundler
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Priority: Critical
> Fix For: 6.1
>
>
> {code}
> [glue-tune] rebuilding...
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source
>  [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/support/create_glue_grammar.sh 
> /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed > 
> /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue
>   took 1 seconds (1s)
> [tune-bundle] rebuilding...
>   
> dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source
>  [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/tune/model/run-joshua.sh
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force 
> --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir 
> /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> /usr/local/joshua_resources/russian_experiments/exp2/tune/model 
> --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" 
> -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 
> tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function 
> "StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm"  -tm0/type 
> hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm 
> /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed --tm 
> /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue
>   JOB FAILED (return code 1)
> * Running the copy-config.pl script with the command: 
> /usr/local/incubator-joshua/scripts/copy-config.pl -top-n 300 -output-format 
> "%i ||| %s ||| %f ||| %c" -mark-oovs false -search cky -weights "lm_0 1 
> tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " 
> -feature-function "StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm"  -tm0/type 
> hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue
> Traceback (most recent call last):
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 748, in main
> operations = collect_operations(opts)
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 637, in collect_operations
> opts.copy_config_options
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 202, in filter_through_copy_config_script
> result, err = p.communicate(config_text)
>   File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1072, 
> in communicate
> stdout, stderr = self._communicate(input, endtime, timeout)
>   File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1700, 
> in _communicate
> input_view = memoryview(self._input)
> TypeError: memoryview: a bytes-like object is required, not 'str'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 760, in 
> main(sys.argv)
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 751, in main
> error_quit(e.message)
> AttributeError: 'TypeError' object has no attribute 'message'
> * WARNING: no key 'outputformat' found in config file (appending to end)
> * WARNING: no key 'search' found in config file (appending to end)
> * WARNING: no key 'topn' found in config file (appending to end)
> * WARNING: no key 'markoovs' found in config file (appending to end)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JOSHUA-316) run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a bytes-like object is required, not 'str'

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607165#comment-15607165
 ] 

ASF GitHub Bot commented on JOSHUA-316:
---

Github user lewismc commented on the issue:

https://github.com/apache/incubator-joshua/pull/73
  
BTW I am using Python 3.5 here.


> run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a 
> bytes-like object is required, not 'str'
> -
>
> Key: JOSHUA-316
> URL: https://issues.apache.org/jira/browse/JOSHUA-316
> Project: Joshua
>  Issue Type: Bug
>  Components: bundler
>Affects Versions: 6.0.5
>Reporter: Lewis John McGibbney
>Priority: Critical
> Fix For: 6.2
>
>
> {code}
> [glue-tune] rebuilding...
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source
>  [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/support/create_glue_grammar.sh 
> /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed > 
> /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue
>   took 1 seconds (1s)
> [tune-bundle] rebuilding...
>   
> dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source
>  [CHANGED]
>   
> dep=/usr/local/joshua_resources/russian_experiments/exp2/tune/model/run-joshua.sh
>  [NOT FOUND]
>   cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force 
> --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir 
> /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config 
> /usr/local/joshua_resources/russian_experiments/exp2/tune/model 
> --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" 
> -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 
> tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function 
> "StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm"  -tm0/type 
> hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm 
> /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed --tm 
> /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue
>   JOB FAILED (return code 1)
> * Running the copy-config.pl script with the command: 
> /usr/local/incubator-joshua/scripts/copy-config.pl -top-n 300 -output-format 
> "%i ||| %s ||| %f ||| %c" -mark-oovs false -search cky -weights "lm_0 1 
> tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " 
> -feature-function "StateMinimizingLanguageModel -lm_order 5 -lm_file 
> /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm"  -tm0/type 
> hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue
> Traceback (most recent call last):
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 748, in main
> operations = collect_operations(opts)
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 637, in collect_operations
> opts.copy_config_options
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 202, in filter_through_copy_config_script
> result, err = p.communicate(config_text)
>   File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1072, 
> in communicate
> stdout, stderr = self._communicate(input, endtime, timeout)
>   File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1700, 
> in _communicate
> input_view = memoryview(self._input)
> TypeError: memoryview: a bytes-like object is required, not 'str'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 760, in 
> main(sys.argv)
>   File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line 
> 751, in main
> error_quit(e.message)
> AttributeError: 'TypeError' object has no attribute 'message'
> * WARNING: no key 'outputformat' found in config file (appending to end)
> * WARNING: no key 'search' found in config file (appending to end)
> * WARNING: no key 'topn' found in config file (appending to end)
> * WARNING: no key 'markoovs' found in config file (appending to end)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] incubator-joshua issue #73: JOSHUA-316 run_bundler.py returning JOB FAILED (...

2016-10-25 Thread lewismc
Github user lewismc commented on the issue:

https://github.com/apache/incubator-joshua/pull/73
  
BTW I am using Python 3.5 here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-joshua pull request #73: JOSHUA-316 run_bundler.py returning JOB F...

2016-10-25 Thread lewismc
GitHub user lewismc opened a pull request:

https://github.com/apache/incubator-joshua/pull/73

JOSHUA-316 run_bundler.py returning JOB FAILED (return code 1) TypeError: 
memoryview: a bytes-like object is required, not 'str'

This issue addresses https://issues.apache.org/jira/browse/JOSHUA-316

This issue was a complete PITA. The non ascii character in run_bundler.py 
README template was a frigging big PITA.

Done now folks :)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/incubator-joshua JOSHUA-316

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-joshua/pull/73.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #73


commit d137f2970d1eace1dcec576d287712cd5b88
Author: Lewis John McGibbney 
Date:   2016-10-26T02:31:10Z

JOSHUA-316 run_bundler.py returning JOB FAILED (return code 1) TypeError: 
memoryview: a bytes-like object is required, not 'str'




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: openjdk 8 incompatibility

2016-10-25 Thread John Hewitt
Checks out. Thanks, Matt.

-John

On Tue, Oct 25, 2016 at 3:56 PM, Matt Post  wrote:

> Hmm, inclusion of that line looks like a mistake. I've seen Eclipse add
> random imports because it sorts the suggestions in a very unhelpful manner.
> I just removed the line and pushed, try again.
>
>
> > On Oct 25, 2016, at 1:11 PM, John Hewitt  wrote:
> >
> > Hi all,
> >
> > Has anyone been able to compile Joshua with openjdk? I get this message:
> >
> > /home/john/java/incubator-joshua/src/main/java/org/
> apache/joshua/decoder/ff/lm/KenLM.java:[21,19]
> > error: package javafx.scene does not exist
> >
> > And the following link seems to confirm that javafx is not a part of
> > openjdk.
> > https://ask.fedoraproject.org/en/question/93407/there-is-no-
> javafx-packages-in-openjdk-180-fedora-gnulinux/
> >
> > -John
>
>


Re: openjdk 8 incompatibility

2016-10-25 Thread Matt Post
Hmm, inclusion of that line looks like a mistake. I've seen Eclipse add random 
imports because it sorts the suggestions in a very unhelpful manner. I just 
removed the line and pushed, try again.


> On Oct 25, 2016, at 1:11 PM, John Hewitt  wrote:
> 
> Hi all,
> 
> Has anyone been able to compile Joshua with openjdk? I get this message:
> 
> /home/john/java/incubator-joshua/src/main/java/org/apache/joshua/decoder/ff/lm/KenLM.java:[21,19]
> error: package javafx.scene does not exist
> 
> And the following link seems to confirm that javafx is not a part of
> openjdk.
> https://ask.fedoraproject.org/en/question/93407/there-is-no-javafx-packages-in-openjdk-180-fedora-gnulinux/
> 
> -John



openjdk 8 incompatibility

2016-10-25 Thread John Hewitt
Hi all,

Has anyone been able to compile Joshua with openjdk? I get this message:

/home/john/java/incubator-joshua/src/main/java/org/apache/joshua/decoder/ff/lm/KenLM.java:[21,19]
error: package javafx.scene does not exist

And the following link seems to confirm that javafx is not a part of
openjdk.
https://ask.fedoraproject.org/en/question/93407/there-is-no-javafx-packages-in-openjdk-180-fedora-gnulinux/

-John


[jira] [Commented] (JOSHUA-288) Port fast_align to java

2016-10-25 Thread John Hewitt (JIRA)

[ 
https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605667#comment-15605667
 ] 

John Hewitt commented on JOSHUA-288:


Replaced gnu-getopt (not Apache licence-compliant) with commons-cli

> Port fast_align to java
> ---
>
> Key: JOSHUA-288
> URL: https://issues.apache.org/jira/browse/JOSHUA-288
> Project: Joshua
>  Issue Type: New Feature
>Reporter: Matt Post
>Assignee: John Hewitt
>Priority: Minor
> Fix For: 6.2
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> It would be great to have a Java port of fast_align, so that we don't have to 
> worry about compiling it, and could distribute it via Maven.
> https://github.com/clab/fast_align
> The port we'll use, in progress, is hosted at:
> https://github.com/john-hewitt/fast_align.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JOSHUA-288) Port fast_align to java

2016-10-25 Thread John Hewitt (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewitt updated JOSHUA-288:
---
Description: 
It would be great to have a Java port of fast_align, so that we don't have to 
worry about compiling it, and could distribute it via Maven.

https://github.com/clab/fast_align

The port we'll use, in progress, is hosted at:

https://github.com/john-hewitt/fast_align.java

  was:
It would be great to have a Java port of fast_align, so that we don't have to 
worry about compiling it, and could distribute it via Maven.

https://github.com/clab/fast_align


> Port fast_align to java
> ---
>
> Key: JOSHUA-288
> URL: https://issues.apache.org/jira/browse/JOSHUA-288
> Project: Joshua
>  Issue Type: New Feature
>Reporter: Matt Post
>Assignee: John Hewitt
>Priority: Minor
> Fix For: 6.2
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> It would be great to have a Java port of fast_align, so that we don't have to 
> worry about compiling it, and could distribute it via Maven.
> https://github.com/clab/fast_align
> The port we'll use, in progress, is hosted at:
> https://github.com/john-hewitt/fast_align.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Joshua Model Input Format(s) and LM Loading

2016-10-25 Thread Matt Post
Hi Lewis,

Joshua supports two language model representation packages: KenLM [0] and 
BerkeleyLM [1]. These were both developed at about the same time, and 
represented huge gains in doing this task efficiently, over what had previously 
been the standard approach (SRILM). Ken Heafield (who has contributed a lot to 
Joshua) went on to contribute a lot of other improvements to language model 
representation, decoder integration, and also the actual construction of 
language models and their efficient interpolation. His goal for a while was to 
make SRILM completely unnecessary, and I think he succeeded.

BerkeleyLM was more of a one-off project. It is slower than KenLM and hasn't 
been touched in years. If you want to understand, your efforts are probably 
best spent looking into KenLM papers. But it's also worth noting that Ken is a 
crack C++ programmer who has spent years hacking away on these problems, and 
your chances of finding any further efficiencies there are probably quite 
limited unless you have a lot of background in the area. But even if you did, I 
would recommend you not spend your time that way — I basically consider the LM 
representation problem to have been solved by KenLM. That's not to say that 
there are some improvements to be had on the Joshua / JNI bridge, but even 
there, there are probably better things to do.

matt

[0] KenLM: Faster and Smaller Language Model Queries
http://www.kheafield.com/professional/avenue/kenlm.pdf

[1] Faster and Smaller N-Gram Language Models
http://nlp.cs.berkeley.edu/pubs/Pauls-Klein_2011_LM_paper.pdf




> On Oct 24, 2016, at 10:21 PM, lewis john mcgibbney  wrote:
> 
> Hi Folks,
> I have set out with the aim of learning more about the underlying Joshua
> language model serialization(s) e.g. statistical n-gram model in ARPA
> format [0] as well as trying to JProfile a Joshua server running to better
> understand how objects are used and what runtime memory usage looks like
> for typical translation tasks.
> This has lead me to think about the fundamental performance issues we
> experience when loading large LM's into memory in the first place... and
> the efficiency of searching models regardless of whether they are cached in
> memory (e.g. Joshua server), or not.
> Does anyone have detailed technical/journal documentation which would set
> me in the right direction to address the above area?
> Thanks
> Lewis
> 
> [0]
> http://cmusphinx.sourceforge.net/wiki/sphinx4:standardgrammarformats#statistical_n-gram_models_in_the_arpa_format
> 
> -- 
> http://home.apache.org/~lewismc/
> @hectorMcSpector
> http://www.linkedin.com/in/lmcgibbney



Re: language pack #1

2016-10-25 Thread Matt Post
Hi Lewis,

I have parameters to set the default amount of memory when building the 
language pack. The comment therein is just boilerplate that didn't get 
parameterized. I'll add that to the script. In general, memory usage can be 
heuristically set to the size of the model files that are loaded (the grammar 
and the language model).

Great to hear that things are working well for you!

matt


> On Oct 24, 2016, at 11:48 PM, lewis john mcgibbney  wrote:
> 
> Hi Matt,
> I got around to testing out the language pack you posted and have a few
> suggestions.
> 
>   -  The Joshua bash script states in a number of places that ..."# The
>   default amount of memory is 4gb". This is not true as it is set to a
>   different (higher) number by default.
>   - When starting the Joshua server, I monitored memory usage (JProfiler)
>   and it seems to somewhat stabilize and linger at around 5 1/2 GB. Is this
>   normal based on the sie of the Berkeley LM?
>   - Translations are working pretty damn well. I've run a large amount of
>   current Spanish text relating to current news stories and the output looks
>   pretty comprehensive.
> 
> It would be great if we could update the Joshua Homebrew recipe with this
> language pack and also link to the pack from the Wiki.
> 
> Lewis
> 
> On Mon, Oct 10, 2016 at 2:48 AM, <
> dev-digest-h...@joshua.incubator.apache.org> wrote:
> 
>> 
>> From: Matt Post 
>> To: dev@joshua.incubator.apache.org
>> Cc:
>> Date: Fri, 7 Oct 2016 11:51:41 -0400
>> Subject: Re: language pack #1
>> That would be awesome.
>> 
>>