[jira] [Updated] (JOSHUA-316) run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a bytes-like object is required, not 'str'
[ https://issues.apache.org/jira/browse/JOSHUA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated JOSHUA-316: Fix Version/s: (was: 6.2) 6.1 > run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a > bytes-like object is required, not 'str' > - > > Key: JOSHUA-316 > URL: https://issues.apache.org/jira/browse/JOSHUA-316 > Project: Joshua > Issue Type: Bug > Components: bundler >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Priority: Critical > Fix For: 6.1 > > > {code} > [glue-tune] rebuilding... > > dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/create_glue_grammar.sh > /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed > > /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > took 1 seconds (1s) > [tune-bundle] rebuilding... > > dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/tune/model/run-joshua.sh > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force > --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir > /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > /usr/local/joshua_resources/russian_experiments/exp2/tune/model > --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" > -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 > tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function > "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm > /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed --tm > /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > JOB FAILED (return code 1) > * Running the copy-config.pl script with the command: > /usr/local/incubator-joshua/scripts/copy-config.pl -top-n 300 -output-format > "%i ||| %s ||| %f ||| %c" -mark-oovs false -search cky -weights "lm_0 1 > tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " > -feature-function "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue > Traceback (most recent call last): > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 748, in main > operations = collect_operations(opts) > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 637, in collect_operations > opts.copy_config_options > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 202, in filter_through_copy_config_script > result, err = p.communicate(config_text) > File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1072, > in communicate > stdout, stderr = self._communicate(input, endtime, timeout) > File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1700, > in _communicate > input_view = memoryview(self._input) > TypeError: memoryview: a bytes-like object is required, not 'str' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 760, in > main(sys.argv) > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 751, in main > error_quit(e.message) > AttributeError: 'TypeError' object has no attribute 'message' > * WARNING: no key 'outputformat' found in config file (appending to end) > * WARNING: no key 'search' found in config file (appending to end) > * WARNING: no key 'topn' found in config file (appending to end) > * WARNING: no key 'markoovs' found in config file (appending to end) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-316) run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a bytes-like object is required, not 'str'
[ https://issues.apache.org/jira/browse/JOSHUA-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607165#comment-15607165 ] ASF GitHub Bot commented on JOSHUA-316: --- Github user lewismc commented on the issue: https://github.com/apache/incubator-joshua/pull/73 BTW I am using Python 3.5 here. > run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a > bytes-like object is required, not 'str' > - > > Key: JOSHUA-316 > URL: https://issues.apache.org/jira/browse/JOSHUA-316 > Project: Joshua > Issue Type: Bug > Components: bundler >Affects Versions: 6.0.5 >Reporter: Lewis John McGibbney >Priority: Critical > Fix For: 6.2 > > > {code} > [glue-tune] rebuilding... > > dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/create_glue_grammar.sh > /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed > > /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > took 1 seconds (1s) > [tune-bundle] rebuilding... > > dep=/usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/grammar.packed/slice_0.source > [CHANGED] > > dep=/usr/local/joshua_resources/russian_experiments/exp2/tune/model/run-joshua.sh > [NOT FOUND] > cmd=/usr/local/incubator-joshua/scripts/support/run_bundler.py --force > --symlink --absolute --verbose -T /usr/local/hadoop-2.5.2/hadoop_tmp_dir > /usr/local/incubator-joshua/scripts/training/templates/tune/joshua.config > /usr/local/joshua_resources/russian_experiments/exp2/tune/model > --copy-config-options '-top-n 300 -output-format "%i ||| %s ||| %f ||| %c" > -mark-oovs false -search cky -weights "lm_0 1 tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 > tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " -feature-function > "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue' --pack-tm > /usr/local/joshua_resources/russian_experiments/exp2/grammar.packed --tm > /usr/local/joshua_resources/russian_experiments/exp2/data/tune/grammar.glue > JOB FAILED (return code 1) > * Running the copy-config.pl script with the command: > /usr/local/incubator-joshua/scripts/copy-config.pl -top-n 300 -output-format > "%i ||| %s ||| %f ||| %c" -mark-oovs false -search cky -weights "lm_0 1 > tm_pt_0 1 tm_pt_1 1 tm_pt_2 1 tm_pt_3 1 tm_pt_4 1 tm_pt_5 1 tm_glue_0 1 " > -feature-function "StateMinimizingLanguageModel -lm_order 5 -lm_file > /usr/local/joshua_resources/russian_experiments/exp2/lm.kenlm" -tm0/type > hiero -tm0/owner pt -tm0/maxspan 20 -tm1/owner glue > Traceback (most recent call last): > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 748, in main > operations = collect_operations(opts) > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 637, in collect_operations > opts.copy_config_options > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 202, in filter_through_copy_config_script > result, err = p.communicate(config_text) > File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1072, > in communicate > stdout, stderr = self._communicate(input, endtime, timeout) > File "/Users/lmcgibbn/miniconda3/lib/python3.5/subprocess.py", line 1700, > in _communicate > input_view = memoryview(self._input) > TypeError: memoryview: a bytes-like object is required, not 'str' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 760, in > main(sys.argv) > File "/usr/local/incubator-joshua/scripts/support/run_bundler.py", line > 751, in main > error_quit(e.message) > AttributeError: 'TypeError' object has no attribute 'message' > * WARNING: no key 'outputformat' found in config file (appending to end) > * WARNING: no key 'search' found in config file (appending to end) > * WARNING: no key 'topn' found in config file (appending to end) > * WARNING: no key 'markoovs' found in config file (appending to end) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-joshua issue #73: JOSHUA-316 run_bundler.py returning JOB FAILED (...
Github user lewismc commented on the issue: https://github.com/apache/incubator-joshua/pull/73 BTW I am using Python 3.5 here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #73: JOSHUA-316 run_bundler.py returning JOB F...
GitHub user lewismc opened a pull request: https://github.com/apache/incubator-joshua/pull/73 JOSHUA-316 run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a bytes-like object is required, not 'str' This issue addresses https://issues.apache.org/jira/browse/JOSHUA-316 This issue was a complete PITA. The non ascii character in run_bundler.py README template was a frigging big PITA. Done now folks :) You can merge this pull request into a Git repository by running: $ git pull https://github.com/lewismc/incubator-joshua JOSHUA-316 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/73.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #73 commit d137f2970d1eace1dcec576d287712cd5b88 Author: Lewis John McGibbneyDate: 2016-10-26T02:31:10Z JOSHUA-316 run_bundler.py returning JOB FAILED (return code 1) TypeError: memoryview: a bytes-like object is required, not 'str' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: openjdk 8 incompatibility
Checks out. Thanks, Matt. -John On Tue, Oct 25, 2016 at 3:56 PM, Matt Postwrote: > Hmm, inclusion of that line looks like a mistake. I've seen Eclipse add > random imports because it sorts the suggestions in a very unhelpful manner. > I just removed the line and pushed, try again. > > > > On Oct 25, 2016, at 1:11 PM, John Hewitt wrote: > > > > Hi all, > > > > Has anyone been able to compile Joshua with openjdk? I get this message: > > > > /home/john/java/incubator-joshua/src/main/java/org/ > apache/joshua/decoder/ff/lm/KenLM.java:[21,19] > > error: package javafx.scene does not exist > > > > And the following link seems to confirm that javafx is not a part of > > openjdk. > > https://ask.fedoraproject.org/en/question/93407/there-is-no- > javafx-packages-in-openjdk-180-fedora-gnulinux/ > > > > -John > >
Re: openjdk 8 incompatibility
Hmm, inclusion of that line looks like a mistake. I've seen Eclipse add random imports because it sorts the suggestions in a very unhelpful manner. I just removed the line and pushed, try again. > On Oct 25, 2016, at 1:11 PM, John Hewittwrote: > > Hi all, > > Has anyone been able to compile Joshua with openjdk? I get this message: > > /home/john/java/incubator-joshua/src/main/java/org/apache/joshua/decoder/ff/lm/KenLM.java:[21,19] > error: package javafx.scene does not exist > > And the following link seems to confirm that javafx is not a part of > openjdk. > https://ask.fedoraproject.org/en/question/93407/there-is-no-javafx-packages-in-openjdk-180-fedora-gnulinux/ > > -John
openjdk 8 incompatibility
Hi all, Has anyone been able to compile Joshua with openjdk? I get this message: /home/john/java/incubator-joshua/src/main/java/org/apache/joshua/decoder/ff/lm/KenLM.java:[21,19] error: package javafx.scene does not exist And the following link seems to confirm that javafx is not a part of openjdk. https://ask.fedoraproject.org/en/question/93407/there-is-no-javafx-packages-in-openjdk-180-fedora-gnulinux/ -John
[jira] [Commented] (JOSHUA-288) Port fast_align to java
[ https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15605667#comment-15605667 ] John Hewitt commented on JOSHUA-288: Replaced gnu-getopt (not Apache licence-compliant) with commons-cli > Port fast_align to java > --- > > Key: JOSHUA-288 > URL: https://issues.apache.org/jira/browse/JOSHUA-288 > Project: Joshua > Issue Type: New Feature >Reporter: Matt Post >Assignee: John Hewitt >Priority: Minor > Fix For: 6.2 > > Original Estimate: 168h > Remaining Estimate: 168h > > It would be great to have a Java port of fast_align, so that we don't have to > worry about compiling it, and could distribute it via Maven. > https://github.com/clab/fast_align > The port we'll use, in progress, is hosted at: > https://github.com/john-hewitt/fast_align.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-288) Port fast_align to java
[ https://issues.apache.org/jira/browse/JOSHUA-288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Hewitt updated JOSHUA-288: --- Description: It would be great to have a Java port of fast_align, so that we don't have to worry about compiling it, and could distribute it via Maven. https://github.com/clab/fast_align The port we'll use, in progress, is hosted at: https://github.com/john-hewitt/fast_align.java was: It would be great to have a Java port of fast_align, so that we don't have to worry about compiling it, and could distribute it via Maven. https://github.com/clab/fast_align > Port fast_align to java > --- > > Key: JOSHUA-288 > URL: https://issues.apache.org/jira/browse/JOSHUA-288 > Project: Joshua > Issue Type: New Feature >Reporter: Matt Post >Assignee: John Hewitt >Priority: Minor > Fix For: 6.2 > > Original Estimate: 168h > Remaining Estimate: 168h > > It would be great to have a Java port of fast_align, so that we don't have to > worry about compiling it, and could distribute it via Maven. > https://github.com/clab/fast_align > The port we'll use, in progress, is hosted at: > https://github.com/john-hewitt/fast_align.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Joshua Model Input Format(s) and LM Loading
Hi Lewis, Joshua supports two language model representation packages: KenLM [0] and BerkeleyLM [1]. These were both developed at about the same time, and represented huge gains in doing this task efficiently, over what had previously been the standard approach (SRILM). Ken Heafield (who has contributed a lot to Joshua) went on to contribute a lot of other improvements to language model representation, decoder integration, and also the actual construction of language models and their efficient interpolation. His goal for a while was to make SRILM completely unnecessary, and I think he succeeded. BerkeleyLM was more of a one-off project. It is slower than KenLM and hasn't been touched in years. If you want to understand, your efforts are probably best spent looking into KenLM papers. But it's also worth noting that Ken is a crack C++ programmer who has spent years hacking away on these problems, and your chances of finding any further efficiencies there are probably quite limited unless you have a lot of background in the area. But even if you did, I would recommend you not spend your time that way — I basically consider the LM representation problem to have been solved by KenLM. That's not to say that there are some improvements to be had on the Joshua / JNI bridge, but even there, there are probably better things to do. matt [0] KenLM: Faster and Smaller Language Model Queries http://www.kheafield.com/professional/avenue/kenlm.pdf [1] Faster and Smaller N-Gram Language Models http://nlp.cs.berkeley.edu/pubs/Pauls-Klein_2011_LM_paper.pdf > On Oct 24, 2016, at 10:21 PM, lewis john mcgibbneywrote: > > Hi Folks, > I have set out with the aim of learning more about the underlying Joshua > language model serialization(s) e.g. statistical n-gram model in ARPA > format [0] as well as trying to JProfile a Joshua server running to better > understand how objects are used and what runtime memory usage looks like > for typical translation tasks. > This has lead me to think about the fundamental performance issues we > experience when loading large LM's into memory in the first place... and > the efficiency of searching models regardless of whether they are cached in > memory (e.g. Joshua server), or not. > Does anyone have detailed technical/journal documentation which would set > me in the right direction to address the above area? > Thanks > Lewis > > [0] > http://cmusphinx.sourceforge.net/wiki/sphinx4:standardgrammarformats#statistical_n-gram_models_in_the_arpa_format > > -- > http://home.apache.org/~lewismc/ > @hectorMcSpector > http://www.linkedin.com/in/lmcgibbney
Re: language pack #1
Hi Lewis, I have parameters to set the default amount of memory when building the language pack. The comment therein is just boilerplate that didn't get parameterized. I'll add that to the script. In general, memory usage can be heuristically set to the size of the model files that are loaded (the grammar and the language model). Great to hear that things are working well for you! matt > On Oct 24, 2016, at 11:48 PM, lewis john mcgibbneywrote: > > Hi Matt, > I got around to testing out the language pack you posted and have a few > suggestions. > > - The Joshua bash script states in a number of places that ..."# The > default amount of memory is 4gb". This is not true as it is set to a > different (higher) number by default. > - When starting the Joshua server, I monitored memory usage (JProfiler) > and it seems to somewhat stabilize and linger at around 5 1/2 GB. Is this > normal based on the sie of the Berkeley LM? > - Translations are working pretty damn well. I've run a large amount of > current Spanish text relating to current news stories and the output looks > pretty comprehensive. > > It would be great if we could update the Joshua Homebrew recipe with this > language pack and also link to the pack from the Wiki. > > Lewis > > On Mon, Oct 10, 2016 at 2:48 AM, < > dev-digest-h...@joshua.incubator.apache.org> wrote: > >> >> From: Matt Post >> To: dev@joshua.incubator.apache.org >> Cc: >> Date: Fri, 7 Oct 2016 11:51:41 -0400 >> Subject: Re: language pack #1 >> That would be awesome. >> >>