[GitHub] incubator-joshua issue #42: Fix various issues related to resources, warning...
Github user mjpost commented on the issue: https://github.com/apache/incubator-joshua/pull/42 This actually looks really great, but I'm worried about the complexity of merging this with the 7, which moves every file and changed a bunch with refactoring the feature function interface. How much work went into this, and how hard would it be to repeat it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #42: Fix various issues related to resources, ...
GitHub user maxthomas opened a pull request: https://github.com/apache/incubator-joshua/pull/42 Fix various issues related to resources, warnings, and copy-pasted code. I didn't make an issue for this, because it hits on a lot of topics that would be tough to summarize in one issue. But, let me know if you'd like me to do so. The gist of this commit is code cleanup. Resource leaks are closed, warnings are resolved, and the beginning of refactoring the 10K lines of copy/pasted code throughout MERT, MIRA, etc. was started. I plan on working on this further on the 7 branch. But, because many resources weren't being closed explicitly (I found the finalize override in LineReader __extremely__ questionable), I figured I'd submit this on master right away. Feel free to ignore/postpone/whatever You can merge this pull request into a Git repository by running: $ git pull https://github.com/maxthomas/incubator-joshua fix-warnings-resources Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/42.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #42 commit 980f735853a9403823501a66ff857e0c5763ae52 Author: max thomasDate: 2016-08-18T19:04:52Z Fix a number of issues: - Reader now implements autocloseable - Close various leaks from LineReader - LineReader no longer implements custom finalize(). Resources should be explicitly closed when no longer needed. The compiler helps with this. - Start refactoring copy/pasted code into a new type: ExistingUTF8EncodedTextFile. There is so much use of text files that this should really have its own type. - Fix warnings about unused fields, unused methods - Delete some old/legacy/unused classes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (JOSHUA-291) Improve code quality via static analysis
[ https://issues.apache.org/jira/browse/JOSHUA-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426843#comment-15426843 ] Max Thomas commented on JOSHUA-291: --- Nice. I'll probably hold off a bit - I'm just getting acquainted with the code base, and I think there are more pressing issues, so I'll probably revisit this after 7 gets merged in. > Improve code quality via static analysis > > > Key: JOSHUA-291 > URL: https://issues.apache.org/jira/browse/JOSHUA-291 > Project: Joshua > Issue Type: Improvement > Components: core >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > > We can improve code quality / readability leveraging code analysis from tools > like FindBugs and others integrated in IDEs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-221) ArrayIndexOutOfBoundsException when passing arguments to JoshuaDecoder.main
[ https://issues.apache.org/jira/browse/JOSHUA-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426484#comment-15426484 ] Matt Post commented on JOSHUA-221: -- That sounds like a nice general way to solve this problem. > ArrayIndexOutOfBoundsException when passing arguments to JoshuaDecoder.main > --- > > Key: JOSHUA-221 > URL: https://issues.apache.org/jira/browse/JOSHUA-221 > Project: Joshua > Issue Type: Bug >Reporter: Lewis John McGibbney > Fix For: 6.2 > > > {code} > lmcgibbn@LMC-032857 /usr/local/joshua(master) $ java -jar class/joshua.jar > -version > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2 > at joshua.decoder.ArgsParser.(ArgsParser.java:43) > at joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:30) > lmcgibbn@LMC-032857 /usr/local/joshua(master) $ java -jar class/joshua.jar > -version -v > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2 > at joshua.decoder.ArgsParser.(ArgsParser.java:43) > at joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:30) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-265) Refactor key interfaces and core code for a future release.
[ https://issues.apache.org/jira/browse/JOSHUA-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426428#comment-15426428 ] ASF GitHub Bot commented on JOSHUA-265: --- Github user mjpost commented on the issue: https://github.com/apache/incubator-joshua/pull/41 This looks great. I put it on a new branch off 7 so that we can Travis-CI it (on the idea that 7 is its own master). Once we get all the regression tests removed and merged in from master, I'll merge this back into 7. Tagging this on JOSHUA-265 and JOSHUA-273. > Refactor key interfaces and core code for a future release. > > > Key: JOSHUA-265 > URL: https://issues.apache.org/jira/browse/JOSHUA-265 > Project: Joshua > Issue Type: Improvement >Reporter: Kellen Sunderland >Priority: Minor > Fix For: 6.2 > > > We've discussed making some modifications to the key interfaces. This ticket > can focus on making large changes to the codebase for a future release. This > work will likely take some time and some collaboration. I'd suggest some the > code for this be a separate release branch. > Some issues we can work on: > * I'd propose we conform to the SOLID principles for our major interfaces. > https://en.wikipedia.org/wiki/SOLID_(object-oriented_design) . > * We can look at Sparse / Dense feature vectors and how to handle them > naturally in Joshua. > * Refactor objects that may now be used more broadly than was originally > intended (for example Vocabulary class). > * We should have a general discussion around what parts of the codebase are > responsible for what functions. We should clearly define what logic should > be a part of the Grammar versus the Feature Functions for example, and make > sure logic doesn't leak from one of these objects to the others. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-joshua issue #41: JOSHUA-265 Major refactoring of features/rules/g...
Github user mjpost commented on the issue: https://github.com/apache/incubator-joshua/pull/41 This looks great. I put it on a new branch off 7 so that we can Travis-CI it (on the idea that 7 is its own master). Once we get all the regression tests removed and merged in from master, I'll merge this back into 7. Tagging this on JOSHUA-265 and JOSHUA-273. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (JOSHUA-292) Add travis CI build status badge to README.md
[ https://issues.apache.org/jira/browse/JOSHUA-292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Post resolved JOSHUA-292. -- Resolution: Fixed > Add travis CI build status badge to README.md > - > > Key: JOSHUA-292 > URL: https://issues.apache.org/jira/browse/JOSHUA-292 > Project: Joshua > Issue Type: Improvement >Reporter: Max Thomas >Priority: Trivial > Original Estimate: 5m > Remaining Estimate: 5m > > Would be nice to see the status of the latest master branch build from the > README - many projects do this already. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-291) Improve code quality via static analysis
[ https://issues.apache.org/jira/browse/JOSHUA-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426112#comment-15426112 ] Hudson commented on JOSHUA-291: --- SUCCESS: Integrated in Jenkins build joshua_master #92 (See [https://builds.apache.org/job/joshua_master/92/]) JOSHUA-291 - static analysis based code improvements on adagrad package (tommaso: rev 233818d6ce526f3a77b33110d67314b723371743) * (edit) src/main/java/org/apache/joshua/adagrad/Optimizer.java * (edit) src/main/java/org/apache/joshua/adagrad/AdaGradCore.java JOSHUA-291 - static analysis based code improvements on corpus package (tommaso: rev 356b173d4fa3f0efa4ea53809e46c9e04c8c1ca5) * (edit) src/main/java/org/apache/joshua/corpus/SymbolTable.java * (edit) src/main/java/org/apache/joshua/corpus/Span.java * (edit) src/main/java/org/apache/joshua/corpus/Phrase.java * (edit) src/main/java/org/apache/joshua/corpus/ContiguousPhrase.java * (edit) src/main/java/org/apache/joshua/corpus/Vocabulary.java * (edit) src/main/java/org/apache/joshua/corpus/syntax/ArraySyntaxTree.java * (edit) src/main/java/org/apache/joshua/corpus/syntax/SyntaxTree.java * (edit) src/main/java/org/apache/joshua/corpus/BasicPhrase.java JOSHUA-291 - static analysis based code improvements on decoder package (tommaso: rev 029cbbcc156d4939ebe503cc58962ab24728f653) * (edit) src/main/java/org/apache/joshua/decoder/ff/SourceDependentFF.java * (edit) src/main/java/org/apache/joshua/decoder/hypergraph/OutputStringExtractor.java * (edit) src/main/java/org/apache/joshua/decoder/ff/lm/bloomfilter_lm/BloomFilterLanguageModel.java * (edit) src/main/java/org/apache/joshua/decoder/phrase/Coverage.java * (edit) src/main/java/org/apache/joshua/decoder/ff/tm/hash_based/ExtensionIterator.java * (edit) src/main/java/org/apache/joshua/decoder/ff/tm/CreateGlueGrammar.java * (edit) src/main/java/org/apache/joshua/decoder/ff/tm/GrammarReader.java * (edit) src/main/java/org/apache/joshua/decoder/Decoder.java * (edit) src/main/java/org/apache/joshua/decoder/ff/tm/format/MosesFormatReader.java * (edit) src/main/java/org/apache/joshua/decoder/phrase/Hypothesis.java * (edit) src/main/java/org/apache/joshua/decoder/phrase/PhraseChart.java * (edit) src/main/java/org/apache/joshua/decoder/ArgsParser.java * (edit) src/main/java/org/apache/joshua/decoder/BLEU.java * (edit) src/main/java/org/apache/joshua/decoder/ff/lm/bloomfilter_lm/BloomFilter.java * (edit) src/main/java/org/apache/joshua/decoder/chart_parser/ComputeNodeResult.java * (edit) src/main/java/org/apache/joshua/decoder/ff/fragmentlm/Trees.java * (edit) src/main/java/org/apache/joshua/decoder/phrase/Header.java * (edit) src/main/java/org/apache/joshua/decoder/ff/fragmentlm/FragmentLMFF.java * (edit) src/main/java/org/apache/joshua/decoder/Translation.java * (edit) src/main/java/org/apache/joshua/decoder/ff/LexicalFeatures.java * (edit) src/main/java/org/apache/joshua/decoder/StructuredTranslation.java * (edit) src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java * (edit) src/main/java/org/apache/joshua/decoder/hypergraph/ViterbiExtractor.java * (edit) src/main/java/org/apache/joshua/decoder/chart_parser/Chart.java * (edit) src/main/java/org/apache/joshua/decoder/phrase/PhraseTable.java * (edit) src/main/java/org/apache/joshua/decoder/ff/TargetBigram.java * (edit) src/main/java/org/apache/joshua/decoder/phrase/Candidate.java * (edit) src/main/java/org/apache/joshua/decoder/phrase/Stacks.java * (edit) src/main/java/org/apache/joshua/decoder/ff/SourcePathFF.java * (edit) src/main/java/org/apache/joshua/decoder/ff/lm/buildin_lm/TrieLM.java * (edit) src/main/java/org/apache/joshua/decoder/ff/FeatureVector.java * (edit) src/main/java/org/apache/joshua/decoder/ff/LabelCombinationFF.java * (edit) src/main/java/org/apache/joshua/decoder/ff/RuleShape.java * (edit) src/main/java/org/apache/joshua/decoder/ff/tm/OwnerMap.java * (edit) src/main/java/org/apache/joshua/decoder/ff/tm/Rule.java * (edit) src/main/java/org/apache/joshua/decoder/ff/lm/KenLM.java * (edit) src/main/java/org/apache/joshua/decoder/ff/PhraseModel.java * (edit) src/main/java/org/apache/joshua/decoder/ff/phrase/Distortion.java * (edit) src/main/java/org/apache/joshua/decoder/segment_file/Sentence.java * (edit) src/main/java/org/apache/joshua/decoder/NbestMinRiskReranker.java * (edit) src/main/java/org/apache/joshua/decoder/ff/fragmentlm/Tree.java * (edit) src/main/java/org/apache/joshua/decoder/ff/tm/AbstractGrammar.java * (edit) src/main/java/org/apache/joshua/decoder/ff/tm/hash_based/MemoryBasedBatchGrammar.java * (edit) src/main/java/org/apache/joshua/decoder/ff/tm/SentenceFilteredGrammar.java * (edit) src/main/java/org/apache/joshua/decoder/DecoderThread.java * (edit) src/main/java/org/apache/joshua/decoder/ff/lm/LanguageModelFF.java * (edit) src/main/java/org/apache/joshua/decoder/io/JSONMessage.java * (edit) src/main/java/org/apache/joshua/decoder/chart_parser/Cell.java * (edit)
[jira] [Commented] (JOSHUA-291) Improve code quality via static analysis
[ https://issues.apache.org/jira/browse/JOSHUA-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426067#comment-15426067 ] Tommaso Teofili commented on JOSHUA-291: I've committed some per package clean up I had been doing weeks ago, I'll now let [~maxthomas] go ahead with his fixes. > Improve code quality via static analysis > > > Key: JOSHUA-291 > URL: https://issues.apache.org/jira/browse/JOSHUA-291 > Project: Joshua > Issue Type: Improvement > Components: core >Reporter: Tommaso Teofili >Assignee: Tommaso Teofili > > We can improve code quality / readability leveraging code analysis from tools > like FindBugs and others integrated in IDEs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-301) Add findbugs plugin to maven build
[ https://issues.apache.org/jira/browse/JOSHUA-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili updated JOSHUA-301: --- Fix Version/s: 6.1 > Add findbugs plugin to maven build > -- > > Key: JOSHUA-301 > URL: https://issues.apache.org/jira/browse/JOSHUA-301 > Project: Joshua > Issue Type: Sub-task > Components: build >Reporter: Max Thomas >Priority: Minor > Fix For: 6.1 > > > Add the findbugs maven plugin to the build. This will allow the build to > report on issues found via the popular Findbugs static analysis tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-95) Vocabulary locking
[ https://issues.apache.org/jira/browse/JOSHUA-95?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426055#comment-15426055 ] Kellen Sunderland commented on JOSHUA-95: - Yes, all the contention on Vocabulary has been removed. > Vocabulary locking > -- > > Key: JOSHUA-95 > URL: https://issues.apache.org/jira/browse/JOSHUA-95 > Project: Joshua > Issue Type: Bug >Reporter: Matt Post >Assignee: Juri Ganitkevitch > Fix For: 6.2 > > > Vocabulary::id() is still synchronized and a potential point of contention. > It would be nice to resolve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JOSHUA-295) Revamp dependency organization in Joshua
[ https://issues.apache.org/jira/browse/JOSHUA-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kellen Sunderland updated JOSHUA-295: - Affects Version/s: 6.2 > Revamp dependency organization in Joshua > > > Key: JOSHUA-295 > URL: https://issues.apache.org/jira/browse/JOSHUA-295 > Project: Joshua > Issue Type: Improvement >Affects Versions: 6.2 >Reporter: Kellen Sunderland > > We would like to separate dependencies in Joshua by create a multi-module > maven project. This will allow us to decouple our codebase and make it more > modular. This means consumers of Joshua who are only interested in a core > library do not have to pull in dependencies for things like Http servers or > database clients. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JOSHUA-303) Simplify feature handling code within Joshua
Kellen Sunderland created JOSHUA-303: Summary: Simplify feature handling code within Joshua Key: JOSHUA-303 URL: https://issues.apache.org/jira/browse/JOSHUA-303 Project: Joshua Issue Type: Improvement Affects Versions: 6.2, 7 Reporter: Kellen Sunderland There's currently a lot of code branching and special cases necessary in Joshua to properly handle sparse versus dense features. We could refactor this code to remove the distinction which would simplify many classes (FeatureVector, Rule, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JOSHUA-221) ArrayIndexOutOfBoundsException when passing arguments to JoshuaDecoder.main
[ https://issues.apache.org/jira/browse/JOSHUA-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426045#comment-15426045 ] Kellen Sunderland commented on JOSHUA-221: -- Maybe we could resolve this by using args4j for the JoshuaDecoder.main? > ArrayIndexOutOfBoundsException when passing arguments to JoshuaDecoder.main > --- > > Key: JOSHUA-221 > URL: https://issues.apache.org/jira/browse/JOSHUA-221 > Project: Joshua > Issue Type: Bug >Reporter: Lewis John McGibbney > Fix For: 6.2 > > > {code} > lmcgibbn@LMC-032857 /usr/local/joshua(master) $ java -jar class/joshua.jar > -version > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2 > at joshua.decoder.ArgsParser.(ArgsParser.java:43) > at joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:30) > lmcgibbn@LMC-032857 /usr/local/joshua(master) $ java -jar class/joshua.jar > -version -v > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2 > at joshua.decoder.ArgsParser.(ArgsParser.java:43) > at joshua.decoder.JoshuaDecoder.main(JoshuaDecoder.java:30) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-joshua pull request #41: Major refactoring of features/rules/gramm...
GitHub user fhieber opened a pull request: https://github.com/apache/incubator-joshua/pull/41 Major refactoring of features/rules/grammars in Joshua Major refactoring of core decoder components (Rule.java, FeatureVector.java and grammars). The core idea of this change is to simplify feature handling inside Joshua. Please note that this change is NOT backwards compatible. The following changes were made: - No distinction between sparse and dense features inside the decoder anymore. Each feature stored at the rule is 'owned' by the grammar that contains the rule. An 'owned' feature simply means that its name is prepended with the owner string: 0=0.2 becomes _0=0.2. This applies to both dense features (features that occur at every rule), as well as sparse features. Please note that the old prefix 'tm_' is no longer used. - Having only one type of feature, a revised version of FeatureVector.java was built that is greatly simplified. It is basically a HashMap of FeatureId (typed as ints) to feature values. FeatureIds are created/hashed by the new global mapping FeatureMap.java, which maintains a bidirectional mapping between feature ids and feature names. This also allowed getting rid of storing feature names in the vocabulary. - The simplified FeatureVector cause removal of all 'reportDenseFeatures'/'getNumDenseFeatures' method in the decoder and the grammar interface. - The tradition but very obscure way of flipping the sign of dense features but not sparse features was removed. The feature value in the decoder is now just the value as you see it stored at the rule. - The Rule class was changed to adhere to object-oriented principles. It now has only one constructor that requires all of its dependencies and these can not be changed later. This forces Rule creators to finalize the dependencies (deciding on an owner of the rule and the hashing of the feature vector). - Also the unused concept of the precomputableCost in a rule was removed. Rules still 'cache' their estimated cost. - The various Grammar and MemoryBasedBatchGrammar constructors were unified and a lot of old obscure code was removed. - Due to the change above, the PhraseModel feature function that fires feature values for features stored at rules is greatly simplified. - As featureVectors at Rules are final and have to have an owner, feature sharing across multiple grammars would need to be handled by a separate feature function implementation which is transparent. This commit also updates all existing (and enabled) Unit tests which also pass. Existing regression tests do NOT work in this commit since many of the grammars are packed and would need to be re-packed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhieber/incubator-joshua 7_features Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/41.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #41 commit 20afddf85263f3def242f721728ac148ef143ad5 Author: Felix HieberDate: 2016-08-17T11:52:39Z Major refactoring of core decoder components (Rule.java, FeatureVector.java and grammars). The core idea of this change is to simplify feature handling inside Joshua. Please note that this change is NOT backwards compatible. The following changes were made: - No distinction between sparse and dense features inside the decoder anymore. Each feature stored at the rule is 'owned' by the grammar that contains the rule. An 'owned' feature simply means that its name is prepended with the owner string: 0=0.2 becomes _0=0.2. This applies to both dense features (features that occur at every rule), as well as sparse features. Please note that the old prefix 'tm_' is no longer used. - Having only one type of feature, a revised version of FeatureVector.java was built that is greatly simplified. It is basically a HashMap of FeatureId (typed as ints) to feature values. FeatureIds are created/hashed by the new global mapping FeatureMap.java, which maintains a bidirectional mapping between feature ids and feature names. This also allowed getting rid of storing feature names in the vocabulary. - The simplified FeatureVector cause removal of all 'reportDenseFeatures'/'getNumDenseFeatures' method in the decoder and the grammar interface. - The tradition but very obscure way of flipping the sign of dense features but not sparse features was removed. The feature value in the decoder is now just the value as you see it stored at the rule. - The Rule class was changed to adhere to object-oriented principles. It now has only one constructor that requires all of its dependencies and these can not be changed later. This forces Rule creators to