[GitHub] incubator-joshua pull request #64: Fixed Rescoring test
GitHub user fhieber opened a pull request: https://github.com/apache/incubator-joshua/pull/64 Fixed Rescoring test You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhieber/incubator-joshua 7 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/64.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #64 commit 695feb33293423ab6fca1d8f03b2eb0f4fe9393a Author: Hieber, Felix Date: 2016-09-15T15:16:56Z Fixed Rescoring test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #63: Fixed more tests in 7
GitHub user fhieber opened a pull request: https://github.com/apache/incubator-joshua/pull/63 Fixed more tests in 7 You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhieber/incubator-joshua 7_new Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/63.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #63 commit 1091b9e8d49165a2ec9f20b62f3a15b37ffb4dda Author: Hieber, Felix Date: 2016-09-15T14:57:01Z Fixed SourceAnnotations Test for 7 commit 0d99e4d47f868d83afcb34379fbf984a5452dca1 Author: Hieber, Felix Date: 2016-09-15T14:58:52Z Fixed TooLongTest for 7 commit c6cd09748312f0338c1002f2adb19fd8308fcb7d Author: Hieber, Felix Date: 2016-09-15T15:05:10Z Merge remote-tracking branch 'upstream/7' into 7_new --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #62: Fixed NumTranslationOptions test and adde...
GitHub user fhieber opened a pull request: https://github.com/apache/incubator-joshua/pull/62 Fixed NumTranslationOptions test and added deterministic output order⦠⦠of feature vectors You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhieber/incubator-joshua 7 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/62.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #62 commit bc050e3b5c99f71c4d5945dc9852a13e825958f7 Author: Hieber, Felix Date: 2016-09-15T14:48:58Z Fixed NumTranslationOptions test and added deterministic output order of feature vectors --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #61: Fixed NAry test for 7
GitHub user fhieber opened a pull request: https://github.com/apache/incubator-joshua/pull/61 Fixed NAry test for 7 You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhieber/incubator-joshua 7 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/61.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #61 commit f08573b5f64a37afcc824df233db8a7e8719f7e9 Author: Hieber, Felix Date: 2016-09-15T12:28:13Z Fixed NAry test for 7 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #59: Fixed NoGrammarTest for 7
GitHub user fhieber opened a pull request: https://github.com/apache/incubator-joshua/pull/59 Fixed NoGrammarTest for 7 You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhieber/incubator-joshua 7 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/59.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #59 commit c2747787ae222a36362be2de8cc2e5e26fcc01fb Author: Hieber, Felix Date: 2016-09-15T11:43:02Z Fixed NoGrammarTest for 7 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua issue #58: Moved new unit tests (former regression tests) t...
Github user fhieber commented on the issue: https://github.com/apache/incubator-joshua/pull/58 Depends whether we had any end-to-end tests like this already in 7. For the existing unit tests I already made the necessary changes to pass in 7. Quote from my commit for the featureFector refactoring: > This commit also updates all existing (and enabled) Unit tests which also pass. Existing regression tests do NOT work in this commit since many of the grammars are packed and would need to be re-packed. [Commit](https://github.com/apache/incubator-joshua/commit/20afddf85263f3def242f721728ac148ef143ad5) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #58: Moved new unit tests (former regression t...
GitHub user fhieber opened a pull request: https://github.com/apache/incubator-joshua/pull/58 Moved new unit tests (former regression tests) to correct location. The previous merge from master placed the new regression tests from Michael into the wrong location. This moves them to joshua-core/src/ so that they actually get run. Now they are failing. Can someone please investigate? You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhieber/incubator-joshua 7 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/58.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #58 commit ee7398f7d0c921cf1aaf7937a370a41d31431033 Author: Hieber, Felix Date: 2016-09-14T21:19:00Z moved new unit tests (former regression tests) to correct location in joshua 7 (joshua-core/src) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #50: fix basedir path for java.library.path
GitHub user fhieber opened a pull request: https://github.com/apache/incubator-joshua/pull/50 fix basedir path for java.library.path makes unit tests run with kenlm again You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhieber/incubator-joshua 7 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/50.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #50 commit f0883156c10a521cc22ee0698b7c15456256b4bb Author: Felix Hieber Date: 2016-09-12T14:08:54Z fix basedir path for java.library.path --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/d4fdbfd88bab99e244d3ed1fc9cff4ba5e6d124c#commitcomment-18742911 Yes, sounds like the right thing to do. Would there be a way to decide on statefulness earlier to avoid the instanceof checks and adding a new interface method which is redundant for Stateless feature functions? Maybe @KellenSunderland can comment on what the right way to instantiate feature function classes is in general. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/d4fdbfd88bab99e244d3ed1fc9cff4ba5e6d124c#commitcomment-18738740 If resetting global state in the constructor works for now, thats fine with me. That should in principle be equivalent with calling Decoder.cleanUp() in setup/teardown methods for the tests. I never know whether unit tests are executed in parallel or not. If thats the case, we might have a general problem with these 'Decoder-level' tests anyway. Maybe someone else has more insight here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #41: Major refactoring of features/rules/gramm...
GitHub user fhieber opened a pull request: https://github.com/apache/incubator-joshua/pull/41 Major refactoring of features/rules/grammars in Joshua Major refactoring of core decoder components (Rule.java, FeatureVector.java and grammars). The core idea of this change is to simplify feature handling inside Joshua. Please note that this change is NOT backwards compatible. The following changes were made: - No distinction between sparse and dense features inside the decoder anymore. Each feature stored at the rule is 'owned' by the grammar that contains the rule. An 'owned' feature simply means that its name is prepended with the owner string: 0=0.2 becomes _0=0.2. This applies to both dense features (features that occur at every rule), as well as sparse features. Please note that the old prefix 'tm_' is no longer used. - Having only one type of feature, a revised version of FeatureVector.java was built that is greatly simplified. It is basically a HashMap of FeatureId (typed as ints) to feature values. FeatureIds are created/hashed by the new global mapping FeatureMap.java, which maintains a bidirectional mapping between feature ids and feature names. This also allowed getting rid of storing feature names in the vocabulary. - The simplified FeatureVector cause removal of all 'reportDenseFeatures'/'getNumDenseFeatures' method in the decoder and the grammar interface. - The tradition but very obscure way of flipping the sign of dense features but not sparse features was removed. The feature value in the decoder is now just the value as you see it stored at the rule. - The Rule class was changed to adhere to object-oriented principles. It now has only one constructor that requires all of its dependencies and these can not be changed later. This forces Rule creators to finalize the dependencies (deciding on an owner of the rule and the hashing of the feature vector). - Also the unused concept of the precomputableCost in a rule was removed. Rules still 'cache' their estimated cost. - The various Grammar and MemoryBasedBatchGrammar constructors were unified and a lot of old obscure code was removed. - Due to the change above, the PhraseModel feature function that fires feature values for features stored at rules is greatly simplified. - As featureVectors at Rules are final and have to have an owner, feature sharing across multiple grammars would need to be handled by a separate feature function implementation which is transparent. This commit also updates all existing (and enabled) Unit tests which also pass. Existing regression tests do NOT work in this commit since many of the grammars are packed and would need to be re-packed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhieber/incubator-joshua 7_features Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/41.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #41 commit 20afddf85263f3def242f721728ac148ef143ad5 Author: Felix Hieber Date: 2016-08-17T11:52:39Z Major refactoring of core decoder components (Rule.java, FeatureVector.java and grammars). The core idea of this change is to simplify feature handling inside Joshua. Please note that this change is NOT backwards compatible. The following changes were made: - No distinction between sparse and dense features inside the decoder anymore. Each feature stored at the rule is 'owned' by the grammar that contains the rule. An 'owned' feature simply means that its name is prepended with the owner string: 0=0.2 becomes _0=0.2. This applies to both dense features (features that occur at every rule), as well as sparse features. Please note that the old prefix 'tm_' is no longer used. - Having only one type of feature, a revised version of FeatureVector.java was built that is greatly simplified. It is basically a HashMap of FeatureId (typed as ints) to feature values. FeatureIds are created/hashed by the new global mapping FeatureMap.java, which maintains a bidirectional mapping between feature ids and feature names. This also allowed getting rid of storing feature names in the vocabulary. - The simplified FeatureVector cause removal of all 'reportDenseFeatures'/'getNumDenseFeatures' method in the decoder and the grammar interface. - The tradition but very obscure way of flipping the sign of dense features but not sparse features was removed. The feature value in the decoder is now just the value as you see it stored at the rule. - The Rule class was changed to adhere to object-oriented principles. It now has only one constructor that r
[GitHub] incubator-joshua pull request #:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/3e2e053db9cafcdc5b4885a0f6fa5e54a63bc468#commitcomment-18672173 In src/test/java/org/apache/joshua/system/LmOovFeatureTest.java: In src/test/java/org/apache/joshua/system/LmOovFeatureTest.java on line 69: is this main required? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #29: Added option to fire a dense language mod...
GitHub user fhieber opened a pull request: https://github.com/apache/incubator-joshua/pull/29 Added option to fire a dense language model oov indicator feature from LanguageModelFF. By default it is turned off. To activate, specify '-oov_feature' in the lm line in joshua.config. This OOV feature is different from the OOV rules in Joshua grammars/phrase tables. It indicates an OOV in the language model which typically has a much larger vocabulary. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhieber/incubator-joshua lm_feature Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/29.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #29 commit b3063a4674222a3419eac17e310b1d2cb693fbc8 Author: Felix Hieber Date: 2016-06-24T07:13:52Z Added option to fire a dense language model oov indicator feature from LanguageModelFF. By default it is turned off. To activate, specify '-oov_feature' in the lm line in joshua.config --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/748cf32f956d3fd5772f62e023d0cc8d70a7dcad#commitcomment-17995315 No its totally fine. Thanks for the fix! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/3fb809c444c7855255b51080c368071bc508be1f#commitcomment-17995297 I am fine with testng. Its equally easy to run in eclipse with the plugin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/748cf32f956d3fd5772f62e023d0cc8d70a7dcad#commitcomment-17994947 Ah, right! Thanks! Should have looked more closely. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/748cf32f956d3fd5772f62e023d0cc8d70a7dcad#commitcomment-17994753 Hi Matt, Can you comment how this was a bug and what this change changes? Using the Rule constructor without an owner will just assign the Unknown owner id to it in the constructor itself afaik. So what should the owner be here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua issue #26: Owner Ids are maintained in separate mapping now
Github user fhieber commented on the issue: https://github.com/apache/incubator-joshua/pull/26 Alright, I can rebase this PR once the ClassLM PR is merged so that we have a clean separation of those two commits on Github --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua issue #25: ClassLMs: fixed a bug with class-based lms not m...
Github user fhieber commented on the issue: https://github.com/apache/incubator-joshua/pull/25 Great, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #26: Owner Ids are maintained in separate mapp...
GitHub user fhieber opened a pull request: https://github.com/apache/incubator-joshua/pull/26 Owner Ids are maintained in separate mapping now Removed owner ids from Vocabulary. These are now maintained in their own mapping. Fixes a bug with multiple packed grammars that would overwrite each others owner Vocab id. Also cleaned up grammar constructors a little bit. The owner id is now strongly typed to prevent users to accidentally use ints that do not represent an actual OwnerId. Further OwnerIds can only be returned by the OwnerMap.register() method. @mjpost I am obviously not a Github PR expert. This PR includes the other one about class LMs. Lets coordinate merging if you are ok with both. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhieber/incubator-joshua owner Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/26.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #26 commit 8fc7544eaaf35f71367b48778eaa1f22772ca390 Author: Felix Hieber Date: 2016-06-20T09:21:03Z ClassLMs: fixed a bug with class-based lms not mapping to class ids when estimateCost(). Also refactored the code a little bit to have StateMinimizingLanguageModels support classes as well. Added some unit tests. The existing regression test output was changed to the new output. commit 1011bbb03b29b57eb2903e4817a4d6a3d553354e Author: Felix Hieber Date: 2016-06-20T15:55:23Z Removed owner ids from Vocabulary. These are now maintained in their own mapping. Fixes a bug with multiple packed grammars that would overwrite each others owner Vocab id. Also cleaned up grammar constructors a little bit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #25: ClassLMs: fixed a bug with class-based lm...
GitHub user fhieber opened a pull request: https://github.com/apache/incubator-joshua/pull/25 ClassLMs: fixed a bug with class-based lms not mapping to class ids for estimateCost() Also refactored the code a little bit to have StateMinimizingLanguageModels support classes as well. Added some unit tests. The existing regression test output was changed to the new output. @mjpost It is hard to see whether the new regression output for test-classlm.sh is 'more' correct than before. If you could test this change with some of your models that use class-based lms, that'd be great. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhieber/incubator-joshua master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/25.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #25 commit 8fc7544eaaf35f71367b48778eaa1f22772ca390 Author: Felix Hieber Date: 2016-06-20T09:21:03Z ClassLMs: fixed a bug with class-based lms not mapping to class ids when estimateCost(). Also refactored the code a little bit to have StateMinimizingLanguageModels support classes as well. Added some unit tests. The existing regression test output was changed to the new output. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/6d2213a20b74432fc7cb131c732f7507b74053e9#commitcomment-17933645 In src/main/java/org/apache/joshua/decoder/StructuredTranslation.java: In src/main/java/org/apache/joshua/decoder/StructuredTranslation.java on line 57: Hi Matt, Thanks, StructuredTranslation is an object to be used in a downstream codebase and I think it would be good to have minimal dependencies on Joshua objects as possible there :) Regarding your question: Iâd go for a container/value object for the reason stated above if possible. Cheers, Felix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/6d2213a20b74432fc7cb131c732f7507b74053e9#commitcomment-17928740 In src/main/java/org/apache/joshua/decoder/StructuredTranslation.java: In src/main/java/org/apache/joshua/decoder/StructuredTranslation.java on line 57: Hi matt, Do you need the features member of structuredTranslation to be a FeatureVector? The problem is that this leaks IDs of features into downstream applications. Cheers, Felix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request #:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/3fb809c444c7855255b51080c368071bc508be1f#commitcomment-17817018 I would vote for junit, since these tests were created much more recently and the older testng tests have not been used for quite some time right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: Reworked some SparseFeature functions. Also a...
GitHub user fhieber opened a pull request: https://github.com/apache/incubator-joshua/pull/21 Reworked some SparseFeature functions. Also added 'lexical' sparse features. This makes some of the sparse feature functions more efficient. I also removed the redundant "feature-function" string in the JoshuaConfiguration.feature_functions String list, since it served no purposes, afaik. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhieber/incubator-joshua sparse Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/21.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21 commit 25a92cbca7c3a11c1d99c3e71686aea9874e0133 Author: Felix Hieber Date: 2016-04-30T16:35:10Z Added Sparse lexical feature function. Revised various other sparse feature functions to avoid String formatting. Expensive feature functions now use an LRU cache to avoid re-calculation of feature hashes for commonly used rules. Also cleaned up the feature string parsing a little bit. commit 5591c6769c162e3268243aa3324c367c6ba9c945 Author: Felix Hieber Date: 2016-05-30T09:54:53Z revert change to ivy.xml --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: Some changes to structured translation interf...
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/pull/20 I think it doesn't break master so it could also go there and then JOSHUA-273/252 could be rebased. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: Some changes to structured translation interf...
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/pull/20 You are right, the structural changes in this PR are subsumed by JOSHUA-273 mostly. However, the changes to the alignment code would still be beneficial. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: Some changes to structured translation interf...
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/pull/20 These changes would not be immediately compatible with JOSHUA-273. The latter is unlikely to be immediately merged, right? Basically, the KBestExtractor now may return StructuredTranslations so that you can return them if Joshua is used as a library. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: Some changes to structured translat...
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/pull/20#issuecomment-222445352 Unit tests should all pass now except the MultithreadedTranslation test because of the interface change to decodeAll() --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request: Some changes to structured translat...
GitHub user fhieber opened a pull request: https://github.com/apache/incubator-joshua/pull/20 Some changes to structured translation interface to provide kbest access. Hi Matt, I tried to fix all compile errors, but some of the tests still fail due to the changed decodeAll() interface (not returning Translations object, but writing to output stream. Also, I noticed that the StructuredTranslationTest and word alignment unit tests fail (e.g. tst/joshua/system/StructuredTranslationTest.java). I can't investigate right now, but the expected output for word alignments in the tests should be correct. I am getting some error messages regarding the MemoryBasedBatchgrammars being loaded there: 'Couldn't create a GrammarReader for file null with format phrase' With your recent changes to the Grammar interface/readers, these tests might use an outdated format or so? It would be great if you could take a look / try run all unit tests under tst/. On a side note: I also had trouble building the project using ant (unable to download doxygen and commons-cli dependencies). You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhieber/incubator-joshua master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-joshua/pull/20.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20 commit e3673e988d5d27f93e69cf270dd4056547a752b9 Author: Felix Hieber Date: 2016-03-15T10:26:29Z StructuredTranslation objects can now be generated from KBest Derivations. This gives way to expose k-best lists if Joshua is used as a library. Also fixed some code issues and tests. commit b6010671dba008669282a1e7b2b334217443c587 Author: Artem Sokolov Date: 2016-04-22T13:50:38Z Handled an edge case where a null hypergraph with topn!=0 would not return an empty output translation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/868b340949f324b12d810d03c09c25fd5877d3dc#commitcomment-17621082 As far as I know, you can add the tst folder to the Eclipse build path and then select it and click run. Even if not run, they should be added to the Eclipse build to see compile errors. But I am not an Eclipse expert :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/19aadf0e240acb09c5b4336068e6368083f26bef#commitcomment-17621052 In src/joshua/decoder/Decoder.java: In src/joshua/decoder/Decoder.java on line 515: If one uses Joshua as a library, threading could be handled by the surrounding codebase. I think supporting both use cases is important. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/f2f82c38af9aebd28f9d27f685a2e99767a2e575#commitcomment-17621008 In src/joshua/decoder/Translations.java: In src/joshua/decoder/Translations.java on line 49: Maybe we can use the TranslationFactory for it. This is just brainstorming, but lets say at Decoder initialization/construction the decoder instantiates its member translationFactory with the joshuaConfiguration and the factory configures itself in the constructor of what to do. We only would need a single instance of factory and it would build for all requests. I am not sure if the pattern below is nice, we should get some engineering comments. :) public TranslationFactory { private final boolean buildAlignments; public TranslationFactory(JoshuaConfiguration config) { buildAlignments = config.outputFormat.needsAlignments } public getTranslation(DerivationState derivation, Sentence sentence) { Translation t = new Translation(derivation, Sentence); if (buildAlignments) { t.setWordAlignments(this.buildWordAlignments(derivation)); } return t; } } } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/19aadf0e240acb09c5b4336068e6368083f26bef#commitcomment-17620734 In src/joshua/decoder/Decoder.java: In src/joshua/decoder/Decoder.java on line 515: what if one uses Decoder.decode() and not decodeAll()? Is there a way to put this into the TranslationFactory by passing it a reference to the feature functions? If someone calls TranslationFactory().translation() it could execute this KenLM hack since at this point one knows that the KenLM states can be destroyed. Also, if TranslationFactory is used in this pattern I would suggest calling it TranslationBuilder and adopt the builder pattern: TranslationBuilder tb = new TranslationBuilder() Translation translation = tb.withAlignments().withFeatures().buildTranslation(); --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/f2f82c38af9aebd28f9d27f685a2e99767a2e575#commitcomment-17620605 In src/joshua/decoder/hypergraph/KBestExtractor.java: In src/joshua/decoder/hypergraph/KBestExtractor.java on line 93: I love the idea of this being an iterator. Could we set k in its constructor to make sure it stops producing iterations once k or all possible derivations are hit? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/f2f82c38af9aebd28f9d27f685a2e99767a2e575#commitcomment-17620594 In src/joshua/decoder/Translations.java: In src/joshua/decoder/Translations.java on line 49: I am not sure I have understood it 100%, but what I worry about with this change is that when you have to wait for sentence x very long, and other requests x+1x+n are already finished, the decoder will hold on to all these hypergraphs until the hypergraph for x is complete, its translation is created and all hgs can be destroyed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/f2f82c38af9aebd28f9d27f685a2e99767a2e575#commitcomment-17620411 In src/joshua/decoder/Decoder.java: In src/joshua/decoder/Decoder.java on line 193: where is the right side of this assignment coming from? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/f2f82c38af9aebd28f9d27f685a2e99767a2e575#commitcomment-17620359 In src/joshua/decoder/Decoder.java: In src/joshua/decoder/Decoder.java on line 39: is this required? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/f5adcdefd3e52c58b36713798c0830d1c42099e3#commitcomment-17618810 In src/joshua/tools/GrammarPacker.java: In src/joshua/tools/GrammarPacker.java on line 295: Completely agreed :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/868b340949f324b12d810d03c09c25fd5877d3dc#commitcomment-17618752 There is a Vocabulary unit test (https://github.com/apache/incubator-joshua/blob/master/tst/joshua/corpus/VocabularyTest.java) and a FormatUtil Unit test (https://github.com/apache/incubator-joshua/blob/master/tst/joshua/util/FormatUtilsTest.java) These should be updated accordingly. I guess Unit tests are not run currently during build. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/f5adcdefd3e52c58b36713798c0830d1c42099e3#commitcomment-17618502 In src/joshua/decoder/ff/tm/format/MosesFormatReader.java: In src/joshua/decoder/ff/tm/format/MosesFormatReader.java on line 95: this should probably go at some point or changed to logging --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/76eb9587a9c321d45a9560481429e2dffd918e4e#commitcomment-17618437 Great change! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/f5adcdefd3e52c58b36713798c0830d1c42099e3#commitcomment-17618536 In src/joshua/tools/GrammarPacker.java: In src/joshua/tools/GrammarPacker.java on line 261: should probably be removed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/f5adcdefd3e52c58b36713798c0830d1c42099e3#commitcomment-17618628 I love this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/366f408672e2d29b69a78531b57056649629e978#commitcomment-17618694 In src/joshua/decoder/phrase/PhraseTable.java: In src/joshua/decoder/phrase/PhraseTable.java on line 99: "[X]" should be defined as a constant somewhere in the project --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/f5adcdefd3e52c58b36713798c0830d1c42099e3#commitcomment-17618524 In src/joshua/tools/GrammarPacker.java: In src/joshua/tools/GrammarPacker.java on line 242: I have to look into this. I think we found a case where it was necessary but I cannot remember why. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/f5adcdefd3e52c58b36713798c0830d1c42099e3#commitcomment-17618617 In src/joshua/tools/GrammarPacker.java: In src/joshua/tools/GrammarPacker.java on line 295: The "\\s+" string should probably defined as a token separator constant somewhere in the project to avoid hard-coding it everywhere. It might be useful to add a method to Vocabulary.java that returns a list of strings instead of the joined version to avoid the splitting here: public List getSourceWordList(int[] ids) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/f5adcdefd3e52c58b36713798c0830d1c42099e3#commitcomment-17618489 In src/joshua/decoder/ff/tm/format/MosesFormatReader.java: In src/joshua/decoder/ff/tm/format/MosesFormatReader.java on line 82: you can initialize the StringBuffer directly with the following line: new StringBuffer("[X] ||| [X,1] " + fields[0] + " ||| [X,1] " + fields[1] + " |||") --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/76eb9587a9c321d45a9560481429e2dffd918e4e#commitcomment-17618411 In src/joshua/decoder/ff/tm/hash_based/MemoryBasedBatchGrammar.java: In src/joshua/decoder/ff/tm/hash_based/MemoryBasedBatchGrammar.java on line 130: We could think about making these type strings an enum at some point to clearly define what they are. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/76eb9587a9c321d45a9560481429e2dffd918e4e#commitcomment-17618391 In src/joshua/decoder/ff/tm/format/MosesFormatReader.java: In src/joshua/decoder/ff/tm/format/MosesFormatReader.java on line 45: this could be made private static final int lhs = Vocabulary.id("[X]"). However, ideally all 'dependencies' of a class should be instantiated before the class is instantiated. We should think at some point about what 'contracts' we can define for grammars: i.e. when and how do they modify the vocabulary? At construction time? At loading time? Later? etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/76eb9587a9c321d45a9560481429e2dffd918e4e#commitcomment-17618301 In src/joshua/decoder/ff/tm/format/HieroFormatReader.java: In src/joshua/decoder/ff/tm/format/HieroFormatReader.java on line 83: to be consistent with the above changes (English/French -> target/source) the comment could be adopted --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/76eb9587a9c321d45a9560481429e2dffd918e4e#commitcomment-17618246 In src/joshua/decoder/ff/tm/Rule.java: In src/joshua/decoder/ff/tm/Rule.java on line 201: Fantasic commit! Would it make sense to refactor these methods to setTarget/setSource as well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/8243195611a17e0ef067ec7dbf6c4a57612d041b#commitcomment-17242365 In src/joshua/decoder/StructuredTranslation.java: In src/joshua/decoder/StructuredTranslation.java on line 47: this could be solved using the Suppliers pattern as already done in the PackedGrammar class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/805b643187e07c1d4dcd5047c3ac2dfa0a84e256#commitcomment-17176979 I agree, this is kind of a big deal. Thanks for catching this! I have used this code before, but did not see clear improvements with class-based lms. Now I know why :) Regarding StateMinimizingLanguageModels: that would be great. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-joshua pull request:
Github user fhieber commented on the pull request: https://github.com/apache/incubator-joshua/commit/805b643187e07c1d4dcd5047c3ac2dfa0a84e256#commitcomment-17172640 Hi Matt, this looks like a quite important change. Can you comment on how this changes results? Also, the lookup in the class map returns int 10 if word is unknown. Why 10? Further, do you know if there are any limitations using class lms as another state minimizing language model? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---