GitHub user KellenSunderland opened a pull request:

    https://github.com/apache/incubator-joshua/pull/6

    More work on structuring translation output

    These commits focus on two areas:  
    *  One is structured translation output, and this should be our last PR for 
this topic for the time being.  
    *  We also have done some static code analysis and fixed different issues 
that have been flagged by our tool.  
    
    There are also some new unit tests included.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/KellenSunderland/incubator-joshua master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-joshua/pull/6.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6
    
----
commit 53de11905134e32191d11cafd07d5a033c16e411
Author: Felix Hieber <fhie...@amazon.com>
Date:   2015-11-25T14:11:27Z

    Reworked most of the hypergraph traversals for Viterbi and n-best 
extractions. Most importantly: translation string extraction is now int based 
instead of doing regex matching and string operations. This should be a lot 
faster. However this will only work for hiero models for now. Phrase-based 
decoding will still use the String-based extractions. Before there were two 
ways to traverse the hypergraph: (1) regular tailNode order (used for Viterbi 
and WordAlignment) and (2) tailNode order according to target side non terminal 
indices (used for KBestExtraction). This caused quite some inconsistencies on 
how to write general extractors (output string, input string, feature vector, 
word alignments, tree, etc.) that support both. The main issue was that some 
extractors (String-based HypothesisExtractor) relies on the traversal order (2) 
to simply always merge children strings into the first nonTerminal symbol on 
the target side. However, this breaks the very same class when the input st
 ring is requested (which is a supported feature in Joshua). This change gets 
rid of these inconsistencies for Hiero and simplifies a lot of the code. For 
phrase-based decoding, we still rely on the HypothesisExtractor, but this is 
not a use case for Saar at the moment, and phrase-based decoding is hack in 
Joshua anyway. JoshuaConfiguration now throws an exception if you want to have 
the 'align_index' in output strings (Moses style) for Hiero models. This is not 
supported by int[]-based extraction and useless anyway.

commit c72faea7c241395b8827cf6ab41aec67c7fdc54c
Author: Pavel Danchenko <danch...@amazon.com>
Date:   2015-12-22T11:49:22Z

    LanguageModelFF.estimateFutureCost refactorings and test

commit d1c3caac1da8c3c1175059b20c411a3ebd965465
Author: Kellen Sunderland <kell...@amazon.com>
Date:   2016-04-27T22:12:00Z

    Updated license files for tests

commit 9c3f2e6e60e68c9a55733d872d15c5c39c937ab0
Author: Felix Hieber <fhie...@amazon.com>
Date:   2015-12-31T10:57:18Z

    Modified KenLM jni to support querying the lm using strings not only ids. 
Also added a method to check whether a word or id is known to the lm.
    Made output of regression tests more concise

commit 8d86ff2b989c2b1db8aff7eaaa479cae38c73357
Author: Felix Hieber <fhie...@amazon.com>
Date:   2016-02-03T14:17:30Z

    Mostly a refactor for improved readability

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to