[ 
https://issues.apache.org/jira/browse/JOSHUA-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post updated JOSHUA-272:
-----------------------------
    Fix Version/s:     (was: 6.2)
                   6.1

> Simplify the packing and usage of phrase-based grammars
> -------------------------------------------------------
>
>                 Key: JOSHUA-272
>                 URL: https://issues.apache.org/jira/browse/JOSHUA-272
>             Project: Joshua
>          Issue Type: Improvement
>            Reporter: Matt Post
>            Assignee: Matt Post
>             Fix For: 6.1
>
>
> For historical reasons, phrase-based grammars add some complexity to 
> decoding. The complete tree under each top-level trie node in packed grammars 
> has to fit within a single packed grammars slice, which is limited to 2 GB 
> due to constraints on the size of Java byte[] arrays. We used to sort on just 
> the first item in the trie, which was a problem for phrase-based decoding, 
> since phrase-based rules are implemented as left-branching hierarchical 
> rules. In order to pack large grammars, we packed them without the leading 
> [X,1], and then added it when loading the grammars, both for the packed and 
> memory-based grammars. This was a real mess.
> This was all fixed with a commit a while ago that packs and reads packed 
> grammars based on the first two symbols on the source side. So we should 
> remove all the complexity associated with phrases. They should just be 
> regular rules. There is also a lot of redundancy across the codebase in 
> parsing rules, converting them to different formats, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to