[ https://issues.apache.org/jira/browse/JOSHUA-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt Post resolved JOSHUA-272. ------------------------------ Resolution: Fixed Fixed with [a recent comment|https://github.com/apache/incubator-joshua/commit/aef0b2dbe4555070aec9f15bb2c8d9dcb5671dcd]. > Simplify the packing and usage of phrase-based grammars > ------------------------------------------------------- > > Key: JOSHUA-272 > URL: https://issues.apache.org/jira/browse/JOSHUA-272 > Project: Joshua > Issue Type: Improvement > Reporter: Matt Post > Assignee: Matt Post > Fix For: 6.1 > > > For historical reasons, phrase-based grammars add some complexity to > decoding. The complete tree under each top-level trie node in packed grammars > has to fit within a single packed grammars slice, which is limited to 2 GB > due to constraints on the size of Java byte[] arrays. We used to sort on just > the first item in the trie, which was a problem for phrase-based decoding, > since phrase-based rules are implemented as left-branching hierarchical > rules. In order to pack large grammars, we packed them without the leading > [X,1], and then added it when loading the grammars, both for the packed and > memory-based grammars. This was a real mess. > This was all fixed with a commit a while ago that packs and reads packed > grammars based on the first two symbols on the source side. So we should > remove all the complexity associated with phrases. They should just be > regular rules. There is also a lot of redundancy across the codebase in > parsing rules, converting them to different formats, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)