[ https://issues.apache.org/jira/browse/JOSHUA-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435616#comment-15435616 ]
Matt Post commented on JOSHUA-272: ---------------------------------- Phrase-based decoding has been changed to no longer use left-branching rules, so this no longer applies. > Simplify the packing and usage of phrase-based grammars > ------------------------------------------------------- > > Key: JOSHUA-272 > URL: https://issues.apache.org/jira/browse/JOSHUA-272 > Project: Joshua > Issue Type: Improvement > Reporter: Matt Post > Assignee: Matt Post > Fix For: 6.1 > > > For historical reasons, phrase-based grammars add some complexity to > decoding. The complete tree under each top-level trie node in packed grammars > has to fit within a single packed grammars slice, which is limited to 2 GB > due to constraints on the size of Java byte[] arrays. We used to sort on just > the first item in the trie, which was a problem for phrase-based decoding, > since phrase-based rules are implemented as left-branching hierarchical > rules. In order to pack large grammars, we packed them without the leading > [X,1], and then added it when loading the grammars, both for the packed and > memory-based grammars. This was a real mess. > This was all fixed with a commit a while ago that packs and reads packed > grammars based on the first two symbols on the source side. So we should > remove all the complexity associated with phrases. They should just be > regular rules. There is also a lot of redundancy across the codebase in > parsing rules, converting them to different formats, and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)