[jira] [Resolved] (JOSHUA-272) Simplify the packing and usage of phrase-based grammars

2016-08-24 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-272.
--
Resolution: Fixed

> Simplify the packing and usage of phrase-based grammars
> ---
>
> Key: JOSHUA-272
> URL: https://issues.apache.org/jira/browse/JOSHUA-272
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.1
>
>
> For historical reasons, phrase-based grammars add some complexity to 
> decoding. The complete tree under each top-level trie node in packed grammars 
> has to fit within a single packed grammars slice, which is limited to 2 GB 
> due to constraints on the size of Java byte[] arrays. We used to sort on just 
> the first item in the trie, which was a problem for phrase-based decoding, 
> since phrase-based rules are implemented as left-branching hierarchical 
> rules. In order to pack large grammars, we packed them without the leading 
> [X,1], and then added it when loading the grammars, both for the packed and 
> memory-based grammars. This was a real mess.
> This was all fixed with a commit a while ago that packs and reads packed 
> grammars based on the first two symbols on the source side. So we should 
> remove all the complexity associated with phrases. They should just be 
> regular rules. There is also a lot of redundancy across the codebase in 
> parsing rules, converting them to different formats, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (JOSHUA-272) Simplify the packing and usage of phrase-based grammars

2016-05-25 Thread Matt Post (JIRA)

 [ 
https://issues.apache.org/jira/browse/JOSHUA-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Post resolved JOSHUA-272.
--
Resolution: Fixed

Fixed with [a recent 
comment|https://github.com/apache/incubator-joshua/commit/aef0b2dbe4555070aec9f15bb2c8d9dcb5671dcd].

> Simplify the packing and usage of phrase-based grammars
> ---
>
> Key: JOSHUA-272
> URL: https://issues.apache.org/jira/browse/JOSHUA-272
> Project: Joshua
>  Issue Type: Improvement
>Reporter: Matt Post
>Assignee: Matt Post
> Fix For: 6.1
>
>
> For historical reasons, phrase-based grammars add some complexity to 
> decoding. The complete tree under each top-level trie node in packed grammars 
> has to fit within a single packed grammars slice, which is limited to 2 GB 
> due to constraints on the size of Java byte[] arrays. We used to sort on just 
> the first item in the trie, which was a problem for phrase-based decoding, 
> since phrase-based rules are implemented as left-branching hierarchical 
> rules. In order to pack large grammars, we packed them without the leading 
> [X,1], and then added it when loading the grammars, both for the packed and 
> memory-based grammars. This was a real mess.
> This was all fixed with a commit a while ago that packs and reads packed 
> grammars based on the first two symbols on the source side. So we should 
> remove all the complexity associated with phrases. They should just be 
> regular rules. There is also a lot of redundancy across the codebase in 
> parsing rules, converting them to different formats, and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)