[jira] [Updated] (LUCENE-3297) FST doesn't fully share common prefix across all outputs

Michael McCandless (JIRA) Sat, 09 Jul 2011 06:05:44 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael McCandless updated LUCENE-3297:
---------------------------------------

    Component/s: core/FSTs

> FST doesn't fully share common prefix across all outputs
> --------------------------------------------------------
>
>                 Key: LUCENE-3297
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3297
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/FSTs
>            Reporter: Michael McCandless
>            Priority: Minor
>
> FST will try to share prefixes of outputs when possible, however in the [I 
> think unusual in practice] case where all outputs share a common prefix, FST 
> really ought to store this just once, on the root arc, but instead it's only 
> able to push back to the N root arcs.  It's sort of an off-by-one on how far 
> back the pushing goes...
> One [synthetic] example where this makes a big difference is the new 
> Test2BPostings test, when it uses MemoryCodec, because this test has 26 terms 
> (letters of alphabet) and each term has exactly the same long (~85 MB) all 1s 
> byte[] as the postings.  If we fixed this issue, then the resulting FST would 
> only be ~85 MB but now instead it needs to be ~85 * 26 MB.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-3297) FST doesn't fully share common prefix across all outputs

Reply via email to