Han Jiang created LUCENE-5029:
---------------------------------
Summary: factor out a generic 'TermState' for better sharing in
FST-based term dict
Key: LUCENE-5029
URL: https://issues.apache.org/jira/browse/LUCENE-5029
Project: Lucene - Core
Issue Type: Task
Reporter: Han Jiang
Assignee: Han Jiang
Priority: Minor
Fix For: 4.4
Currently, those two FST-based term dict (memory codec & blocktree) all use
FST<BytesRef> as a base data structure, this might not share much data in
parent arcs, since the encoded BytesRef doesn't guarantee that
'Outputs.common()' always creates a long prefix.
While for current postings format, it is guaranteed that each FP (pointing to
.doc, .pos, etc.) will increase monotonically with 'larger' terms. That means,
between two Outputs, the Outputs from smaller term can be safely pushed towards
root. However we always have some tricky TermState to deal with (like the
singletonDocID for pulsing trick), so as Mike suggested, we can simply cut the
whole TermState into two parts: one part for comparation and intersection,
another for restoring generic data. Then the data structure will be clear: this
generic 'TermState' will consist of a fixed-length LongsRef and variable-length
BytesRef.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]