Optimize BlockTermsReader.seek
------------------------------
Key: LUCENE-2922
URL: https://issues.apache.org/jira/browse/LUCENE-2922
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
Fix For: 4.0
When we seek, we first consult the terms index to find the right block
of 32 (default) terms that may hold the target term. Then, we scan
that block looking for an exact match.
The scanning just uses next() and then compares the full term, but
this is actually rather wasteful. First off, since all terms in the
block share a common prefix, we should compare the target against that
common prefix once, and then only compare the new suffix of each
term. Second, since the term suffixes have already been read up front
into a byte[], we should do a no-copy comparison (vs today, where we
first read a copy into the local BytesRef and then compare).
With this opto, I removed the ability for BlockTermsWriter/Reader to
support arbitrary term sort order -- it's now hardwired to
BytesRef.utf8SortedAsUnicode.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]