Adrien Grand created LUCENE-6690:
------------------------------------

             Summary: Speed up MultiTermsEnum.next()
                 Key: LUCENE-6690
                 URL: https://issues.apache.org/jira/browse/LUCENE-6690
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Adrien Grand
            Assignee: Adrien Grand
            Priority: Minor


OrdinalMap is very useful when computing top terms on a multi-index segment. 
However I've seen it being occasionally slow to build, which was either making 
facets (when the ordinals map is computed lazily) or reopen (when computed 
eagerly) slow. So out of curiosity, I tried to profile ordinal map building on 
a simple index: 10M random strings of length between 0 and 20 stored as a 
SORTED doc values field. The index has 19 segments. The bottleneck was 
MultiTermsEnum.next() (by far) due to lots of BytesRef comparisons 
(UTF8SortedAsUnicodeComparator).

MultiTermsEnum stores sub enums in two different places:
 - top: a simple array containing all enums on the current term
 - queue: a queue for enums that are not exhausted yet but beyond the current 
term.

A non-exhausted enum is in exactly one of these data-structures. When moving to 
the next term, MultiTermsEnum advances all enums in {{top}}, then adds them to 
{{queue}} and finally, pops all enum that are on the same term back into 
{{top}}.

We could save reorderings of the priority queue by not removing entries from 
the priority queue and then calling updateTop to advance enums which are on the 
current term. This is already what we do for disjunctions of doc IDs in 
DISIPriorityQueue.

On the index described above and current trunk, building an OrdinalMap has to 
call UTF8SortedAsUnicodeComparator.compare 80114820 times and runs in 1.9 s. 
With the change, it calls UTF8SortedAsUnicodeComparator.compare 36900694 times, 
BytesRef.equals 16297638 times and runs in 1.4s (~26% faster).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to