Adrien Grand created LUCENE-7958:
------------------------------------
Summary: Give TermInSetQuery better advancing capabilities
Key: LUCENE-7958
URL: https://issues.apache.org/jira/browse/LUCENE-7958
Project: Lucene - Core
Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor
If a TermInSetQuery has more than 15 matching terms on a given segment, then we
consume all postings lists into a bitset and return an iterator over this
bitset as a scorer. I would like to change it so that we keep the 15 postings
lists that have the largest document frequencies and consume all other
(shorter) postings lists into a bitset. In the end we return a disjunction over
the N longest postings lists and the bit set. This could help consume fewer doc
ids if the TermInSetQuery is intersected with other queries, especially if the
document frequencies of the terms it wraps have a zipfian distribution.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]