[ https://issues.apache.org/jira/browse/LUCENE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17453801#comment-17453801 ]
ASF subversion and git services commented on LUCENE-10233: ---------------------------------------------------------- Commit ebee531df7eefeed4f735e86745cf33a55df232f in lucene's branch refs/heads/branch_9x from gf2121 [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ebee531 ] LUCENE-10233: fix Unit Test TestFixedBitSet#testAndNot (#512) Co-authored-by: guofeng.my <guofeng...@bytedance.com> > Store docIds as bitset when leafCardinality = 1 to speed up addAll > ------------------------------------------------------------------ > > Key: LUCENE-10233 > URL: https://issues.apache.org/jira/browse/LUCENE-10233 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Reporter: Feng Guo > Priority: Major > Fix For: 9.1 > > Attachments: SparseFixedBitSet.png > > Time Spent: 3h 50m > Remaining Estimate: 0h > > In low cardinality points cases, id blocks will usually store doc ids that > have the same point value, and {{intersect}} will get into {{addAll}} logic. > If we store ids as bitset, and give the IntersectVisitor bulk visiting > ability, we can speed up addAll because we can just execute the 'or' logic > between the result and the block ids. > Optimization will be triggered when the following conditions are met at the > same time: > # doc IDs are sorted strictly > # max(docId) - min(docId) <= 16 * pointCount (in order to avoid expanding > too much storage) > I mocked a field that has 10,000,000 docs per value and search it with a 1 > term PointInSetQuery, the build scorer time decreased from 151ms to 5ms. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org