[ https://issues.apache.org/jira/browse/LUCENE-7254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257785#comment-15257785 ]
Robert Muir commented on LUCENE-7254: ------------------------------------- {quote} +1 I don't like that this patch might create iterators over sparse FixedBitSet instances. I am fine with doing that temporarily for queries that are likely to match many docs (I see that you modified the ranges but not the point-in-set queries for instance) but in the longer term I think we should improve points so that we can know earlier how many docs are going to be added. {quote} No, it is the opposite way around. The sparse case is not the case to optimize because it is already fast. not doing point-in-set had nothing to do with that. I just don't have a good benchmark for it. I think we should use the fastest bitset always here for these queries. Optimizations for esoteric/abuse/etc cases (many values in a structured field, sparse fields) shouldnt drag down the hotspot of these searches for the common case. > DocIDSetBuilder is no good for points > ------------------------------------- > > Key: LUCENE-7254 > URL: https://issues.apache.org/jira/browse/LUCENE-7254 > Project: Lucene - Core > Issue Type: Bug > Reporter: Robert Muir > Attachments: LUCENE-7254.patch, LUCENE-7254.patch > > > For the postings lists, I think this approach works well in dense cases (e.g. > whole DISI's are added, things are coming in order, etc). > However in the points case, it holds back range performance significantly. > There are a couple of problems here: > * expensive cardinality computation (this is a 2% hit) when its totally > unnecessary. we can use index statistics to help here. > * lots of conditional stuff in add(). This includes growing checks / bitset > switching checks and so on (which happens even if you are smart and call > grow, but this stuff all adds up). > I dont think we should try to create a magical shared API that is both > efficient for postings lists of unstructured stuff and at the same time point > collection for structured fields, instead we should just do things > differently for points and iterate from there. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org