[ https://issues.apache.org/jira/browse/LUCENE-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588903#comment-15588903 ]
Michael McCandless commented on LUCENE-7462: -------------------------------------------- Sorry, that was {{wikimediumall}} that I ran, 20 JVM iters, multiple iters per JVM, multiple concurrent queries, etc. > Faster search APIs for doc values > --------------------------------- > > Key: LUCENE-7462 > URL: https://issues.apache.org/jira/browse/LUCENE-7462 > Project: Lucene - Core > Issue Type: Improvement > Affects Versions: master (7.0) > Reporter: Adrien Grand > Priority: Minor > Attachments: LUCENE-7462-advanceExact.patch > > > While the iterator API helps deal with sparse doc values more efficiently, it > also makes search-time operations more costly. For instance, the old > random-access API allowed to compute facets on a given segment without any > conditionals, by just incrementing the counter at index {{ordinal+1}} while > the new API requires to advance the iterator if necessary and then check > whether it is exactly on the right document or not. > Since it is very common for fields to exist across most documents, I suspect > codecs will keep an internal structure that is similar to the current codec > in the dense case, by having a dense representation of the data and just > making the iterator skip over the minority of documents that do not have a > value. > I suggest that we add APIs that make things cheaper at search time. For > instance in the case of SORTED doc values, it could look like > {{LegacySortedDocValues}} with the additional restriction that documents can > only be consumed in order. Codecs that can implement this API efficiently > would hide it behind a {{SortedDocValues}} adapter, and then at search time > facets and comparators (which liked the {{LegacySortedDocValues}} API better) > would either unwrap or hide the SortedDocValues they got behind a more > random-access API (which would only happen in the truly sparse case if the > codec optimizes the dense case). > One challenge is that we already use the same idea for hiding single-valued > impls behind multi-valued impls, so we would need to enforce the order in > which the wrapping needs to happen. At first sight, it seems that it would be > best to do the single-value-behind-multi-value-API wrapping above the > random-access-behind-iterator-API wrapping. The complexity of > wrapping/unwrapping in the right order could be contained in the > {{DocValues}} helper class. > I think this change would also simplify search-time consumption of doc > values, which currently needs to spend several lines of code positioning the > iterator everytime it needs to do something interesting with doc values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org