[ 
https://issues.apache.org/jira/browse/LUCENE-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588903#comment-15588903
 ] 

Michael McCandless commented on LUCENE-7462:
--------------------------------------------

Sorry, that was {{wikimediumall}} that I ran, 20 JVM iters, multiple iters per 
JVM, multiple concurrent queries, etc.

> Faster search APIs for doc values
> ---------------------------------
>
>                 Key: LUCENE-7462
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7462
>             Project: Lucene - Core
>          Issue Type: Improvement
>    Affects Versions: master (7.0)
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7462-advanceExact.patch
>
>
> While the iterator API helps deal with sparse doc values more efficiently, it 
> also makes search-time operations more costly. For instance, the old 
> random-access API allowed to compute facets on a given segment without any 
> conditionals, by just incrementing the counter at index {{ordinal+1}} while 
> the new API requires to advance the iterator if necessary and then check 
> whether it is exactly on the right document or not.
> Since it is very common for fields to exist across most documents, I suspect 
> codecs will keep an internal structure that is similar to the current codec 
> in the dense case, by having a dense representation of the data and just 
> making the iterator skip over the minority of documents that do not have a 
> value.
> I suggest that we add APIs that make things cheaper at search time. For 
> instance in the case of SORTED doc values, it could look like 
> {{LegacySortedDocValues}} with the additional restriction that documents can 
> only be consumed in order. Codecs that can implement this API efficiently 
> would hide it behind a {{SortedDocValues}} adapter, and then at search time 
> facets and comparators (which liked the {{LegacySortedDocValues}} API better) 
> would either unwrap or hide the SortedDocValues they got behind a more 
> random-access API (which would only happen in the truly sparse case if the 
> codec optimizes the dense case).
> One challenge is that we already use the same idea for hiding single-valued 
> impls behind multi-valued impls, so we would need to enforce the order in 
> which the wrapping needs to happen. At first sight, it seems that it would be 
> best to do the single-value-behind-multi-value-API wrapping above the 
> random-access-behind-iterator-API wrapping. The complexity of 
> wrapping/unwrapping in the right order could be contained in the 
> {{DocValues}} helper class.
> I think this change would also simplify search-time consumption of doc 
> values, which currently needs to spend several lines of code positioning the 
> iterator everytime it needs to do something interesting with doc values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to