[ 
https://issues.apache.org/jira/browse/LUCENE-7460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517118#comment-15517118
 ] 

David Smiley commented on LUCENE-7460:
--------------------------------------

+1 to random access for within-document.

> Should SortedNumericDocValues expose a per-document random-access API?
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-7460
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7460
>             Project: Lucene - Core
>          Issue Type: Wish
>            Reporter: Adrien Grand
>            Priority: Minor
>
> Sorted numerics used to expose a per-document random-access API so that 
> accessing the median or max element would be cheap. The new 
> SortedNumericDocValues still exposes the number of values a document has, but 
> the only way to read values is to use {nextValue}, which forces to read all 
> values in order to read the max value.
> For instance, {{SortedNumericSelector.MAX}} does the following in master (the 
> important part is the for-loop):
> {code}
>     private void setValue() throws IOException {
>       int count = in.docValueCount();
>       for(int i=0;i<count;i++) {
>         value = in.nextValue();
>       }
>     }
>     @Override
>     public int nextDoc() throws IOException {
>       int docID = in.nextDoc();
>       if (docID != NO_MORE_DOCS) {
>         setValue();
>       }
>       return docID;
>     }
> {code}
> while it used to simply look up the value at index {{count-1}} in 6.x:
> {code}
>     @Override
>     public long get(int docID) {
>       in.setDocument(docID);
>       final int count = in.count();
>       if (count == 0) {
>         return 0; // missing
>       } else {
>         return in.valueAt(count-1);
>       }
>     }
> {code}
> This could be a conscious decision since a sequential API gives more 
> opportunities to the codec to compress efficiently, but on the other hand 
> this API prevents sorting by max or median values to be efficient.
> On my end I have a preference for the random-access API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to