[jira] [Commented] (LUCENE-7460) Should SortedNumericDocValues expose a per-document random-access API?

Michael McCandless (JIRA) Fri, 23 Sep 2016 11:48:31 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517240#comment-15517240
 ]


Michael McCandless commented on LUCENE-7460:
--------------------------------------------

I'm not convinced this is a good idea.

E.g. Lucene's postings don't provide random access to each term occurrence 
within one document: you have to iterate them if you want to see them.

And I think it's somewhat abusive to store such a huge number of numeric values 
in a single document that this API change would matter.  For apps that really 
need a fast way to compute the min and max, they can index the min and max 
themselves?

It's important we keep our search time APIs in check.

> Should SortedNumericDocValues expose a per-document random-access API?
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-7460
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7460
>             Project: Lucene - Core
>          Issue Type: Wish
>            Reporter: Adrien Grand
>            Priority: Minor
>
> Sorted numerics used to expose a per-document random-access API so that 
> accessing the median or max element would be cheap. The new 
> SortedNumericDocValues still exposes the number of values a document has, but 
> the only way to read values is to use {nextValue}, which forces to read all 
> values in order to read the max value.
> For instance, {{SortedNumericSelector.MAX}} does the following in master (the 
> important part is the for-loop):
> {code}
>     private void setValue() throws IOException {
>       int count = in.docValueCount();
>       for(int i=0;i<count;i++) {
>         value = in.nextValue();
>       }
>     }
>     @Override
>     public int nextDoc() throws IOException {
>       int docID = in.nextDoc();
>       if (docID != NO_MORE_DOCS) {
>         setValue();
>       }
>       return docID;
>     }
> {code}
> while it used to simply look up the value at index {{count-1}} in 6.x:
> {code}
>     @Override
>     public long get(int docID) {
>       in.setDocument(docID);
>       final int count = in.count();
>       if (count == 0) {
>         return 0; // missing
>       } else {
>         return in.valueAt(count-1);
>       }
>     }
> {code}
> This could be a conscious decision since a sequential API gives more 
> opportunities to the codec to compress efficiently, but on the other hand 
> this API prevents sorting by max or median values to be efficient.
> On my end I have a preference for the random-access API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7460) Should SortedNumericDocValues expose a per-document random-access API?

Reply via email to