[
https://issues.apache.org/jira/browse/LUCENE-7460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517240#comment-15517240
]
Michael McCandless commented on LUCENE-7460:
--------------------------------------------
I'm not convinced this is a good idea.
E.g. Lucene's postings don't provide random access to each term occurrence
within one document: you have to iterate them if you want to see them.
And I think it's somewhat abusive to store such a huge number of numeric values
in a single document that this API change would matter. For apps that really
need a fast way to compute the min and max, they can index the min and max
themselves?
It's important we keep our search time APIs in check.
> Should SortedNumericDocValues expose a per-document random-access API?
> ----------------------------------------------------------------------
>
> Key: LUCENE-7460
> URL: https://issues.apache.org/jira/browse/LUCENE-7460
> Project: Lucene - Core
> Issue Type: Wish
> Reporter: Adrien Grand
> Priority: Minor
>
> Sorted numerics used to expose a per-document random-access API so that
> accessing the median or max element would be cheap. The new
> SortedNumericDocValues still exposes the number of values a document has, but
> the only way to read values is to use {nextValue}, which forces to read all
> values in order to read the max value.
> For instance, {{SortedNumericSelector.MAX}} does the following in master (the
> important part is the for-loop):
> {code}
> private void setValue() throws IOException {
> int count = in.docValueCount();
> for(int i=0;i<count;i++) {
> value = in.nextValue();
> }
> }
> @Override
> public int nextDoc() throws IOException {
> int docID = in.nextDoc();
> if (docID != NO_MORE_DOCS) {
> setValue();
> }
> return docID;
> }
> {code}
> while it used to simply look up the value at index {{count-1}} in 6.x:
> {code}
> @Override
> public long get(int docID) {
> in.setDocument(docID);
> final int count = in.count();
> if (count == 0) {
> return 0; // missing
> } else {
> return in.valueAt(count-1);
> }
> }
> {code}
> This could be a conscious decision since a sequential API gives more
> opportunities to the codec to compress efficiently, but on the other hand
> this API prevents sorting by max or median values to be efficient.
> On my end I have a preference for the random-access API.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]