[ https://issues.apache.org/jira/browse/LUCENE-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410605#comment-15410605 ]
Michael McCandless commented on LUCENE-7407: -------------------------------------------- bq. One minor suggestion I have would be to make the API return a DocIdSetIterator (like Scorer) rather than extend it. Hmm, this seems somewhat awkward? It would mean to iterate doc values, you would need to hold onto two classes: the iterator you pulled, and the parent class you pulled it from. I think it's also odd that the iterator is altering state in another class. I realize we did this for performance reasons for {{Scorer}}, but it's not great to compromise the API so much for performance unless the performance change is really drastic. Could that be the case here? > Explore switching doc values to an iterator API > ----------------------------------------------- > > Key: LUCENE-7407 > URL: https://issues.apache.org/jira/browse/LUCENE-7407 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Labels: docValues > > I think it could be compelling if we restricted doc values to use an > iterator API at read time, instead of the more general random access > API we have today: > * It would make doc values disk usage more of a "you pay for what > what you actually use", like postings, which is a compelling > reduction for sparse usage. > * I think codecs could compress better and maybe speed up decoding > of doc values, even in the non-sparse case, since the read-time > API is more restrictive "forward only" instead of random access. > * We could remove {{getDocsWithField}} entirely, since that's > implicit in the iteration, and the awkward "return 0 if the > document didn't have this field" would go away. > * We can remove the annoying thread locals we must make today in > {{CodecReader}}, and close the trappy "I accidentally shared a > single XXXDocValues instance across threads", since an iterator is > inherently "use once". > * We could maybe leverage the numerous optimizations we've done for > postings over time, since the two problems ("iterate over doc ids > and store something interesting for each") are very similar. > This idea has come up many in the past, e.g. LUCENE-7253 is a recent > example, and very early iterations of doc values started with exactly > this ;) > However, it's a truly enormous change, likely 7.0 only. Or maybe we > could have the new iterator APIs also ported to 6.x side by side with > the deprecate existing random-access APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org