[ https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277866#comment-15277866 ]
Michael McCandless commented on LUCENE-6766: -------------------------------------------- Thanks [~jpountz]! I folded in most of your feedback, except: bq. The only thing I am slightly worried about is how all optimized bulk mergers need to opt out if a sort order is configured. I am wondering if our base consumer classes should have two merge methods so that you would not have to check the sort order when overriding the method for regular merges? This is just an idea, it has drawbacks too since there would not be a single entry point to merging anymore and we would need another method in our API, but I'm suggesting it anyway hoping that it might give somebody a better idea. I think it's OK to keep a single merge method? This merge method already must deal with wild per-segment variabilities, e.g. different fields across segments, some have deletions some don't, etc., so I don't think we need to single out "has an index sort" into a separate method? Also, implementing merge methods is really an uber-expert thing to do, so such devs should be up to the task of handling an incoming index sort, I think. bq. I think this is buggy since it ignores null sorts at the beginning of the list but not at the end, Nice catch! I added test showing the bug, and then fixed it (pushed). bq. Let's remove it for now and later see whether this is something that could be added back? OK I did that. I think at least there is a simple solution for doc-block users: just index a doc values field with the "id" for each block, and then sort on that. bq. but leveraging index sorting at search time looks like a big task to me so maybe we should defer it to a follow-up issue like sorting on flush? I did move the early terminating to core, and I do think going forward we should make it easier to use this ... it should somehow be the default, and not a "make your own Collector" situation ... As Rob has pointed out, even today (before promoting index sorting) we could early-terminate in cases where the query is sorting on index order, such as collecting first N hits for a filter. But I agree we should do this separately. I will open follow-on issues for "can we sort on flush too" and "searching should take advantage of index sort by default". bq. Should DocIdMerger.Sub.nextDoc throw an IOException? I tried this out, but it started to sprawl: the doc values all wrap `DocIdMerger` under a java `Iterator` which cannot throw `IOException` ... I could move the `try/except` up there, but there are many places I'd have to move this to, so leaving it where it is seemed like the lesser evil. > Make index sorting a first-class citizen > ---------------------------------------- > > Key: LUCENE-6766 > URL: https://issues.apache.org/jira/browse/LUCENE-6766 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Attachments: LUCENE-6766.patch, LUCENE-6766.patch > > > Today index sorting is a very expert feature. You need to use a custom merge > policy, custom collectors, etc. I would like to explore making it a > first-class citizen so that: > - the sort order could be configured on IndexWriterConfig > - segments would record the sort order that was used to write them > - IndexSearcher could automatically early terminate when computing top docs > on a sort order that is a prefix of the sort order of a segment (and if the > user is not interested in totalHits). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org