[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277866#comment-15277866
 ] 

Michael McCandless commented on LUCENE-6766:
--------------------------------------------

Thanks [~jpountz]!

I folded in most of your feedback, except:

bq. The only thing I am slightly worried about is how all optimized bulk 
mergers need to opt out if a sort order is configured. I am wondering if our 
base consumer classes should have two merge methods so that you would not have 
to check the sort order when overriding the method for regular merges? This is 
just an idea, it has drawbacks too since there would not be a single entry 
point to merging anymore and we would need another method in our API, but I'm 
suggesting it anyway hoping that it might give somebody a better idea.

I think it's OK to keep a single merge method?  This merge method
already must deal with wild per-segment variabilities, e.g. different
fields across segments, some have deletions some don't, etc., so I
don't think we need to single out "has an index sort" into a separate
method?

Also, implementing merge methods is really an uber-expert thing to
do, so such devs should be up to the task of handling an incoming
index sort, I think.

bq. I think this is buggy since it ignores null sorts at the beginning of the 
list but not at the end,

Nice catch!  I added test showing the bug, and then fixed it (pushed).

bq. Let's remove it for now and later see whether this is something that could 
be added back?

OK I did that.  I think at least there is a simple solution for doc-block
users: just index a doc values field with the "id" for each block, and
then sort on that.

bq.  but leveraging index sorting at search time looks like a big task to me so 
maybe we should defer it to a follow-up issue like sorting on flush?

I did move the early terminating to core, and I do think going forward
we should make it easier to use this ... it should somehow be the
default, and not a "make your own Collector" situation ...

As Rob has pointed out, even today (before promoting index sorting)
we could early-terminate in cases where the query is sorting on
index order, such as collecting first N hits for a filter.

But I agree we should do this separately.  I will open follow-on issues
for "can we sort on flush too" and "searching should take advantage
of index sort by default".

bq. Should DocIdMerger.Sub.nextDoc throw an IOException?

I tried this out, but it started to sprawl: the doc values all wrap
`DocIdMerger` under a java `Iterator` which cannot throw `IOException`
... I could move the `try/except` up there, but there are many places
I'd have to move this to, so leaving it where it is seemed like the
lesser evil.


> Make index sorting a first-class citizen
> ----------------------------------------
>
>                 Key: LUCENE-6766
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6766
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to