[
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280234#comment-15280234
]
Michael McCandless commented on LUCENE-6766:
--------------------------------------------
I tried sorting with the 10M wikipedia index.
Sort by last-modified-date:
{noformat}
Indexer: indexing done (900389 msec); total 10000000 docs
Indexer: force merge done (took 134020 msec)
{noformat}
Sort by title:
{noformat}
Indexer: indexing done (907923 msec); total 10000000 docs
Indexer: force merge done (took 135041 msec)
{noformat}
vs. no sorting:
{noformat}
Indexer: indexing done (702761 msec); total 10000000 docs
Indexer: force merge done (took 65726 msec)
{noformat}
Index size was about the same in all cases, ~3.1 GB.
I also confirmed CheckIndex verifies the sorted indices are OK (it checks the
sort order).
So ~28% slower with sorting overall... but this uses a single thread,
SerialMergeScheduler, and small IW buffer, so it's very merge-heavy.
> Make index sorting a first-class citizen
> ----------------------------------------
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge
> policy, custom collectors, etc. I would like to explore making it a
> first-class citizen so that:
> - the sort order could be configured on IndexWriterConfig
> - segments would record the sort order that was used to write them
> - IndexSearcher could automatically early terminate when computing top docs
> on a sort order that is a prefix of the sort order of a segment (and if the
> user is not interested in totalHits).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]