[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-06-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325926#comment-15325926
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit 8bd27977dd993d4443be359a6f7ec92c7f012247 in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8bd2797 ]

LUCENE-6766: add changes


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: master (7.0), 6.2
>
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-06-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325921#comment-15325921
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit 4740056f0987aef4eb727332d7ce9770964543c2 in lucene-solr's branch 
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4740056 ]

LUCENE-6766: fix parallel reader's detection of conflicting index sort


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-06-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325924#comment-15325924
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit a3270ac6e64012ec0a5b6864cdfcf190a1a36346 in lucene-solr's branch 
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a3270ac ]

LUCENE-6766: keep SortingMergePolicy for solr back-compat; fix Solr tests; fix 
precommit failures


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-06-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325919#comment-15325919
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit a4722befb3f878faa0a5ee9752ae21070c771cf2 in lucene-solr's branch 
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a4722be ]

LUCENE-6766: add deletions to random test


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-06-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325923#comment-15325923
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit 2f6cdea9a9ec3bb62cf0d111768969c2a6275276 in lucene-solr's branch 
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2f6cdea ]

LUCENE-6766: remove leftover sop


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-06-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325920#comment-15325920
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit 2703b827bf2316e8d39025666ed5f1d42ed70d64 in lucene-solr's branch 
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2703b82 ]

LUCENE-6766: resolve remaining nocommits; add more IW infoStream logging during 
merge


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-06-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325925#comment-15325925
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit ea26dd5855ec45dcdaa385dd240a6ef91aa1c4d9 in lucene-solr's branch 
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ea26dd5 ]

LUCENE-6766: finish 6.x backport


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-06-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325922#comment-15325922
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit 0dd65f6130dbcb1a9caae7963fed246c1068ebe0 in lucene-solr's branch 
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0dd65f6 ]

LUCENE-6766: more IW.infoStream logging around sorting; fix test bug


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-06-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325918#comment-15325918
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit 3010ffacafd5cc371f4d62413105294d0df37450 in lucene-solr's branch 
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3010ffa ]

LUCENE-6766: add another random test case; move early terminating collector to 
core


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-06-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325917#comment-15325917
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit c26bb87140eacbcdfa6c083a10714af275fe4ab6 in lucene-solr's branch 
refs/heads/branch_6x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c26bb87 ]

LUCENE-6766: simplify test case


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-13 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282602#comment-15282602
 ] 

Michael McCandless commented on LUCENE-6766:


I pushed this to master ... I will hold off on backporting to 6.x until we 
release 6.1, giving it time to bake.

I'll go open a bunch of followon issues now.

> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282594#comment-15282594
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit 5fb7413ccb9c690d3a59d7227b3cb194943290ef in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5fb7413 ]

LUCENE-6766: remove leftover sop


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282592#comment-15282592
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit e3ecc6a5361948c28679c7ac76161f167824e514 in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e3ecc6a ]

LUCENE-6766: merge master


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282597#comment-15282597
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit 9d5b834b09d4ff23e89755e5d1af407a2bd96c16 in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9d5b834 ]

LUCENE-6766: put Placeholder back so javadocs are OK; deprecate Lucene60Codec


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282588#comment-15282588
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit 87690f8b13b1def6c822ba36a42e4cb6939ab4c2 in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=87690f8 ]

LUCENE-6766: add another random test case; move early terminating collector to 
core


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282595#comment-15282595
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit 3cde9eb3d027b273a3c136e9eb284ae18f1824fe in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3cde9eb ]

LUCENE-6766: keep SortingMergePolicy for solr back-compat; fix Solr tests; fix 
precommit failures


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282593#comment-15282593
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit e283271aaf6da3033156f36b421d3241b5499d4e in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e283271 ]

LUCENE-6766: more IW.infoStream logging around sorting; fix test bug


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282596#comment-15282596
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit d715210467a4907ca34e7f0fe1a438908737894f in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=d715210 ]

LUCENE-6766: merged


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282590#comment-15282590
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit 1e82c13184621f6cefac35f8d10d8fe74d2a356c in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1e82c13 ]

LUCENE-6766: resolve remaining nocommits; add more IW infoStream logging during 
merge


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282591#comment-15282591
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit 8361de87becd64c8b217313877b996ac20167856 in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8361de8 ]

LUCENE-6766: fix parallel reader's detection of conflicting index sort


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282589#comment-15282589
 ] 

ASF subversion and git services commented on LUCENE-6766:
-

Commit fa37241e784e0479da1637f863e07f1d909f40a9 in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fa37241 ]

LUCENE-6766: add deletions to random test


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280234#comment-15280234
 ] 

Michael McCandless commented on LUCENE-6766:


I tried sorting with the 10M wikipedia index.

Sort by last-modified-date:

{noformat}
  Indexer: indexing done (900389 msec); total 1000 docs
  Indexer: force merge done (took 134020 msec)
{noformat}
 
Sort by title:

{noformat}
  Indexer: indexing done (907923 msec); total 1000 docs
  Indexer: force merge done (took 135041 msec)
{noformat}
 
vs. no sorting:

{noformat}
  Indexer: indexing done (702761 msec); total 1000 docs
  Indexer: force merge done (took 65726 msec)
{noformat}
 
Index size was about the same in all cases, ~3.1 GB.

I also confirmed CheckIndex verifies the sorted indices are OK (it checks the 
sort order).

So ~28% slower with sorting overall... but this uses a single thread, 
SerialMergeScheduler, and small IW buffer, so it's very merge-heavy.


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-11 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279829#comment-15279829
 ] 

Adrien Grand commented on LUCENE-6766:
--

+1

> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279810#comment-15279810
 ] 

Michael McCandless commented on LUCENE-6766:


I tested master vs patch indexing performance on luceneutil's "wikimedium10m" 
docs.  I ran indexing 5 times each.  This is just a "first do no harm test", 
i.e. in both cases I'm indexing without an index sort.

I use SMS, and frequent flushing, so this is a very merge-heavy benchmark.

Master:

{noformat}
/l/logs/before0.log:Indexer: finished (675550 msec)
/l/logs/before1.log:Indexer: finished (671058 msec)
/l/logs/before2.log:Indexer: finished (683297 msec)
/l/logs/before3.log:Indexer: finished (670856 msec)
/l/logs/before4.log:Indexer: finished (671516 msec)
{noformat}

Patch:

{noformat}
/l/logs/after0.log:Indexer: finished (673302 msec)
/l/logs/after1.log:Indexer: finished (674855 msec)
/l/logs/after2.log:Indexer: finished (679655 msec)
/l/logs/after3.log:Indexer: finished (680151 msec)
/l/logs/after4.log:Indexer: finished (681921 msec)
{noformat}

Net/net I think any performance hit is very small, well within measurement 
noise.


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277866#comment-15277866
 ] 

Michael McCandless commented on LUCENE-6766:


Thanks [~jpountz]!

I folded in most of your feedback, except:

bq. The only thing I am slightly worried about is how all optimized bulk 
mergers need to opt out if a sort order is configured. I am wondering if our 
base consumer classes should have two merge methods so that you would not have 
to check the sort order when overriding the method for regular merges? This is 
just an idea, it has drawbacks too since there would not be a single entry 
point to merging anymore and we would need another method in our API, but I'm 
suggesting it anyway hoping that it might give somebody a better idea.

I think it's OK to keep a single merge method?  This merge method
already must deal with wild per-segment variabilities, e.g. different
fields across segments, some have deletions some don't, etc., so I
don't think we need to single out "has an index sort" into a separate
method?

Also, implementing merge methods is really an uber-expert thing to
do, so such devs should be up to the task of handling an incoming
index sort, I think.

bq. I think this is buggy since it ignores null sorts at the beginning of the 
list but not at the end,

Nice catch!  I added test showing the bug, and then fixed it (pushed).

bq. Let's remove it for now and later see whether this is something that could 
be added back?

OK I did that.  I think at least there is a simple solution for doc-block
users: just index a doc values field with the "id" for each block, and
then sort on that.

bq.  but leveraging index sorting at search time looks like a big task to me so 
maybe we should defer it to a follow-up issue like sorting on flush?

I did move the early terminating to core, and I do think going forward
we should make it easier to use this ... it should somehow be the
default, and not a "make your own Collector" situation ...

As Rob has pointed out, even today (before promoting index sorting)
we could early-terminate in cases where the query is sorting on
index order, such as collecting first N hits for a filter.

But I agree we should do this separately.  I will open follow-on issues
for "can we sort on flush too" and "searching should take advantage
of index sort by default".

bq. Should DocIdMerger.Sub.nextDoc throw an IOException?

I tried this out, but it started to sprawl: the doc values all wrap
`DocIdMerger` under a java `Iterator` which cannot throw `IOException`
... I could move the `try/except` up there, but there are many places
I'd have to move this to, so leaving it where it is seemed like the
lesser evil.


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-09 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276107#comment-15276107
 ] 

Adrien Grand commented on LUCENE-6766:
--

This looks great!

{quote}
it does add some increase in code complexity, which I think is OK/contained.
{quote}

Agreed. The only thing I am slightly worried about is how all optimized bulk 
mergers need to opt out if a sort order is configured. I am wondering if our 
base consumer classes should have two merge methods so that you would not have 
to check the sort order when overriding the method for regular merges? This is 
just an idea, it has drawbacks too since there would not be a single entry 
point to merging anymore and we would need another method in our API, but I'm 
suggesting it anyway hoping that it might give somebody a better idea. :)

bq. but this is already an enormous change so I think we really have to look 
into "sort on flush" (which is hairy by itself) later, separately

+1

{code}
+// nocommit if index time sorting is in use, don't try to bulk merge ... later 
we can make crazy bulk merger that looks for long runs from
+// one sub?
{code}

Maybe this one could be made a simple TODO. I think it is totally fine if index 
sorting always bypasses optimized bulk mergers, at least for now? Since we are 
still pulling a merge instance, it should not be too bad (no worse than merging 
across different codecs)?

{code}
 // nocommit in the unsorted case, this should map correctly, e.g. apply per 
segment docBase
{code}

This seems to already be the case based on the code?

{code}
// nocommit isn't liveDocs redundant?  docMap returns -1 for us?
{code}

+1 I think it would be easier if this part of the code only used the docMap.

{code}
// nocommit is it sub's job to skip deleted docs?
{code}

I think it is since there is no mapped doc ID for deleted docs?

{code}
  // nocommit doesn't support index sorting?  or sorts must be the same?
  public void addIndexes(Directory... dirs) throws IOException {
{code}

Can we do like the nocommit on {{addIndexes(CodecReader...)}} suggests and just 
make sure that we cannot end up with segments that have different sort orders 
in the index?

{code}
// nocommit what about MergedReaderWrapper in here?
{code}

I think we should still wrap with MergedReaderWrapper? This will help stored 
fields if two documents from the same block are read consecutively (which could 
likely happen if the order in which docs are indexed is somehow correlated to 
the index sort, like if sorting by timestamp)?

{code}
+Sort indexSort = null;
+
 // build FieldInfos and fieldToReader map:
 for (final LeafReader reader : this.parallelReaders) {
+  if (indexSort == null) {
+indexSort = reader.getIndexSort();
+  } else if (indexSort.equals(reader.getIndexSort()) == false) {
+throw new IllegalArgumentException("cannot combine LeafReaders that 
have different index sorts: saw both sort=" + indexSort + " and " + 
reader.getIndexSort());
+  }
{code}

I think this is buggy since it ignores {{null}} sorts at the beginning of the 
list but not at the end, so the same list of readers may or may not raise an 
exception depending on the order in which readers are provided?

{code}
// nocommit does search time "do the right thing" automatically when segment is 
sorted?
{code}

Agreed it should. I see you also left nocommits about moving the 
early-terminating collectors from misc to core, but leveraging index sorting at 
search time looks like a big task to me so maybe we should defer it to a 
follow-up issue like sorting on flush?

{code}
// nocommit just do assertReaderEquals, don't use @BeforeClass, etc.?
{code}

+1!

{code}
--- 
trunk/lucene/misc/src/java/org/apache/lucene/search/BlockJoinComparatorSource.java
  2016-02-16 11:18:34.753021816 -0500
+++ 
indexsort/lucene/misc/src/java/org/apache/lucene/search/BlockJoinComparatorSource.java
  2016-05-06 19:17:29.893848515 -0400
@@ -20,13 +20,14 @@

+// nocommit what to do here?
{code}

Let's remove it for now and later see whether this is something that could be 
added back?

{code}
+@Override
+public int nextDoc() {
+  try {
+return postings.nextDoc();
+  } catch (IOException ioe) {
+throw new RuntimeException(ioe);
+  }
+}
{code}

Should DocIdMerger.Sub.nextDoc throw an IOException?

> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore mak

[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274915#comment-15274915
 ] 

Michael McCandless commented on LUCENE-6766:


And here's the same patch on github: 
https://github.com/apache/lucene-solr/compare/master...mikemccand:index_sort?expand=1

> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch, LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-05-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274734#comment-15274734
 ] 

Michael McCandless commented on LUCENE-6766:


I've been slowly iterating here and pushing changes to 
https://github.com/mikemccand/lucene-solr/tree/index_sort

There are tons of nocommits, but tests do pass, including index sorting tests 
(though they still need improving).

Some details:

  - I added a new {{DocIDMerger}} helper class, and the default merge impls use 
this to abstract away how to iterate the documents from the N sub-readers, 
whether they are simply concatenated or merge-sorted.  I think this should be 
quite a bit more efficient than what {{SortingMergePolicy}} does today, but it 
does add some increase in code complexity, which I think is OK/contained.

  - {{SlowCompositeReader}} is no longer used for index sorting

  - Points now work fine w/ index sorting

  - CheckIndex verifies the claimed per-segment index sort is in fact true

  - IW gets angry if you open an existing index with a different index sort

  - Only simple sort types are allowed; no CUSTOM, SCORE or REWRITEABLE

  - I made a new {{Lucene62Codec}}, with a new {{Lucene62SegmentInfoFormat}} 
that supports index sorting.

  - I added {{LeafReader.getIndexSort}} so apps can check if a given segment 
was sorted

  - I disable bulk merge optos when index sorting is present

IW flush still does not sort, and so at merge time we wrap such segments with 
{{SortingLeafReader}}.  This is quite ugly, that an index can have some 
segments sorted and some not sorted.  E.g. it means IW's check for whether the 
new index sort matches the existing one, is just best effort ... but this is 
already an enormous change so
I think we really have to look into "sort on flush" (which is hairy by itself) 
later, separately


> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-04-26 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258000#comment-15258000
 ] 

Adrien Grand commented on LUCENE-6766:
--

Please let me know if you need help!

> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-04-26 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257807#comment-15257807
 ] 

Michael McCandless commented on LUCENE-6766:


bq. Maybe we can abstract "concat vs merge sort" away

I'm exploring this and it looks like it may be a promising baby step, hopefully 
letting us stop using {{SlowCompositeReaderWrapper}} for index sorting...

> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-04-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255402#comment-15255402
 ] 

Michael McCandless commented on LUCENE-6766:


bq. I think a challenge to sorting flushed segments is how we write stored 
fields and term vectors directly to the directory at index time. We should 
somehow buffer them in memory and sort on flush when a non-default sort order 
is configured? Or do you see an easier way?

Hmm tricky.  Yeah, we could buffer in heap if IWC.indexSort is set, or ... we 
could just write as we do today, but then ask the codec for a stored fields 
(and term vectors) reader to do the sorting at flush time.

Or we separate "sorting on flushed segments" out for the future, keeping 
{{SortingLeafReader}}, since the rest of this is already plenty hard, and focus 
here on making merging more efficient (don't use 
{{SlowCompositeReaderWrapper}}?  I think it would mean fixing the default merge 
impls ... today they all assume they concatenate each segments document 
sequentially (mapping around deletions) but with indexSort in use, they just 
need to merge sort instead.  Maybe we can abstract "concat vs merge sort" away 
so that all default merge impls could re-use it ... seems like it could be 
fairly clean maybe. 

> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-04-23 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255384#comment-15255384
 ] 

Adrien Grand commented on LUCENE-6766:
--

I think a challenge to sorting flushed segments is how we write stored fields 
and term vectors directly to the directory at index time. We should somehow 
buffer them in memory and sort on flush when a non-default sort order is 
configured? Or do you see an easier way?

I agree merge sorting feels like the right approach to this problem. The reason 
why I used SlowCompositeReaderWrapper in the first place was that merging can 
be quite tricky and using SlowCompositeReaderWrapper allowed me to reuse the 
existing merging logic of all codec components. But it is likely less efficient 
like you said.



> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2016-04-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255371#comment-15255371
 ] 

Michael McCandless commented on LUCENE-6766:


This looks like a great patch!  Probably we can make {{SortingLeafReader}} 
private?

I think it's OK to restrict the allowed {{SortField}} that we need to support 
and serialize/deserialize?

Can we fix IW to insist on open that the incoming index sort matches whatever 
the current index has (if the current index exists)?

Since this patch, we moved {{SlowCompositeReaderWrapper}} out of core ... I 
wonder if we can 1) fix flush to also write new segments in correct sort order, 
and 2) fix default merge implementation to look at sort order?  Merging should 
be an efficient merge sort (vs. what {{SortingLeafReader}} on top of 
{{SlowCompositeReaderWrapper}} does today).

> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2015-11-21 Thread Elliott Bradshaw (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15020548#comment-15020548
 ] 

Elliott Bradshaw commented on LUCENE-6766:
--

This would be great!

+1

> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-6766.patch
>
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6766) Make index sorting a first-class citizen

2015-08-27 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717454#comment-14717454
 ] 

Michael McCandless commented on LUCENE-6766:


+1

> Make index sorting a first-class citizen
> 
>
> Key: LUCENE-6766
> URL: https://issues.apache.org/jira/browse/LUCENE-6766
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Today index sorting is a very expert feature. You need to use a custom merge 
> policy, custom collectors, etc. I would like to explore making it a 
> first-class citizen so that:
>  - the sort order could be configured on IndexWriterConfig
>  - segments would record the sort order that was used to write them
>  - IndexSearcher could automatically early terminate when computing top docs 
> on a sort order that is a prefix of the sort order of a segment (and if the 
> user is not interested in totalHits).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org