[jira] Updated: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-1960: -- Attachment: lucene-1960-1.patch It was as easy as changing this method in FieldsReader: {code:java} boolean canReadRawDocs() { // Disable reading raw docs in 2.x format, because of the removal of compressed // fields in 3.0. We don't want rawDocs() to decode field bits to figure out // if a field was compressed, hence we enforce ordinary (non-raw) stored field merges // for <3.0 indexes. return format >= FieldsWriter.FORMAT_LUCENE_3_0_NO_COMPRESSED_FIELDS; } {code} Uwe, I made some quick tests and it looks good. But I don't have any indexes with compressed fields (we don't use them), so I'll wait for you to test it out with your indexes that you mentioned. > Remove deprecated Field.Store.COMPRESS > -- > > Key: LUCENE-1960 > URL: https://issues.apache.org/jira/browse/LUCENE-1960 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.0 > > Attachments: lucene-1960-1.patch, lucene-1960-1.patch, > lucene-1960.patch > > > Also remove FieldForMerge and related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769531#action_12769531 ] Uwe Schindler commented on LUCENE-1960: --- I have some large indexes here from 2.9 with compressed XML documents in stored fields. I can compare the optimization time for Lucene 2.9 and Lucene 3.0 with your patch. > Remove deprecated Field.Store.COMPRESS > -- > > Key: LUCENE-1960 > URL: https://issues.apache.org/jira/browse/LUCENE-1960 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.0 > > Attachments: lucene-1960-1.patch, lucene-1960.patch > > > Also remove FieldForMerge and related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769527#action_12769527 ] Michael Busch commented on LUCENE-1960: --- Yes, I believe this would work. > Remove deprecated Field.Store.COMPRESS > -- > > Key: LUCENE-1960 > URL: https://issues.apache.org/jira/browse/LUCENE-1960 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.0 > > Attachments: lucene-1960-1.patch, lucene-1960.patch > > > Also remove FieldForMerge and related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769528#action_12769528 ] Uwe Schindler commented on LUCENE-1960: --- Then +1 from me! > Remove deprecated Field.Store.COMPRESS > -- > > Key: LUCENE-1960 > URL: https://issues.apache.org/jira/browse/LUCENE-1960 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.0 > > Attachments: lucene-1960-1.patch, lucene-1960.patch > > > Also remove FieldForMerge and related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2006) Optimization for FieldDocSortedHitQueue
[ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved LUCENE-2006. --- Resolution: Fixed Committed revision: 829274 Thanks Mark for perf testing! > Optimization for FieldDocSortedHitQueue > --- > > Key: LUCENE-2006 > URL: https://issues.apache.org/jira/browse/LUCENE-2006 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 3.0 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 3.0 > > Attachments: LUCENE-2006.patch > > > When updating core for generics, I found the following as a optimization of > FieldDocSortedHitQueue: > All FieldDoc values are Compareables (also the score or docid, if they > appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code > of lessThan seems very ineffective, as it has a big switch statement on the > SortField type, then casts the value to the underlying numeric type Object, > calls Number.xxxValue() & co for it and then compares manually. As > j.l.Number is itself Comparable, I see no reason to do this. Just call > compareTo on the Comparable interface and we are happy. The big deal is that > it prevents casting and the two method calls xxxValue(), as Number.compareTo > works more efficient internally. > The only special cases are String sort, where the Locale may be used and the > score sorting which is backwards. But these are two if statements instead of > the whole switch. > I had not tested it now for performance, but in my opinion it should be > faster for MultiSearchers. All tests still pass (because they should). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2006) Optimization for FieldDocSortedHitQueue
[ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769522#action_12769522 ] Uwe Schindler commented on LUCENE-2006: --- OK, I commit soon! > Optimization for FieldDocSortedHitQueue > --- > > Key: LUCENE-2006 > URL: https://issues.apache.org/jira/browse/LUCENE-2006 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 3.0 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 3.0 > > Attachments: LUCENE-2006.patch > > > When updating core for generics, I found the following as a optimization of > FieldDocSortedHitQueue: > All FieldDoc values are Compareables (also the score or docid, if they > appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code > of lessThan seems very ineffective, as it has a big switch statement on the > SortField type, then casts the value to the underlying numeric type Object, > calls Number.xxxValue() & co for it and then compares manually. As > j.l.Number is itself Comparable, I see no reason to do this. Just call > compareTo on the Comparable interface and we are happy. The big deal is that > it prevents casting and the two method calls xxxValue(), as Number.compareTo > works more efficient internally. > The only special cases are String sort, where the Locale may be used and the > score sorting which is backwards. But these are two if statements instead of > the whole switch. > I had not tested it now for performance, but in my opinion it should be > faster for MultiSearchers. All tests still pass (because they should). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769520#action_12769520 ] Uwe Schindler commented on LUCENE-1960: --- So the idea is to raise the version number of the stored fields file by one in 3.0. All new or merged segments get this version number? When merging, for all versions before the actual one we do not use addRawDocuments() when copying contents. The current lucene-1960-1.patch stays unchanged. > Remove deprecated Field.Store.COMPRESS > -- > > Key: LUCENE-1960 > URL: https://issues.apache.org/jira/browse/LUCENE-1960 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.0 > > Attachments: lucene-1960-1.patch, lucene-1960.patch > > > Also remove FieldForMerge and related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2006) Optimization for FieldDocSortedHitQueue
[ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769521#action_12769521 ] Mark Miller commented on LUCENE-2006: - Right - I don't think we have to worry about things much over top 1000. And while I don't want to take the time to do top 4*64,000, for kicks I tried top 64,000 over a couple runs. It actually does show a 2-3% win with the new method once you get up that high ;) Its somethin' anyway :) > Optimization for FieldDocSortedHitQueue > --- > > Key: LUCENE-2006 > URL: https://issues.apache.org/jira/browse/LUCENE-2006 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 3.0 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 3.0 > > Attachments: LUCENE-2006.patch > > > When updating core for generics, I found the following as a optimization of > FieldDocSortedHitQueue: > All FieldDoc values are Compareables (also the score or docid, if they > appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code > of lessThan seems very ineffective, as it has a big switch statement on the > SortField type, then casts the value to the underlying numeric type Object, > calls Number.xxxValue() & co for it and then compares manually. As > j.l.Number is itself Comparable, I see no reason to do this. Just call > compareTo on the Comparable interface and we are happy. The big deal is that > it prevents casting and the two method calls xxxValue(), as Number.compareTo > works more efficient internally. > The only special cases are String sort, where the Locale may be used and the > score sorting which is backwards. But these are two if statements instead of > the whole switch. > I had not tested it now for performance, but in my opinion it should be > faster for MultiSearchers. All tests still pass (because they should). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769517#action_12769517 ] Michael Busch commented on LUCENE-1960: --- {quote} But this is only one-time. As soon as it is optimized it is fast again. Because of that I said, one could use a tool to enforce optimization or the new IndexSplitter can also do the copy old to new index. {quote} That's right, I'm just trying to make sure we all understand the consequences. Would be nice to know how much longer it takes though. If everyone else is ok with this approach I can work on a patch. > Remove deprecated Field.Store.COMPRESS > -- > > Key: LUCENE-1960 > URL: https://issues.apache.org/jira/browse/LUCENE-1960 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.0 > > Attachments: lucene-1960-1.patch, lucene-1960.patch > > > Also remove FieldForMerge and related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2006) Optimization for FieldDocSortedHitQueue
[ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769515#action_12769515 ] Uwe Schindler commented on LUCENE-2006: --- I think it is because you only merge the top 1000 docs into the HitQueue. The merging of the HQs at the end of search is simple, because it only merges the top n docs of each queue. You would only see a difference if you sort all hits. I think we can commit this, too. > Optimization for FieldDocSortedHitQueue > --- > > Key: LUCENE-2006 > URL: https://issues.apache.org/jira/browse/LUCENE-2006 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 3.0 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 3.0 > > Attachments: LUCENE-2006.patch > > > When updating core for generics, I found the following as a optimization of > FieldDocSortedHitQueue: > All FieldDoc values are Compareables (also the score or docid, if they > appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code > of lessThan seems very ineffective, as it has a big switch statement on the > SortField type, then casts the value to the underlying numeric type Object, > calls Number.xxxValue() & co for it and then compares manually. As > j.l.Number is itself Comparable, I see no reason to do this. Just call > compareTo on the Comparable interface and we are happy. The big deal is that > it prevents casting and the two method calls xxxValue(), as Number.compareTo > works more efficient internally. > The only special cases are String sort, where the Locale may be used and the > score sorting which is backwards. But these are two if statements instead of > the whole switch. > I had not tested it now for performance, but in my opinion it should be > faster for MultiSearchers. All tests still pass (because they should). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769511#action_12769511 ] Michael Busch commented on LUCENE-1960: --- {quote} Or better, we look into the FieldInfos of the segment-to-merge and look if there is the compressed flag set for one of the fields. {quote} For a second earlier I had the same idea - it would be the most convenient solution. BUT: bummer! no compressed flag in the fieldinfos... It's a bit per stored field *instance*. > Remove deprecated Field.Store.COMPRESS > -- > > Key: LUCENE-1960 > URL: https://issues.apache.org/jira/browse/LUCENE-1960 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.0 > > Attachments: lucene-1960-1.patch, lucene-1960.patch > > > Also remove FieldForMerge and related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2006) Optimization for FieldDocSortedHitQueue
[ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769509#action_12769509 ] Mark Miller commented on LUCENE-2006: - Okay Uwe - I took a 2 GB zipped Wiki dump and used a SkipDocTask to create four unique indices of 64,000 docs each. Then I ran a search matching all docs and sorting on title, taking the average of 1000 runs and recording that overage over a few times for each method. I tried topn's of 10, 100, and 1000. I couldn't measure a meaningful difference one way or the other. Lets do it. > Optimization for FieldDocSortedHitQueue > --- > > Key: LUCENE-2006 > URL: https://issues.apache.org/jira/browse/LUCENE-2006 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 3.0 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 3.0 > > Attachments: LUCENE-2006.patch > > > When updating core for generics, I found the following as a optimization of > FieldDocSortedHitQueue: > All FieldDoc values are Compareables (also the score or docid, if they > appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code > of lessThan seems very ineffective, as it has a big switch statement on the > SortField type, then casts the value to the underlying numeric type Object, > calls Number.xxxValue() & co for it and then compares manually. As > j.l.Number is itself Comparable, I see no reason to do this. Just call > compareTo on the Comparable interface and we are happy. The big deal is that > it prevents casting and the two method calls xxxValue(), as Number.compareTo > works more efficient internally. > The only special cases are String sort, where the Locale may be used and the > score sorting which is backwards. But these are two if statements instead of > the whole switch. > I had not tested it now for performance, but in my opinion it should be > faster for MultiSearchers. All tests still pass (because they should). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769510#action_12769510 ] Uwe Schindler commented on LUCENE-1960: --- But this is only one-time. As soon as it is optimized it is fast again. Because of that I said, one could use a tool to enforce optimization or the new IndexSplitter can also do the copy old to new index. > Remove deprecated Field.Store.COMPRESS > -- > > Key: LUCENE-1960 > URL: https://issues.apache.org/jira/browse/LUCENE-1960 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.0 > > Attachments: lucene-1960-1.patch, lucene-1960.patch > > > Also remove FieldForMerge and related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769507#action_12769507 ] Michael Busch commented on LUCENE-1960: --- {quote} Can't we detect that we're dealing w/ an older version segment and not use addRawDocuments when merging them (and uncompress when we merge)? {quote} So then any 2.x index (including 2.9) would not be merged in the optimized way with 3.x. I'm actually not even sure how much of a slowdown this is. Did you (or anyone else) ever measure that? > Remove deprecated Field.Store.COMPRESS > -- > > Key: LUCENE-1960 > URL: https://issues.apache.org/jira/browse/LUCENE-1960 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.0 > > Attachments: lucene-1960-1.patch, lucene-1960.patch > > > Also remove FieldForMerge and related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769503#action_12769503 ] Uwe Schindler commented on LUCENE-1960: --- Good idea, from where take the version? Or better, we look into the FieldInfos of the segment-to-merge and look if there is the compressed flag set for one of the fields. If yes, do not use addRawDocuments. It there the possibility to see this flag also or'ed segment-wise (like a field is omitNors is per-segment)? > Remove deprecated Field.Store.COMPRESS > -- > > Key: LUCENE-1960 > URL: https://issues.apache.org/jira/browse/LUCENE-1960 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.0 > > Attachments: lucene-1960-1.patch, lucene-1960.patch > > > Also remove FieldForMerge and related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769499#action_12769499 ] Michael Busch commented on LUCENE-1960: --- Right, because FieldsReader#rawDocs() does not decode the field bits, so it doesn't know which fields are compressed. If we want to change that it would have a significant negative performance impact on *all* stored fields. > Remove deprecated Field.Store.COMPRESS > -- > > Key: LUCENE-1960 > URL: https://issues.apache.org/jira/browse/LUCENE-1960 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.0 > > Attachments: lucene-1960-1.patch, lucene-1960.patch > > > Also remove FieldForMerge and related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769497#action_12769497 ] Michael McCandless commented on LUCENE-1960: Can't we detect that we're dealing w/ an older version segment and not use addRawDocuments when merging them (and uncompress when we merge)? > Remove deprecated Field.Store.COMPRESS > -- > > Key: LUCENE-1960 > URL: https://issues.apache.org/jira/browse/LUCENE-1960 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.0 > > Attachments: lucene-1960-1.patch, lucene-1960.patch > > > Also remove FieldForMerge and related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769494#action_12769494 ] Uwe Schindler commented on LUCENE-1960: --- And how about keeping the current lucene-1960-1.patch? It works for me as I exspected. The only problem is that we do not decompress the fields for sure on optimizing? > Remove deprecated Field.Store.COMPRESS > -- > > Key: LUCENE-1960 > URL: https://issues.apache.org/jira/browse/LUCENE-1960 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.0 > > Attachments: lucene-1960-1.patch, lucene-1960.patch > > > Also remove FieldForMerge and related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769490#action_12769490 ] Michael Busch commented on LUCENE-1960: --- I'm actually -1 for option 1). The whole implementation of addRawDocuments() would have to change, and the necessary changes would kind of defeat its purpose. If we do 2) nobody will be able to use an index that has compressed fields in 4.0 anymore, and to convert it they have to manually reindex (which might not always be possible). Of course our policy says that 4.0 must not be able to read <3.0 indexes anymore, however normally users can take a 2.x index, optimize it with 3.x, and then 4.0 can read it without problems. This wouldn't be possible with 2). > Remove deprecated Field.Store.COMPRESS > -- > > Key: LUCENE-1960 > URL: https://issues.apache.org/jira/browse/LUCENE-1960 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.0 > > Attachments: lucene-1960-1.patch, lucene-1960.patch > > > Also remove FieldForMerge and related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: 2.9.1
> Well, we should then have added it to 2.9.0 already. Normally we don't > introduce new APIs in bugfix releases. > > This could be a candidate for the backwards-compat break section: If you > have compressed fields you need to change your code, otherwise drop-in > will work. 2.9.1 already has such changes (see the recently closed issues about Version parameters in QueryParser and Analyzers). I still prefer simple decompressing compressed fields on merge, this is the best solution for easy migration. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1960) Remove deprecated Field.Store.COMPRESS
[ https://issues.apache.org/jira/browse/LUCENE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769489#action_12769489 ] Uwe Schindler commented on LUCENE-1960: --- If we want to stay with the current patch, we place a warning that indexes can suddenly get bigger on merges. We note this in changes.txt. If one wants to regenerate the index with the stored fields decompressed, he could simply use the IndexSplitter contrib module recently added. This command line tool uses addIndexes and therefore merges all segments into a new index. With option 1, they get decompressed. If somebody wants real compressed fields again, he has to write code and reindex using CompressableStringTools. > Remove deprecated Field.Store.COMPRESS > -- > > Key: LUCENE-1960 > URL: https://issues.apache.org/jira/browse/LUCENE-1960 > Project: Lucene - Java > Issue Type: Task >Reporter: Michael Busch >Assignee: Michael Busch >Priority: Minor > Fix For: 3.0 > > Attachments: lucene-1960-1.patch, lucene-1960.patch > > > Also remove FieldForMerge and related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: 2.9.1
On 10/23/09 3:19 PM, Uwe Schindler wrote: Open is still the problem with compressed fields (see LUCENE-1960), if we use option 3 (isCompressed() deprec method, we have to add it to 2.9, too -> I would not prefer this). See the issue for details. I do not want to add this method, as it would break bw compatibility in 2.9 if inserted there (but if it is in 3.0 it should also be available in 2.9). Otheriwse it breaks 3.0 which is also bad. At this point, I can say, if you have removed all deprecations from your code in 2.9, you can drop in the 3.0 JAR. Adding such a method is a hard break, because you cannot read compressed fields easily. Well, we should then have added it to 2.9.0 already. Normally we don't introduce new APIs in bugfix releases. This could be a candidate for the backwards-compat break section: If you have compressed fields you need to change your code, otherwise drop-in will work. Michael - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1942) NUM_THREADS is a static member of RunAddIndexesThreads and should be accessed in a static way
[ https://issues.apache.org/jira/browse/LUCENE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769485#action_12769485 ] Mark Miller commented on LUCENE-1942: - Well if the title is self explanatory as said, it just means he wants to switch access to that variable to a static way, rather than from a given instance. (ie RunIndexAccessThreads.NUM_THREADS rather than runIndexAccessThreads.NUM_THREADS). Purists ;) But its in the tests it looks ... and the only access' like that are now commented out that I see ... and the patch is way too big - but perhaps thats all binary cruft :) Its a nice mystery for now :) > NUM_THREADS is a static member of RunAddIndexesThreads and should be accessed > in a static way > - > > Key: LUCENE-1942 > URL: https://issues.apache.org/jira/browse/LUCENE-1942 > Project: Lucene - Java > Issue Type: Bug > Components: Other > Environment: Eclipse 3.4.2 >Reporter: Hasan Diwan >Priority: Trivial > Attachments: lucene.pat > > > The summary contains the problem. No further description needed, I don't > think. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: 2.9.1
> > Open is still the problem with compressed fields (see LUCENE-1960), if > we > > use option 3 (isCompressed() deprec method, we have to add it to 2.9, > too -> > > I would not prefer this). See the issue for details. I do not want to add this method, as it would break bw compatibility in 2.9 if inserted there (but if it is in 3.0 it should also be available in 2.9). Otheriwse it breaks 3.0 which is also bad. At this point, I can say, if you have removed all deprecations from your code in 2.9, you can drop in the 3.0 JAR. Adding such a method is a hard break, because you cannot read compressed fields easily. > Why would we have to add the method to 2.9.1? > > After that 3.0 is also almost finished, I have generics (almost) > everywhere > > in core, Parameter -> enum replacement, StringBuilder, varargs (not yet > > finished, I have to visit method signatures and add varargs where > possible). > > New Number() removed by valueOf(),... Some new defaults also need to be > > implemented. > > > > Also some final revisiting of generics should be done, there are some > > strange parts with collections where it is not clearly defined what's in > it. > > > > > > Thanks for all your hard work, Uwe! Thanks! - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1942) NUM_THREADS is a static member of RunAddIndexesThreads and should be accessed in a static way
[ https://issues.apache.org/jira/browse/LUCENE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769483#action_12769483 ] Michael McCandless commented on LUCENE-1942: bq. I do not understand the whole problem. We have no compilation problems, so whats wrong? Right, I was curious what could be wrong, so, I wanted to look at the patch... > NUM_THREADS is a static member of RunAddIndexesThreads and should be accessed > in a static way > - > > Key: LUCENE-1942 > URL: https://issues.apache.org/jira/browse/LUCENE-1942 > Project: Lucene - Java > Issue Type: Bug > Components: Other > Environment: Eclipse 3.4.2 >Reporter: Hasan Diwan >Priority: Trivial > Attachments: lucene.pat > > > The summary contains the problem. No further description needed, I don't > think. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: 2.9.1
On 10/23/09 3:00 PM, Uwe Schindler wrote: I try to get the rest of search deprecations away in 3.0, but then we should be sure, that there are no more such problems like with the posIncrement in QueryParser that need additional changes in 2.9.1 API. Maybe somebody can help me with the rest of LUCENE-1973, the rest is explain() in Scorer (hard to do because lots of references to this method even in core), then we have IndexSearcher.fieldSortDoTrackScores / IS.fieldSortDoMaxScore (which is simple I think - I even did not know that these settings existed) and last but not least the deprecated MultiValueSource. The hardest one is the first. After that all deprecations are removed, only some small things need to be solved like the overridesTokenStreamMethod in Analyzer (I would keep it in 3.0, as we cannot guarantee that every analyzer reuses tokenstreams unless we make all core/contrib analyzers final). And there are some deprecated classes to be removed in 4.0 when the support for old indexes is gone. Open is still the problem with compressed fields (see LUCENE-1960), if we use option 3 (isCompressed() deprec method, we have to add it to 2.9, too -> I would not prefer this). Why would we have to add the method to 2.9.1? After that 3.0 is also almost finished, I have generics (almost) everywhere in core, Parameter -> enum replacement, StringBuilder, varargs (not yet finished, I have to visit method signatures and add varargs where possible). New Number() removed by valueOf(),... Some new defaults also need to be implemented. Also some final revisiting of generics should be done, there are some strange parts with collections where it is not clearly defined what's in it. Thanks for all your hard work, Uwe! Michael Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Friday, October 23, 2009 11:27 PM To: java-dev@lucene.apache.org Subject: 2.9.1 OK we are now down to 0 issues!! It's been exciting :) Assuming nothing crops up over the weekend, I plan to start the release process on Monday. Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: 2.9.1
> On Fri, Oct 23, 2009 at 6:00 PM, Uwe Schindler wrote: > > I try to get the rest of search deprecations away in 3.0, but then we > should > > be sure, that there are no more such problems like with the posIncrement > in > > QueryParser that need additional changes in 2.9.1 API. > > That sounds like a big job that shouldn't hold up 2.9.1, which fixes > serious bugs in 2.9.0. > Removing all deprecations is essentially finishing 3.0 (by the > original plan at least). It's 95% done :-) Because of this I wrote this eMail. Uwe - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: 2.9.1
On Fri, Oct 23, 2009 at 6:00 PM, Uwe Schindler wrote: > I try to get the rest of search deprecations away in 3.0, but then we should > be sure, that there are no more such problems like with the posIncrement in > QueryParser that need additional changes in 2.9.1 API. That sounds like a big job that shouldn't hold up 2.9.1, which fixes serious bugs in 2.9.0. Removing all deprecations is essentially finishing 3.0 (by the original plan at least). -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1942) NUM_THREADS is a static member of RunAddIndexesThreads and should be accessed in a static way
[ https://issues.apache.org/jira/browse/LUCENE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769477#action_12769477 ] Uwe Schindler commented on LUCENE-1942: --- I do not understand the whole problem. We have no compilation problems, so whats wrong? > NUM_THREADS is a static member of RunAddIndexesThreads and should be accessed > in a static way > - > > Key: LUCENE-1942 > URL: https://issues.apache.org/jira/browse/LUCENE-1942 > Project: Lucene - Java > Issue Type: Bug > Components: Other > Environment: Eclipse 3.4.2 >Reporter: Hasan Diwan >Priority: Trivial > Attachments: lucene.pat > > > The summary contains the problem. No further description needed, I don't > think. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: 2.9.1
I try to get the rest of search deprecations away in 3.0, but then we should be sure, that there are no more such problems like with the posIncrement in QueryParser that need additional changes in 2.9.1 API. Maybe somebody can help me with the rest of LUCENE-1973, the rest is explain() in Scorer (hard to do because lots of references to this method even in core), then we have IndexSearcher.fieldSortDoTrackScores / IS.fieldSortDoMaxScore (which is simple I think - I even did not know that these settings existed) and last but not least the deprecated MultiValueSource. The hardest one is the first. After that all deprecations are removed, only some small things need to be solved like the overridesTokenStreamMethod in Analyzer (I would keep it in 3.0, as we cannot guarantee that every analyzer reuses tokenstreams unless we make all core/contrib analyzers final). And there are some deprecated classes to be removed in 4.0 when the support for old indexes is gone. Open is still the problem with compressed fields (see LUCENE-1960), if we use option 3 (isCompressed() deprec method, we have to add it to 2.9, too -> I would not prefer this). After that 3.0 is also almost finished, I have generics (almost) everywhere in core, Parameter -> enum replacement, StringBuilder, varargs (not yet finished, I have to visit method signatures and add varargs where possible). New Number() removed by valueOf(),... Some new defaults also need to be implemented. Also some final revisiting of generics should be done, there are some strange parts with collections where it is not clearly defined what's in it. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Friday, October 23, 2009 11:27 PM > To: java-dev@lucene.apache.org > Subject: 2.9.1 > > OK we are now down to 0 issues!! It's been exciting :) > > Assuming nothing crops up over the weekend, I plan to start the > release process on Monday. > > Mike > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1942) NUM_THREADS is a static member of RunAddIndexesThreads and should be accessed in a static way
[ https://issues.apache.org/jira/browse/LUCENE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769471#action_12769471 ] Mark Miller commented on LUCENE-1942: - Its not only a binary file, but it clearly says at the start that its an eclipse patch. > NUM_THREADS is a static member of RunAddIndexesThreads and should be accessed > in a static way > - > > Key: LUCENE-1942 > URL: https://issues.apache.org/jira/browse/LUCENE-1942 > Project: Lucene - Java > Issue Type: Bug > Components: Other > Environment: Eclipse 3.4.2 >Reporter: Hasan Diwan >Priority: Trivial > Attachments: lucene.pat > > > The summary contains the problem. No further description needed, I don't > think. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1942) NUM_THREADS is a static member of RunAddIndexesThreads and should be accessed in a static way
[ https://issues.apache.org/jira/browse/LUCENE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769468#action_12769468 ] Michael McCandless commented on LUCENE-1942: But the attachment here is a binary file -- I can't feed it to the "patch" command. If you have an svn command line client you should be able to just do "svn diff" > LUCENE-1924.patch and then post that. > NUM_THREADS is a static member of RunAddIndexesThreads and should be accessed > in a static way > - > > Key: LUCENE-1942 > URL: https://issues.apache.org/jira/browse/LUCENE-1942 > Project: Lucene - Java > Issue Type: Bug > Components: Other > Environment: Eclipse 3.4.2 >Reporter: Hasan Diwan >Priority: Trivial > Attachments: lucene.pat > > > The summary contains the problem. No further description needed, I don't > think. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1942) NUM_THREADS is a static member of RunAddIndexesThreads and should be accessed in a static way
[ https://issues.apache.org/jira/browse/LUCENE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769467#action_12769467 ] Robert Muir commented on LUCENE-1942: - i'm not able to read this patch file either... to me it looks malformed. (all on one line?) > NUM_THREADS is a static member of RunAddIndexesThreads and should be accessed > in a static way > - > > Key: LUCENE-1942 > URL: https://issues.apache.org/jira/browse/LUCENE-1942 > Project: Lucene - Java > Issue Type: Bug > Components: Other > Environment: Eclipse 3.4.2 >Reporter: Hasan Diwan >Priority: Trivial > Attachments: lucene.pat > > > The summary contains the problem. No further description needed, I don't > think. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1942) NUM_THREADS is a static member of RunAddIndexesThreads and should be accessed in a static way
[ https://issues.apache.org/jira/browse/LUCENE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769466#action_12769466 ] Hasan Diwan commented on LUCENE-1942: - Patch files are independent of the platform they were generated on. Indeed, I generated the patch using svn diff, I merely use eclipse as a glorified text editor. > NUM_THREADS is a static member of RunAddIndexesThreads and should be accessed > in a static way > - > > Key: LUCENE-1942 > URL: https://issues.apache.org/jira/browse/LUCENE-1942 > Project: Lucene - Java > Issue Type: Bug > Components: Other > Environment: Eclipse 3.4.2 >Reporter: Hasan Diwan >Priority: Trivial > Attachments: lucene.pat > > > The summary contains the problem. No further description needed, I don't > think. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
2.9.1
OK we are now down to 0 issues!! It's been exciting :) Assuming nothing crops up over the weekend, I plan to start the release process on Monday. Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Call the authorities
OK, done! Mike On Fri, Oct 23, 2009 at 5:18 PM, Uwe Schindler wrote: > I think I was the guy that removed everything from this testcase :-) I > should have removed it after removing the old serach API. > > You can also remove it from BW branch, but I think a new tag is not needed > now, this can wait until the next big bw commit. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > >> -Original Message- >> From: Michael McCandless [mailto:luc...@mikemccandless.com] >> Sent: Friday, October 23, 2009 11:15 PM >> To: java-dev@lucene.apache.org >> Subject: Re: Call the authorities >> >> I'll just remove TestStressSort... >> >> Mike >> >> On Mon, Oct 19, 2009 at 6:03 AM, Michael McCandless >> wrote: >> > Indeed! It's doing nothing now. Just creating Sort objects but not >> > in fact doing any searching with them. Hmm. >> > >> > Unfortunately, the test very much relied on the deprecated >> > "setUseLegacySearch" API, to compare old vs new sorting. I suppose >> > its time has past, given that it has had a good amount of time, now, >> > to assert that old and new were producing identical results. >> > >> > Should we just remove it? >> > >> > Mike >> > >> > On Sun, Oct 18, 2009 at 11:20 PM, Mark Miller >> wrote: >> >> Mark Miller wrote: >> >>> TestStressSort has been butchered. >> >>> >> >>> >> >> I suppose we could just pull it since it wouldn't check for much any >> >> more - looks awful funny as is. >> >> >> >> -- >> >> - Mark >> >> >> >> http://www.lucidimagination.com >> >> >> >> >> >> >> >> >> >> - >> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >> >> >> > >> >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1942) NUM_THREADS is a static member of RunAddIndexesThreads and should be accessed in a static way
[ https://issues.apache.org/jira/browse/LUCENE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769456#action_12769456 ] Michael McCandless commented on LUCENE-1942: Can you make a simple patch for this? (I don't use Eclipse). Thanks. > NUM_THREADS is a static member of RunAddIndexesThreads and should be accessed > in a static way > - > > Key: LUCENE-1942 > URL: https://issues.apache.org/jira/browse/LUCENE-1942 > Project: Lucene - Java > Issue Type: Bug > Components: Other > Environment: Eclipse 3.4.2 >Reporter: Hasan Diwan >Priority: Trivial > Attachments: lucene.pat > > > The summary contains the problem. No further description needed, I don't > think. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: Call the authorities
I think I was the guy that removed everything from this testcase :-) I should have removed it after removing the old serach API. You can also remove it from BW branch, but I think a new tag is not needed now, this can wait until the next big bw commit. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Friday, October 23, 2009 11:15 PM > To: java-dev@lucene.apache.org > Subject: Re: Call the authorities > > I'll just remove TestStressSort... > > Mike > > On Mon, Oct 19, 2009 at 6:03 AM, Michael McCandless > wrote: > > Indeed! It's doing nothing now. Just creating Sort objects but not > > in fact doing any searching with them. Hmm. > > > > Unfortunately, the test very much relied on the deprecated > > "setUseLegacySearch" API, to compare old vs new sorting. I suppose > > its time has past, given that it has had a good amount of time, now, > > to assert that old and new were producing identical results. > > > > Should we just remove it? > > > > Mike > > > > On Sun, Oct 18, 2009 at 11:20 PM, Mark Miller > wrote: > >> Mark Miller wrote: > >>> TestStressSort has been butchered. > >>> > >>> > >> I suppose we could just pull it since it wouldn't check for much any > >> more - looks awful funny as is. > >> > >> -- > >> - Mark > >> > >> http://www.lucidimagination.com > >> > >> > >> > >> > >> - > >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-dev-h...@lucene.apache.org > >> > >> > > > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2003) Highlighter has problems when you use StandardAnalyzer with LUCENE_29 or simplier StopFilter with stopWordsPosIncr mode switched on
[ https://issues.apache.org/jira/browse/LUCENE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved LUCENE-2003. - Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [New]) > Highlighter has problems when you use StandardAnalyzer with LUCENE_29 or > simplier StopFilter with stopWordsPosIncr mode switched on > --- > > Key: LUCENE-2003 > URL: https://issues.apache.org/jira/browse/LUCENE-2003 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.9, 3.0 >Reporter: Uwe Schindler >Assignee: Mark Miller > Fix For: 2.9.1, 3.0 > > Attachments: LUCENE-2003.patch, LUCENE-2003.patch > > > This is a followup on LUCENE-1987: > If you set in HighligterTest the constant static final Version TEST_VERSION = > Version.LUCENE_24 to LUCENE_29 or LUCENE_CURRENT, the test > testSimpleQueryScorerPhraseHighlighting fails. Please note, that currently > (before LUCENE-2002 is fixed), you must also set the QueryParser to respect > posIncr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Call the authorities
I'll just remove TestStressSort... Mike On Mon, Oct 19, 2009 at 6:03 AM, Michael McCandless wrote: > Indeed! It's doing nothing now. Just creating Sort objects but not > in fact doing any searching with them. Hmm. > > Unfortunately, the test very much relied on the deprecated > "setUseLegacySearch" API, to compare old vs new sorting. I suppose > its time has past, given that it has had a good amount of time, now, > to assert that old and new were producing identical results. > > Should we just remove it? > > Mike > > On Sun, Oct 18, 2009 at 11:20 PM, Mark Miller wrote: >> Mark Miller wrote: >>> TestStressSort has been butchered. >>> >>> >> I suppose we could just pull it since it wouldn't check for much any >> more - looks awful funny as is. >> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> >> >> >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: contrib and lucene 3.0
I think we should allow new features into contrib for 3.0. I don't even like holding new features from core for 3.0. In general I don't think it's healthy when trunk is locked down Trunk should be like a locomotive that's plowing ahead at all times. Mike On Thu, Oct 22, 2009 at 1:48 PM, Robert Muir wrote: > Hi, > > What is the consensus on new features for contrib for Lucene 3.0? I know > that for core, its mostly a java 5 upgrade and deprecation removal. > > I want to make sure LUCENE-1606 is set to the right version, but I figured > its really not just about that specific issue, I would like to know the > plans in general. > > Thanks, > Robert > > -- > Robert Muir > rcm...@gmail.com > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2002. Resolution: Fixed > Add oal.util.Version ctor to QueryParser > > > Key: LUCENE-2002 > URL: https://issues.apache.org/jira/browse/LUCENE-2002 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.9, 3.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 2.9.1 > > Attachments: LUCENE-2002-29.patch, LUCENE-2002-29.patch, > LUCENE-2002-29.patch, LUCENE-2002.patch > > > This is a followup of LUCENE-1987: > If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses > QueryParser, phrase queries will not work, because the StopFilter enables > position Increments for stop words, but QueryParser ignores them per default. > The user has to explicitely enable them. > This issue would add a ctor taking the Version constant and automatically > enable this setting. The same applies to the contrib queryparser. Eventually > also StopAnalyzer should add this version ctor. > To be able to remove the default ctor for 3.0 (to remove a possible trap for > users of QueryParser), it must be deprecated and the new one also added to > 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2003) Highlighter has problems when you use StandardAnalyzer with LUCENE_29 or simplier StopFilter with stopWordsPosIncr mode switched on
[ https://issues.apache.org/jira/browse/LUCENE-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769452#action_12769452 ] Michael McCandless commented on LUCENE-2003: Mark is this one done? > Highlighter has problems when you use StandardAnalyzer with LUCENE_29 or > simplier StopFilter with stopWordsPosIncr mode switched on > --- > > Key: LUCENE-2003 > URL: https://issues.apache.org/jira/browse/LUCENE-2003 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.9, 3.0 >Reporter: Uwe Schindler >Assignee: Mark Miller > Fix For: 2.9.1, 3.0 > > Attachments: LUCENE-2003.patch, LUCENE-2003.patch > > > This is a followup on LUCENE-1987: > If you set in HighligterTest the constant static final Version TEST_VERSION = > Version.LUCENE_24 to LUCENE_29 or LUCENE_CURRENT, the test > testSimpleQueryScorerPhraseHighlighting fails. Please note, that currently > (before LUCENE-2002 is fixed), you must also set the QueryParser to respect > posIncr. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene 2.9 sorting algorithm
They are included in my last patch on LUCENE-1997. It's somewhat hacked up though :) We'd have to redo it "for real" if we go forward with this... Mike 2009/10/23 John Wang : > Hi Mike: > Thank you! It would be really nice to get the optimizations you have > done. > -John > > 2009/10/23 Michael McCandless >> >> Agreed: so far I'm seeing serious performance loss with MultiPQ, >> especially as topN gets larger, and for int sorting. >> >> For small queue, String sort, it sometimes wins. >> >> So if I were forced to decide now based on the current results, I >> think we should keep the single PQ API. >> >> But: I am right now optimizing John's patch to see how fast Multi PQ >> can get. I'll post it once I get it working, and post output from >> re-running on my opensolaris box. >> >> Mike >> >> 2009/10/23 Mark Miller : >> >>>I still think we should if performance is no >> >>>better with the new one. >> > >> > Where is there any indication performance is not better with the new >> > one? >> > >> > The benchmarks are clearly against switching back. At best they could >> > argue for two API's - even then it depends - a loss of 10% on Java 1.5 >> > with the most recent linux for a topn:10 ? I'm all for more results, but >> > its not looking like a good switch to me. What API do I use? Well, it >> > depends - how many docs will you ask for back, what OS are running, how >> > hard >> > is it for you to grok one API over the other? >> > >> > And then as we make changes in the future we have to manage both APIs. >> > >> > bq. digging in deep and running thorough perf tests makes sense >> > >> > Again - no one is arguing against - dig all year - I'll help - but I >> > don't see the treasure yet, and the hole is starting to look deep. >> > >> > bq. removing that if from the Multi PQ patch makes sense >> > >> > I didn't have a problem with that either - or other code changes - but >> > jeeze, mention what you are seeing with the switch. I'll tell you what I >> > saw it - not that much - a bit of improvement, but take a look at the >> > Java 1.5 run - it ended up being a blade of grass holding up a boulder >> > on Linux. >> > >> > >> > >> > Michael McCandless wrote: >> >> Sheesh I go to bed and so much all of a sudden happens!! >> >> >> >> Sorry Mark; I should've called out "PATCH IS ON 2.9 BRANCH" more >> >> clearly ;) >> >> >> >> There's no question in my mind that the new comparator API is more >> >> complex than the old one, and I really don't like that. I had to >> >> rewrite the section of LIA that gives an example of a [simple] custom >> >> sort and it wasn't pleasant! Two compare methods (compare, >> >> compareBottom)? Two copy methods (copy, setBottom)? Sure, you can >> >> grok it and get through it if you have to, but it is more complex >> >> because it's conflated with the PQ API. >> >> >> >> Ease on consumption of our APIs is very important, so, only when >> >> performance clearly warrants it should we adopt a more complex API. >> >> >> >> Also, yeah, it would suck to have to switch back to the old API at >> >> this point, but net/net I still think we should if performance is no >> >> better with the new one. >> >> >> >> The old API also fits cleanly with per-segment searching (John's >> >> initial patch shows that -- it's simply another per-segment Colletor). >> >> The two APIs (collection, comparator) are well decoupled. >> >> >> >> So, digging in deep and running thorough perf tests makes sense; we >> >> need to understand the performance to make the API switch decision. >> >> And definitely we should tune both approaches as much as possible >> >> (removing that if from the Multi PQ patch makes sense). >> >> >> >> But... Multi PQ's performance isn't better in many cases... though, >> >> we're clearly still iterating. I'll run a 1.5 (32 & 64 bit) test, >> >> with the if statement removed. >> >> >> >> Mike >> >> >> >> On Fri, Oct 23, 2009 at 3:53 AM, Earwin Burrfoot >> >> wrote: >> >> >> >>> I did. >> >>> >> >>> On Fri, Oct 23, 2009 at 09:05, Jake Mannix >> >>> wrote: >> >>> >> On Thu, Oct 22, 2009 at 9:58 PM, Mark Miller >> wrote: >> >> > Yes - I've seen a handful of non core devs report back that they >> > upgraded with no complaints on the difficulty. Its in the mailing >> > list >> > archives. The only core dev I've seen say its easy is Uwe. He's >> > super >> > sharp though, so I wasn't banking my comment on him ;) >> > >> Upgrade custom sorting? Where has anyone talked about this? >> >> 2.9 is great, I like the new apis, they're great in general. It's >> just this >> multi-segment sorting we're talking about here. >> >> -jake >> >> >> >> >>> >> >>> -- >> >>> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) >> >>> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 >> >>> ICQ: 104465785 >> >>> >> >>> - >> >>> To uns
Re: lucene 2.9 sorting algorithm
Hi Mike: Thank you! It would be really nice to get the optimizations you have done. -John 2009/10/23 Michael McCandless > Agreed: so far I'm seeing serious performance loss with MultiPQ, > especially as topN gets larger, and for int sorting. > > For small queue, String sort, it sometimes wins. > > So if I were forced to decide now based on the current results, I > think we should keep the single PQ API. > > But: I am right now optimizing John's patch to see how fast Multi PQ > can get. I'll post it once I get it working, and post output from > re-running on my opensolaris box. > > Mike > > 2009/10/23 Mark Miller : > >>>I still think we should if performance is no > >>>better with the new one. > > > > Where is there any indication performance is not better with the new one? > > > > The benchmarks are clearly against switching back. At best they could > argue for two API's - even then it depends - a loss of 10% on Java 1.5 > > with the most recent linux for a topn:10 ? I'm all for more results, but > its not looking like a good switch to me. What API do I use? Well, it > depends - how many docs will you ask for back, what OS are running, how hard > is it for you to grok one API over the other? > > > > And then as we make changes in the future we have to manage both APIs. > > > > bq. digging in deep and running thorough perf tests makes sense > > > > Again - no one is arguing against - dig all year - I'll help - but I > don't see the treasure yet, and the hole is starting to look deep. > > > > bq. removing that if from the Multi PQ patch makes sense > > > > I didn't have a problem with that either - or other code changes - but > > jeeze, mention what you are seeing with the switch. I'll tell you what I > > saw it - not that much - a bit of improvement, but take a look at the > > Java 1.5 run - it ended up being a blade of grass holding up a boulder > > on Linux. > > > > > > > > Michael McCandless wrote: > >> Sheesh I go to bed and so much all of a sudden happens!! > >> > >> Sorry Mark; I should've called out "PATCH IS ON 2.9 BRANCH" more > >> clearly ;) > >> > >> There's no question in my mind that the new comparator API is more > >> complex than the old one, and I really don't like that. I had to > >> rewrite the section of LIA that gives an example of a [simple] custom > >> sort and it wasn't pleasant! Two compare methods (compare, > >> compareBottom)? Two copy methods (copy, setBottom)? Sure, you can > >> grok it and get through it if you have to, but it is more complex > >> because it's conflated with the PQ API. > >> > >> Ease on consumption of our APIs is very important, so, only when > >> performance clearly warrants it should we adopt a more complex API. > >> > >> Also, yeah, it would suck to have to switch back to the old API at > >> this point, but net/net I still think we should if performance is no > >> better with the new one. > >> > >> The old API also fits cleanly with per-segment searching (John's > >> initial patch shows that -- it's simply another per-segment Colletor). > >> The two APIs (collection, comparator) are well decoupled. > >> > >> So, digging in deep and running thorough perf tests makes sense; we > >> need to understand the performance to make the API switch decision. > >> And definitely we should tune both approaches as much as possible > >> (removing that if from the Multi PQ patch makes sense). > >> > >> But... Multi PQ's performance isn't better in many cases... though, > >> we're clearly still iterating. I'll run a 1.5 (32 & 64 bit) test, > >> with the if statement removed. > >> > >> Mike > >> > >> On Fri, Oct 23, 2009 at 3:53 AM, Earwin Burrfoot > wrote: > >> > >>> I did. > >>> > >>> On Fri, Oct 23, 2009 at 09:05, Jake Mannix > wrote: > >>> > On Thu, Oct 22, 2009 at 9:58 PM, Mark Miller > wrote: > > > Yes - I've seen a handful of non core devs report back that they > > upgraded with no complaints on the difficulty. Its in the mailing > list > > archives. The only core dev I've seen say its easy is Uwe. He's super > > sharp though, so I wasn't banking my comment on him ;) > > > Upgrade custom sorting? Where has anyone talked about this? > > 2.9 is great, I like the new apis, they're great in general. It's > just this > multi-segment sorting we're talking about here. > > -jake > > > > >>> > >>> -- > >>> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) > >>> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 > >>> ICQ: 104465785 > >>> > >>> - > >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org > >>> > >>> > >>> > >> > >> - > >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-dev-h...@lucene.apache
Re: svn:mergeinfo prop
Ahhh, ok, that makes sense. Mike 2009/10/23 Uwe Schindler : > Ist very easy to explain: > > The mergeinfo is inherited from top level directories downto each file. If > one of the files already contained a mergeinfo in its properties (e.g. the > TestBackwardsCompatibility), because it was merged separately (I > reverse-merged this test as a separate action during my initial test > editing, you know it was deleted). This file-specific mergeinfo overrides > the one from the directoy. > > If you then add a new mergeinfo to a top-level directoy (like you did), the > files/subdirs with a separate mergeinfo need to be updated, too. Because of > this you see the spurious mergeinfos in unchanged files. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > >> -Original Message- >> From: Michael McCandless [mailto:luc...@mikemccandless.com] >> Sent: Friday, October 23, 2009 10:19 PM >> To: java-dev@lucene.apache.org >> Subject: Re: svn:mergeinfo prop >> >> OK thanks for the pointer :) It's very strange indeed. >> >> Mike >> >> On Fri, Oct 23, 2009 at 2:12 PM, Earwin Burrfoot wrote: >> > It's okay in a sense. >> > See, svn's merge-tracking support was grafted onto it in a particulary >> > hideous way and is really hairy on the insides. >> > So while there's no sane explanation for that behaviour, it is expected. >> > >> > See - http://svnbook.red- >> bean.com/en/1.5/svn.branchmerge.advanced.html#svn.branchmerge.advanced.fin >> alword >> > >> > On Fri, Oct 23, 2009 at 21:55, Michael McCandless >> > wrote: >> >> I've noticed recently when merging from 2.9.x -> trunk or vice/versa, >> >> for some reason it picks up files that had zero source changes in the >> >> revision I merged, but do show changes to their svn:mergeinfo. >> >> >> >> EG for LUCENE-2002, I merged 2.9.x -> trunk, and now on my trunk >> >> checkout I see this mods: >> >> >> >> Property changes on: >> >> src/test/org/apache/lucene/index/TestBackwardsCompatibility.java >> >> ___ >> >> Modified: svn:mergeinfo >> >> Merged >> /lucene/java/branches/lucene_2_9/src/test/org/apache/lucene/index/TestBack >> wardsCompatibility.java:r829134 >> >> >> >> >> >> Property changes on: >> src/test/org/apache/lucene/document/TestNumberTools.java >> >> ___ >> >> Modified: svn:mergeinfo >> >> Merged >> /lucene/java/branches/lucene_2_9/src/test/org/apache/lucene/document/TestN >> umberTools.java:r829134 >> >> >> >> But, the commit for LUCENE-2002 did not touch these files. Does >> >> anyone know why it's doing this? Is it OK? >> >> >> >> Mike >> >> >> >> - >> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >> >> >> > >> > >> > >> > -- >> > Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) >> > Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 >> > ICQ: 104465785 >> > >> > - >> > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-dev-h...@lucene.apache.org >> > >> > >> >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769412#action_12769412 ] Michael McCandless commented on LUCENE-2002: bq. So the QueryParser ctors should also use the TEST_VERSION constant, so that it can be easily updated to do the testing for various versions. Good idea -- I'll fix! > Add oal.util.Version ctor to QueryParser > > > Key: LUCENE-2002 > URL: https://issues.apache.org/jira/browse/LUCENE-2002 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.9, 3.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 2.9.1 > > Attachments: LUCENE-2002-29.patch, LUCENE-2002-29.patch, > LUCENE-2002-29.patch, LUCENE-2002.patch > > > This is a followup of LUCENE-1987: > If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses > QueryParser, phrase queries will not work, because the StopFilter enables > position Increments for stop words, but QueryParser ignores them per default. > The user has to explicitely enable them. > This issue would add a ctor taking the Version constant and automatically > enable this setting. The same applies to the contrib queryparser. Eventually > also StopAnalyzer should add this version ctor. > To be able to remove the default ctor for 3.0 (to remove a possible trap for > users of QueryParser), it must be deprecated and the new one also added to > 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: svn:mergeinfo prop
Ist very easy to explain: The mergeinfo is inherited from top level directories downto each file. If one of the files already contained a mergeinfo in its properties (e.g. the TestBackwardsCompatibility), because it was merged separately (I reverse-merged this test as a separate action during my initial test editing, you know it was deleted). This file-specific mergeinfo overrides the one from the directoy. If you then add a new mergeinfo to a top-level directoy (like you did), the files/subdirs with a separate mergeinfo need to be updated, too. Because of this you see the spurious mergeinfos in unchanged files. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Friday, October 23, 2009 10:19 PM > To: java-dev@lucene.apache.org > Subject: Re: svn:mergeinfo prop > > OK thanks for the pointer :) It's very strange indeed. > > Mike > > On Fri, Oct 23, 2009 at 2:12 PM, Earwin Burrfoot wrote: > > It's okay in a sense. > > See, svn's merge-tracking support was grafted onto it in a particulary > > hideous way and is really hairy on the insides. > > So while there's no sane explanation for that behaviour, it is expected. > > > > See - http://svnbook.red- > bean.com/en/1.5/svn.branchmerge.advanced.html#svn.branchmerge.advanced.fin > alword > > > > On Fri, Oct 23, 2009 at 21:55, Michael McCandless > > wrote: > >> I've noticed recently when merging from 2.9.x -> trunk or vice/versa, > >> for some reason it picks up files that had zero source changes in the > >> revision I merged, but do show changes to their svn:mergeinfo. > >> > >> EG for LUCENE-2002, I merged 2.9.x -> trunk, and now on my trunk > >> checkout I see this mods: > >> > >> Property changes on: > >> src/test/org/apache/lucene/index/TestBackwardsCompatibility.java > >> ___ > >> Modified: svn:mergeinfo > >> Merged > /lucene/java/branches/lucene_2_9/src/test/org/apache/lucene/index/TestBack > wardsCompatibility.java:r829134 > >> > >> > >> Property changes on: > src/test/org/apache/lucene/document/TestNumberTools.java > >> ___ > >> Modified: svn:mergeinfo > >> Merged > /lucene/java/branches/lucene_2_9/src/test/org/apache/lucene/document/TestN > umberTools.java:r829134 > >> > >> But, the commit for LUCENE-2002 did not touch these files. Does > >> anyone know why it's doing this? Is it OK? > >> > >> Mike > >> > >> - > >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-dev-h...@lucene.apache.org > >> > >> > > > > > > > > -- > > Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) > > Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 > > ICQ: 104465785 > > > > - > > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > > > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769406#action_12769406 ] Uwe Schindler commented on LUCENE-2002: --- A mega patch, one thing: The highlighter testcase uses a separate constant for the version. My idea was to iterate over all version constants and run the test several times to test all combinations. So the QueryParser ctors should also use the TEST_VERSION constant, so that it can be easily updated to do the testing for various versions. > Add oal.util.Version ctor to QueryParser > > > Key: LUCENE-2002 > URL: https://issues.apache.org/jira/browse/LUCENE-2002 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.9, 3.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 2.9.1 > > Attachments: LUCENE-2002-29.patch, LUCENE-2002-29.patch, > LUCENE-2002-29.patch, LUCENE-2002.patch > > > This is a followup of LUCENE-1987: > If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses > QueryParser, phrase queries will not work, because the StopFilter enables > position Increments for stop words, but QueryParser ignores them per default. > The user has to explicitely enable them. > This issue would add a ctor taking the Version constant and automatically > enable this setting. The same applies to the contrib queryparser. Eventually > also StopAnalyzer should add this version ctor. > To be able to remove the default ctor for 3.0 (to remove a possible trap for > users of QueryParser), it must be deprecated and the new one also added to > 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: svn:mergeinfo prop
OK thanks for the pointer :) It's very strange indeed. Mike On Fri, Oct 23, 2009 at 2:12 PM, Earwin Burrfoot wrote: > It's okay in a sense. > See, svn's merge-tracking support was grafted onto it in a particulary > hideous way and is really hairy on the insides. > So while there's no sane explanation for that behaviour, it is expected. > > See - > http://svnbook.red-bean.com/en/1.5/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword > > On Fri, Oct 23, 2009 at 21:55, Michael McCandless > wrote: >> I've noticed recently when merging from 2.9.x -> trunk or vice/versa, >> for some reason it picks up files that had zero source changes in the >> revision I merged, but do show changes to their svn:mergeinfo. >> >> EG for LUCENE-2002, I merged 2.9.x -> trunk, and now on my trunk >> checkout I see this mods: >> >> Property changes on: >> src/test/org/apache/lucene/index/TestBackwardsCompatibility.java >> ___ >> Modified: svn:mergeinfo >> Merged >> /lucene/java/branches/lucene_2_9/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java:r829134 >> >> >> Property changes on: src/test/org/apache/lucene/document/TestNumberTools.java >> ___ >> Modified: svn:mergeinfo >> Merged >> /lucene/java/branches/lucene_2_9/src/test/org/apache/lucene/document/TestNumberTools.java:r829134 >> >> But, the commit for LUCENE-2002 did not touch these files. Does >> anyone know why it's doing this? Is it OK? >> >> Mike >> >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> > > > > -- > Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) > Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 > ICQ: 104465785 > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2006) Optimization for FieldDocSortedHitQueue
[ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769404#action_12769404 ] Uwe Schindler commented on LUCENE-2006: --- The reason why this code looked like this is simple (from SVN log): at the beginning the FieldDoc values were just "Object[] fields". So the casts were needed. After adding custom comparators they get "Comparable". So there was no real perf idea behind doing it so complicated and ineffective. > Optimization for FieldDocSortedHitQueue > --- > > Key: LUCENE-2006 > URL: https://issues.apache.org/jira/browse/LUCENE-2006 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 3.0 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 3.0 > > Attachments: LUCENE-2006.patch > > > When updating core for generics, I found the following as a optimization of > FieldDocSortedHitQueue: > All FieldDoc values are Compareables (also the score or docid, if they > appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code > of lessThan seems very ineffective, as it has a big switch statement on the > SortField type, then casts the value to the underlying numeric type Object, > calls Number.xxxValue() & co for it and then compares manually. As > j.l.Number is itself Comparable, I see no reason to do this. Just call > compareTo on the Comparable interface and we are happy. The big deal is that > it prevents casting and the two method calls xxxValue(), as Number.compareTo > works more efficient internally. > The only special cases are String sort, where the Locale may be used and the > score sorting which is backwards. But these are two if statements instead of > the whole switch. > I had not tested it now for performance, but in my opinion it should be > faster for MultiSearchers. All tests still pass (because they should). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
No - was considering how one might be added - Mike's python script control to JIRA output stuff is just so cool, I'd hate to test any other way ;) The new colors feature makes it even better. Not sure how best to fit it in though - need a way to specify multiple indices obviously. Would love to get that Python into java too :) I thought Jason had started an issue for that, but I don't think it went very far. Would be great if all that was more generically built into the benchmarker somehow. Uwe Schindler wrote: > I opened LUCENE-2006. > > Is there any MultiSearcher related task/alg in contrib/benchmark? > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Mark Miller [mailto:markrmil...@gmail.com] >> Sent: Friday, October 23, 2009 7:53 PM >> To: java-dev@lucene.apache.org >> Subject: Re: [jira] Issue Comment Edited: (LUCENE-1997) Explore >> performance of multi-PQ vs single-PQ sorting API >> >> Nice! I like it. Even if its not much faster (havn't checked either), I >> can't see it being much slower and its cleaner code. >> >> I'd be happy to do some quick perf tests when I get a chance, but I'm +1 >> on it. >> >> Uwe Schindler wrote: >> >>> Mark, >>> >>> when removing may comment (as I now understand the whole >>> FieldDocSortedHitQueue), I found the following as a optimization of the >>> whole hq: >>> >>> All FieldDoc values are Compareables (also the score or docid, if they >>> appear as SortField in a MultiSearcher or ParallelMultiSearcher). The >>> >> code >> >>> of lessThan seems very ineffective, as it has a big switch statement on >>> >> the >> >>> SortField type, then casts the value to the underlying numeric type >>> >> Object, >> >>> calls Number.xxxValue() & co for it and then compares manually. As >>> j.l.Number is itself Comparable, I see no reason to do this. Just call >>> compareTo on the Comparable interface and we are happy. The big deal is >>> >> that >> >>> it prevents casting and the two method calls xxxValue(), as >>> >> Number.compareTo >> >>> works more efficient internally. >>> >>> The only special cases are String sort, where the Locale may be used and >>> >> the >> >>> score sorting which is backwards. But these are two if statements >>> >> instead of >> >>> the whole switch. >>> >>> I had not tested it now for performance, but in my opinion it should be >>> faster for MultiSearchers. All tests still pass (because they should). >>> >>> Attached patch applies to (current) trunk. >>> >>> - >>> Uwe Schindler >>> H.-H.-Meier-Allee 63, D-28213 Bremen >>> http://www.thetaphi.de >>> eMail: u...@thetaphi.de >>> >>> >>> -Original Message- From: Mark Miller (JIRA) [mailto:j...@apache.org] Sent: Friday, October 23, 2009 3:33 PM To: java-dev@lucene.apache.org Subject: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance >> of >> multi-PQ vs single-PQ sorting API [ https://issues.apache.org/jira/browse/LUCENE- 1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanel&focusedCommentId=12769221#action_12769221 ] Mark Miller edited comment on LUCENE-1997 at 10/23/09 1:31 PM: --- bq. but how does this fit together. Thats what Comparable FieldComparator#value is for - fillFields will >> grab >> all those and load up FieldDoc fields - so the custom FieldComparator >> is >> tied into it - it creates Comparable objects that can be compared by >> the >> native compareTos. (the old API did the same thing) {code} /** * Given a queue Entry, creates a corresponding FieldDoc * that contains the values used to sort the given document. * These values are not the raw values out of the index, but the internal * representation of them. This is so the given search hit can be collated by * a MultiSearcher with other search hits. * * @param entry The Entry used to create a FieldDoc * @return The newly created FieldDoc * @see Searchable#search(Weight,Filter,int,Sort) */ FieldDoc fillFields(final Entry entry) { final int n = comparators.length; final Comparable[] fields = new Comparable[n]; for (int i = 0; i < n; ++i) { fields[i] = comparators[i].value(entry.slot); } //if (maxscore > 1.0f) doc.score /= maxscore; // normalize scores return new FieldDoc(entry.docID, entry.score, fields); } {code} was (Author: markrmil...@gmail.com): bq. but how does this fit together. Thats what Comparable FieldComparator#value i
[jira] Commented: (LUCENE-2006) Optimization for FieldDocSortedHitQueue
[ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769398#action_12769398 ] Uwe Schindler commented on LUCENE-2006: --- Is there any MultiSearcher related task/alg in contrib/benchmark or somewhere in JIRA? > Optimization for FieldDocSortedHitQueue > --- > > Key: LUCENE-2006 > URL: https://issues.apache.org/jira/browse/LUCENE-2006 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 3.0 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 3.0 > > Attachments: LUCENE-2006.patch > > > When updating core for generics, I found the following as a optimization of > FieldDocSortedHitQueue: > All FieldDoc values are Compareables (also the score or docid, if they > appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code > of lessThan seems very ineffective, as it has a big switch statement on the > SortField type, then casts the value to the underlying numeric type Object, > calls Number.xxxValue() & co for it and then compares manually. As > j.l.Number is itself Comparable, I see no reason to do this. Just call > compareTo on the Comparable interface and we are happy. The big deal is that > it prevents casting and the two method calls xxxValue(), as Number.compareTo > works more efficient internally. > The only special cases are String sort, where the Locale may be used and the > score sorting which is backwards. But these are two if statements instead of > the whole switch. > I had not tested it now for performance, but in my opinion it should be > faster for MultiSearchers. All tests still pass (because they should). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
I opened LUCENE-2006. Is there any MultiSearcher related task/alg in contrib/benchmark? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Friday, October 23, 2009 7:53 PM > To: java-dev@lucene.apache.org > Subject: Re: [jira] Issue Comment Edited: (LUCENE-1997) Explore > performance of multi-PQ vs single-PQ sorting API > > Nice! I like it. Even if its not much faster (havn't checked either), I > can't see it being much slower and its cleaner code. > > I'd be happy to do some quick perf tests when I get a chance, but I'm +1 > on it. > > Uwe Schindler wrote: > > Mark, > > > > when removing may comment (as I now understand the whole > > FieldDocSortedHitQueue), I found the following as a optimization of the > > whole hq: > > > > All FieldDoc values are Compareables (also the score or docid, if they > > appear as SortField in a MultiSearcher or ParallelMultiSearcher). The > code > > of lessThan seems very ineffective, as it has a big switch statement on > the > > SortField type, then casts the value to the underlying numeric type > Object, > > calls Number.xxxValue() & co for it and then compares manually. As > > j.l.Number is itself Comparable, I see no reason to do this. Just call > > compareTo on the Comparable interface and we are happy. The big deal is > that > > it prevents casting and the two method calls xxxValue(), as > Number.compareTo > > works more efficient internally. > > > > The only special cases are String sort, where the Locale may be used and > the > > score sorting which is backwards. But these are two if statements > instead of > > the whole switch. > > > > I had not tested it now for performance, but in my opinion it should be > > faster for MultiSearchers. All tests still pass (because they should). > > > > Attached patch applies to (current) trunk. > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > >> -Original Message- > >> From: Mark Miller (JIRA) [mailto:j...@apache.org] > >> Sent: Friday, October 23, 2009 3:33 PM > >> To: java-dev@lucene.apache.org > >> Subject: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance > of > >> multi-PQ vs single-PQ sorting API > >> > >> > >> [ https://issues.apache.org/jira/browse/LUCENE- > >> 1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment- > >> tabpanel&focusedCommentId=12769221#action_12769221 ] > >> > >> Mark Miller edited comment on LUCENE-1997 at 10/23/09 1:31 PM: > >> --- > >> > >> bq. but how does this fit together. > >> > >> Thats what Comparable FieldComparator#value is for - fillFields will > grab > >> all those and load up FieldDoc fields - so the custom FieldComparator > is > >> tied into it - it creates Comparable objects that can be compared by > the > >> native compareTos. (the old API did the same thing) > >> > >> {code} > >> /** > >>* Given a queue Entry, creates a corresponding FieldDoc > >>* that contains the values used to sort the given document. > >>* These values are not the raw values out of the index, but the > >> internal > >>* representation of them. This is so the given search hit can be > >> collated by > >>* a MultiSearcher with other search hits. > >>* > >>* @param entry The Entry used to create a FieldDoc > >>* @return The newly created FieldDoc > >>* @see Searchable#search(Weight,Filter,int,Sort) > >>*/ > >> FieldDoc fillFields(final Entry entry) { > >> final int n = comparators.length; > >> final Comparable[] fields = new Comparable[n]; > >> for (int i = 0; i < n; ++i) { > >> fields[i] = comparators[i].value(entry.slot); > >> } > >> //if (maxscore > 1.0f) doc.score /= maxscore; // normalize scores > >> return new FieldDoc(entry.docID, entry.score, fields); > >> } > >> {code} > >> > >> was (Author: markrmil...@gmail.com): > >> bq. but how does this fit together. > >> > >> Thats what Comparable FieldComparator#value is for - fillFields will > grab > >> all those and load up FieldDoc fields - so the custom FieldComparator > is > >> tied into it - it creates Comparable objects that can be compared by > the > >> native compareTos. > >> > >> {code} > >> /** > >>* Given a queue Entry, creates a corresponding FieldDoc > >>* that contains the values used to sort the given document. > >>* These values are not the raw values out of the index, but the > >> internal > >>* representation of them. This is so the given search hit can be > >> collated by > >>* a MultiSearcher with other search hits. > >>* > >>* @param entry The Entry used to create a FieldDoc > >>* @return The newly created FieldDoc > >>* @see Searchable#search(Weight,Filter,int,Sort) > >>*/ > >> FieldDoc fill
[jira] Created: (LUCENE-2006) Optimization for FieldDocSortedHitQueue
Optimization for FieldDocSortedHitQueue --- Key: LUCENE-2006 URL: https://issues.apache.org/jira/browse/LUCENE-2006 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 3.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.0 Attachments: LUCENE-2006.patch When updating core for generics, I found the following as a optimization of FieldDocSortedHitQueue: All FieldDoc values are Compareables (also the score or docid, if they appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code of lessThan seems very ineffective, as it has a big switch statement on the SortField type, then casts the value to the underlying numeric type Object, calls Number.xxxValue() & co for it and then compares manually. As j.l.Number is itself Comparable, I see no reason to do this. Just call compareTo on the Comparable interface and we are happy. The big deal is that it prevents casting and the two method calls xxxValue(), as Number.compareTo works more efficient internally. The only special cases are String sort, where the Locale may be used and the score sorting which is backwards. But these are two if statements instead of the whole switch. I had not tested it now for performance, but in my opinion it should be faster for MultiSearchers. All tests still pass (because they should). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2006) Optimization for FieldDocSortedHitQueue
[ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2006: -- Attachment: LUCENE-2006.patch Patch. > Optimization for FieldDocSortedHitQueue > --- > > Key: LUCENE-2006 > URL: https://issues.apache.org/jira/browse/LUCENE-2006 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 3.0 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 3.0 > > Attachments: LUCENE-2006.patch > > > When updating core for generics, I found the following as a optimization of > FieldDocSortedHitQueue: > All FieldDoc values are Compareables (also the score or docid, if they > appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code > of lessThan seems very ineffective, as it has a big switch statement on the > SortField type, then casts the value to the underlying numeric type Object, > calls Number.xxxValue() & co for it and then compares manually. As > j.l.Number is itself Comparable, I see no reason to do this. Just call > compareTo on the Comparable interface and we are happy. The big deal is that > it prevents casting and the two method calls xxxValue(), as Number.compareTo > works more efficient internally. > The only special cases are String sort, where the Locale may be used and the > score sorting which is backwards. But these are two if statements instead of > the whole switch. > I had not tested it now for performance, but in my opinion it should be > faster for MultiSearchers. All tests still pass (because they should). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2006) Optimization for FieldDocSortedHitQueue
[ https://issues.apache.org/jira/browse/LUCENE-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769389#action_12769389 ] Uwe Schindler commented on LUCENE-2006: --- Mark Miller on java-dev: {quote} Nice! I like it. Even if its not much faster (havn't checked either), I can't see it being much slower and its cleaner code. I'd be happy to do some quick perf tests when I get a chance, but I'm +1 on it. {quote} > Optimization for FieldDocSortedHitQueue > --- > > Key: LUCENE-2006 > URL: https://issues.apache.org/jira/browse/LUCENE-2006 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 3.0 >Reporter: Uwe Schindler >Assignee: Uwe Schindler > Fix For: 3.0 > > Attachments: LUCENE-2006.patch > > > When updating core for generics, I found the following as a optimization of > FieldDocSortedHitQueue: > All FieldDoc values are Compareables (also the score or docid, if they > appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code > of lessThan seems very ineffective, as it has a big switch statement on the > SortField type, then casts the value to the underlying numeric type Object, > calls Number.xxxValue() & co for it and then compares manually. As > j.l.Number is itself Comparable, I see no reason to do this. Just call > compareTo on the Comparable interface and we are happy. The big deal is that > it prevents casting and the two method calls xxxValue(), as Number.compareTo > works more efficient internally. > The only special cases are String sort, where the Locale may be used and the > score sorting which is backwards. But these are two if statements instead of > the whole switch. > I had not tested it now for performance, but in my opinion it should be > faster for MultiSearchers. All tests still pass (because they should). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2002: --- Attachment: LUCENE-2002.patch Patch for trunk; I plan to commit soon... > Add oal.util.Version ctor to QueryParser > > > Key: LUCENE-2002 > URL: https://issues.apache.org/jira/browse/LUCENE-2002 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.9, 3.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 2.9.1 > > Attachments: LUCENE-2002-29.patch, LUCENE-2002-29.patch, > LUCENE-2002-29.patch, LUCENE-2002.patch > > > This is a followup of LUCENE-1987: > If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses > QueryParser, phrase queries will not work, because the StopFilter enables > position Increments for stop words, but QueryParser ignores them per default. > The user has to explicitely enable them. > This issue would add a ctor taking the Version constant and automatically > enable this setting. The same applies to the contrib queryparser. Eventually > also StopAnalyzer should add this version ctor. > To be able to remove the default ctor for 3.0 (to remove a possible trap for > users of QueryParser), it must be deprecated and the new one also added to > 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769368#action_12769368 ] Michael McCandless commented on LUCENE-1997: I agree the new results are now more ambiguous. > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769365#action_12769365 ] Mark Miller commented on LUCENE-1997: - JAVA: java version "1.5.0_20" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_20-b02) Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_20-b02, mixed mode) OS: Linux quad-laptop 2.6.31-14-generic #48-Ubuntu SMP x86_64 GNU/Linux ||Source||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change|| |wiki|log|1|317914|title|10|80.09|78.20|{color:red}-2.4%{color}| |wiki|log|1|317914|title|25|80.12|79.51|{color:red}-0.8%{color}| |wiki|log|1|317914|title|50|78.61|76.03|{color:red}-3.3%{color}| |wiki|log|1|317914|title|100|77.18|75.13|{color:red}-2.7%{color}| |wiki|log|1|317914|title|500|75.01|54.74|{color:red}-27.0%{color}| |wiki|log|1|317914|title|1000|67.77|41.29|{color:red}-39.1%{color}| |wiki|log||100|title|10|109.30|119.29|{color:green}9.1%{color}| |wiki|log||100|title|25|108.34|116.02|{color:green}7.1%{color}| |wiki|log||100|title|50|106.86|110.70|{color:green}3.6%{color}| |wiki|log||100|title|100|94.72|101.10|{color:green}6.7%{color}| |wiki|log||100|title|500|78.69|62.04|{color:red}-21.2%{color}| |wiki|log||100|title|1000|71.93|43.05|{color:red}-40.2%{color}| |random|log||100|rand string|10|112.81|117.80|{color:green}4.4%{color}| |random|log||100|rand string|25|113.92|115.73|{color:green}1.6%{color}| |random|log||100|rand string|50|113.55|110.08|{color:red}-3.1%{color}| |random|log||100|rand string|100|90.30|95.35|{color:green}5.6%{color}| |random|log||100|rand string|500|76.77|51.88|{color:red}-32.4%{color}| |random|log||100|rand string|1000|66.78|36.93|{color:red}-44.7%{color}| |random|log||100|country|10|114.26|118.72|{color:green}3.9%{color}| |random|log||100|country|25|113.96|115.81|{color:green}1.6%{color}| |random|log||100|country|50|113.59|109.78|{color:red}-3.4%{color}| |random|log||100|country|100|91.97|94.05|{color:green}2.3%{color}| |random|log||100|country|500|75.03|27.37|{color:red}-63.5%{color}| |random|log||100|country|1000|66.62|34.85|{color:red}-47.7%{color}| |random|log||100|rand int|10|118.06|124.42|{color:green}5.4%{color}| |random|log||100|rand int|25|117.76|120.76|{color:green}2.5%{color}| |random|log||100|rand int|50|117.35|115.81|{color:red}-1.3%{color}| |random|log||100|rand int|100|96.35|103.60|{color:green}7.5%{color}| |random|log||100|rand int|500|87.04|50.63|{color:red}-41.8%{color}| |random|log||100|rand int|1000|77.59|35.47|{color:red}-54.3%{color}| > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: svn:mergeinfo prop
It's okay in a sense. See, svn's merge-tracking support was grafted onto it in a particulary hideous way and is really hairy on the insides. So while there's no sane explanation for that behaviour, it is expected. See - http://svnbook.red-bean.com/en/1.5/svn.branchmerge.advanced.html#svn.branchmerge.advanced.finalword On Fri, Oct 23, 2009 at 21:55, Michael McCandless wrote: > I've noticed recently when merging from 2.9.x -> trunk or vice/versa, > for some reason it picks up files that had zero source changes in the > revision I merged, but do show changes to their svn:mergeinfo. > > EG for LUCENE-2002, I merged 2.9.x -> trunk, and now on my trunk > checkout I see this mods: > > Property changes on: > src/test/org/apache/lucene/index/TestBackwardsCompatibility.java > ___ > Modified: svn:mergeinfo > Merged > /lucene/java/branches/lucene_2_9/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java:r829134 > > > Property changes on: src/test/org/apache/lucene/document/TestNumberTools.java > ___ > Modified: svn:mergeinfo > Merged > /lucene/java/branches/lucene_2_9/src/test/org/apache/lucene/document/TestNumberTools.java:r829134 > > But, the commit for LUCENE-2002 did not touch these files. Does > anyone know why it's doing this? Is it OK? > > Mike > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 ICQ: 104465785 - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769332#action_12769332 ] Mark Miller commented on LUCENE-1997: - Mike's latest results are more ambiguous - let me run the new stuff on Linux too. > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769328#action_12769328 ] Jake Mannix commented on LUCENE-1997: - Mike, thanks for all the hard work on this - it's clearly far more work than anyone has spent yet on just doing the upgrade to the newer api, and that's appreciated. Am I wrong in thinking that these results are pretty ambiguous? How often to people take the top 500 or top 1000 sorted hits? If you don't focus on that case (that of looking for pages *50 through 100* of normal 10-per-page search results), there's a bunch of green, a bunch of red, both techniques are +/- 10-20% of each other? Is that what everyone else sees of Mike's newest numbers here, or am I misreading them? > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
svn:mergeinfo prop
I've noticed recently when merging from 2.9.x -> trunk or vice/versa, for some reason it picks up files that had zero source changes in the revision I merged, but do show changes to their svn:mergeinfo. EG for LUCENE-2002, I merged 2.9.x -> trunk, and now on my trunk checkout I see this mods: Property changes on: src/test/org/apache/lucene/index/TestBackwardsCompatibility.java ___ Modified: svn:mergeinfo Merged /lucene/java/branches/lucene_2_9/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java:r829134 Property changes on: src/test/org/apache/lucene/document/TestNumberTools.java ___ Modified: svn:mergeinfo Merged /lucene/java/branches/lucene_2_9/src/test/org/apache/lucene/document/TestNumberTools.java:r829134 But, the commit for LUCENE-2002 did not touch these files. Does anyone know why it's doing this? Is it OK? Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
Nice! I like it. Even if its not much faster (havn't checked either), I can't see it being much slower and its cleaner code. I'd be happy to do some quick perf tests when I get a chance, but I'm +1 on it. Uwe Schindler wrote: > Mark, > > when removing may comment (as I now understand the whole > FieldDocSortedHitQueue), I found the following as a optimization of the > whole hq: > > All FieldDoc values are Compareables (also the score or docid, if they > appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code > of lessThan seems very ineffective, as it has a big switch statement on the > SortField type, then casts the value to the underlying numeric type Object, > calls Number.xxxValue() & co for it and then compares manually. As > j.l.Number is itself Comparable, I see no reason to do this. Just call > compareTo on the Comparable interface and we are happy. The big deal is that > it prevents casting and the two method calls xxxValue(), as Number.compareTo > works more efficient internally. > > The only special cases are String sort, where the Locale may be used and the > score sorting which is backwards. But these are two if statements instead of > the whole switch. > > I had not tested it now for performance, but in my opinion it should be > faster for MultiSearchers. All tests still pass (because they should). > > Attached patch applies to (current) trunk. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Mark Miller (JIRA) [mailto:j...@apache.org] >> Sent: Friday, October 23, 2009 3:33 PM >> To: java-dev@lucene.apache.org >> Subject: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance of >> multi-PQ vs single-PQ sorting API >> >> >> [ https://issues.apache.org/jira/browse/LUCENE- >> 1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment- >> tabpanel&focusedCommentId=12769221#action_12769221 ] >> >> Mark Miller edited comment on LUCENE-1997 at 10/23/09 1:31 PM: >> --- >> >> bq. but how does this fit together. >> >> Thats what Comparable FieldComparator#value is for - fillFields will grab >> all those and load up FieldDoc fields - so the custom FieldComparator is >> tied into it - it creates Comparable objects that can be compared by the >> native compareTos. (the old API did the same thing) >> >> {code} >> /** >>* Given a queue Entry, creates a corresponding FieldDoc >>* that contains the values used to sort the given document. >>* These values are not the raw values out of the index, but the >> internal >>* representation of them. This is so the given search hit can be >> collated by >>* a MultiSearcher with other search hits. >>* >>* @param entry The Entry used to create a FieldDoc >>* @return The newly created FieldDoc >>* @see Searchable#search(Weight,Filter,int,Sort) >>*/ >> FieldDoc fillFields(final Entry entry) { >> final int n = comparators.length; >> final Comparable[] fields = new Comparable[n]; >> for (int i = 0; i < n; ++i) { >> fields[i] = comparators[i].value(entry.slot); >> } >> //if (maxscore > 1.0f) doc.score /= maxscore; // normalize scores >> return new FieldDoc(entry.docID, entry.score, fields); >> } >> {code} >> >> was (Author: markrmil...@gmail.com): >> bq. but how does this fit together. >> >> Thats what Comparable FieldComparator#value is for - fillFields will grab >> all those and load up FieldDoc fields - so the custom FieldComparator is >> tied into it - it creates Comparable objects that can be compared by the >> native compareTos. >> >> {code} >> /** >>* Given a queue Entry, creates a corresponding FieldDoc >>* that contains the values used to sort the given document. >>* These values are not the raw values out of the index, but the >> internal >>* representation of them. This is so the given search hit can be >> collated by >>* a MultiSearcher with other search hits. >>* >>* @param entry The Entry used to create a FieldDoc >>* @return The newly created FieldDoc >>* @see Searchable#search(Weight,Filter,int,Sort) >>*/ >> FieldDoc fillFields(final Entry entry) { >> final int n = comparators.length; >> final Comparable[] fields = new Comparable[n]; >> for (int i = 0; i < n; ++i) { >> fields[i] = comparators[i].value(entry.slot); >> } >> //if (maxscore > 1.0f) doc.score /= maxscore; // normalize scores >> return new FieldDoc(entry.docID, entry.score, fields); >> } >> {code} >> >> >>> Explore performance of multi-PQ vs single-PQ sorting API >>> >>> >>> Key: LUCENE-1997 >>> URL: https://issues.apache.org/jira/browse/LUCENE-1997 >>> Project: Lucene - Java >>> Issue Type: Improvement >>>
Re: svn commit: r829128 - /lucene/java/branches/lucene_2_9/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
Yeah - coming up. Yonik Seeley wrote: > This probably warrants a CHANGES entry? > > -Yonik > http://www.lucidimagination.com > > > > On Fri, Oct 23, 2009 at 1:07 PM, wrote: > >> Author: markrmiller >> Date: Fri Oct 23 17:07:22 2009 >> New Revision: 829128 >> >> URL: http://svn.apache.org/viewvc?rev=829128&view=rev >> Log: >> LUCENE-2003: Highlighter doesn't respect position increments other than 1 >> with PhraseQuerys >> >> Modified: >> >> lucene/java/branches/lucene_2_9/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java >> >> Modified: >> lucene/java/branches/lucene_2_9/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java >> URL: >> http://svn.apache.org/viewvc/lucene/java/branches/lucene_2_9/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java?rev=829128&r1=829127&r2=829128&view=diff >> == >> --- >> lucene/java/branches/lucene_2_9/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java >> (original) >> +++ >> lucene/java/branches/lucene_2_9/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java >> Fri Oct 23 17:07:22 2009 >> @@ -109,13 +109,32 @@ >> } >> } >> } else if (query instanceof PhraseQuery) { >> - Term[] phraseQueryTerms = ((PhraseQuery) query).getTerms(); >> + PhraseQuery phraseQuery = ((PhraseQuery) query); >> + Term[] phraseQueryTerms = phraseQuery.getTerms(); >> SpanQuery[] clauses = new SpanQuery[phraseQueryTerms.length]; >> for (int i = 0; i < phraseQueryTerms.length; i++) { >> clauses[i] = new SpanTermQuery(phraseQueryTerms[i]); >> } >> + int slop = phraseQuery.getSlop(); >> + int[] positions = phraseQuery.getPositions(); >> + // add largest position increment to slop >> + if (positions.length > 0) { >> +int lastPos = positions[0]; >> +int largestInc = 0; >> +int sz = positions.length; >> +for (int i = 1; i < sz; i++) { >> + int pos = positions[i]; >> + int inc = pos - lastPos; >> + if (inc > largestInc) { >> +largestInc = inc; >> + } >> + lastPos = pos; >> +} >> +if(largestInc > 1) { >> + slop += largestInc; >> +} >> + } >> >> - int slop = ((PhraseQuery) query).getSlop(); >> boolean inorder = false; >> >> if (slop == 0) { >> >> >> >> > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: svn commit: r829128 - /lucene/java/branches/lucene_2_9/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java
This probably warrants a CHANGES entry? -Yonik http://www.lucidimagination.com On Fri, Oct 23, 2009 at 1:07 PM, wrote: > Author: markrmiller > Date: Fri Oct 23 17:07:22 2009 > New Revision: 829128 > > URL: http://svn.apache.org/viewvc?rev=829128&view=rev > Log: > LUCENE-2003: Highlighter doesn't respect position increments other than 1 > with PhraseQuerys > > Modified: > > lucene/java/branches/lucene_2_9/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java > > Modified: > lucene/java/branches/lucene_2_9/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java > URL: > http://svn.apache.org/viewvc/lucene/java/branches/lucene_2_9/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java?rev=829128&r1=829127&r2=829128&view=diff > == > --- > lucene/java/branches/lucene_2_9/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java > (original) > +++ > lucene/java/branches/lucene_2_9/contrib/highlighter/src/java/org/apache/lucene/search/highlight/WeightedSpanTermExtractor.java > Fri Oct 23 17:07:22 2009 > @@ -109,13 +109,32 @@ > } > } > } else if (query instanceof PhraseQuery) { > - Term[] phraseQueryTerms = ((PhraseQuery) query).getTerms(); > + PhraseQuery phraseQuery = ((PhraseQuery) query); > + Term[] phraseQueryTerms = phraseQuery.getTerms(); > SpanQuery[] clauses = new SpanQuery[phraseQueryTerms.length]; > for (int i = 0; i < phraseQueryTerms.length; i++) { > clauses[i] = new SpanTermQuery(phraseQueryTerms[i]); > } > + int slop = phraseQuery.getSlop(); > + int[] positions = phraseQuery.getPositions(); > + // add largest position increment to slop > + if (positions.length > 0) { > + int lastPos = positions[0]; > + int largestInc = 0; > + int sz = positions.length; > + for (int i = 1; i < sz; i++) { > + int pos = positions[i]; > + int inc = pos - lastPos; > + if (inc > largestInc) { > + largestInc = inc; > + } > + lastPos = pos; > + } > + if(largestInc > 1) { > + slop += largestInc; > + } > + } > > - int slop = ((PhraseQuery) query).getSlop(); > boolean inorder = false; > > if (slop == 0) { > > > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769299#action_12769299 ] Michael McCandless commented on LUCENE-2002: Actually to port to trunk I was going svn merge, remove conflicts, remove merged in but deprecated methods, then fix all resulting compilation/test errors. I'll try to do this myself... wish me luck ;) I'm only using emacs over here!! > Add oal.util.Version ctor to QueryParser > > > Key: LUCENE-2002 > URL: https://issues.apache.org/jira/browse/LUCENE-2002 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.9, 3.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 2.9.1 > > Attachments: LUCENE-2002-29.patch, LUCENE-2002-29.patch, > LUCENE-2002-29.patch > > > This is a followup of LUCENE-1987: > If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses > QueryParser, phrase queries will not work, because the StopFilter enables > position Increments for stop words, but QueryParser ignores them per default. > The user has to explicitely enable them. > This issue would add a ctor taking the Version constant and automatically > enable this setting. The same applies to the contrib queryparser. Eventually > also StopAnalyzer should add this version ctor. > To be able to remove the default ctor for 3.0 (to remove a possible trap for > users of QueryParser), it must be deprecated and the new one also added to > 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769295#action_12769295 ] Michael McCandless commented on LUCENE-1997: 32 bit 1.5 JRE: JAVA: java version "1.5.0_19" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02) Java HotSpot(TM) Server VM (build 1.5.0_19-b02, mixed mode) OS: SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris ||Source||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change|| |wiki|log|1|318481|title|10|97.31|92.69|{color:red}-4.7%{color}| |wiki|log|1|318481|title|25|96.74|92.09|{color:red}-4.8%{color}| |wiki|log|1|318481|title|50|98.57|90.03|{color:red}-8.7%{color}| |wiki|log|1|318481|title|100|97.20|103.72|{color:green}6.7%{color}| |wiki|log|1|318481|title|500|84.14|78.23|{color:red}-7.0%{color}| |wiki|log|1|318481|title|1000|77.84|63.62|{color:red}-18.3%{color}| |wiki|log||100|title|10|114.99|136.86|{color:green}19.0%{color}| |wiki|log||100|title|25|114.63|125.92|{color:green}9.8%{color}| |wiki|log||100|title|50|113.33|130.58|{color:green}15.2%{color}| |wiki|log||100|title|100|115.36|111.81|{color:red}-3.1%{color}| |wiki|log||100|title|500|107.30|86.16|{color:red}-19.7%{color}| |wiki|log||100|title|1000|98.07|55.39|{color:red}-43.5%{color}| |random|log||100|rand string|10|115.55|140.86|{color:green}21.9%{color}| |random|log||100|rand string|25|125.66|137.15|{color:green}9.1%{color}| |random|log||100|rand string|50|123.58|133.82|{color:green}8.3%{color}| |random|log||100|rand string|100|115.51|134.82|{color:green}16.7%{color}| |random|log||100|rand string|500|102.73|93.24|{color:red}-9.2%{color}| |random|log||100|rand string|1000|88.70|65.09|{color:red}-26.6%{color}| |random|log||100|country|10|113.92|139.72|{color:green}22.6%{color}| |random|log||100|country|25|113.44|131.36|{color:green}15.8%{color}| |random|log||100|country|50|122.88|128.62|{color:green}4.7%{color}| |random|log||100|country|100|121.88|135.58|{color:green}11.2%{color}| |random|log||100|country|500|96.94|79.38|{color:red}-18.1%{color}| |random|log||100|country|1000|82.01|62.31|{color:red}-24.0%{color}| |random|log||100|rand int|10|124.58|134.20|{color:green}7.7%{color}| |random|log||100|rand int|25|123.46|134.82|{color:green}9.2%{color}| |random|log||100|rand int|50|117.96|128.61|{color:green}9.0%{color}| |random|log||100|rand int|100|113.92|122.09|{color:green}7.2%{color}| |random|log||100|rand int|500|105.49|38.92|{color:red}-63.1%{color}| |random|log||100|rand int|1000|92.27|53.14|{color:red}-42.4%{color}| > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
RE: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
Mark, when removing may comment (as I now understand the whole FieldDocSortedHitQueue), I found the following as a optimization of the whole hq: All FieldDoc values are Compareables (also the score or docid, if they appear as SortField in a MultiSearcher or ParallelMultiSearcher). The code of lessThan seems very ineffective, as it has a big switch statement on the SortField type, then casts the value to the underlying numeric type Object, calls Number.xxxValue() & co for it and then compares manually. As j.l.Number is itself Comparable, I see no reason to do this. Just call compareTo on the Comparable interface and we are happy. The big deal is that it prevents casting and the two method calls xxxValue(), as Number.compareTo works more efficient internally. The only special cases are String sort, where the Locale may be used and the score sorting which is backwards. But these are two if statements instead of the whole switch. I had not tested it now for performance, but in my opinion it should be faster for MultiSearchers. All tests still pass (because they should). Attached patch applies to (current) trunk. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Mark Miller (JIRA) [mailto:j...@apache.org] > Sent: Friday, October 23, 2009 3:33 PM > To: java-dev@lucene.apache.org > Subject: [jira] Issue Comment Edited: (LUCENE-1997) Explore performance of > multi-PQ vs single-PQ sorting API > > > [ https://issues.apache.org/jira/browse/LUCENE- > 1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment- > tabpanel&focusedCommentId=12769221#action_12769221 ] > > Mark Miller edited comment on LUCENE-1997 at 10/23/09 1:31 PM: > --- > > bq. but how does this fit together. > > Thats what Comparable FieldComparator#value is for - fillFields will grab > all those and load up FieldDoc fields - so the custom FieldComparator is > tied into it - it creates Comparable objects that can be compared by the > native compareTos. (the old API did the same thing) > > {code} > /** >* Given a queue Entry, creates a corresponding FieldDoc >* that contains the values used to sort the given document. >* These values are not the raw values out of the index, but the > internal >* representation of them. This is so the given search hit can be > collated by >* a MultiSearcher with other search hits. >* >* @param entry The Entry used to create a FieldDoc >* @return The newly created FieldDoc >* @see Searchable#search(Weight,Filter,int,Sort) >*/ > FieldDoc fillFields(final Entry entry) { > final int n = comparators.length; > final Comparable[] fields = new Comparable[n]; > for (int i = 0; i < n; ++i) { > fields[i] = comparators[i].value(entry.slot); > } > //if (maxscore > 1.0f) doc.score /= maxscore; // normalize scores > return new FieldDoc(entry.docID, entry.score, fields); > } > {code} > > was (Author: markrmil...@gmail.com): > bq. but how does this fit together. > > Thats what Comparable FieldComparator#value is for - fillFields will grab > all those and load up FieldDoc fields - so the custom FieldComparator is > tied into it - it creates Comparable objects that can be compared by the > native compareTos. > > {code} > /** >* Given a queue Entry, creates a corresponding FieldDoc >* that contains the values used to sort the given document. >* These values are not the raw values out of the index, but the > internal >* representation of them. This is so the given search hit can be > collated by >* a MultiSearcher with other search hits. >* >* @param entry The Entry used to create a FieldDoc >* @return The newly created FieldDoc >* @see Searchable#search(Weight,Filter,int,Sort) >*/ > FieldDoc fillFields(final Entry entry) { > final int n = comparators.length; > final Comparable[] fields = new Comparable[n]; > for (int i = 0; i < n; ++i) { > fields[i] = comparators[i].value(entry.slot); > } > //if (maxscore > 1.0f) doc.score /= maxscore; // normalize scores > return new FieldDoc(entry.docID, entry.score, fields); > } > {code} > > > Explore performance of multi-PQ vs single-PQ sorting API > > > > > > Key: LUCENE-1997 > > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > > Project: Lucene - Java > > Issue Type: Improvement > > Components: Search > >Affects Versions: 2.9 > >Reporter: Michael McCandless > >Assignee: Michael McCandless > > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > > where a simpler (non-segment-based) comparator API is proposed that > > gathe
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769287#action_12769287 ] Michael McCandless commented on LUCENE-1997: Env: JAVA: java version "1.5.0_19" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02) Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_19-b02, mixed mode) OS: SunOS rhumba 5.11 snv_111b i86pc i386 i86pc Solaris Results: ||Source||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change|| |wiki|log|1|318481|title|10|98.47|104.60|{color:green}6.2%{color}| |wiki|log|1|318481|title|25|97.90|103.63|{color:green}5.9%{color}| |wiki|log|1|318481|title|50|105.12|101.50|{color:red}-3.4%{color}| |wiki|log|1|318481|title|100|102.30|108.59|{color:green}6.1%{color}| |wiki|log|1|318481|title|500|89.43|79.40|{color:red}-11.2%{color}| |wiki|log|1|318481|title|1000|82.83|63.75|{color:red}-23.0%{color}| |wiki|log||100|title|10|152.56|157.40|{color:green}3.2%{color}| |wiki|log||100|title|25|151.95|148.52|{color:red}-2.3%{color}| |wiki|log||100|title|50|148.52|142.90|{color:red}-3.8%{color}| |wiki|log||100|title|100|127.70|138.72|{color:green}8.6%{color}| |wiki|log||100|title|500|104.30|90.30|{color:red}-13.4%{color}| |wiki|log||100|title|1000|99.10|66.05|{color:red}-33.4%{color}| |random|log||100|rand string|10|153.13|157.74|{color:green}3.0%{color}| |random|log||100|rand string|25|128.79|150.62|{color:green}17.0%{color}| |random|log||100|rand string|50|122.46|153.95|{color:green}25.7%{color}| |random|log||100|rand string|100|116.26|141.43|{color:green}21.6%{color}| |random|log||100|rand string|500|98.24|96.17|{color:red}-2.1%{color}| |random|log||100|rand string|1000|86.38|71.95|{color:red}-16.7%{color}| |random|log||100|country|10|148.65|153.23|{color:green}3.1%{color}| |random|log||100|country|25|148.52|152.69|{color:green}2.8%{color}| |random|log||100|country|50|122.01|149.52|{color:green}22.5%{color}| |random|log||100|country|100|120.39|145.99|{color:green}21.3%{color}| |random|log||100|country|500|99.70|95.65|{color:red}-4.1%{color}| |random|log||100|country|1000|90.18|69.46|{color:red}-23.0%{color}| |random|log||100|rand int|10|150.85|171.22|{color:green}13.5%{color}| |random|log||100|rand int|25|151.13|167.94|{color:green}11.1%{color}| |random|log||100|rand int|50|152.51|162.23|{color:green}6.4%{color}| |random|log||100|rand int|100|130.54|145.04|{color:green}11.1%{color}| |random|log||100|rand int|500|108.38|43.74|{color:red}-59.6%{color}| |random|log||100|rand int|1000|98.27|63.56|{color:red}-35.3%{color}| > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1997: --- Attachment: LUCENE-1997.patch New patch, fixes silly bug in sortBench.py. > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, > LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1997: --- Attachment: LUCENE-1997.patch New patch attached: * Made some basic code level optimizations, eg created an explicit DocIDPriorityQueue (that deals in int not Object, to avoid casting), subclassed that directly to a SortByStringQueue and a SortByIntQueue. It turns out that if statement (when comparing int values) must stay because the subtraction can overflow int. * Added "sortBench.py -verify" that quickly runs each API across all tests and confirms results are identical -- proxy for real unit tests * Added "Source" (wiki or random) to Jira table output * Print java/os version at start I'll re-run my test. > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1257) Port to Java5
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Kay updated LUCENE-1257: Attachment: LUCENE-1257_contrib_benchmark_2.patch > Port to Java5 > - > > Key: LUCENE-1257 > URL: https://issues.apache.org/jira/browse/LUCENE-1257 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis, Examples, Index, Other, Query/Scoring, > QueryParser, Search, Store, Term Vectors >Affects Versions: 3.0 >Reporter: Cédric Champeau >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.0 > > Attachments: instantiated_fieldable.patch, > LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, > LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, > LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, > LUCENE-1257-CompoundFileReaderWriter.patch, > LUCENE-1257-ConcurrentMergeScheduler.patch, > LUCENE-1257-DirectoryReader.patch, > LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, > LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, > LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, > LUCENE-1257-IndexDeleter.patch, > LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, > LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, > LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, > LUCENE-1257-org_apache_lucene_document.patch, > LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, > LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, > LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, > LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, > LUCENE-1257_BooleanFilter_Generics.patch, > LUCENE-1257_contrib_benchmark.patch, LUCENE-1257_contrib_benchmark_2.patch, > LUCENE-1257_contrib_highlighting.patch, LUCENE-1257_javacc_upgrade.patch, > LUCENE-1257_messages.patch, LUCENE-1257_more_unnecessary_casts.patch, > LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, > LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, > LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, > LUCENE-1257_o_a_l_search_spans.patch, > LUCENE-1257_org_apache_lucene_index.patch, > LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_queryParser_jj.patch, > LUCENE-1257_unnecessary_casts.patch, LUCENE-1257_unnnecessary_casts_2.patch, > lucene1257surround1.patch, lucene1257surround1.patch, > shinglematrixfilter_generified.patch > > > For my needs I've updated Lucene so that it uses Java 5 constructs. I know > Java 5 migration had been planned for 2.1 someday in the past, but don't know > when it is planned now. This patch against the trunk includes : > - most obvious generics usage (there are tons of usages of sets, ... Those > which are commonly used have been generified) > - PriorityQueue generification > - replacement of indexed for loops with for each constructs > - removal of unnececessary unboxing > The code is to my opinion much more readable with those features (you > actually *know* what is stored in collections reading the code, without the > need to lookup for field definitions everytime) and it simplifies many > algorithms. > Note that this patch also includes an interface for the Query class. This has > been done for my company's needs for building custom Query classes which add > some behaviour to the base Lucene queries. It prevents multiple unnnecessary > casts. I know this introduction is not wanted by the team, but it really > makes our developments easier to maintain. If you don't want to use this, > replace all /Queriable/ calls with standard /Query/. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: 2.9 and IndexReader deletions
Yes, please do move to the Back Compat section; I think it really does belong there. Mike On Fri, Oct 23, 2009 at 11:28 AM, Grant Ingersoll wrote: > I'm going through and updating my Lucene Boot Camp training for 2.9. In it, > I have some code that shows the various ways you can do deletes. > > In 2.4, the code worked fine, in 2.9 it now fails. Here's the code: > public void testDeletions() throws Exception { > log.info("testDeletions()-"); > //we should have indexed already > IndexReader reader = IndexReader.open(directory, false); > assertTrue("reader is null and it shouldn't be", reader != null); > Document doc = reader.document(0); > assertTrue("doc is null and it shouldn't be", doc != null); > reader.deleteDocument(0); > assertTrue(reader.isDeleted(0) + " does not equal: " + true, > reader.isDeleted(0) == true); > try { > doc = reader.document(0); //verify the document is not retrievable > assertTrue(doc.toString(), false); > } catch (IllegalArgumentException e) { > > } > > log.info("end testDeletions()-"); > } > > This is, of course, due to LUCENE-1708: > > * LUCENE-1708 - IndexReader.document() no longer checks if the document is > deleted. You can call IndexReader.isDeleted(n) prior to calling > document(n). > (Shai Erera via Mike McCandless) > > However, it seems like this is a break in back-compat as it requires people > to actively change their code. Granted, my test case is admittedly > contrived, but I wonder if other people come across this. I understand that > I'm not going to find (search) a deleted document so in most cases it's not > a big deal, but plenty of people use Lucene as a document store, too, and > may access documents directly such that their code may now well be broken. > > Anyone against me moving the issue from the Runtime changes section to the > Back Compat section? > > -Grant > > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
2.9 and IndexReader deletions
I'm going through and updating my Lucene Boot Camp training for 2.9. In it, I have some code that shows the various ways you can do deletes. In 2.4, the code worked fine, in 2.9 it now fails. Here's the code: public void testDeletions() throws Exception { log.info("testDeletions()-"); //we should have indexed already IndexReader reader = IndexReader.open(directory, false); assertTrue("reader is null and it shouldn't be", reader != null); Document doc = reader.document(0); assertTrue("doc is null and it shouldn't be", doc != null); reader.deleteDocument(0); assertTrue(reader.isDeleted(0) + " does not equal: " + true, reader.isDeleted(0) == true); try { doc = reader.document(0); //verify the document is not retrievable assertTrue(doc.toString(), false); } catch (IllegalArgumentException e) { } log.info("end testDeletions()-"); } This is, of course, due to LUCENE-1708: * LUCENE-1708 - IndexReader.document() no longer checks if the document is deleted. You can call IndexReader.isDeleted(n) prior to calling document(n). (Shai Erera via Mike McCandless) However, it seems like this is a break in back-compat as it requires people to actively change their code. Granted, my test case is admittedly contrived, but I wonder if other people come across this. I understand that I'm not going to find (search) a deleted document so in most cases it's not a big deal, but plenty of people use Lucene as a document store, too, and may access documents directly such that their code may now well be broken. Anyone against me moving the issue from the Runtime changes section to the Back Compat section? -Grant - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene 2.9 sorting algorithm
Agreed: so far I'm seeing serious performance loss with MultiPQ, especially as topN gets larger, and for int sorting. For small queue, String sort, it sometimes wins. So if I were forced to decide now based on the current results, I think we should keep the single PQ API. But: I am right now optimizing John's patch to see how fast Multi PQ can get. I'll post it once I get it working, and post output from re-running on my opensolaris box. Mike 2009/10/23 Mark Miller : >>>I still think we should if performance is no >>>better with the new one. > > Where is there any indication performance is not better with the new one? > > The benchmarks are clearly against switching back. At best they could argue > for two API's - even then it depends - a loss of 10% on Java 1.5 > with the most recent linux for a topn:10 ? I'm all for more results, but its > not looking like a good switch to me. What API do I use? Well, it depends - > how many docs will you ask for back, what OS are running, how hard is it for > you to grok one API over the other? > > And then as we make changes in the future we have to manage both APIs. > > bq. digging in deep and running thorough perf tests makes sense > > Again - no one is arguing against - dig all year - I'll help - but I don't > see the treasure yet, and the hole is starting to look deep. > > bq. removing that if from the Multi PQ patch makes sense > > I didn't have a problem with that either - or other code changes - but > jeeze, mention what you are seeing with the switch. I'll tell you what I > saw it - not that much - a bit of improvement, but take a look at the > Java 1.5 run - it ended up being a blade of grass holding up a boulder > on Linux. > > > > Michael McCandless wrote: >> Sheesh I go to bed and so much all of a sudden happens!! >> >> Sorry Mark; I should've called out "PATCH IS ON 2.9 BRANCH" more >> clearly ;) >> >> There's no question in my mind that the new comparator API is more >> complex than the old one, and I really don't like that. I had to >> rewrite the section of LIA that gives an example of a [simple] custom >> sort and it wasn't pleasant! Two compare methods (compare, >> compareBottom)? Two copy methods (copy, setBottom)? Sure, you can >> grok it and get through it if you have to, but it is more complex >> because it's conflated with the PQ API. >> >> Ease on consumption of our APIs is very important, so, only when >> performance clearly warrants it should we adopt a more complex API. >> >> Also, yeah, it would suck to have to switch back to the old API at >> this point, but net/net I still think we should if performance is no >> better with the new one. >> >> The old API also fits cleanly with per-segment searching (John's >> initial patch shows that -- it's simply another per-segment Colletor). >> The two APIs (collection, comparator) are well decoupled. >> >> So, digging in deep and running thorough perf tests makes sense; we >> need to understand the performance to make the API switch decision. >> And definitely we should tune both approaches as much as possible >> (removing that if from the Multi PQ patch makes sense). >> >> But... Multi PQ's performance isn't better in many cases... though, >> we're clearly still iterating. I'll run a 1.5 (32 & 64 bit) test, >> with the if statement removed. >> >> Mike >> >> On Fri, Oct 23, 2009 at 3:53 AM, Earwin Burrfoot wrote: >> >>> I did. >>> >>> On Fri, Oct 23, 2009 at 09:05, Jake Mannix wrote: >>> On Thu, Oct 22, 2009 at 9:58 PM, Mark Miller wrote: > Yes - I've seen a handful of non core devs report back that they > upgraded with no complaints on the difficulty. Its in the mailing list > archives. The only core dev I've seen say its easy is Uwe. He's super > sharp though, so I wasn't banking my comment on him ;) > Upgrade custom sorting? Where has anyone talked about this? 2.9 is great, I like the new apis, they're great in general. It's just this multi-segment sorting we're talking about here. -jake >>> >>> -- >>> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) >>> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 >>> ICQ: 104465785 >>> >>> - >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org >>> >>> >>> >> >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> > > > -- > - Mark > > http://www.lucidimagination.com > > > > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > --
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769235#action_12769235 ] Uwe Schindler commented on LUCENE-2002: --- I know this problem of trunk. But the first step is to merge the changes in and then remove the deprecated parts again. Then fixing of tests, which may be many as QueryParser is used almost everywhere. Maybe we can split contrib/core. > Add oal.util.Version ctor to QueryParser > > > Key: LUCENE-2002 > URL: https://issues.apache.org/jira/browse/LUCENE-2002 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.9, 3.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 2.9.1 > > Attachments: LUCENE-2002-29.patch, LUCENE-2002-29.patch, > LUCENE-2002-29.patch > > > This is a followup of LUCENE-1987: > If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses > QueryParser, phrase queries will not work, because the StopFilter enables > position Increments for stop words, but QueryParser ignores them per default. > The user has to explicitely enable them. > This issue would add a ctor taking the Version constant and automatically > enable this setting. The same applies to the contrib queryparser. Eventually > also StopAnalyzer should add this version ctor. > To be able to remove the default ctor for 3.0 (to remove a possible trap for > users of QueryParser), it must be deprecated and the new one also added to > 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769232#action_12769232 ] Michael McCandless commented on LUCENE-2002: bq. I am happy to then use the merge operations in SVN tru apply this in trunk! Are you offering to port this to trunk? That'd be nice :) I wasn't looking forward to that part! Note that on trunk it'll be fairly difference since we'll remove (not deprecate) the old methods, and, we have to go fix all usage of the deprecated APIs (eg in tests, contrib) which I haven't done (which I haven't done for 2.9). > Add oal.util.Version ctor to QueryParser > > > Key: LUCENE-2002 > URL: https://issues.apache.org/jira/browse/LUCENE-2002 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.9, 3.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 2.9.1 > > Attachments: LUCENE-2002-29.patch, LUCENE-2002-29.patch, > LUCENE-2002-29.patch > > > This is a followup of LUCENE-1987: > If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses > QueryParser, phrase queries will not work, because the StopFilter enables > position Increments for stop words, but QueryParser ignores them per default. > The user has to explicitely enable them. > This issue would add a ctor taking the Version constant and automatically > enable this setting. The same applies to the contrib queryparser. Eventually > also StopAnalyzer should add this version ctor. > To be able to remove the default ctor for 3.0 (to remove a possible trap for > users of QueryParser), it must be deprecated and the new one also added to > 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2002: --- Attachment: LUCENE-2002-29.patch New patch, adding Version to StopAnalyzer as well. Thanks for reviewing -- I'll commit in a bit! > Add oal.util.Version ctor to QueryParser > > > Key: LUCENE-2002 > URL: https://issues.apache.org/jira/browse/LUCENE-2002 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.9, 3.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 2.9.1 > > Attachments: LUCENE-2002-29.patch, LUCENE-2002-29.patch, > LUCENE-2002-29.patch > > > This is a followup of LUCENE-1987: > If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses > QueryParser, phrase queries will not work, because the StopFilter enables > position Increments for stop words, but QueryParser ignores them per default. > The user has to explicitely enable them. > This issue would add a ctor taking the Version constant and automatically > enable this setting. The same applies to the contrib queryparser. Eventually > also StopAnalyzer should add this version ctor. > To be able to remove the default ctor for 3.0 (to remove a possible trap for > users of QueryParser), it must be deprecated and the new one also added to > 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene 2.9 sorting algorithm
Mark Miller wrote: > bq. removing that if from the Multi PQ patch makes sense > > I didn't have a problem with that either - or other code changes - but > jeeze, mention what you are seeing with the switch. I'll tell you what I > saw it - not that much - a bit of improvement, but take a look at the > Java 1.5 run - it ended up being a blade of grass holding up a boulder > on Linux. > In fact, looking closer at my runs - for me it didn't really help at all - I had noticed a difference of a couple to a few percent here and there - but now looking closer, it goes both ways on different ones - some slightly better, some slightly worse - making me think, that at least for my setup, it just doesn't matter. -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769227#action_12769227 ] Uwe Schindler commented on LUCENE-1997: --- bq. it creates Comparable objects that can be compared by the native compareTos. (the old API did the same thing) OK understood. I will try to fix the generics somehow to be able to remove the SuppressWarnings. > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769222#action_12769222 ] Mark Miller commented on LUCENE-1997: - bq. As most servers are running 64 bit, Aren't we at the tipping point where even non servers are 64bit now? My consumer desktop/laptops have been 64-bit for years now. > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769221#action_12769221 ] Mark Miller edited comment on LUCENE-1997 at 10/23/09 1:31 PM: --- bq. but how does this fit together. Thats what Comparable FieldComparator#value is for - fillFields will grab all those and load up FieldDoc fields - so the custom FieldComparator is tied into it - it creates Comparable objects that can be compared by the native compareTos. (the old API did the same thing) {code} /** * Given a queue Entry, creates a corresponding FieldDoc * that contains the values used to sort the given document. * These values are not the raw values out of the index, but the internal * representation of them. This is so the given search hit can be collated by * a MultiSearcher with other search hits. * * @param entry The Entry used to create a FieldDoc * @return The newly created FieldDoc * @see Searchable#search(Weight,Filter,int,Sort) */ FieldDoc fillFields(final Entry entry) { final int n = comparators.length; final Comparable[] fields = new Comparable[n]; for (int i = 0; i < n; ++i) { fields[i] = comparators[i].value(entry.slot); } //if (maxscore > 1.0f) doc.score /= maxscore; // normalize scores return new FieldDoc(entry.docID, entry.score, fields); } {code} was (Author: markrmil...@gmail.com): bq. but how does this fit together. Thats what Comparable FieldComparator#value is for - fillFields will grab all those and load up FieldDoc fields - so the custom FieldComparator is tied into it - it creates Comparable objects that can be compared by the native compareTos. {code} /** * Given a queue Entry, creates a corresponding FieldDoc * that contains the values used to sort the given document. * These values are not the raw values out of the index, but the internal * representation of them. This is so the given search hit can be collated by * a MultiSearcher with other search hits. * * @param entry The Entry used to create a FieldDoc * @return The newly created FieldDoc * @see Searchable#search(Weight,Filter,int,Sort) */ FieldDoc fillFields(final Entry entry) { final int n = comparators.length; final Comparable[] fields = new Comparable[n]; for (int i = 0; i < n; ++i) { fields[i] = comparators[i].value(entry.slot); } //if (maxscore > 1.0f) doc.score /= maxscore; // normalize scores return new FieldDoc(entry.docID, entry.score, fields); } {code} > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API
[ https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769221#action_12769221 ] Mark Miller commented on LUCENE-1997: - bq. but how does this fit together. Thats what Comparable FieldComparator#value is for - fillFields will grab all those and load up FieldDoc fields - so the custom FieldComparator is tied into it - it creates Comparable objects that can be compared by the native compareTos. {code} /** * Given a queue Entry, creates a corresponding FieldDoc * that contains the values used to sort the given document. * These values are not the raw values out of the index, but the internal * representation of them. This is so the given search hit can be collated by * a MultiSearcher with other search hits. * * @param entry The Entry used to create a FieldDoc * @return The newly created FieldDoc * @see Searchable#search(Weight,Filter,int,Sort) */ FieldDoc fillFields(final Entry entry) { final int n = comparators.length; final Comparable[] fields = new Comparable[n]; for (int i = 0; i < n; ++i) { fields[i] = comparators[i].value(entry.slot); } //if (maxscore > 1.0f) doc.score /= maxscore; // normalize scores return new FieldDoc(entry.docID, entry.score, fields); } {code} > Explore performance of multi-PQ vs single-PQ sorting API > > > Key: LUCENE-1997 > URL: https://issues.apache.org/jira/browse/LUCENE-1997 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.9 >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-1997.patch, LUCENE-1997.patch > > > Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, > where a simpler (non-segment-based) comparator API is proposed that > gathers results into multiple PQs (one per segment) and then merges > them in the end. > I started from John's multi-PQ code and worked it into > contrib/benchmark so that we could run perf tests. Then I generified > the Python script I use for running search benchmarks (in > contrib/benchmark/sortBench.py). > The script first creates indexes with 1M docs (based on > SortableSingleDocSource, and based on wikipedia, if available). Then > it runs various combinations: > * Index with 20 balanced segments vs index with the "normal" log > segment size > * Queries with different numbers of hits (only for wikipedia index) > * Different top N > * Different sorts (by title, for wikipedia, and by random string, > random int, and country for the random index) > For each test, 7 search rounds are run and the best QPS is kept. The > script runs singlePQ then multiPQ, and records the resulting best QPS > for each and produces table (in Jira format) as output. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769218#action_12769218 ] Uwe Schindler commented on LUCENE-2002: --- I am happy to then use the merge operations in SVN tru apply this in trunk! +1 from my side! > Add oal.util.Version ctor to QueryParser > > > Key: LUCENE-2002 > URL: https://issues.apache.org/jira/browse/LUCENE-2002 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.9, 3.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 2.9.1 > > Attachments: LUCENE-2002-29.patch, LUCENE-2002-29.patch > > > This is a followup of LUCENE-1987: > If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses > QueryParser, phrase queries will not work, because the StopFilter enables > position Increments for stop words, but QueryParser ignores them per default. > The user has to explicitely enable them. > This issue would add a ctor taking the Version constant and automatically > enable this setting. The same applies to the contrib queryparser. Eventually > also StopAnalyzer should add this version ctor. > To be able to remove the default ctor for 3.0 (to remove a possible trap for > users of QueryParser), it must be deprecated and the new one also added to > 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene 2.9 sorting algorithm
>>I still think we should if performance is no >>better with the new one. Where is there any indication performance is not better with the new one? The benchmarks are clearly against switching back. At best they could argue for two API's - even then it depends - a loss of 10% on Java 1.5 with the most recent linux for a topn:10 ? I'm all for more results, but its not looking like a good switch to me. What API do I use? Well, it depends - how many docs will you ask for back, what OS are running, how hard is it for you to grok one API over the other? And then as we make changes in the future we have to manage both APIs. bq. digging in deep and running thorough perf tests makes sense Again - no one is arguing against - dig all year - I'll help - but I don't see the treasure yet, and the hole is starting to look deep. bq. removing that if from the Multi PQ patch makes sense I didn't have a problem with that either - or other code changes - but jeeze, mention what you are seeing with the switch. I'll tell you what I saw it - not that much - a bit of improvement, but take a look at the Java 1.5 run - it ended up being a blade of grass holding up a boulder on Linux. Michael McCandless wrote: > Sheesh I go to bed and so much all of a sudden happens!! > > Sorry Mark; I should've called out "PATCH IS ON 2.9 BRANCH" more > clearly ;) > > There's no question in my mind that the new comparator API is more > complex than the old one, and I really don't like that. I had to > rewrite the section of LIA that gives an example of a [simple] custom > sort and it wasn't pleasant! Two compare methods (compare, > compareBottom)? Two copy methods (copy, setBottom)? Sure, you can > grok it and get through it if you have to, but it is more complex > because it's conflated with the PQ API. > > Ease on consumption of our APIs is very important, so, only when > performance clearly warrants it should we adopt a more complex API. > > Also, yeah, it would suck to have to switch back to the old API at > this point, but net/net I still think we should if performance is no > better with the new one. > > The old API also fits cleanly with per-segment searching (John's > initial patch shows that -- it's simply another per-segment Colletor). > The two APIs (collection, comparator) are well decoupled. > > So, digging in deep and running thorough perf tests makes sense; we > need to understand the performance to make the API switch decision. > And definitely we should tune both approaches as much as possible > (removing that if from the Multi PQ patch makes sense). > > But... Multi PQ's performance isn't better in many cases... though, > we're clearly still iterating. I'll run a 1.5 (32 & 64 bit) test, > with the if statement removed. > > Mike > > On Fri, Oct 23, 2009 at 3:53 AM, Earwin Burrfoot wrote: > >> I did. >> >> On Fri, Oct 23, 2009 at 09:05, Jake Mannix wrote: >> >>> On Thu, Oct 22, 2009 at 9:58 PM, Mark Miller wrote: >>> Yes - I've seen a handful of non core devs report back that they upgraded with no complaints on the difficulty. Its in the mailing list archives. The only core dev I've seen say its easy is Uwe. He's super sharp though, so I wasn't banking my comment on him ;) >>> Upgrade custom sorting? Where has anyone talked about this? >>> >>> 2.9 is great, I like the new apis, they're great in general. It's just this >>> multi-segment sorting we're talking about here. >>> >>> -jake >>> >>> >>> >> >> -- >> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) >> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 >> ICQ: 104465785 >> >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >> > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769214#action_12769214 ] Grant Ingersoll commented on LUCENE-2002: - +1 on this patch. > Add oal.util.Version ctor to QueryParser > > > Key: LUCENE-2002 > URL: https://issues.apache.org/jira/browse/LUCENE-2002 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.9, 3.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 2.9.1 > > Attachments: LUCENE-2002-29.patch, LUCENE-2002-29.patch > > > This is a followup of LUCENE-1987: > If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses > QueryParser, phrase queries will not work, because the StopFilter enables > position Increments for stop words, but QueryParser ignores them per default. > The user has to explicitely enable them. > This issue would add a ctor taking the Version constant and automatically > enable this setting. The same applies to the contrib queryparser. Eventually > also StopAnalyzer should add this version ctor. > To be able to remove the default ctor for 3.0 (to remove a possible trap for > users of QueryParser), it must be deprecated and the new one also added to > 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769213#action_12769213 ] Uwe Schindler commented on LUCENE-2002: --- They only appear with native patch. All higher-level svn related patch impls "know" this. I think bekause oth this they were removed in trunk. > Add oal.util.Version ctor to QueryParser > > > Key: LUCENE-2002 > URL: https://issues.apache.org/jira/browse/LUCENE-2002 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.9, 3.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 2.9.1 > > Attachments: LUCENE-2002-29.patch, LUCENE-2002-29.patch > > > This is a followup of LUCENE-1987: > If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses > QueryParser, phrase queries will not work, because the StopFilter enables > position Increments for stop words, but QueryParser ignores them per default. > The user has to explicitely enable them. > This issue would add a ctor taking the Version constant and automatically > enable this setting. The same applies to the contrib queryparser. Eventually > also StopAnalyzer should add this version ctor. > To be able to remove the default ctor for 3.0 (to remove a possible trap for > users of QueryParser), it must be deprecated and the new one also added to > 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: lucene 2.9 sorting algorithm
Yup, I'm not against the testing or the thought - and it is clearly more complicated - I'm not saying its not. But I haven't seen anyone thats come and said they haven't grokked it yet or that they had a hard time with it (though they have run into limitations in what they have tried to do). John and Jake are not the first the upgrade (and while they have noted its more complicated, they grokked it too). Its a matter of how difficult it is, not thats its a little more complicated than the old. Doing this is an advanced Lucene op - a bunch of people have done it already - these guys are not the first - no one has really gotten tripped up. I'm not in love with the new API, but I'm still waiting to see the list of people that haven't grokked it. I don't like the two idea of supporting two API's and I don't like the idea of herding everyone back right after herding them over. If it clearly made sense, I have no reason to be against it - but I'm just not seeing that myself. Michael McCandless wrote: > Sheesh I go to bed and so much all of a sudden happens!! > > Sorry Mark; I should've called out "PATCH IS ON 2.9 BRANCH" more > clearly ;) > > There's no question in my mind that the new comparator API is more > complex than the old one, and I really don't like that. I had to > rewrite the section of LIA that gives an example of a [simple] custom > sort and it wasn't pleasant! Two compare methods (compare, > compareBottom)? Two copy methods (copy, setBottom)? Sure, you can > grok it and get through it if you have to, but it is more complex > because it's conflated with the PQ API. > > Ease on consumption of our APIs is very important, so, only when > performance clearly warrants it should we adopt a more complex API. > > Also, yeah, it would suck to have to switch back to the old API at > this point, but net/net I still think we should if performance is no > better with the new one. > > The old API also fits cleanly with per-segment searching (John's > initial patch shows that -- it's simply another per-segment Colletor). > The two APIs (collection, comparator) are well decoupled. > > So, digging in deep and running thorough perf tests makes sense; we > need to understand the performance to make the API switch decision. > And definitely we should tune both approaches as much as possible > (removing that if from the Multi PQ patch makes sense). > > But... Multi PQ's performance isn't better in many cases... though, > we're clearly still iterating. I'll run a 1.5 (32 & 64 bit) test, > with the if statement removed. > > Mike > > On Fri, Oct 23, 2009 at 3:53 AM, Earwin Burrfoot wrote: > >> I did. >> >> On Fri, Oct 23, 2009 at 09:05, Jake Mannix wrote: >> >>> On Thu, Oct 22, 2009 at 9:58 PM, Mark Miller wrote: >>> Yes - I've seen a handful of non core devs report back that they upgraded with no complaints on the difficulty. Its in the mailing list archives. The only core dev I've seen say its easy is Uwe. He's super sharp though, so I wasn't banking my comment on him ;) >>> Upgrade custom sorting? Where has anyone talked about this? >>> >>> 2.9 is great, I like the new apis, they're great in general. It's just this >>> multi-segment sorting we're talking about here. >>> >>> -jake >>> >>> >>> >> >> -- >> Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) >> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 >> ICQ: 104465785 >> >> - >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >> > > - > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > -- - Mark http://www.lucidimagination.com - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769203#action_12769203 ] Grant Ingersoll commented on LUCENE-2002: - Yes, they are near the $Id tags. That's kind of an odd error, though. At any rate, I'm running the tests now. > Add oal.util.Version ctor to QueryParser > > > Key: LUCENE-2002 > URL: https://issues.apache.org/jira/browse/LUCENE-2002 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.9, 3.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 2.9.1 > > Attachments: LUCENE-2002-29.patch, LUCENE-2002-29.patch > > > This is a followup of LUCENE-1987: > If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses > QueryParser, phrase queries will not work, because the StopFilter enables > position Increments for stop words, but QueryParser ignores them per default. > The user has to explicitely enable them. > This issue would add a ctor taking the Version constant and automatically > enable this setting. The same applies to the contrib queryparser. Eventually > also StopAnalyzer should add this version ctor. > To be able to remove the default ctor for 3.0 (to remove a possible trap for > users of QueryParser), it must be deprecated and the new one also added to > 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769196#action_12769196 ] Michael McCandless commented on LUCENE-2002: bq. I'm getting errors applying this, but they look like they are just in the javadoc comments.. Are the patch errors near the $Id$ tags? (Which we've removed from trunk, for this reason). bq. I would add it to StopAnalyzer OK I'll add. > Add oal.util.Version ctor to QueryParser > > > Key: LUCENE-2002 > URL: https://issues.apache.org/jira/browse/LUCENE-2002 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.9, 3.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 2.9.1 > > Attachments: LUCENE-2002-29.patch, LUCENE-2002-29.patch > > > This is a followup of LUCENE-1987: > If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses > QueryParser, phrase queries will not work, because the StopFilter enables > position Increments for stop words, but QueryParser ignores them per default. > The user has to explicitely enable them. > This issue would add a ctor taking the Version constant and automatically > enable this setting. The same applies to the contrib queryparser. Eventually > also StopAnalyzer should add this version ctor. > To be able to remove the default ctor for 3.0 (to remove a possible trap for > users of QueryParser), it must be deprecated and the new one also added to > 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1973) Remove deprecated query components
[ https://issues.apache.org/jira/browse/LUCENE-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769194#action_12769194 ] Uwe Schindler commented on LUCENE-1973: --- Committed removal of BoostingTermQuery in revision: 829020 > Remove deprecated query components > -- > > Key: LUCENE-1973 > URL: https://issues.apache.org/jira/browse/LUCENE-1973 > Project: Lucene - Java > Issue Type: Task > Components: Search >Reporter: Uwe Schindler > Fix For: 3.0 > > Attachments: LUCENE-1973-BoostingTermQuery.patch, > LUCENE-1973-Similarity-BW.patch, LUCENE-1973-Similarity.patch > > > Remove the rest of the deprecated query components. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1973) Remove deprecated query components
[ https://issues.apache.org/jira/browse/LUCENE-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765093#action_12765093 ] Uwe Schindler edited comment on LUCENE-1973 at 10/23/09 12:22 PM: -- There are still some of them: - explain() in Scorer (I do not know what to do exactly here, I use explain() very seldom) - -idf() in Similarity- (DONE) - IndexSearcher.fieldSortDoTrackScores / IS.fieldSortDoMaxScore - -BoostingTermQuery- (DONE) - MultiValueSource (what to do with it?) - -BooleanQuery scoreDocOutOfOrder & others (LUCENE-944)- (DONE) I am not familar with all of these, so I do not want to fix it. was (Author: thetaphi): There are still some of them: - explain() in Scorer (I do not know what to do exactly here, I use explain() very seldom) - -idf() in Similarity- (DONE) - IndexSearcher.fieldSortDoTrackScores / IS.fieldSortDoMaxScore - BoostingTermQuery - MultiValueSource (what to do with it?) - -BooleanQuery scoreDocOutOfOrder & others (LUCENE-944)- (DONE) I am not familar with all of these, so I do not want to fix it. > Remove deprecated query components > -- > > Key: LUCENE-1973 > URL: https://issues.apache.org/jira/browse/LUCENE-1973 > Project: Lucene - Java > Issue Type: Task > Components: Search >Reporter: Uwe Schindler > Fix For: 3.0 > > Attachments: LUCENE-1973-BoostingTermQuery.patch, > LUCENE-1973-Similarity-BW.patch, LUCENE-1973-Similarity.patch > > > Remove the rest of the deprecated query components. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1973) Remove deprecated query components
[ https://issues.apache.org/jira/browse/LUCENE-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1973: -- Attachment: LUCENE-1973-BoostingTermQuery.patch remove BoostingTermQuery. The xml-query-parser now creates a PayloadTermQuery with the default payload function. > Remove deprecated query components > -- > > Key: LUCENE-1973 > URL: https://issues.apache.org/jira/browse/LUCENE-1973 > Project: Lucene - Java > Issue Type: Task > Components: Search >Reporter: Uwe Schindler > Fix For: 3.0 > > Attachments: LUCENE-1973-BoostingTermQuery.patch, > LUCENE-1973-Similarity-BW.patch, LUCENE-1973-Similarity.patch > > > Remove the rest of the deprecated query components. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2002) Add oal.util.Version ctor to QueryParser
[ https://issues.apache.org/jira/browse/LUCENE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769190#action_12769190 ] Grant Ingersoll commented on LUCENE-2002: - I'm getting errors applying this, but they look like they are just in the javadoc comments.. > Add oal.util.Version ctor to QueryParser > > > Key: LUCENE-2002 > URL: https://issues.apache.org/jira/browse/LUCENE-2002 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 2.9, 3.0 >Reporter: Uwe Schindler >Assignee: Michael McCandless > Fix For: 2.9.1 > > Attachments: LUCENE-2002-29.patch, LUCENE-2002-29.patch > > > This is a followup of LUCENE-1987: > If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses > QueryParser, phrase queries will not work, because the StopFilter enables > position Increments for stop words, but QueryParser ignores them per default. > The user has to explicitely enable them. > This issue would add a ctor taking the Version constant and automatically > enable this setting. The same applies to the contrib queryparser. Eventually > also StopAnalyzer should add this version ctor. > To be able to remove the default ctor for 3.0 (to remove a possible trap for > users of QueryParser), it must be deprecated and the new one also added to > 2.9.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1257) Port to Java5
[ https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769186#action_12769186 ] Uwe Schindler commented on LUCENE-1257: --- Committed: LUCENE-1257_contrib_benchmark.patch 2009-10-22 05:53 PM Kay Kay 56 kB LUCENE-1257_unnnecessary_casts_2.patch 2009-10-22 08:16 PM Kay Kay 22 kB Revision: 829013 With the highlighter patch I will wait until LUCENE-2003 is done to not break the patch of Mark. > Port to Java5 > - > > Key: LUCENE-1257 > URL: https://issues.apache.org/jira/browse/LUCENE-1257 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis, Examples, Index, Other, Query/Scoring, > QueryParser, Search, Store, Term Vectors >Affects Versions: 3.0 >Reporter: Cédric Champeau >Assignee: Uwe Schindler >Priority: Minor > Fix For: 3.0 > > Attachments: instantiated_fieldable.patch, > LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, > LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, > LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, > LUCENE-1257-CompoundFileReaderWriter.patch, > LUCENE-1257-ConcurrentMergeScheduler.patch, > LUCENE-1257-DirectoryReader.patch, > LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, > LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, > LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, > LUCENE-1257-IndexDeleter.patch, > LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, > LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, > LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, > LUCENE-1257-org_apache_lucene_document.patch, > LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, > LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, > LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, > LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, > LUCENE-1257_BooleanFilter_Generics.patch, > LUCENE-1257_contrib_benchmark.patch, LUCENE-1257_contrib_highlighting.patch, > LUCENE-1257_javacc_upgrade.patch, LUCENE-1257_messages.patch, > LUCENE-1257_more_unnecessary_casts.patch, > LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, > LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, > LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, > LUCENE-1257_o_a_l_search_spans.patch, > LUCENE-1257_org_apache_lucene_index.patch, > LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_queryParser_jj.patch, > LUCENE-1257_unnecessary_casts.patch, LUCENE-1257_unnnecessary_casts_2.patch, > lucene1257surround1.patch, lucene1257surround1.patch, > shinglematrixfilter_generified.patch > > > For my needs I've updated Lucene so that it uses Java 5 constructs. I know > Java 5 migration had been planned for 2.1 someday in the past, but don't know > when it is planned now. This patch against the trunk includes : > - most obvious generics usage (there are tons of usages of sets, ... Those > which are commonly used have been generified) > - PriorityQueue generification > - replacement of indexed for loops with for each constructs > - removal of unnececessary unboxing > The code is to my opinion much more readable with those features (you > actually *know* what is stored in collections reading the code, without the > need to lookup for field definitions everytime) and it simplifies many > algorithms. > Note that this patch also includes an interface for the Query class. This has > been done for my company's needs for building custom Query classes which add > some behaviour to the base Lucene queries. It prevents multiple unnnecessary > casts. I know this introduction is not wanted by the team, but it really > makes our developments easier to maintain. If you don't want to use this, > replace all /Queriable/ calls with standard /Query/. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1973) Remove deprecated query components
[ https://issues.apache.org/jira/browse/LUCENE-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1973: -- Attachment: (was: LUCENE-1973-Similarity.patch) > Remove deprecated query components > -- > > Key: LUCENE-1973 > URL: https://issues.apache.org/jira/browse/LUCENE-1973 > Project: Lucene - Java > Issue Type: Task > Components: Search >Reporter: Uwe Schindler > Fix For: 3.0 > > Attachments: LUCENE-1973-Similarity-BW.patch, > LUCENE-1973-Similarity.patch > > > Remove the rest of the deprecated query components. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1973) Remove deprecated query components
[ https://issues.apache.org/jira/browse/LUCENE-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1973: -- Attachment: (was: LUCENE-1973-Similarity-BW.patch) > Remove deprecated query components > -- > > Key: LUCENE-1973 > URL: https://issues.apache.org/jira/browse/LUCENE-1973 > Project: Lucene - Java > Issue Type: Task > Components: Search >Reporter: Uwe Schindler > Fix For: 3.0 > > Attachments: LUCENE-1973-Similarity-BW.patch, > LUCENE-1973-Similarity.patch > > > Remove the rest of the deprecated query components. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1973) Remove deprecated query components
[ https://issues.apache.org/jira/browse/LUCENE-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1973: -- Attachment: (was: LUCENE-1973-Similarity.patch) > Remove deprecated query components > -- > > Key: LUCENE-1973 > URL: https://issues.apache.org/jira/browse/LUCENE-1973 > Project: Lucene - Java > Issue Type: Task > Components: Search >Reporter: Uwe Schindler > Fix For: 3.0 > > Attachments: LUCENE-1973-Similarity-BW.patch, > LUCENE-1973-Similarity.patch > > > Remove the rest of the deprecated query components. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org