[jira] Created: (LUCENE-2272) PayloadNearQuery has hardwired explanation for 'AveragePayloadFunction'
PayloadNearQuery has hardwired explanation for 'AveragePayloadFunction' --- Key: LUCENE-2272 URL: https://issues.apache.org/jira/browse/LUCENE-2272 Project: Lucene - Java Issue Type: Bug Components: Search Reporter: Peter Keegan Attachments: payloadfunctin-patch.txt The 'explain' method in PayloadNearSpanScorer assumes the AveragePayloadFunction was used. This patch adds the 'explain' method to the 'PayloadFunction' interface, where the Scorer can call it. Added unit tests for 'explain' and for {Min,Max}PayloadFunction. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2272) PayloadNearQuery has hardwired explanation for 'AveragePayloadFunction'
[ https://issues.apache.org/jira/browse/LUCENE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Keegan updated LUCENE-2272: - Attachment: payloadfunctin-patch.txt This patch adds the 'explain' method to the 'PayloadFunction' interface, where the Scorer can call it. Added unit tests for 'explain' and for {Min,Max}PayloadFunction. PayloadNearQuery has hardwired explanation for 'AveragePayloadFunction' --- Key: LUCENE-2272 URL: https://issues.apache.org/jira/browse/LUCENE-2272 Project: Lucene - Java Issue Type: Bug Components: Search Reporter: Peter Keegan Attachments: payloadfunctin-patch.txt The 'explain' method in PayloadNearSpanScorer assumes the AveragePayloadFunction was used. This patch adds the 'explain' method to the 'PayloadFunction' interface, where the Scorer can call it. Added unit tests for 'explain' and for {Min,Max}PayloadFunction. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1986) NPE in NearSpansUnordered from PayloadNearQuery
[ https://issues.apache.org/jira/browse/LUCENE-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767373#action_12767373 ] Peter Keegan commented on LUCENE-1986: -- + if (!more) { +return false; + } I was about to submit this same patch today, but I see you beat me to it :) Thanks Mark. NPE in NearSpansUnordered from PayloadNearQuery --- Key: LUCENE-1986 URL: https://issues.apache.org/jira/browse/LUCENE-1986 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.9 Reporter: Peter Keegan Assignee: Michael McCandless Fix For: 2.9.1, 3.0 Attachments: LUCENE-1986.patch, LUCENE-1986.patch, TestPayloadNearQuery1.java The following query causes a NPE in NearSpansUnordered, and is reproducible with the the attached unit test. The failure occurs on the last document scored. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1986) NPE in NearSpansUnordered from PayloadNearQuery
NPE in NearSpansUnordered from PayloadNearQuery --- Key: LUCENE-1986 URL: https://issues.apache.org/jira/browse/LUCENE-1986 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.9 Reporter: Peter Keegan Attachments: TestPayloadNearQuery1.java The following query causes a NPE in NearSpansUnordered, and is reproducible with the the attached unit test. The failure occurs on the last document scored. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1986) NPE in NearSpansUnordered from PayloadNearQuery
[ https://issues.apache.org/jira/browse/LUCENE-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Keegan updated LUCENE-1986: - Attachment: TestPayloadNearQuery1.java Unit test that causes NPE NPE in NearSpansUnordered from PayloadNearQuery --- Key: LUCENE-1986 URL: https://issues.apache.org/jira/browse/LUCENE-1986 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.9 Reporter: Peter Keegan Attachments: TestPayloadNearQuery1.java The following query causes a NPE in NearSpansUnordered, and is reproducible with the the attached unit test. The failure occurs on the last document scored. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1341) BoostingNearQuery class (prototype)
[ https://issues.apache.org/jira/browse/LUCENE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Keegan updated LUCENE-1341: - Attachment: lucene-1341-new-2.patch New version that works with current trunk (8/5/09) BoostingNearQuery class (prototype) --- Key: LUCENE-1341 URL: https://issues.apache.org/jira/browse/LUCENE-1341 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Affects Versions: 2.3.1 Reporter: Peter Keegan Assignee: Grant Ingersoll Priority: Minor Fix For: 3.0 Attachments: bnq.patch, bnq.patch, BoostingNearQuery.java, BoostingNearQuery.java, lucene-1341-new-1.patch, lucene-1341-new-2.patch, LUCENE-1341-new.patch, LUCENE-1341.patch This patch implements term boosting for SpanNearQuery. Refer to: http://www.gossamer-threads.com/lists/lucene/java-user/62779 This patch works but probably needs more work. I don't like the use of 'instanceof', but I didn't want to touch Spans or TermSpans. Also, the payload code is mostly a copy of what's in BoostingTermQuery and could be common-sourced somewhere. Feel free to throw darts at it :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1341) BoostingNearQuery class (prototype)
[ https://issues.apache.org/jira/browse/LUCENE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Keegan updated LUCENE-1341: - Attachment: lucene-1341-new-1.patch As I was debugging a unit test for BoostingNearQuery, I discovered that not all the payloads were getting read. The 'needToLoadPayload' flag on the termpos was getting reset on the last term in the span by NearSpansOrdered. Then I noticed that the term positions aren't even needed in BNQ because they were already collected by the Spans in 'matchPayload'. So, here is a newer, simpler implementation of BNQ along with some unit tests. Peter BoostingNearQuery class (prototype) --- Key: LUCENE-1341 URL: https://issues.apache.org/jira/browse/LUCENE-1341 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Affects Versions: 2.3.1 Reporter: Peter Keegan Assignee: Grant Ingersoll Priority: Minor Fix For: 3.0 Attachments: bnq.patch, bnq.patch, BoostingNearQuery.java, BoostingNearQuery.java, lucene-1341-new-1.patch, LUCENE-1341-new.patch, LUCENE-1341.patch This patch implements term boosting for SpanNearQuery. Refer to: http://www.gossamer-threads.com/lists/lucene/java-user/62779 This patch works but probably needs more work. I don't like the use of 'instanceof', but I didn't want to touch Spans or TermSpans. Also, the payload code is mostly a copy of what's in BoostingTermQuery and could be common-sourced somewhere. Feel free to throw darts at it :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1341) BoostingNearQuery class (prototype)
[ https://issues.apache.org/jira/browse/LUCENE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Keegan updated LUCENE-1341: - Attachment: LUCENE-1341-new.patch Here is an updated patch for the 2.4 branch. It's 6 months late because I missed Grant's e-mail requesting me to retest. I was just recently looking to see what became of the original patch. Peter BoostingNearQuery class (prototype) --- Key: LUCENE-1341 URL: https://issues.apache.org/jira/browse/LUCENE-1341 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Affects Versions: 2.3.1 Reporter: Peter Keegan Assignee: Grant Ingersoll Priority: Minor Fix For: 3.0 Attachments: bnq.patch, bnq.patch, BoostingNearQuery.java, BoostingNearQuery.java, LUCENE-1341-new.patch, LUCENE-1341.patch This patch implements term boosting for SpanNearQuery. Refer to: http://www.gossamer-threads.com/lists/lucene/java-user/62779 This patch works but probably needs more work. I don't like the use of 'instanceof', but I didn't want to touch Spans or TermSpans. Also, the payload code is mostly a copy of what's in BoostingTermQuery and could be common-sourced somewhere. Feel free to throw darts at it :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1341) BoostingNearQuery class (prototype)
[ https://issues.apache.org/jira/browse/LUCENE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Keegan updated LUCENE-1341: - Attachment: BoostingNearQuery.java bnq.patch Here is version of patch for Java 1.4 BoostingNearQuery class (prototype) --- Key: LUCENE-1341 URL: https://issues.apache.org/jira/browse/LUCENE-1341 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Affects Versions: 2.3.1 Reporter: Peter Keegan Priority: Minor Fix For: 2.3.2 Attachments: bnq.patch, bnq.patch, BoostingNearQuery.java, BoostingNearQuery.java This patch implements term boosting for SpanNearQuery. Refer to: http://www.gossamer-threads.com/lists/lucene/java-user/62779 This patch works but probably needs more work. I don't like the use of 'instanceof', but I didn't want to touch Spans or TermSpans. Also, the payload code is mostly a copy of what's in BoostingTermQuery and could be common-sourced somewhere. Feel free to throw darts at it :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1341) BoostingNearQuery class (prototype)
BoostingNearQuery class (prototype) --- Key: LUCENE-1341 URL: https://issues.apache.org/jira/browse/LUCENE-1341 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Affects Versions: 2.3.1 Reporter: Peter Keegan Priority: Minor Fix For: 2.3.2 This patch implements term boosting for SpanNearQuery. Refer to: http://www.gossamer-threads.com/lists/lucene/java-user/62779 This patch works but probably needs more work. I don't like the use of 'instanceof', but I didn't want to touch Spans or TermSpans. Also, the payload code is mostly a copy of what's in BoostingTermQuery and could be common-sourced somewhere. Feel free to throw darts at it :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1341) BoostingNearQuery class (prototype)
[ https://issues.apache.org/jira/browse/LUCENE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Keegan updated LUCENE-1341: - Attachment: BoostingNearQuery.java bnq.patch BoostingNearQuery class (prototype) --- Key: LUCENE-1341 URL: https://issues.apache.org/jira/browse/LUCENE-1341 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Affects Versions: 2.3.1 Reporter: Peter Keegan Priority: Minor Fix For: 2.3.2 Attachments: bnq.patch, BoostingNearQuery.java This patch implements term boosting for SpanNearQuery. Refer to: http://www.gossamer-threads.com/lists/lucene/java-user/62779 This patch works but probably needs more work. I don't like the use of 'instanceof', but I didn't want to touch Spans or TermSpans. Also, the payload code is mostly a copy of what's in BoostingTermQuery and could be common-sourced somewhere. Feel free to throw darts at it :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1341) BoostingNearQuery class (prototype)
[ https://issues.apache.org/jira/browse/LUCENE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12615004#action_12615004 ] Peter Keegan commented on LUCENE-1341: -- Note that this patch requires java 1.5 or later (easily modified to run on 1.4) BoostingNearQuery class (prototype) --- Key: LUCENE-1341 URL: https://issues.apache.org/jira/browse/LUCENE-1341 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Affects Versions: 2.3.1 Reporter: Peter Keegan Priority: Minor Fix For: 2.3.2 Attachments: bnq.patch, BoostingNearQuery.java This patch implements term boosting for SpanNearQuery. Refer to: http://www.gossamer-threads.com/lists/lucene/java-user/62779 This patch works but probably needs more work. I don't like the use of 'instanceof', but I didn't want to touch Spans or TermSpans. Also, the payload code is mostly a copy of what's in BoostingTermQuery and could be common-sourced somewhere. Feel free to throw darts at it :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1093) SpanFirstQuery modification to aid term boosting based on position.
SpanFirstQuery modification to aid term boosting based on position. --- Key: LUCENE-1093 URL: https://issues.apache.org/jira/browse/LUCENE-1093 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Reporter: Peter Keegan This is a request for a modification to SpanFirstQuery that would allow term boosting based on a term's distance from the beginning of the document. This modification to SpanFirstQuery would be that the Spans returned by SpanFirstQuery.getSpans() must always return 0 from its start() method. Then the slop passed to sloppyFreq(slop) would be the distance from the beginning of the indexed field to the end of the Spans of the SpanQuery passed to SpanFirstQuery. Here is the discussion behind this issue: http://www.nabble.com/Can-I-do-boosting-based-on-term-postions--to11939423.html#a11939423 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1088) PriorityQueue 'wouldBeInserted' method
PriorityQueue 'wouldBeInserted' method -- Key: LUCENE-1088 URL: https://issues.apache.org/jira/browse/LUCENE-1088 Project: Lucene - Java Issue Type: New Feature Components: Other Reporter: Peter Keegan This is a request for a new method in PriorityQueue public boolean wouldBeInserted(Object element) // returns true if doc would be inserted, without inserting This would allow an application to prevent duplicate entries from being added to the queue. Here is a reference to the discussion behind this request: http://www.nabble.com/FieldSortedHitQueue-enhancement-to9733550.html#a9733550 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1017) BoostingTermQuery performance
[ https://issues.apache.org/jira/browse/LUCENE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550203 ] Peter Keegan commented on LUCENE-1017: -- Grant, Unfortunately, my performance test bed isn't suitable for contrib/benchmark because it's designed to simulate real queries from our log files. These are multi-threaded queries sent at a very high rate to stress test the Lucene server, which runs on an 8 cpu system. Given the somewhat dynamic nature of the test bed, I don't think the 5% performance increase that I reported is statistically significant. You're probably right that skipTo is not likely any faster. I still think it would be nice to have BoostingTermQuery that extends TermQuery, though. Peter BoostingTermQuery performance - Key: LUCENE-1017 URL: https://issues.apache.org/jira/browse/LUCENE-1017 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.2 Environment: all Reporter: Peter Keegan Priority: Minor Attachments: BoostingTermQuery.java, BoostingTermQuery.patch, termquery.patch I have been experimenting with payloads and BoostingTermQuery, which I think are excellent additions to Lucene core. Currently, BoostingTermQuery extends SpanQuery. I would suggest changing this class to extend TermQuery and refactor the current version to something like 'BoostingSpanQuery'. The reason is rooted in performance. In my testing, I compared query throughput using TermQuery against 2 versions of BoostingTermQuery - the current one that extends SpanQuery and one that extends TermQuery (which I've included, below). Here are the results (qps = queries per second): TermQuery:200 qps BoostingTermQuery (extends SpanQuery): 97 qps BoostingTermQuery (extends TermQuery): 130 qps Here is a version of BoostingTermQuery that extends TermQuery. I had to modify TermQuery and TermScorer to make them public. A code review would be in order, and I would appreciate your comments on this suggestion. Peter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1017) BoostingTermQuery performance
[ https://issues.apache.org/jira/browse/LUCENE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550238 ] Peter Keegan commented on LUCENE-1017: -- What's the use case? Is there something that isn't possible with it as is? I would say no, if we can conclude that there is no significant difference in performance between the 2 implementations. As a developer, when I see queries based on SpanQuery or PhraseQuery, this a tip that there is a potential performance impact. On the plus side, renaming the current implementation to 'BoostingSpanQuery' might give the developer a better hint of the methods of its superclass, too. Would you expect the cost of traversing the postings to be higher than reading the payload? Peter BoostingTermQuery performance - Key: LUCENE-1017 URL: https://issues.apache.org/jira/browse/LUCENE-1017 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.2 Environment: all Reporter: Peter Keegan Priority: Minor Attachments: BoostingTermQuery.java, BoostingTermQuery.patch, termquery.patch I have been experimenting with payloads and BoostingTermQuery, which I think are excellent additions to Lucene core. Currently, BoostingTermQuery extends SpanQuery. I would suggest changing this class to extend TermQuery and refactor the current version to something like 'BoostingSpanQuery'. The reason is rooted in performance. In my testing, I compared query throughput using TermQuery against 2 versions of BoostingTermQuery - the current one that extends SpanQuery and one that extends TermQuery (which I've included, below). Here are the results (qps = queries per second): TermQuery:200 qps BoostingTermQuery (extends SpanQuery): 97 qps BoostingTermQuery (extends TermQuery): 130 qps Here is a version of BoostingTermQuery that extends TermQuery. I had to modify TermQuery and TermScorer to make them public. A code review would be in order, and I would appreciate your comments on this suggestion. Peter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1017) BoostingTermQuery performance
[ https://issues.apache.org/jira/browse/LUCENE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Keegan updated LUCENE-1017: - Attachment: BoostingTermQuery.patch Here is new version that traverses the term positions properly, I believe. It's only about 5% faster than the spans version, though. It's quite possible that this could be improved. I'm using TermScorer to get me to the current doc and then setting TermPositions to the same doc. Do you see any inefficiencies with the 'next' and 'skipTo' calls, particularly in 'getPayloads'? BoostingTermQuery performance - Key: LUCENE-1017 URL: https://issues.apache.org/jira/browse/LUCENE-1017 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.2 Environment: all Reporter: Peter Keegan Attachments: BoostingTermQuery.java, BoostingTermQuery.patch, termquery.patch I have been experimenting with payloads and BoostingTermQuery, which I think are excellent additions to Lucene core. Currently, BoostingTermQuery extends SpanQuery. I would suggest changing this class to extend TermQuery and refactor the current version to something like 'BoostingSpanQuery'. The reason is rooted in performance. In my testing, I compared query throughput using TermQuery against 2 versions of BoostingTermQuery - the current one that extends SpanQuery and one that extends TermQuery (which I've included, below). Here are the results (qps = queries per second): TermQuery:200 qps BoostingTermQuery (extends SpanQuery): 97 qps BoostingTermQuery (extends TermQuery): 130 qps Here is a version of BoostingTermQuery that extends TermQuery. I had to modify TermQuery and TermScorer to make them public. A code review would be in order, and I would appreciate your comments on this suggestion. Peter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1017) BoostingTermQuery performance
[ https://issues.apache.org/jira/browse/LUCENE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532094 ] Peter Keegan commented on LUCENE-1017: -- Grant, You are right about it not handling multiple payloads per document. In fact, I don't think it's even reading the payloads for all the hits. I'll see what I can do, but feel free to step in and help me get this right. I won't touch svn for now. Peter BoostingTermQuery performance - Key: LUCENE-1017 URL: https://issues.apache.org/jira/browse/LUCENE-1017 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.2 Environment: all Reporter: Peter Keegan Attachments: BoostingTermQuery.java, termquery.patch I have been experimenting with payloads and BoostingTermQuery, which I think are excellent additions to Lucene core. Currently, BoostingTermQuery extends SpanQuery. I would suggest changing this class to extend TermQuery and refactor the current version to something like 'BoostingSpanQuery'. The reason is rooted in performance. In my testing, I compared query throughput using TermQuery against 2 versions of BoostingTermQuery - the current one that extends SpanQuery and one that extends TermQuery (which I've included, below). Here are the results (qps = queries per second): TermQuery:200 qps BoostingTermQuery (extends SpanQuery): 97 qps BoostingTermQuery (extends TermQuery): 130 qps Here is a version of BoostingTermQuery that extends TermQuery. I had to modify TermQuery and TermScorer to make them public. A code review would be in order, and I would appreciate your comments on this suggestion. Peter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1017) BoostingTermQuery performance
BoostingTermQuery performance - Key: LUCENE-1017 URL: https://issues.apache.org/jira/browse/LUCENE-1017 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.2 Environment: all Reporter: Peter Keegan I have been experimenting with payloads and BoostingTermQuery, which I think are excellent additions to Lucene core. Currently, BoostingTermQuery extends SpanQuery. I would suggest changing this class to extend TermQuery and refactor the current version to something like 'BoostingSpanQuery'. The reason is rooted in performance. In my testing, I compared query throughput using TermQuery against 2 versions of BoostingTermQuery - the current one that extends SpanQuery and one that extends TermQuery (which I've included, below). Here are the results (qps = queries per second): TermQuery:200 qps BoostingTermQuery (extends SpanQuery): 97 qps BoostingTermQuery (extends TermQuery): 130 qps Here is a version of BoostingTermQuery that extends TermQuery. I had to modify TermQuery and TermScorer to make them public. A code review would be in order, and I would appreciate your comments on this suggestion. Peter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1017) BoostingTermQuery performance
[ https://issues.apache.org/jira/browse/LUCENE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Keegan updated LUCENE-1017: - Attachment: BoostingTermQuery.java Suggested change to BoostingTermQuery to extend TermQuery BoostingTermQuery performance - Key: LUCENE-1017 URL: https://issues.apache.org/jira/browse/LUCENE-1017 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.2 Environment: all Reporter: Peter Keegan Attachments: BoostingTermQuery.java I have been experimenting with payloads and BoostingTermQuery, which I think are excellent additions to Lucene core. Currently, BoostingTermQuery extends SpanQuery. I would suggest changing this class to extend TermQuery and refactor the current version to something like 'BoostingSpanQuery'. The reason is rooted in performance. In my testing, I compared query throughput using TermQuery against 2 versions of BoostingTermQuery - the current one that extends SpanQuery and one that extends TermQuery (which I've included, below). Here are the results (qps = queries per second): TermQuery:200 qps BoostingTermQuery (extends SpanQuery): 97 qps BoostingTermQuery (extends TermQuery): 130 qps Here is a version of BoostingTermQuery that extends TermQuery. I had to modify TermQuery and TermScorer to make them public. A code review would be in order, and I would appreciate your comments on this suggestion. Peter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1017) BoostingTermQuery performance
[ https://issues.apache.org/jira/browse/LUCENE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Keegan updated LUCENE-1017: - Attachment: termquery.patch Changes to TermQuery and TermScorer for previous patch BoostingTermQuery performance - Key: LUCENE-1017 URL: https://issues.apache.org/jira/browse/LUCENE-1017 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.2 Environment: all Reporter: Peter Keegan Attachments: BoostingTermQuery.java, termquery.patch I have been experimenting with payloads and BoostingTermQuery, which I think are excellent additions to Lucene core. Currently, BoostingTermQuery extends SpanQuery. I would suggest changing this class to extend TermQuery and refactor the current version to something like 'BoostingSpanQuery'. The reason is rooted in performance. In my testing, I compared query throughput using TermQuery against 2 versions of BoostingTermQuery - the current one that extends SpanQuery and one that extends TermQuery (which I've included, below). Here are the results (qps = queries per second): TermQuery:200 qps BoostingTermQuery (extends SpanQuery): 97 qps BoostingTermQuery (extends TermQuery): 130 qps Here is a version of BoostingTermQuery that extends TermQuery. I had to modify TermQuery and TermScorer to make them public. A code review would be in order, and I would appreciate your comments on this suggestion. Peter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-991) BoostingTermQuery.explain() bugs
[ https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525709 ] Peter Keegan commented on LUCENE-991: - Hi Grant, TopDocs hits = searcher.search(query, null, 100); assertTrue(hits Size: + hits.totalHits + is not: + 0, hits.totalHits == 0); TopDocCollector discards hits with score = 0, so that's not a fair comparison. If you do a similar test with TermQuery (with a field boost = 0) instead of BoostingTermQuery, you'll see the difference. Even terms with 0 weight are included in the explanation. Make sense? Peter BoostingTermQuery.explain() bugs Key: LUCENE-991 URL: https://issues.apache.org/jira/browse/LUCENE-991 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.2 Reporter: Peter Keegan Assignee: Grant Ingersoll Priority: Minor Attachments: TestBoostingTermQuery.patch, TestBoostingTermQuery2.patch, TestBoostingTermQuery3.patch There are a couple of minor bugs in BoostingTermQuery.explain(). 1. The computation of average payload score produces NaN if no payloads were found. It should probably be: float avgPayloadScore = super.score() * (payloadsSeen 0 ? (payloadScore / payloadsSeen) : 1); 2. If the average payload score is zero, the value of the explanation is 0: result.setValue(nonPayloadExpl.getValue() * avgPayloadScore); If the query is part of a BooleanClause, this results in: no match on required clause... failure to meet condition(s) of required/prohibited clause(s) The average payload score can be zero if the field boost = 0. I've attached a patch to 'TestBoostingTermQuery.java', however, the test 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe someone more knowledgable of spans could investigate this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-991) BoostingTermQuery.explain() bugs
BoostingTermQuery.explain() bugs Key: LUCENE-991 URL: https://issues.apache.org/jira/browse/LUCENE-991 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.2 Reporter: Peter Keegan Priority: Minor Attachments: TestBoostingTermQuery.patch There are a couple of minor bugs in BoostingTermQuery.explain(). 1. The computation of average payload score produces NaN if no payloads were found. It should probably be: float avgPayloadScore = super.score() * (payloadsSeen 0 ? (payloadScore / payloadsSeen) : 1); 2. If the average payload score is zero, the value of the explanation is 0: result.setValue(nonPayloadExpl.getValue() * avgPayloadScore); If the query is part of a BooleanClause, this results in: no match on required clause... failure to meet condition(s) of required/prohibited clause(s) The average payload score can be zero if the field boost = 0. I've attached a patch to 'TestBoostingTermQuery.java', however, the test 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe someone more knowledgable of spans could investigate this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-693) ConjunctionScorer - more tuneup
[ http://issues.apache.org/jira/browse/LUCENE-693?page=comments#action_12444317 ] Peter Keegan commented on LUCENE-693: - Yonik, I tried out your patch, but it causes an exception on some boolean queries. This one occurred on a boolean query with 3 required terms: java.lang.ArrayIndexOutOfBoundsException: 2147483647 at org.apache.lucene.search.TermScorer.score(TermScorer.java:129) at org.apache.lucene.search.ConjunctionScorer.score( ConjunctionScorer.java:97) at org.apache.lucene.search.BooleanScorer2$2.score(BooleanScorer2.java :186) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java :318) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java :282) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132) at org.apache.lucene.search.Searcher.search(Searcher.java:116) at org.apache.lucene.search.Searcher.search(Searcher.java:95) It looks like the doc id has the sentinel value (Integer.MAX_VALUE). Note: one of the terms had no occurrences in the index. Peter ConjunctionScorer - more tuneup --- Key: LUCENE-693 URL: http://issues.apache.org/jira/browse/LUCENE-693 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.1 Environment: Windows Server 2003 x64, Java 1.6, pretty large index Reporter: Peter Keegan Attachments: conjunction.patch (See also: #LUCENE-443) I did some profile testing with the new ConjuctionScorer in 2.1 and discovered a new bottleneck in ConjunctionScorer.sortScorers. The java.utils.Arrays.sort method is cloning the Scorers array on every sort, which is quite expensive on large indexes because of the size of the 'norms' array within, and isn't necessary. Here is one possible solution: private void sortScorers() { // squeeze the array down for the sort //if (length != scorers.length) { // Scorer[] temps = new Scorer[length]; // System.arraycopy(scorers, 0, temps, 0, length); // scorers = temps; //} insertionSort( scorers,length ); // note that this comparator is not consistent with equals! //Arrays.sort(scorers, new Comparator() { // sort the array //public int compare(Object o1, Object o2) { // return ((Scorer)o1).doc() - ((Scorer)o2).doc(); //} // }); first = 0; last = length - 1; } private void insertionSort( Scorer[] scores, int len) { for (int i=0; ilen; i++) { for (int j=i; j0 scores[j-1].doc() scores[j].doc();j-- ) { swap (scores, j, j-1); } } return; } private void swap(Object[] x, int a, int b) { Object t = x[a]; x[a] = x[b]; x[b] = t; } The squeezing of the array is no longer needed. We also initialized the Scorers array to 8 (instead of 2) to avoid having to grow the array for common queries, although this probably has less performance impact. This change added about 3% to query throughput in my testing. Peter -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-693) ConjunctionScorer - more tuneup
[ http://issues.apache.org/jira/browse/LUCENE-693?page=comments#action_1208 ] Peter Keegan commented on LUCENE-693: - Well, I'm seeing a good 7% increase over the trunk version. Conjunction scorer time is mostly in 'skipto' now, which seems reasonable. Do the test cases try queries with non-existent terms? My failed query contained 3 required terms, but one of the terms was misspelled and didn't exist in the index. Peter ConjunctionScorer - more tuneup --- Key: LUCENE-693 URL: http://issues.apache.org/jira/browse/LUCENE-693 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.1 Environment: Windows Server 2003 x64, Java 1.6, pretty large index Reporter: Peter Keegan Attachments: conjunction.patch, conjunction.patch (See also: #LUCENE-443) I did some profile testing with the new ConjuctionScorer in 2.1 and discovered a new bottleneck in ConjunctionScorer.sortScorers. The java.utils.Arrays.sort method is cloning the Scorers array on every sort, which is quite expensive on large indexes because of the size of the 'norms' array within, and isn't necessary. Here is one possible solution: private void sortScorers() { // squeeze the array down for the sort //if (length != scorers.length) { // Scorer[] temps = new Scorer[length]; // System.arraycopy(scorers, 0, temps, 0, length); // scorers = temps; //} insertionSort( scorers,length ); // note that this comparator is not consistent with equals! //Arrays.sort(scorers, new Comparator() { // sort the array //public int compare(Object o1, Object o2) { // return ((Scorer)o1).doc() - ((Scorer)o2).doc(); //} // }); first = 0; last = length - 1; } private void insertionSort( Scorer[] scores, int len) { for (int i=0; ilen; i++) { for (int j=i; j0 scores[j-1].doc() scores[j].doc();j-- ) { swap (scores, j, j-1); } } return; } private void swap(Object[] x, int a, int b) { Object t = x[a]; x[a] = x[b]; x[b] = t; } The squeezing of the array is no longer needed. We also initialized the Scorers array to 8 (instead of 2) to avoid having to grow the array for common queries, although this probably has less performance impact. This change added about 3% to query throughput in my testing. Peter -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-693) ConjunctionScorer - more tuneup
[ http://issues.apache.org/jira/browse/LUCENE-693?page=comments#action_1236 ] Peter Keegan commented on LUCENE-693: - fwiw, my tests were done using 'real world' queries and index. Most queries have several required clauses. The jvm is 1.6 beta2 with -server. I would be interested to see results from others, too. thanks Yonik! Peter ConjunctionScorer - more tuneup --- Key: LUCENE-693 URL: http://issues.apache.org/jira/browse/LUCENE-693 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.1 Environment: Windows Server 2003 x64, Java 1.6, pretty large index Reporter: Peter Keegan Attachments: conjunction.patch, conjunction.patch, conjunction.patch (See also: #LUCENE-443) I did some profile testing with the new ConjuctionScorer in 2.1 and discovered a new bottleneck in ConjunctionScorer.sortScorers. The java.utils.Arrays.sort method is cloning the Scorers array on every sort, which is quite expensive on large indexes because of the size of the 'norms' array within, and isn't necessary. Here is one possible solution: private void sortScorers() { // squeeze the array down for the sort //if (length != scorers.length) { // Scorer[] temps = new Scorer[length]; // System.arraycopy(scorers, 0, temps, 0, length); // scorers = temps; //} insertionSort( scorers,length ); // note that this comparator is not consistent with equals! //Arrays.sort(scorers, new Comparator() { // sort the array //public int compare(Object o1, Object o2) { // return ((Scorer)o1).doc() - ((Scorer)o2).doc(); //} // }); first = 0; last = length - 1; } private void insertionSort( Scorer[] scores, int len) { for (int i=0; ilen; i++) { for (int j=i; j0 scores[j-1].doc() scores[j].doc();j-- ) { swap (scores, j, j-1); } } return; } private void swap(Object[] x, int a, int b) { Object t = x[a]; x[a] = x[b]; x[b] = t; } The squeezing of the array is no longer needed. We also initialized the Scorers array to 8 (instead of 2) to avoid having to grow the array for common queries, although this probably has less performance impact. This change added about 3% to query throughput in my testing. Peter -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-693) ConjunctionScorer - more tuneup
ConjunctionScorer - more tuneup --- Key: LUCENE-693 URL: http://issues.apache.org/jira/browse/LUCENE-693 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.1 Environment: Windows Server 2003 x64, Java 1.6, pretty large index Reporter: Peter Keegan (See also: #LUCENE-443) I did some profile testing with the new ConjuctionScorer in 2.1 and discovered a new bottleneck in ConjunctionScorer.sortScorers. The java.utils.Arrays.sort method is cloning the Scorers array on every sort, which is quite expensive on large indexes because of the size of the 'norms' array within, and isn't necessary. Here is one possible solution: private void sortScorers() { // squeeze the array down for the sort //if (length != scorers.length) { // Scorer[] temps = new Scorer[length]; // System.arraycopy(scorers, 0, temps, 0, length); // scorers = temps; //} insertionSort( scorers,length ); // note that this comparator is not consistent with equals! //Arrays.sort(scorers, new Comparator() { // sort the array //public int compare(Object o1, Object o2) { // return ((Scorer)o1).doc() - ((Scorer)o2).doc(); //} // }); first = 0; last = length - 1; } private void insertionSort( Scorer[] scores, int len) { for (int i=0; ilen; i++) { for (int j=i; j0 scores[j-1].doc() scores[j].doc();j-- ) { swap (scores, j, j-1); } } return; } private void swap(Object[] x, int a, int b) { Object t = x[a]; x[a] = x[b]; x[b] = t; } The squeezing of the array is no longer needed. We also initialized the Scorers array to 8 (instead of 2) to avoid having to grow the array for common queries, although this probably has less performance impact. This change added about 3% to query throughput in my testing. Peter -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]