[jira] Created: (LUCENE-2272) PayloadNearQuery has hardwired explanation for 'AveragePayloadFunction'

2010-02-19 Thread Peter Keegan (JIRA)
PayloadNearQuery has hardwired explanation for 'AveragePayloadFunction'
---

 Key: LUCENE-2272
 URL: https://issues.apache.org/jira/browse/LUCENE-2272
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Reporter: Peter Keegan
 Attachments: payloadfunctin-patch.txt

The 'explain' method in PayloadNearSpanScorer assumes the 
AveragePayloadFunction was used. This patch adds the 'explain' method to the 
'PayloadFunction' interface, where the Scorer can call it. Added unit tests for 
'explain' and for {Min,Max}PayloadFunction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2272) PayloadNearQuery has hardwired explanation for 'AveragePayloadFunction'

2010-02-19 Thread Peter Keegan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Keegan updated LUCENE-2272:
-

Attachment: payloadfunctin-patch.txt

This patch adds the 'explain' method to the 'PayloadFunction' interface, where 
the Scorer can call it. Added unit tests for 'explain' and for 
{Min,Max}PayloadFunction.

 PayloadNearQuery has hardwired explanation for 'AveragePayloadFunction'
 ---

 Key: LUCENE-2272
 URL: https://issues.apache.org/jira/browse/LUCENE-2272
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Reporter: Peter Keegan
 Attachments: payloadfunctin-patch.txt


 The 'explain' method in PayloadNearSpanScorer assumes the 
 AveragePayloadFunction was used. This patch adds the 'explain' method to the 
 'PayloadFunction' interface, where the Scorer can call it. Added unit tests 
 for 'explain' and for {Min,Max}PayloadFunction.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1986) NPE in NearSpansUnordered from PayloadNearQuery

2009-10-19 Thread Peter Keegan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767373#action_12767373
 ] 

Peter Keegan commented on LUCENE-1986:
--

+  if (!more) {
+return false;
+  }
I was about to submit this same patch today, but I see you beat me to it :) 
Thanks Mark.

 NPE in NearSpansUnordered from PayloadNearQuery
 ---

 Key: LUCENE-1986
 URL: https://issues.apache.org/jira/browse/LUCENE-1986
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9
Reporter: Peter Keegan
Assignee: Michael McCandless
 Fix For: 2.9.1, 3.0

 Attachments: LUCENE-1986.patch, LUCENE-1986.patch, 
 TestPayloadNearQuery1.java


 The following query causes a NPE in NearSpansUnordered, and is reproducible 
 with the the attached unit test. The failure occurs on the last document 
 scored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1986) NPE in NearSpansUnordered from PayloadNearQuery

2009-10-16 Thread Peter Keegan (JIRA)
NPE in NearSpansUnordered from PayloadNearQuery
---

 Key: LUCENE-1986
 URL: https://issues.apache.org/jira/browse/LUCENE-1986
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9
Reporter: Peter Keegan
 Attachments: TestPayloadNearQuery1.java

The following query causes a NPE in NearSpansUnordered, and is reproducible 
with the the attached unit test. The failure occurs on the last document scored.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1986) NPE in NearSpansUnordered from PayloadNearQuery

2009-10-16 Thread Peter Keegan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Keegan updated LUCENE-1986:
-

Attachment: TestPayloadNearQuery1.java

Unit test that causes NPE

 NPE in NearSpansUnordered from PayloadNearQuery
 ---

 Key: LUCENE-1986
 URL: https://issues.apache.org/jira/browse/LUCENE-1986
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9
Reporter: Peter Keegan
 Attachments: TestPayloadNearQuery1.java


 The following query causes a NPE in NearSpansUnordered, and is reproducible 
 with the the attached unit test. The failure occurs on the last document 
 scored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1341) BoostingNearQuery class (prototype)

2009-08-05 Thread Peter Keegan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Keegan updated LUCENE-1341:
-

Attachment: lucene-1341-new-2.patch

New version that works with current trunk (8/5/09)

 BoostingNearQuery class (prototype)
 ---

 Key: LUCENE-1341
 URL: https://issues.apache.org/jira/browse/LUCENE-1341
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Affects Versions: 2.3.1
Reporter: Peter Keegan
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.0

 Attachments: bnq.patch, bnq.patch, BoostingNearQuery.java, 
 BoostingNearQuery.java, lucene-1341-new-1.patch, lucene-1341-new-2.patch, 
 LUCENE-1341-new.patch, LUCENE-1341.patch


 This patch implements term boosting for SpanNearQuery. Refer to: 
 http://www.gossamer-threads.com/lists/lucene/java-user/62779
 This patch works but probably needs more work. I don't like the use of 
 'instanceof', but I didn't want to touch Spans or TermSpans. Also, the 
 payload code is mostly a copy of what's in BoostingTermQuery and could be 
 common-sourced somewhere. Feel free to throw darts at it :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1341) BoostingNearQuery class (prototype)

2009-04-23 Thread Peter Keegan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Keegan updated LUCENE-1341:
-

Attachment: lucene-1341-new-1.patch

As I was debugging a unit test for BoostingNearQuery, I discovered that not all 
the payloads were getting read. The 'needToLoadPayload' flag on the termpos was 
getting reset on the last term in the span by NearSpansOrdered. Then I noticed 
that the term positions aren't even needed in BNQ because they were already 
collected by the Spans in 'matchPayload'. So, here is a newer, simpler 
implementation of BNQ along with some unit tests.

Peter



 BoostingNearQuery class (prototype)
 ---

 Key: LUCENE-1341
 URL: https://issues.apache.org/jira/browse/LUCENE-1341
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Affects Versions: 2.3.1
Reporter: Peter Keegan
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.0

 Attachments: bnq.patch, bnq.patch, BoostingNearQuery.java, 
 BoostingNearQuery.java, lucene-1341-new-1.patch, LUCENE-1341-new.patch, 
 LUCENE-1341.patch


 This patch implements term boosting for SpanNearQuery. Refer to: 
 http://www.gossamer-threads.com/lists/lucene/java-user/62779
 This patch works but probably needs more work. I don't like the use of 
 'instanceof', but I didn't want to touch Spans or TermSpans. Also, the 
 payload code is mostly a copy of what's in BoostingTermQuery and could be 
 common-sourced somewhere. Feel free to throw darts at it :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1341) BoostingNearQuery class (prototype)

2009-02-11 Thread Peter Keegan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Keegan updated LUCENE-1341:
-

Attachment: LUCENE-1341-new.patch

Here is an updated patch for the 2.4 branch.
It's 6 months late because I missed Grant's e-mail requesting me to retest. I 
was just recently looking to see what became of the original patch.

Peter


 BoostingNearQuery class (prototype)
 ---

 Key: LUCENE-1341
 URL: https://issues.apache.org/jira/browse/LUCENE-1341
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Affects Versions: 2.3.1
Reporter: Peter Keegan
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.0

 Attachments: bnq.patch, bnq.patch, BoostingNearQuery.java, 
 BoostingNearQuery.java, LUCENE-1341-new.patch, LUCENE-1341.patch


 This patch implements term boosting for SpanNearQuery. Refer to: 
 http://www.gossamer-threads.com/lists/lucene/java-user/62779
 This patch works but probably needs more work. I don't like the use of 
 'instanceof', but I didn't want to touch Spans or TermSpans. Also, the 
 payload code is mostly a copy of what's in BoostingTermQuery and could be 
 common-sourced somewhere. Feel free to throw darts at it :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1341) BoostingNearQuery class (prototype)

2008-07-21 Thread Peter Keegan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Keegan updated LUCENE-1341:
-

Attachment: BoostingNearQuery.java
bnq.patch

Here is version of patch for Java 1.4

 BoostingNearQuery class (prototype)
 ---

 Key: LUCENE-1341
 URL: https://issues.apache.org/jira/browse/LUCENE-1341
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Affects Versions: 2.3.1
Reporter: Peter Keegan
Priority: Minor
 Fix For: 2.3.2

 Attachments: bnq.patch, bnq.patch, BoostingNearQuery.java, 
 BoostingNearQuery.java


 This patch implements term boosting for SpanNearQuery. Refer to: 
 http://www.gossamer-threads.com/lists/lucene/java-user/62779
 This patch works but probably needs more work. I don't like the use of 
 'instanceof', but I didn't want to touch Spans or TermSpans. Also, the 
 payload code is mostly a copy of what's in BoostingTermQuery and could be 
 common-sourced somewhere. Feel free to throw darts at it :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1341) BoostingNearQuery class (prototype)

2008-07-19 Thread Peter Keegan (JIRA)
BoostingNearQuery class (prototype)
---

 Key: LUCENE-1341
 URL: https://issues.apache.org/jira/browse/LUCENE-1341
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Affects Versions: 2.3.1
Reporter: Peter Keegan
Priority: Minor
 Fix For: 2.3.2


This patch implements term boosting for SpanNearQuery. Refer to: 
http://www.gossamer-threads.com/lists/lucene/java-user/62779

This patch works but probably needs more work. I don't like the use of 
'instanceof', but I didn't want to touch Spans or TermSpans. Also, the payload 
code is mostly a copy of what's in BoostingTermQuery and could be 
common-sourced somewhere. Feel free to throw darts at it :)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1341) BoostingNearQuery class (prototype)

2008-07-19 Thread Peter Keegan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Keegan updated LUCENE-1341:
-

Attachment: BoostingNearQuery.java
bnq.patch

 BoostingNearQuery class (prototype)
 ---

 Key: LUCENE-1341
 URL: https://issues.apache.org/jira/browse/LUCENE-1341
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Affects Versions: 2.3.1
Reporter: Peter Keegan
Priority: Minor
 Fix For: 2.3.2

 Attachments: bnq.patch, BoostingNearQuery.java


 This patch implements term boosting for SpanNearQuery. Refer to: 
 http://www.gossamer-threads.com/lists/lucene/java-user/62779
 This patch works but probably needs more work. I don't like the use of 
 'instanceof', but I didn't want to touch Spans or TermSpans. Also, the 
 payload code is mostly a copy of what's in BoostingTermQuery and could be 
 common-sourced somewhere. Feel free to throw darts at it :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1341) BoostingNearQuery class (prototype)

2008-07-19 Thread Peter Keegan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12615004#action_12615004
 ] 

Peter Keegan commented on LUCENE-1341:
--

Note that this patch requires java 1.5 or later (easily modified to run on 1.4)

 BoostingNearQuery class (prototype)
 ---

 Key: LUCENE-1341
 URL: https://issues.apache.org/jira/browse/LUCENE-1341
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Affects Versions: 2.3.1
Reporter: Peter Keegan
Priority: Minor
 Fix For: 2.3.2

 Attachments: bnq.patch, BoostingNearQuery.java


 This patch implements term boosting for SpanNearQuery. Refer to: 
 http://www.gossamer-threads.com/lists/lucene/java-user/62779
 This patch works but probably needs more work. I don't like the use of 
 'instanceof', but I didn't want to touch Spans or TermSpans. Also, the 
 payload code is mostly a copy of what's in BoostingTermQuery and could be 
 common-sourced somewhere. Feel free to throw darts at it :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1093) SpanFirstQuery modification to aid term boosting based on position.

2007-12-18 Thread Peter Keegan (JIRA)
SpanFirstQuery modification to aid term boosting based on position.
---

 Key: LUCENE-1093
 URL: https://issues.apache.org/jira/browse/LUCENE-1093
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Query/Scoring
Reporter: Peter Keegan


This is a request for a modification to SpanFirstQuery that would allow term 
boosting based on a term's distance from the beginning of the document.

This modification to SpanFirstQuery would be that the Spans returned by 
SpanFirstQuery.getSpans() must always return 0 from its start() method. Then 
the slop passed to sloppyFreq(slop) would be the distance from the beginning of 
the indexed field to the end of the Spans of the SpanQuery passed to 
SpanFirstQuery.

Here is the discussion behind this issue:
http://www.nabble.com/Can-I-do-boosting-based-on-term-postions--to11939423.html#a11939423

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1088) PriorityQueue 'wouldBeInserted' method

2007-12-11 Thread Peter Keegan (JIRA)
PriorityQueue 'wouldBeInserted' method
--

 Key: LUCENE-1088
 URL: https://issues.apache.org/jira/browse/LUCENE-1088
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Other
Reporter: Peter Keegan


This is a request for a new method in PriorityQueue

public boolean wouldBeInserted(Object element)
// returns true if doc would be inserted, without inserting 

This would allow an application to prevent duplicate entries from being added 
to the queue.
Here is a reference to the discussion behind  this request:

http://www.nabble.com/FieldSortedHitQueue-enhancement-to9733550.html#a9733550



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1017) BoostingTermQuery performance

2007-12-10 Thread Peter Keegan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550203
 ] 

Peter Keegan commented on LUCENE-1017:
--

Grant,

Unfortunately, my performance test bed isn't suitable for contrib/benchmark 
because it's designed to simulate real queries from our log files. These are 
multi-threaded queries sent at a very high rate to stress test the Lucene 
server, which runs on an 8 cpu system. 

Given the somewhat dynamic nature of the test bed, I don't think the 5% 
performance increase that I reported is statistically significant. You're 
probably right that skipTo is not likely any faster. I still think it would be 
nice to have BoostingTermQuery that extends TermQuery, though.

Peter


 BoostingTermQuery performance
 -

 Key: LUCENE-1017
 URL: https://issues.apache.org/jira/browse/LUCENE-1017
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.2
 Environment: all
Reporter: Peter Keegan
Priority: Minor
 Attachments: BoostingTermQuery.java, BoostingTermQuery.patch, 
 termquery.patch


 I have been experimenting with payloads and BoostingTermQuery, which I think 
 are excellent additions to Lucene core. Currently, BoostingTermQuery extends 
 SpanQuery. I would suggest changing this class to extend TermQuery and 
 refactor the current version to something like 'BoostingSpanQuery'.
 The reason is rooted in performance. In my testing, I compared query 
 throughput using TermQuery against 2 versions of BoostingTermQuery - the 
 current one that extends SpanQuery and one that extends TermQuery (which I've 
 included, below). Here are the results (qps = queries per second):
 TermQuery:200 qps
 BoostingTermQuery (extends SpanQuery): 97 qps
 BoostingTermQuery (extends TermQuery): 130 qps
 Here is a version of BoostingTermQuery that extends TermQuery. I had to 
 modify TermQuery and TermScorer to make them public. A code review would be 
 in order, and I would appreciate your comments on this suggestion.
 Peter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1017) BoostingTermQuery performance

2007-12-10 Thread Peter Keegan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12550238
 ] 

Peter Keegan commented on LUCENE-1017:
--

 What's the use case? Is there something that isn't possible with it as is?

I would say no, if we can conclude that there is no significant difference in 
performance between the 2 implementations. 

As a developer, when I see queries based on SpanQuery or PhraseQuery, this a 
tip that there is a potential performance impact. On the plus side, renaming 
the current implementation to 'BoostingSpanQuery' might give the developer a 
better hint of the methods of its superclass, too.

Would you expect the cost of traversing the postings to be higher than reading 
the payload?

Peter


 BoostingTermQuery performance
 -

 Key: LUCENE-1017
 URL: https://issues.apache.org/jira/browse/LUCENE-1017
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.2
 Environment: all
Reporter: Peter Keegan
Priority: Minor
 Attachments: BoostingTermQuery.java, BoostingTermQuery.patch, 
 termquery.patch


 I have been experimenting with payloads and BoostingTermQuery, which I think 
 are excellent additions to Lucene core. Currently, BoostingTermQuery extends 
 SpanQuery. I would suggest changing this class to extend TermQuery and 
 refactor the current version to something like 'BoostingSpanQuery'.
 The reason is rooted in performance. In my testing, I compared query 
 throughput using TermQuery against 2 versions of BoostingTermQuery - the 
 current one that extends SpanQuery and one that extends TermQuery (which I've 
 included, below). Here are the results (qps = queries per second):
 TermQuery:200 qps
 BoostingTermQuery (extends SpanQuery): 97 qps
 BoostingTermQuery (extends TermQuery): 130 qps
 Here is a version of BoostingTermQuery that extends TermQuery. I had to 
 modify TermQuery and TermScorer to make them public. A code review would be 
 in order, and I would appreciate your comments on this suggestion.
 Peter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1017) BoostingTermQuery performance

2007-10-04 Thread Peter Keegan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Keegan updated LUCENE-1017:
-

Attachment: BoostingTermQuery.patch

Here is new version that traverses the term positions properly, I believe. It's 
only about 5% faster than the spans version, though. It's quite possible that 
this could be improved. I'm using TermScorer to get me to the current doc and 
then setting TermPositions to the same doc. Do you see any inefficiencies with 
the 'next' and 'skipTo' calls, particularly in 'getPayloads'?

 BoostingTermQuery performance
 -

 Key: LUCENE-1017
 URL: https://issues.apache.org/jira/browse/LUCENE-1017
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.2
 Environment: all
Reporter: Peter Keegan
 Attachments: BoostingTermQuery.java, BoostingTermQuery.patch, 
 termquery.patch


 I have been experimenting with payloads and BoostingTermQuery, which I think 
 are excellent additions to Lucene core. Currently, BoostingTermQuery extends 
 SpanQuery. I would suggest changing this class to extend TermQuery and 
 refactor the current version to something like 'BoostingSpanQuery'.
 The reason is rooted in performance. In my testing, I compared query 
 throughput using TermQuery against 2 versions of BoostingTermQuery - the 
 current one that extends SpanQuery and one that extends TermQuery (which I've 
 included, below). Here are the results (qps = queries per second):
 TermQuery:200 qps
 BoostingTermQuery (extends SpanQuery): 97 qps
 BoostingTermQuery (extends TermQuery): 130 qps
 Here is a version of BoostingTermQuery that extends TermQuery. I had to 
 modify TermQuery and TermScorer to make them public. A code review would be 
 in order, and I would appreciate your comments on this suggestion.
 Peter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1017) BoostingTermQuery performance

2007-10-03 Thread Peter Keegan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532094
 ] 

Peter Keegan commented on LUCENE-1017:
--

Grant,

You are right about it not handling multiple payloads per document. In fact, I 
don't think it's even reading the payloads for all the hits. I'll see what I 
can do, but feel free to step in and help me get this right. I won't touch svn 
for now.

Peter

 BoostingTermQuery performance
 -

 Key: LUCENE-1017
 URL: https://issues.apache.org/jira/browse/LUCENE-1017
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.2
 Environment: all
Reporter: Peter Keegan
 Attachments: BoostingTermQuery.java, termquery.patch


 I have been experimenting with payloads and BoostingTermQuery, which I think 
 are excellent additions to Lucene core. Currently, BoostingTermQuery extends 
 SpanQuery. I would suggest changing this class to extend TermQuery and 
 refactor the current version to something like 'BoostingSpanQuery'.
 The reason is rooted in performance. In my testing, I compared query 
 throughput using TermQuery against 2 versions of BoostingTermQuery - the 
 current one that extends SpanQuery and one that extends TermQuery (which I've 
 included, below). Here are the results (qps = queries per second):
 TermQuery:200 qps
 BoostingTermQuery (extends SpanQuery): 97 qps
 BoostingTermQuery (extends TermQuery): 130 qps
 Here is a version of BoostingTermQuery that extends TermQuery. I had to 
 modify TermQuery and TermScorer to make them public. A code review would be 
 in order, and I would appreciate your comments on this suggestion.
 Peter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1017) BoostingTermQuery performance

2007-10-02 Thread Peter Keegan (JIRA)
BoostingTermQuery performance
-

 Key: LUCENE-1017
 URL: https://issues.apache.org/jira/browse/LUCENE-1017
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.2
 Environment: all
Reporter: Peter Keegan


I have been experimenting with payloads and BoostingTermQuery, which I think 
are excellent additions to Lucene core. Currently, BoostingTermQuery extends 
SpanQuery. I would suggest changing this class to extend TermQuery and refactor 
the current version to something like 'BoostingSpanQuery'.

The reason is rooted in performance. In my testing, I compared query throughput 
using TermQuery against 2 versions of BoostingTermQuery - the current one that 
extends SpanQuery and one that extends TermQuery (which I've included, below). 
Here are the results (qps = queries per second):

TermQuery:200 qps
BoostingTermQuery (extends SpanQuery): 97 qps
BoostingTermQuery (extends TermQuery): 130 qps

Here is a version of BoostingTermQuery that extends TermQuery. I had to modify 
TermQuery and TermScorer to make them public. A code review would be in order, 
and I would appreciate your comments on this suggestion.

Peter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1017) BoostingTermQuery performance

2007-10-02 Thread Peter Keegan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Keegan updated LUCENE-1017:
-

Attachment: BoostingTermQuery.java

Suggested change to BoostingTermQuery to extend TermQuery

 BoostingTermQuery performance
 -

 Key: LUCENE-1017
 URL: https://issues.apache.org/jira/browse/LUCENE-1017
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.2
 Environment: all
Reporter: Peter Keegan
 Attachments: BoostingTermQuery.java


 I have been experimenting with payloads and BoostingTermQuery, which I think 
 are excellent additions to Lucene core. Currently, BoostingTermQuery extends 
 SpanQuery. I would suggest changing this class to extend TermQuery and 
 refactor the current version to something like 'BoostingSpanQuery'.
 The reason is rooted in performance. In my testing, I compared query 
 throughput using TermQuery against 2 versions of BoostingTermQuery - the 
 current one that extends SpanQuery and one that extends TermQuery (which I've 
 included, below). Here are the results (qps = queries per second):
 TermQuery:200 qps
 BoostingTermQuery (extends SpanQuery): 97 qps
 BoostingTermQuery (extends TermQuery): 130 qps
 Here is a version of BoostingTermQuery that extends TermQuery. I had to 
 modify TermQuery and TermScorer to make them public. A code review would be 
 in order, and I would appreciate your comments on this suggestion.
 Peter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1017) BoostingTermQuery performance

2007-10-02 Thread Peter Keegan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Keegan updated LUCENE-1017:
-

Attachment: termquery.patch

Changes to TermQuery and TermScorer for previous patch

 BoostingTermQuery performance
 -

 Key: LUCENE-1017
 URL: https://issues.apache.org/jira/browse/LUCENE-1017
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.2
 Environment: all
Reporter: Peter Keegan
 Attachments: BoostingTermQuery.java, termquery.patch


 I have been experimenting with payloads and BoostingTermQuery, which I think 
 are excellent additions to Lucene core. Currently, BoostingTermQuery extends 
 SpanQuery. I would suggest changing this class to extend TermQuery and 
 refactor the current version to something like 'BoostingSpanQuery'.
 The reason is rooted in performance. In my testing, I compared query 
 throughput using TermQuery against 2 versions of BoostingTermQuery - the 
 current one that extends SpanQuery and one that extends TermQuery (which I've 
 included, below). Here are the results (qps = queries per second):
 TermQuery:200 qps
 BoostingTermQuery (extends SpanQuery): 97 qps
 BoostingTermQuery (extends TermQuery): 130 qps
 Here is a version of BoostingTermQuery that extends TermQuery. I had to 
 modify TermQuery and TermScorer to make them public. A code review would be 
 in order, and I would appreciate your comments on this suggestion.
 Peter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-991) BoostingTermQuery.explain() bugs

2007-09-07 Thread Peter Keegan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525709
 ] 

Peter Keegan commented on LUCENE-991:
-



Hi Grant,

 TopDocs hits = searcher.search(query, null, 100);
 assertTrue(hits Size:  + hits.totalHits +  is not:  + 0, hits.totalHits 
 == 0);

TopDocCollector discards hits with score = 0, so that's not a fair comparison. 
If you do a similar test with TermQuery (with a field boost = 0) instead of 
BoostingTermQuery, you'll see the difference. Even terms with 0 weight are 
included in the explanation. Make sense?

Peter



 BoostingTermQuery.explain() bugs
 

 Key: LUCENE-991
 URL: https://issues.apache.org/jira/browse/LUCENE-991
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.2
Reporter: Peter Keegan
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: TestBoostingTermQuery.patch, 
 TestBoostingTermQuery2.patch, TestBoostingTermQuery3.patch


 There are a couple of minor bugs in BoostingTermQuery.explain().
 1. The computation of average payload score produces NaN if no payloads were 
 found. It should probably be:
 float avgPayloadScore = super.score() * (payloadsSeen  0 ? (payloadScore / 
 payloadsSeen) : 1);
 2. If the average payload score is zero, the value of the explanation is 0:
 result.setValue(nonPayloadExpl.getValue() * avgPayloadScore);
 If the query is part of a BooleanClause, this results in:
 no match on required clause...
 failure to meet condition(s) of required/prohibited clause(s)
 The average payload score can be zero if the field boost = 0.
 I've attached a patch to 'TestBoostingTermQuery.java', however, the test 
 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks 
 like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe 
 someone more knowledgable of spans could investigate this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-991) BoostingTermQuery.explain() bugs

2007-08-31 Thread Peter Keegan (JIRA)
BoostingTermQuery.explain() bugs


 Key: LUCENE-991
 URL: https://issues.apache.org/jira/browse/LUCENE-991
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.2
Reporter: Peter Keegan
Priority: Minor
 Attachments: TestBoostingTermQuery.patch

There are a couple of minor bugs in BoostingTermQuery.explain().

1. The computation of average payload score produces NaN if no payloads were 
found. It should probably be:
float avgPayloadScore = super.score() * (payloadsSeen  0 ? (payloadScore / 
payloadsSeen) : 1);

2. If the average payload score is zero, the value of the explanation is 0:
result.setValue(nonPayloadExpl.getValue() * avgPayloadScore);
If the query is part of a BooleanClause, this results in:
no match on required clause...
failure to meet condition(s) of required/prohibited clause(s)

The average payload score can be zero if the field boost = 0.

I've attached a patch to 'TestBoostingTermQuery.java', however, the test 
'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks 
like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe 
someone more knowledgable of spans could investigate this.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-24 Thread Peter Keegan (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-693?page=comments#action_12444317 ] 

Peter Keegan commented on LUCENE-693:
-

Yonik,

I tried out your patch, but it causes an exception on some boolean queries.
This one occurred on a boolean query with 3 required terms:

java.lang.ArrayIndexOutOfBoundsException: 2147483647
at org.apache.lucene.search.TermScorer.score(TermScorer.java:129)
at org.apache.lucene.search.ConjunctionScorer.score(
ConjunctionScorer.java:97)
at org.apache.lucene.search.BooleanScorer2$2.score(BooleanScorer2.java
:186)
at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java
:318)
at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java
:282)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
at org.apache.lucene.search.Searcher.search(Searcher.java:116)
at org.apache.lucene.search.Searcher.search(Searcher.java:95)

It looks like the doc id has the sentinel value (Integer.MAX_VALUE).
Note: one of the terms had no occurrences in the index.

Peter



 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-24 Thread Peter Keegan (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-693?page=comments#action_1208 ] 

Peter Keegan commented on LUCENE-693:
-

Well, I'm seeing a good 7% increase over the trunk version. Conjunction
scorer time is mostly in 'skipto' now, which seems reasonable.

Do the test cases try queries with non-existent terms? My failed query
contained 3 required terms, but one of the terms was misspelled and didn't
exist in the index.

Peter



 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch, conjunction.patch


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-24 Thread Peter Keegan (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-693?page=comments#action_1236 ] 

Peter Keegan commented on LUCENE-693:
-

fwiw, my tests were done using 'real world' queries and index. Most queries
have several required clauses. The jvm is 1.6 beta2 with -server. I would be
interested to see results from others, too.

thanks Yonik!

Peter



 ConjunctionScorer - more tuneup
 ---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan
 Attachments: conjunction.patch, conjunction.patch, conjunction.patch


 (See also: #LUCENE-443)
 I did some profile testing with the new ConjuctionScorer in 2.1 and 
 discovered a new bottleneck in ConjunctionScorer.sortScorers. The 
 java.utils.Arrays.sort method is cloning the Scorers array on every sort, 
 which is quite expensive on large indexes because of the size of the 'norms' 
 array within, and isn't necessary. 
 Here is one possible solution:
   private void sortScorers() {
 // squeeze the array down for the sort
 //if (length != scorers.length) {
 //  Scorer[] temps = new Scorer[length];
 //  System.arraycopy(scorers, 0, temps, 0, length);
 //  scorers = temps;
 //}
 insertionSort( scorers,length );
 // note that this comparator is not consistent with equals!
 //Arrays.sort(scorers, new Comparator() { // sort the array
 //public int compare(Object o1, Object o2) {
 //  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
 //}
 //  });
   
 first = 0;
 last = length - 1;
   }
   private void insertionSort( Scorer[] scores, int len)
   {
   for (int i=0; ilen; i++) {
   for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
   swap (scores, j, j-1);
   }
   }
   return;
   }
   private void swap(Object[] x, int a, int b) {
 Object t = x[a];
 x[a] = x[b];
 x[b] = t;
   }
  
 The squeezing of the array is no longer needed. 
 We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
 grow the array for common queries, although this probably has less 
 performance impact.
 This change added about 3% to query throughput in my testing.
 Peter

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-693) ConjunctionScorer - more tuneup

2006-10-23 Thread Peter Keegan (JIRA)
ConjunctionScorer - more tuneup
---

 Key: LUCENE-693
 URL: http://issues.apache.org/jira/browse/LUCENE-693
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.1
 Environment: Windows Server 2003 x64, Java 1.6, pretty large index
Reporter: Peter Keegan


(See also: #LUCENE-443)
I did some profile testing with the new ConjuctionScorer in 2.1 and discovered 
a new bottleneck in ConjunctionScorer.sortScorers. The java.utils.Arrays.sort 
method is cloning the Scorers array on every sort, which is quite expensive on 
large indexes because of the size of the 'norms' array within, and isn't 
necessary. 

Here is one possible solution:

  private void sortScorers() {
// squeeze the array down for the sort
//if (length != scorers.length) {
//  Scorer[] temps = new Scorer[length];
//  System.arraycopy(scorers, 0, temps, 0, length);
//  scorers = temps;
//}
insertionSort( scorers,length );
// note that this comparator is not consistent with equals!
//Arrays.sort(scorers, new Comparator() { // sort the array
//public int compare(Object o1, Object o2) {
//  return ((Scorer)o1).doc() - ((Scorer)o2).doc();
//}
//  });
  
first = 0;
last = length - 1;
  }
  private void insertionSort( Scorer[] scores, int len)
  {
  for (int i=0; ilen; i++) {
  for (int j=i; j0  scores[j-1].doc()  scores[j].doc();j-- ) {
  swap (scores, j, j-1);
  }
  }
  return;
  }
  private void swap(Object[] x, int a, int b) {
Object t = x[a];
x[a] = x[b];
x[b] = t;
  }
 
The squeezing of the array is no longer needed. 
We also initialized the Scorers array to 8 (instead of 2) to avoid having to 
grow the array for common queries, although this probably has less performance 
impact.

This change added about 3% to query throughput in my testing.

Peter


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]