[jira] Assigned: (LUCENE-991) BoostingTermQuery.explain() bugs

2007-09-07 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned LUCENE-991:
--

Assignee: Grant Ingersoll

 BoostingTermQuery.explain() bugs
 

 Key: LUCENE-991
 URL: https://issues.apache.org/jira/browse/LUCENE-991
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.2
Reporter: Peter Keegan
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: TestBoostingTermQuery.patch


 There are a couple of minor bugs in BoostingTermQuery.explain().
 1. The computation of average payload score produces NaN if no payloads were 
 found. It should probably be:
 float avgPayloadScore = super.score() * (payloadsSeen  0 ? (payloadScore / 
 payloadsSeen) : 1);
 2. If the average payload score is zero, the value of the explanation is 0:
 result.setValue(nonPayloadExpl.getValue() * avgPayloadScore);
 If the query is part of a BooleanClause, this results in:
 no match on required clause...
 failure to meet condition(s) of required/prohibited clause(s)
 The average payload score can be zero if the field boost = 0.
 I've attached a patch to 'TestBoostingTermQuery.java', however, the test 
 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks 
 like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe 
 someone more knowledgable of spans could investigate this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-991) BoostingTermQuery.explain() bugs

2007-09-07 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525695
 ] 

Grant Ingersoll commented on LUCENE-991:


Hi Peter, 

Couple comments.  #1 makes sense, except the super.score() part, the score from 
the other part of the matching is handled by the nonPayloadExpl part.  I do 
agree it should check for zero on payloadsSeen, though, and have added that.

I don't think I am understanding the issue with #2 above.  I am not sure the 
test is correct.  The results[0] being passed into the checkHitCollector say 
you expect Document 0 to be a match, but this can't be since the boost is 0, 
therefore there are no results.  This can be seen by running the query against 
the search without the explain, as in:
TopDocs hits = searcher.search(query, null, 100);
assertTrue(hits Size:  + hits.totalHits +  is not:  + 0, hits.totalHits == 
0);

Or, perhaps I am missing something?  I guess I don't see why the boost part 
needs to be in there?  Can't you have a test that has no payloads?


 BoostingTermQuery.explain() bugs
 

 Key: LUCENE-991
 URL: https://issues.apache.org/jira/browse/LUCENE-991
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.2
Reporter: Peter Keegan
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: TestBoostingTermQuery.patch


 There are a couple of minor bugs in BoostingTermQuery.explain().
 1. The computation of average payload score produces NaN if no payloads were 
 found. It should probably be:
 float avgPayloadScore = super.score() * (payloadsSeen  0 ? (payloadScore / 
 payloadsSeen) : 1);
 2. If the average payload score is zero, the value of the explanation is 0:
 result.setValue(nonPayloadExpl.getValue() * avgPayloadScore);
 If the query is part of a BooleanClause, this results in:
 no match on required clause...
 failure to meet condition(s) of required/prohibited clause(s)
 The average payload score can be zero if the field boost = 0.
 I've attached a patch to 'TestBoostingTermQuery.java', however, the test 
 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks 
 like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe 
 someone more knowledgable of spans could investigate this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-991) BoostingTermQuery.explain() bugs

2007-09-07 Thread Peter Keegan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525709
 ] 

Peter Keegan commented on LUCENE-991:
-



Hi Grant,

 TopDocs hits = searcher.search(query, null, 100);
 assertTrue(hits Size:  + hits.totalHits +  is not:  + 0, hits.totalHits 
 == 0);

TopDocCollector discards hits with score = 0, so that's not a fair comparison. 
If you do a similar test with TermQuery (with a field boost = 0) instead of 
BoostingTermQuery, you'll see the difference. Even terms with 0 weight are 
included in the explanation. Make sense?

Peter



 BoostingTermQuery.explain() bugs
 

 Key: LUCENE-991
 URL: https://issues.apache.org/jira/browse/LUCENE-991
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.2
Reporter: Peter Keegan
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: TestBoostingTermQuery.patch, 
 TestBoostingTermQuery2.patch, TestBoostingTermQuery3.patch


 There are a couple of minor bugs in BoostingTermQuery.explain().
 1. The computation of average payload score produces NaN if no payloads were 
 found. It should probably be:
 float avgPayloadScore = super.score() * (payloadsSeen  0 ? (payloadScore / 
 payloadsSeen) : 1);
 2. If the average payload score is zero, the value of the explanation is 0:
 result.setValue(nonPayloadExpl.getValue() * avgPayloadScore);
 If the query is part of a BooleanClause, this results in:
 no match on required clause...
 failure to meet condition(s) of required/prohibited clause(s)
 The average payload score can be zero if the field boost = 0.
 I've attached a patch to 'TestBoostingTermQuery.java', however, the test 
 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks 
 like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe 
 someone more knowledgable of spans could investigate this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-991) BoostingTermQuery.explain() bugs

2007-09-07 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-991:
---

Attachment: TestBoostingTermQuery2.patch

but, I agree, there is something wrong here.  Attached is an update of the 
Test, plus a fix for #1.

 BoostingTermQuery.explain() bugs
 

 Key: LUCENE-991
 URL: https://issues.apache.org/jira/browse/LUCENE-991
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.2
Reporter: Peter Keegan
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: TestBoostingTermQuery.patch, TestBoostingTermQuery2.patch


 There are a couple of minor bugs in BoostingTermQuery.explain().
 1. The computation of average payload score produces NaN if no payloads were 
 found. It should probably be:
 float avgPayloadScore = super.score() * (payloadsSeen  0 ? (payloadScore / 
 payloadsSeen) : 1);
 2. If the average payload score is zero, the value of the explanation is 0:
 result.setValue(nonPayloadExpl.getValue() * avgPayloadScore);
 If the query is part of a BooleanClause, this results in:
 no match on required clause...
 failure to meet condition(s) of required/prohibited clause(s)
 The average payload score can be zero if the field boost = 0.
 I've attached a patch to 'TestBoostingTermQuery.java', however, the test 
 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks 
 like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe 
 someone more knowledgable of spans could investigate this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-07 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-847:
--

Attachment: LUCENE-847.take5.patch


Attached new patch (take5) incorporating Ning's feedback.

This patch includes LUCENE-845 (a new merge default merge policy plus
a merge by size in bytes of segment merge policy), LUCENE-847
(factor merge policy/scheduling out of IndexWriter) and LUCENE-870
(ConcurrentMergeScheduler).

The one thing remaining after these are done, that I'll open a
separate issue for and commit separately, is to switch IndexWriter to
flush by RAM usage by default (instead of by docCount == 10) as well
as merge by size-in-bytes by default.

I broke out a separate MergeScheduler interface.  SerialMergeScheduler
is the default (matches how merges are executed today: sequentially,
using the calling thread).  ConcurrentMergeScheduler runs the merges
as separate threads (up to a max number at which point the extras are
done sequentially).

Other changes:

  - Allow multiple threads to call optimize().  I added a unit test
for this.

  - Tightnened calls to deleter.refresh(), which remove partially
created files on an exception, to remove only those files that the
given piece of code would create.  This is very important because
otherwise refresh() could remove the files being created by a
background merge.

  - Added some unit tests


 Factor merge policy out of IndexWriter
 --

 Key: LUCENE-847
 URL: https://issues.apache.org/jira/browse/LUCENE-847
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Steven Parkes
Assignee: Steven Parkes
 Fix For: 2.3

 Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, 
 LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, 
 LUCENE-847.take5.patch, LUCENE-847.txt


 If we factor the merge policy out of IndexWriter, we can make it pluggable, 
 making it possible for apps to choose a custom merge policy and for easier 
 experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Regarding Lucene Nutch

2007-09-07 Thread Kunal Wku
Hello Everyone,
   
  I am using Lucene  Nutch in my project for searching content in the webpages.
For a webpage or any other document, Lucene takes all the words in the page and 
indexes them and returns the result when searched.
   
  Lets say, I have 2 webpages as shown below:
   
  Webpage1
--
This is the course page of Computer Science Department
  Subject: Operating System I
Professor: Qi Li
  Details:
The course operating system I deals with the basics of the operating system. 
Mainly the three topics dealt are process management, storage management  
memory mangement. etc
..
--
   
  Webpage2
--
This is the home page of Computer Science Department
  The computer science department offers courses at undergradudate level and 
graduate level. The core courses for the graduate students are  Mathematical 
Foundations of Computer Science, Compilers, Advanced Database, Analysis of 
Algorithms and Operating Systems. etc
..
--
   
  Now if I search using the word operating system, the results shows both the 
webpages (webpage 1  webpage2) since the word operating system exists in 
both the webpage. 
   
  But my requirement is different. If I want to search the word Operating 
System which should appear in the subject field i.e., as in the webpage1, the 
result should show only webpage1. How can I achieve this result ? 
   
  Please help me in this regard.
  Thanks  Regards,
Kunal Gosar


   
-
Be a better Globetrotter. Get better travel answers from someone who knows.
Yahoo! Answers - Check it out.

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

2007-09-07 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525797
 ] 

Doug Cutting commented on LUCENE-847:
-

Is there any reason not to make ConcurrentMergeScheduler the default too after 
this is committed?

 Factor merge policy out of IndexWriter
 --

 Key: LUCENE-847
 URL: https://issues.apache.org/jira/browse/LUCENE-847
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Steven Parkes
Assignee: Steven Parkes
 Fix For: 2.3

 Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, 
 LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, 
 LUCENE-847.take5.patch, LUCENE-847.txt


 If we factor the merge policy out of IndexWriter, we can make it pluggable, 
 making it possible for apps to choose a custom merge policy and for easier 
 experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]