[jira] Assigned: (LUCENE-991) BoostingTermQuery.explain() bugs
[ https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned LUCENE-991: -- Assignee: Grant Ingersoll BoostingTermQuery.explain() bugs Key: LUCENE-991 URL: https://issues.apache.org/jira/browse/LUCENE-991 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.2 Reporter: Peter Keegan Assignee: Grant Ingersoll Priority: Minor Attachments: TestBoostingTermQuery.patch There are a couple of minor bugs in BoostingTermQuery.explain(). 1. The computation of average payload score produces NaN if no payloads were found. It should probably be: float avgPayloadScore = super.score() * (payloadsSeen 0 ? (payloadScore / payloadsSeen) : 1); 2. If the average payload score is zero, the value of the explanation is 0: result.setValue(nonPayloadExpl.getValue() * avgPayloadScore); If the query is part of a BooleanClause, this results in: no match on required clause... failure to meet condition(s) of required/prohibited clause(s) The average payload score can be zero if the field boost = 0. I've attached a patch to 'TestBoostingTermQuery.java', however, the test 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe someone more knowledgable of spans could investigate this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-991) BoostingTermQuery.explain() bugs
[ https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525695 ] Grant Ingersoll commented on LUCENE-991: Hi Peter, Couple comments. #1 makes sense, except the super.score() part, the score from the other part of the matching is handled by the nonPayloadExpl part. I do agree it should check for zero on payloadsSeen, though, and have added that. I don't think I am understanding the issue with #2 above. I am not sure the test is correct. The results[0] being passed into the checkHitCollector say you expect Document 0 to be a match, but this can't be since the boost is 0, therefore there are no results. This can be seen by running the query against the search without the explain, as in: TopDocs hits = searcher.search(query, null, 100); assertTrue(hits Size: + hits.totalHits + is not: + 0, hits.totalHits == 0); Or, perhaps I am missing something? I guess I don't see why the boost part needs to be in there? Can't you have a test that has no payloads? BoostingTermQuery.explain() bugs Key: LUCENE-991 URL: https://issues.apache.org/jira/browse/LUCENE-991 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.2 Reporter: Peter Keegan Assignee: Grant Ingersoll Priority: Minor Attachments: TestBoostingTermQuery.patch There are a couple of minor bugs in BoostingTermQuery.explain(). 1. The computation of average payload score produces NaN if no payloads were found. It should probably be: float avgPayloadScore = super.score() * (payloadsSeen 0 ? (payloadScore / payloadsSeen) : 1); 2. If the average payload score is zero, the value of the explanation is 0: result.setValue(nonPayloadExpl.getValue() * avgPayloadScore); If the query is part of a BooleanClause, this results in: no match on required clause... failure to meet condition(s) of required/prohibited clause(s) The average payload score can be zero if the field boost = 0. I've attached a patch to 'TestBoostingTermQuery.java', however, the test 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe someone more knowledgable of spans could investigate this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-991) BoostingTermQuery.explain() bugs
[ https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525709 ] Peter Keegan commented on LUCENE-991: - Hi Grant, TopDocs hits = searcher.search(query, null, 100); assertTrue(hits Size: + hits.totalHits + is not: + 0, hits.totalHits == 0); TopDocCollector discards hits with score = 0, so that's not a fair comparison. If you do a similar test with TermQuery (with a field boost = 0) instead of BoostingTermQuery, you'll see the difference. Even terms with 0 weight are included in the explanation. Make sense? Peter BoostingTermQuery.explain() bugs Key: LUCENE-991 URL: https://issues.apache.org/jira/browse/LUCENE-991 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.2 Reporter: Peter Keegan Assignee: Grant Ingersoll Priority: Minor Attachments: TestBoostingTermQuery.patch, TestBoostingTermQuery2.patch, TestBoostingTermQuery3.patch There are a couple of minor bugs in BoostingTermQuery.explain(). 1. The computation of average payload score produces NaN if no payloads were found. It should probably be: float avgPayloadScore = super.score() * (payloadsSeen 0 ? (payloadScore / payloadsSeen) : 1); 2. If the average payload score is zero, the value of the explanation is 0: result.setValue(nonPayloadExpl.getValue() * avgPayloadScore); If the query is part of a BooleanClause, this results in: no match on required clause... failure to meet condition(s) of required/prohibited clause(s) The average payload score can be zero if the field boost = 0. I've attached a patch to 'TestBoostingTermQuery.java', however, the test 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe someone more knowledgable of spans could investigate this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-991) BoostingTermQuery.explain() bugs
[ https://issues.apache.org/jira/browse/LUCENE-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-991: --- Attachment: TestBoostingTermQuery2.patch but, I agree, there is something wrong here. Attached is an update of the Test, plus a fix for #1. BoostingTermQuery.explain() bugs Key: LUCENE-991 URL: https://issues.apache.org/jira/browse/LUCENE-991 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.2 Reporter: Peter Keegan Assignee: Grant Ingersoll Priority: Minor Attachments: TestBoostingTermQuery.patch, TestBoostingTermQuery2.patch There are a couple of minor bugs in BoostingTermQuery.explain(). 1. The computation of average payload score produces NaN if no payloads were found. It should probably be: float avgPayloadScore = super.score() * (payloadsSeen 0 ? (payloadScore / payloadsSeen) : 1); 2. If the average payload score is zero, the value of the explanation is 0: result.setValue(nonPayloadExpl.getValue() * avgPayloadScore); If the query is part of a BooleanClause, this results in: no match on required clause... failure to meet condition(s) of required/prohibited clause(s) The average payload score can be zero if the field boost = 0. I've attached a patch to 'TestBoostingTermQuery.java', however, the test 'testNoPayload' fails in 'SpanScorer.score()' because the doc = -1. It looks like 'setFreqCurrentDoc() should have been called before 'score()'. Maybe someone more knowledgable of spans could investigate this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-847) Factor merge policy out of IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-847: -- Attachment: LUCENE-847.take5.patch Attached new patch (take5) incorporating Ning's feedback. This patch includes LUCENE-845 (a new merge default merge policy plus a merge by size in bytes of segment merge policy), LUCENE-847 (factor merge policy/scheduling out of IndexWriter) and LUCENE-870 (ConcurrentMergeScheduler). The one thing remaining after these are done, that I'll open a separate issue for and commit separately, is to switch IndexWriter to flush by RAM usage by default (instead of by docCount == 10) as well as merge by size-in-bytes by default. I broke out a separate MergeScheduler interface. SerialMergeScheduler is the default (matches how merges are executed today: sequentially, using the calling thread). ConcurrentMergeScheduler runs the merges as separate threads (up to a max number at which point the extras are done sequentially). Other changes: - Allow multiple threads to call optimize(). I added a unit test for this. - Tightnened calls to deleter.refresh(), which remove partially created files on an exception, to remove only those files that the given piece of code would create. This is very important because otherwise refresh() could remove the files being created by a background merge. - Added some unit tests Factor merge policy out of IndexWriter -- Key: LUCENE-847 URL: https://issues.apache.org/jira/browse/LUCENE-847 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Steven Parkes Assignee: Steven Parkes Fix For: 2.3 Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, LUCENE-847.take5.patch, LUCENE-847.txt If we factor the merge policy out of IndexWriter, we can make it pluggable, making it possible for apps to choose a custom merge policy and for easier experimenting with merge policy variants. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Regarding Lucene Nutch
Hello Everyone, I am using Lucene Nutch in my project for searching content in the webpages. For a webpage or any other document, Lucene takes all the words in the page and indexes them and returns the result when searched. Lets say, I have 2 webpages as shown below: Webpage1 -- This is the course page of Computer Science Department Subject: Operating System I Professor: Qi Li Details: The course operating system I deals with the basics of the operating system. Mainly the three topics dealt are process management, storage management memory mangement. etc .. -- Webpage2 -- This is the home page of Computer Science Department The computer science department offers courses at undergradudate level and graduate level. The core courses for the graduate students are Mathematical Foundations of Computer Science, Compilers, Advanced Database, Analysis of Algorithms and Operating Systems. etc .. -- Now if I search using the word operating system, the results shows both the webpages (webpage 1 webpage2) since the word operating system exists in both the webpage. But my requirement is different. If I want to search the word Operating System which should appear in the subject field i.e., as in the webpage1, the result should show only webpage1. How can I achieve this result ? Please help me in this regard. Thanks Regards, Kunal Gosar - Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out.
[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter
[ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525797 ] Doug Cutting commented on LUCENE-847: - Is there any reason not to make ConcurrentMergeScheduler the default too after this is committed? Factor merge policy out of IndexWriter -- Key: LUCENE-847 URL: https://issues.apache.org/jira/browse/LUCENE-847 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Steven Parkes Assignee: Steven Parkes Fix For: 2.3 Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, LUCENE-847.take5.patch, LUCENE-847.txt If we factor the merge policy out of IndexWriter, we can make it pluggable, making it possible for apps to choose a custom merge policy and for easier experimenting with merge policy variants. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]