[jira] [Commented] (LUCENE-3096) MultiSearcher does not work correctly with Not on NumericRange
[ https://issues.apache.org/jira/browse/LUCENE-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034289#comment-13034289 ] hao yan commented on LUCENE-3096: - Thanks! Uwe! MultiSearcher does not work correctly with Not on NumericRange -- Key: LUCENE-3096 URL: https://issues.apache.org/jira/browse/LUCENE-3096 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.0.2 Reporter: John Wang Fix For: 3.1 Hi, Keith My colleague xiaoyang and I just confirmed that this is actually due to a lucene bug on Multisearcher. In particular, If we search with Not on NumericRange and we use MultiSearcher, we will wrong search results (However, if we use IndexSearcher, the result is correct). Basically the NotOfNumericRange does not have impact on multisearcher. We suspect it is because the createWeight() function in MultiSearcher and hope you can help us to fix this bug of lucene. I attached the code to reproduce this case. Please check it out. In the attached code, I have two separate functions : (1) testNumericRangeSingleSearcher(Query query) where I create 6 documents, with a field called id= 1,2,3,4,5,6 respectively . Then I search by the query which is +MatchAllDocs -NumericRange(3,3). The expected result then should be 5 hits since the document 3 is MUST_NOT. (2) testNumericRangeMultiSearcher(Query query) where i create 2 RamDirectory(), each of which has 3 documents, 1,2,3; and 4,5,6. Then I search by the same query as above using multiSearcher. The expected result should also be 5 hits. However, from (1), we get 5 hits = expected results, while in (2) we get 6 hits != expected results. We also experimented this with our zoie/bobo open source tools and get the same results because our multi-bobo-browser is built on multi-searcher in lucene. I already emailed the lucene community group. Hopefully we can get some feedback soon. If you have any further concern, pls let me know! Thank you very much! Code: (based on lucene 3.0.x) import java.io.IOException; import java.io.PrintStream; import java.text.DecimalFormat; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.NumericField; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.FieldCache; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.MatchAllDocsQuery; import org.apache.lucene.search.MultiSearcher; import org.apache.lucene.search.NumericRangeQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Searchable; import org.apache.lucene.search.Sort; import org.apache.lucene.search.SortField; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.BooleanClause.Occur; import org.apache.lucene.store.Directory; import org.apache.lucene.store.LockObtainFailedException; import org.apache.lucene.store.RAMDirectory; import com.convertlucene.ConvertFrom2To3; public class TestNumericRange { public final static void main(String[] args) { try { BooleanQuery query = new BooleanQuery(); query.add(NumericRangeQuery.newIntRange(numId, 3, 3, true, true), Occur.MUST_NOT); query.add(new MatchAllDocsQuery(), Occur.MUST); testNumericRangeSingleSearcher(query); testNumericRangeMultiSearcher(query); } catch(Exception e) { e.printStackTrace(); } } public static void testNumericRangeSingleSearcher(Query query) throws CorruptIndexException, LockObtainFailedException, IOException { String[] ids = {1, 2, 3, 4, 5, 6}; Directory directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), IndexWriter.MaxFieldLength.UNLIMITED); for (int i = 0; i ids.length; i++) { Document doc = new Document(); doc.add(new Field(id, ids[i], Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new NumericField(numId).setIntValue(Integer.valueOf(ids[i]))); writer.addDocument(doc); } writer.close(); IndexSearcher searcher = new IndexSearcher(directory); TopDocs docs = searcher.search(query, 10); System.out.println(SingleSearcher: testNumericRange: hitNum: + docs.totalHits); for(ScoreDoc doc : docs.scoreDocs) { System.out.println(searcher.explain(query, doc.doc)); } searcher.close();
[jira] [Commented] (LUCENE-3096) MultiSearcher does not work correctly with Not on NumericRange
[ https://issues.apache.org/jira/browse/LUCENE-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033878#comment-13033878 ] Xiaoyang Gu commented on LUCENE-3096: - Thank you very much! Xiaoyang MultiSearcher does not work correctly with Not on NumericRange -- Key: LUCENE-3096 URL: https://issues.apache.org/jira/browse/LUCENE-3096 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.0.2 Reporter: John Wang Fix For: 3.1 Hi, Keith My colleague xiaoyang and I just confirmed that this is actually due to a lucene bug on Multisearcher. In particular, If we search with Not on NumericRange and we use MultiSearcher, we will wrong search results (However, if we use IndexSearcher, the result is correct). Basically the NotOfNumericRange does not have impact on multisearcher. We suspect it is because the createWeight() function in MultiSearcher and hope you can help us to fix this bug of lucene. I attached the code to reproduce this case. Please check it out. In the attached code, I have two separate functions : (1) testNumericRangeSingleSearcher(Query query) where I create 6 documents, with a field called id= 1,2,3,4,5,6 respectively . Then I search by the query which is +MatchAllDocs -NumericRange(3,3). The expected result then should be 5 hits since the document 3 is MUST_NOT. (2) testNumericRangeMultiSearcher(Query query) where i create 2 RamDirectory(), each of which has 3 documents, 1,2,3; and 4,5,6. Then I search by the same query as above using multiSearcher. The expected result should also be 5 hits. However, from (1), we get 5 hits = expected results, while in (2) we get 6 hits != expected results. We also experimented this with our zoie/bobo open source tools and get the same results because our multi-bobo-browser is built on multi-searcher in lucene. I already emailed the lucene community group. Hopefully we can get some feedback soon. If you have any further concern, pls let me know! Thank you very much! Code: (based on lucene 3.0.x) import java.io.IOException; import java.io.PrintStream; import java.text.DecimalFormat; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.NumericField; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.FieldCache; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.MatchAllDocsQuery; import org.apache.lucene.search.MultiSearcher; import org.apache.lucene.search.NumericRangeQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Searchable; import org.apache.lucene.search.Sort; import org.apache.lucene.search.SortField; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.BooleanClause.Occur; import org.apache.lucene.store.Directory; import org.apache.lucene.store.LockObtainFailedException; import org.apache.lucene.store.RAMDirectory; import com.convertlucene.ConvertFrom2To3; public class TestNumericRange { public final static void main(String[] args) { try { BooleanQuery query = new BooleanQuery(); query.add(NumericRangeQuery.newIntRange(numId, 3, 3, true, true), Occur.MUST_NOT); query.add(new MatchAllDocsQuery(), Occur.MUST); testNumericRangeSingleSearcher(query); testNumericRangeMultiSearcher(query); } catch(Exception e) { e.printStackTrace(); } } public static void testNumericRangeSingleSearcher(Query query) throws CorruptIndexException, LockObtainFailedException, IOException { String[] ids = {1, 2, 3, 4, 5, 6}; Directory directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), IndexWriter.MaxFieldLength.UNLIMITED); for (int i = 0; i ids.length; i++) { Document doc = new Document(); doc.add(new Field(id, ids[i], Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new NumericField(numId).setIntValue(Integer.valueOf(ids[i]))); writer.addDocument(doc); } writer.close(); IndexSearcher searcher = new IndexSearcher(directory); TopDocs docs = searcher.search(query, 10); System.out.println(SingleSearcher: testNumericRange: hitNum: + docs.totalHits); for(ScoreDoc doc : docs.scoreDocs) { System.out.println(searcher.explain(query, doc.doc)); }
[jira] [Commented] (LUCENE-3096) MultiSearcher does not work correctly with Not on NumericRange
[ https://issues.apache.org/jira/browse/LUCENE-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033478#comment-13033478 ] Uwe Schindler commented on LUCENE-3096: --- This is a well-known bug (LUCENE-2756), which is unfixable (query rewrite across different searchers is wrong) without totally changing the way how queries are rewritten. To fix the bug, you should use a MultiReader on your IndexReaders and use a simple IndexSearcher on top of that MultiReader: {code} IndexReader[] readers; readers[0] = IndexReader.open(directory); readers[1] = IndexReader.open(otherdirectory); ... IndexSearcher searcher = new IndexSearcher(new MultiReader(readers)); {code} MultiSearcher and ParallelMultiSearcher were deprecated in 3.1 because of this and disappear in coming Lucene 4.0. ParallelMultiSearcher functionality is now available through IndexSearcher in 3.1 (it parallelizes across index segments, LUCENE-2837). I will close this as won't fix if nobody objects. MultiSearcher does not work correctly with Not on NumericRange -- Key: LUCENE-3096 URL: https://issues.apache.org/jira/browse/LUCENE-3096 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.0.2 Reporter: John Wang Hi, Keith My colleague xiaoyang and I just confirmed that this is actually due to a lucene bug on Multisearcher. In particular, If we search with Not on NumericRange and we use MultiSearcher, we will wrong search results (However, if we use IndexSearcher, the result is correct). Basically the NotOfNumericRange does not have impact on multisearcher. We suspect it is because the createWeight() function in MultiSearcher and hope you can help us to fix this bug of lucene. I attached the code to reproduce this case. Please check it out. In the attached code, I have two separate functions : (1) testNumericRangeSingleSearcher(Query query) where I create 6 documents, with a field called id= 1,2,3,4,5,6 respectively . Then I search by the query which is +MatchAllDocs -NumericRange(3,3). The expected result then should be 5 hits since the document 3 is MUST_NOT. (2) testNumericRangeMultiSearcher(Query query) where i create 2 RamDirectory(), each of which has 3 documents, 1,2,3; and 4,5,6. Then I search by the same query as above using multiSearcher. The expected result should also be 5 hits. However, from (1), we get 5 hits = expected results, while in (2) we get 6 hits != expected results. We also experimented this with our zoie/bobo open source tools and get the same results because our multi-bobo-browser is built on multi-searcher in lucene. I already emailed the lucene community group. Hopefully we can get some feedback soon. If you have any further concern, pls let me know! Thank you very much! Code: (based on lucene 3.0.x) import java.io.IOException; import java.io.PrintStream; import java.text.DecimalFormat; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.NumericField; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.FieldCache; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.MatchAllDocsQuery; import org.apache.lucene.search.MultiSearcher; import org.apache.lucene.search.NumericRangeQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Searchable; import org.apache.lucene.search.Sort; import org.apache.lucene.search.SortField; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.BooleanClause.Occur; import org.apache.lucene.store.Directory; import org.apache.lucene.store.LockObtainFailedException; import org.apache.lucene.store.RAMDirectory; import com.convertlucene.ConvertFrom2To3; public class TestNumericRange { public final static void main(String[] args) { try { BooleanQuery query = new BooleanQuery(); query.add(NumericRangeQuery.newIntRange(numId, 3, 3, true, true), Occur.MUST_NOT); query.add(new MatchAllDocsQuery(), Occur.MUST); testNumericRangeSingleSearcher(query); testNumericRangeMultiSearcher(query); } catch(Exception e) { e.printStackTrace(); } } public static void testNumericRangeSingleSearcher(Query query) throws CorruptIndexException, LockObtainFailedException, IOException { String[] ids = {1, 2, 3, 4, 5, 6}; Directory directory = new RAMDirectory(); IndexWriter writer =
[jira] [Commented] (LUCENE-3096) MultiSearcher does not work correctly with Not on NumericRange
[ https://issues.apache.org/jira/browse/LUCENE-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033479#comment-13033479 ] Uwe Schindler commented on LUCENE-3096: --- This was also already reported and answered on the java-user@lao list: [http://www.gossamer-threads.com/lists/lucene/java-user/123996] MultiSearcher does not work correctly with Not on NumericRange -- Key: LUCENE-3096 URL: https://issues.apache.org/jira/browse/LUCENE-3096 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.0.2 Reporter: John Wang Hi, Keith My colleague xiaoyang and I just confirmed that this is actually due to a lucene bug on Multisearcher. In particular, If we search with Not on NumericRange and we use MultiSearcher, we will wrong search results (However, if we use IndexSearcher, the result is correct). Basically the NotOfNumericRange does not have impact on multisearcher. We suspect it is because the createWeight() function in MultiSearcher and hope you can help us to fix this bug of lucene. I attached the code to reproduce this case. Please check it out. In the attached code, I have two separate functions : (1) testNumericRangeSingleSearcher(Query query) where I create 6 documents, with a field called id= 1,2,3,4,5,6 respectively . Then I search by the query which is +MatchAllDocs -NumericRange(3,3). The expected result then should be 5 hits since the document 3 is MUST_NOT. (2) testNumericRangeMultiSearcher(Query query) where i create 2 RamDirectory(), each of which has 3 documents, 1,2,3; and 4,5,6. Then I search by the same query as above using multiSearcher. The expected result should also be 5 hits. However, from (1), we get 5 hits = expected results, while in (2) we get 6 hits != expected results. We also experimented this with our zoie/bobo open source tools and get the same results because our multi-bobo-browser is built on multi-searcher in lucene. I already emailed the lucene community group. Hopefully we can get some feedback soon. If you have any further concern, pls let me know! Thank you very much! Code: (based on lucene 3.0.x) import java.io.IOException; import java.io.PrintStream; import java.text.DecimalFormat; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.NumericField; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.FieldCache; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.MatchAllDocsQuery; import org.apache.lucene.search.MultiSearcher; import org.apache.lucene.search.NumericRangeQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Searchable; import org.apache.lucene.search.Sort; import org.apache.lucene.search.SortField; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.BooleanClause.Occur; import org.apache.lucene.store.Directory; import org.apache.lucene.store.LockObtainFailedException; import org.apache.lucene.store.RAMDirectory; import com.convertlucene.ConvertFrom2To3; public class TestNumericRange { public final static void main(String[] args) { try { BooleanQuery query = new BooleanQuery(); query.add(NumericRangeQuery.newIntRange(numId, 3, 3, true, true), Occur.MUST_NOT); query.add(new MatchAllDocsQuery(), Occur.MUST); testNumericRangeSingleSearcher(query); testNumericRangeMultiSearcher(query); } catch(Exception e) { e.printStackTrace(); } } public static void testNumericRangeSingleSearcher(Query query) throws CorruptIndexException, LockObtainFailedException, IOException { String[] ids = {1, 2, 3, 4, 5, 6}; Directory directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), IndexWriter.MaxFieldLength.UNLIMITED); for (int i = 0; i ids.length; i++) { Document doc = new Document(); doc.add(new Field(id, ids[i], Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new NumericField(numId).setIntValue(Integer.valueOf(ids[i]))); writer.addDocument(doc); } writer.close(); IndexSearcher searcher = new IndexSearcher(directory); TopDocs docs = searcher.search(query, 10); System.out.println(SingleSearcher: testNumericRange: hitNum: + docs.totalHits); for(ScoreDoc doc : docs.scoreDocs)
[jira] [Commented] (LUCENE-3096) MultiSearcher does not work correctly with Not on NumericRange
[ https://issues.apache.org/jira/browse/LUCENE-3096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033504#comment-13033504 ] Uwe Schindler commented on LUCENE-3096: --- An alternative way to fix this in 3.0 (without giving up to use MultiSearcher) is to set the rewrite mode of MultiTermQueries (like NumericRangeQuery) to CONSTANT_SCORE_REWRITE. But this only fixes the bug for those queries (as no BooleanQuery is used during rewrite). Alltogether, negative queries in MultiSearcher are broken and it depends on index contents if the bug actually affects search results. MultiSearcher does not work correctly with Not on NumericRange -- Key: LUCENE-3096 URL: https://issues.apache.org/jira/browse/LUCENE-3096 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 3.0.2 Reporter: John Wang Fix For: 3.1 Hi, Keith My colleague xiaoyang and I just confirmed that this is actually due to a lucene bug on Multisearcher. In particular, If we search with Not on NumericRange and we use MultiSearcher, we will wrong search results (However, if we use IndexSearcher, the result is correct). Basically the NotOfNumericRange does not have impact on multisearcher. We suspect it is because the createWeight() function in MultiSearcher and hope you can help us to fix this bug of lucene. I attached the code to reproduce this case. Please check it out. In the attached code, I have two separate functions : (1) testNumericRangeSingleSearcher(Query query) where I create 6 documents, with a field called id= 1,2,3,4,5,6 respectively . Then I search by the query which is +MatchAllDocs -NumericRange(3,3). The expected result then should be 5 hits since the document 3 is MUST_NOT. (2) testNumericRangeMultiSearcher(Query query) where i create 2 RamDirectory(), each of which has 3 documents, 1,2,3; and 4,5,6. Then I search by the same query as above using multiSearcher. The expected result should also be 5 hits. However, from (1), we get 5 hits = expected results, while in (2) we get 6 hits != expected results. We also experimented this with our zoie/bobo open source tools and get the same results because our multi-bobo-browser is built on multi-searcher in lucene. I already emailed the lucene community group. Hopefully we can get some feedback soon. If you have any further concern, pls let me know! Thank you very much! Code: (based on lucene 3.0.x) import java.io.IOException; import java.io.PrintStream; import java.text.DecimalFormat; import org.apache.lucene.analysis.WhitespaceAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.document.NumericField; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.Term; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.FieldCache; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.MatchAllDocsQuery; import org.apache.lucene.search.MultiSearcher; import org.apache.lucene.search.NumericRangeQuery; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Searchable; import org.apache.lucene.search.Sort; import org.apache.lucene.search.SortField; import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TopDocs; import org.apache.lucene.search.BooleanClause.Occur; import org.apache.lucene.store.Directory; import org.apache.lucene.store.LockObtainFailedException; import org.apache.lucene.store.RAMDirectory; import com.convertlucene.ConvertFrom2To3; public class TestNumericRange { public final static void main(String[] args) { try { BooleanQuery query = new BooleanQuery(); query.add(NumericRangeQuery.newIntRange(numId, 3, 3, true, true), Occur.MUST_NOT); query.add(new MatchAllDocsQuery(), Occur.MUST); testNumericRangeSingleSearcher(query); testNumericRangeMultiSearcher(query); } catch(Exception e) { e.printStackTrace(); } } public static void testNumericRangeSingleSearcher(Query query) throws CorruptIndexException, LockObtainFailedException, IOException { String[] ids = {1, 2, 3, 4, 5, 6}; Directory directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), IndexWriter.MaxFieldLength.UNLIMITED); for (int i = 0; i ids.length; i++) { Document doc = new Document(); doc.add(new Field(id, ids[i], Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new