[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662907#action_12662907 ] Paul Elschot commented on LUCENE-1345: -- To add a Filter is as a clause to a BooleanQuery, I would prefer to not give it a Weight. Instead I'd like the addition of a required Filter to behave exactly like the current Searcher(Query, Filter) API. That also touches another point: backward compatibility with BooleanQuery and Searcher. It's certainly possible to add scoring behaviour to a Filter when it is added to a BooleanQuery. A default score value could be used, and also a default coordination behaviour. In principle it is also possible to add a disjunction of Filters to a BooleanQuery, even with a minimum number of required filters. For this case a score value does make sense. Required Filters and for prohibited Filters could be added to a BooleanQuery without scoring behaviour. In fact, for prohibited Queries, the score value is never used, so one might even constrain prohibited clauses to be Filters only. Most, if not all, of the scoring behaviour for Filters that was discussed so far can be obtained by using a ConstantScoreQuery based on a Filter and adding it to a BooleanQuery. So I think it would be cleaner to keep the scoring yes/no distinction between Queries and Filters. In case a simplified interface is desired this could then use any of the options available, for example always wrapping a Filter in a ConstantScoreQuery, and then composing a BooleanQuery only from Query clauses. Allow Filter as clause to BooleanQuery -- Key: LUCENE-1345 URL: https://issues.apache.org/jira/browse/LUCENE-1345 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Paul Elschot Priority: Minor Fix For: 2.9 Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1479) TrecDocMaker skips over documents when Date is missing from documents
[ https://issues.apache.org/jira/browse/LUCENE-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662941#action_12662941 ] Michael McCandless commented on LUCENE-1479: Patch looks great Shai! I'll commit shortly. TrecDocMaker skips over documents when Date is missing from documents --- Key: LUCENE-1479 URL: https://issues.apache.org/jira/browse/LUCENE-1479 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Shai Erera Assignee: Michael McCandless Fix For: 2.4.1, 2.9 Attachments: LUCENE-1479-2.patch, LUCENE-1479.patch TrecDocMaker skips over Trec documents if they do not have a Date line. When such a document is encountered, the code may skip over several documents until the next tag that is searched for is found. The result is, instead of reading ~25M documents from the GOV2 collection, the code reads only ~23M (don't remember the actual numbers). The fix adds a terminatingTag to read() such that the code looks for prefix, but only until terminatingTag is found. Appropriate changes were made in getNextDocData(). Patch to follow -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1479) TrecDocMaker skips over documents when Date is missing from documents
[ https://issues.apache.org/jira/browse/LUCENE-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1479. Resolution: Fixed Fix Version/s: (was: 2.4.1) Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Committed revision 733697. Thanks Shai! TrecDocMaker skips over documents when Date is missing from documents --- Key: LUCENE-1479 URL: https://issues.apache.org/jira/browse/LUCENE-1479 Project: Lucene - Java Issue Type: Bug Components: contrib/benchmark Reporter: Shai Erera Assignee: Michael McCandless Fix For: 2.9 Attachments: LUCENE-1479-2.patch, LUCENE-1479.patch TrecDocMaker skips over Trec documents if they do not have a Date line. When such a document is encountered, the code may skip over several documents until the next tag that is searched for is found. The result is, instead of reading ~25M documents from the GOV2 collection, the code reads only ~23M (don't remember the actual numbers). The fix adds a terminatingTag to read() such that the code looks for prefix, but only until terminatingTag is found. Appropriate changes were made in getNextDocData(). Patch to follow -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: How to see the indexed terms
Have a look at the TermEnum and TermDocs classes. http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/index/TermEnum.html Also, next time please use java-u...@lucene.apache.org for usage questions. Java-dev is for discussion on building the internals of Lucene and java-user is for usage. On Jan 12, 2009, at 4:37 AM, ayyanar wrote: I need to see the indexed terms of all my lucene documents. How to see? Luke not shows the terms -- View this message in context: http://www.nabble.com/How-to-see-the-indexed-terms-tp21411226p21411226.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662984#action_12662984 ] Earwin Burrfoot commented on LUCENE-1345: - What about complete merge of filters/queries, and deciding whether to score/use constant score/don't score when adding a query to BooleanQuery (or AND/OR/NOT alternative)? Something along the lines of: boolQuery.add(new TermQuery(..), SHOULD, NO_SCORE) Allow Filter as clause to BooleanQuery -- Key: LUCENE-1345 URL: https://issues.apache.org/jira/browse/LUCENE-1345 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Paul Elschot Priority: Minor Fix For: 2.9 Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662993#action_12662993 ] Paul Elschot commented on LUCENE-1345: -- This: bq. boolQuery.add(new TermQuery(..), SHOULD, NO_SCORE) can be done (with the patch here applied) by: boolQuery.add(new QueryWrapperFilter(new TermQuery(..), SHOULD) . I'll post a working version of the patch within a few days. It's better to discuss on working code than on ideas only. Allow Filter as clause to BooleanQuery -- Key: LUCENE-1345 URL: https://issues.apache.org/jira/browse/LUCENE-1345 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Paul Elschot Priority: Minor Fix For: 2.9 Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1345) Allow Filter as clause to BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662993#action_12662993 ] paul.elsc...@xs4all.nl edited comment on LUCENE-1345 at 1/12/09 8:38 AM: --- This: bq. boolQuery.add(new TermQuery(..), SHOULD, NO_SCORE) can be done (with the patch here applied) by: boolQuery.add(new QueryWrapperFilter(new TermQuery(..), MUST) . (SHOULD cannot be used for filters as clauses). I'll post a working version of the patch within a few days. It's better to discuss on working code than on ideas only. was (Author: paul.elsc...@xs4all.nl): This: bq. boolQuery.add(new TermQuery(..), SHOULD, NO_SCORE) can be done (with the patch here applied) by: boolQuery.add(new QueryWrapperFilter(new TermQuery(..), SHOULD) . I'll post a working version of the patch within a few days. It's better to discuss on working code than on ideas only. Allow Filter as clause to BooleanQuery -- Key: LUCENE-1345 URL: https://issues.apache.org/jira/browse/LUCENE-1345 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Paul Elschot Priority: Minor Fix For: 2.9 Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662998#action_12662998 ] Marvin Humphrey commented on LUCENE-1345: - (SHOULD cannot be used for filters as clauses). It doesn't have to be that way. In KS, QueryFilter is a Query, which you can add as a clause to an ORQuery or a RequiredOptionalQuery. Docs which match only the QueryFilter are fed to the HitCollector with a score of 0.0. Allow Filter as clause to BooleanQuery -- Key: LUCENE-1345 URL: https://issues.apache.org/jira/browse/LUCENE-1345 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Paul Elschot Priority: Minor Fix For: 2.9 Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663021#action_12663021 ] Marvin Humphrey commented on LUCENE-1345: - Uwe Schindler: Maybe I should create an new JIRA issue out of my suggestion to merge Filters and Queries? In my opinion, this is something nice to have in 3.0. I agree with this tack, having taken it in KS. However, I don't think we have consensus as far as the best approach yet, so perhaps it would be beneficial to hash things out on the mailing list first. Allow Filter as clause to BooleanQuery -- Key: LUCENE-1345 URL: https://issues.apache.org/jira/browse/LUCENE-1345 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Paul Elschot Priority: Minor Fix For: 2.9 Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1314) IndexReader.clone
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663023#action_12663023 ] Jason Rutherglen commented on LUCENE-1314: -- Fixing the norms bytes loading is good, it seemed incorrect but I didn't want to mess with it as I didn't fully understand it. I executed TestIndexReaderReopen 7 times and did not see the error. IndexReader.clone - Key: LUCENE-1314 URL: https://issues.apache.org/jira/browse/LUCENE-1314 Project: Lucene - Java Issue Type: New Feature Components: Index Affects Versions: 2.3.1 Reporter: Jason Rutherglen Assignee: Michael McCandless Priority: Minor Fix For: 2.9 Attachments: LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch Based on discussion http://www.nabble.com/IndexReader.reopen-issue-td18070256.html. The problem is reopen returns the same reader if there are no changes, so if docs are deleted from the new reader, they are also reflected in the previous reader which is not always desired behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Realtime Search
Patch #2: Implement a realtime ram index class I think this one is optional, or, rather an optimazation that we can swap in later if/when necessary? Ie for starters little segments are written into the main Directory. John, Zoie could be of use for this patch. In addition, we may want to implement flushing the IW ram buffer to a RAMDir for reading as M.M. suggested. First though the IW - IR integration LUCENE-1516 needs to be implemented otherwise it's not possible to properly execute updates in realtime. On Fri, Jan 9, 2009 at 5:39 AM, Michael McCandless luc...@mikemccandless.com wrote: Jason Rutherglen wrote: Patch #1: Expose an IndexWriter.getReader method that returns the current reader and shares the write lock I tentatively like this approach so far... That reader is opened using IndexWriter's SegmentInfos instance, so it can read segments deletions that have been flushed but not committed. It's allowed to do its own deletions norms updating. When reopen() is called, it grabs the writers SegmentInfos again. Patch #2: Implement a realtime ram index class I think this one is optional, or, rather an optimazation that we can swap in later if/when necessary? Ie for starters little segments are written into the main Directory. Patch #3: Implement realtime transactions in IndexWriter or in a subclass of IndexWriter by implementing a createTransaction method that generates a realtime Transaction object. When the transaction is flushed, the transaction index modifications are available via the getReader method of IndexWriter Can't this be layered on top? Or... are you looking to add support for multiple transactions in flight at once on IndexWriter? Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Realtime Search
Grant, Do you have a proposal in mind? It would help to suggest something like some classes and methods to help understand an alternative to what is being discussed. -J On Fri, Jan 9, 2009 at 12:05 PM, Grant Ingersoll gsing...@apache.orgwrote: I realize we aren't adding read functionality to the Writer, but it would be coupling the Writer to the Reader nonetheless. I understand it is brainstorming (like I said, not trying to distract from the discussion), just saying that if the Reader and the Writer both need access to the underlying data structures, then we should refactor to make that possible, not just glom the Reader onto the Writer. I suspect if that is done, anyway, that it may make the bigger picture a bit clearer, too. On Jan 9, 2009, at 2:53 PM, Michael McCandless wrote: Grant Ingersoll wrote: We've spent a lot of time up until now getting write functionality out of the Reader, and now we are going to add read functionality into the Writer? Well... we're not really adding read functionality into IW; instead, we are asking IW to open the reader for us, except the reader is provided the SegmentInfos it should use from IW (instead of trying to find the latest segments_N file in the Directory). Ie, what IW.getReader returns is an otherwise normal MultiSegmentReader. The goal is to allow an IndexReader to access segments flushed but not yet committed by IW. These segments are normally private to IW, in memory in its SegmentInfos instance. And this is all just thinking-out-loud-brainstorming. There are still many details to work through... Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663055#action_12663055 ] Doug Cutting commented on LUCENE-1345: -- Uwe Maybe I should create an new JIRA issue out of my suggestion to merge Filters and Queries? +1 to creating a new issue and +1 to the idea. Allow Filter as clause to BooleanQuery -- Key: LUCENE-1345 URL: https://issues.apache.org/jira/browse/LUCENE-1345 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Paul Elschot Priority: Minor Fix For: 2.9 Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-1518) Merge Query and Filter classes
Merge Query and Filter classes -- Key: LUCENE-1518 URL: https://issues.apache.org/jira/browse/LUCENE-1518 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.4 Reporter: Uwe Schindler This issue presents a patch, that merges Queries and Filters in a way, that the new Filter class extends Query. This would make it possible, to use every filter as a query. The new abstract filter class would contain all methods of ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the Filter's getDocIdSet()/bits() methods he has nothing more to do, he could just use the filter as a normal query. I do not want to completely convert Filters to ConstantScoreQueries. The idea is to combine Queries and Filters in such a way, that every Filter can automatically be used at all places where a Query can be used (e.g. also alone a search query without any other constraint). For that, the abstract Query methods must be implemented and return a default weight for Filters which is the current ConstantScore Logic. If the filter is used as a real filter (where the API wants a Filter), the getDocIdSet part could be directly used, the weight is useless (as it is currently, too). The constant score default implementation is only used when the Filter is used as a Query (e.g. as direct parameter to Searcher.search()). For the special case of BooleanQueries combining Filters and Queries the idea is, to optimize the BooleanQuery logic in such a way, that it detects if a BooleanClause is a Filter (using instanceof) and then directly uses the Filter API and not take the burden of the ConstantScoreQuery (see LUCENE-1345). Here some ideas how to implement Searcher.search() with Query and Filter: - User runs Searcher.search() using a Filter as the only parameter. As every Filter is also a ConstantScoreQuery, the query can be executed and returns score 1.0 for all matching documents. - User runs Searcher.search() using a Query as the only parameter: No change, all is the same as before - User runs Searcher.search() using a BooleanQuery as parameter: If the BooleanQuery does not contain a Query that is subclass of Filter (the new Filter) everything as usual. If the BooleanQuery only contains exactly one Filter and nothing else the Filter is used as a constant score query. If BooleanQuery contains clauses with Queries and Filters the new algorithm could be used: The queries are executed and the results filtered with the filters. For the user this has the main advantage: That he can construct his query using a simplified API without thinking about Filters oder Queries, you can just combine clauses together. The scorer/weight logic then identifies the cases to use the filter or the query weight API. Just like the query optimizer of a RDB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1518) Merge Query and Filter classes
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1518: -- Attachment: LUCENE-1518.patch This is the patch. Most tests pass. Problems are only in the tests that check the explanations (TestSimpleExplanations) when wrapping the deprecated ConstantScoreQuery. For that the test must be rewritten to directly get the explanation from the Filter class (that is now a query). ConstantScoreQuery is just a no-op that rewrites to the wrapped Filter itsself but returns no weight and no explanation. Some small problems are the handling of toString()/toString(fieldname). The abstract Filter class has no toString(fieldname), but Query has one (but abstract). So there must be a default or Field implementations must be extended to provide one (bc break). The patch uses some bad workaround for that, maybe somebody has a better idea how to handle this. Merge Query and Filter classes -- Key: LUCENE-1518 URL: https://issues.apache.org/jira/browse/LUCENE-1518 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.4 Reporter: Uwe Schindler Attachments: LUCENE-1518.patch This issue presents a patch, that merges Queries and Filters in a way, that the new Filter class extends Query. This would make it possible, to use every filter as a query. The new abstract filter class would contain all methods of ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the Filter's getDocIdSet()/bits() methods he has nothing more to do, he could just use the filter as a normal query. I do not want to completely convert Filters to ConstantScoreQueries. The idea is to combine Queries and Filters in such a way, that every Filter can automatically be used at all places where a Query can be used (e.g. also alone a search query without any other constraint). For that, the abstract Query methods must be implemented and return a default weight for Filters which is the current ConstantScore Logic. If the filter is used as a real filter (where the API wants a Filter), the getDocIdSet part could be directly used, the weight is useless (as it is currently, too). The constant score default implementation is only used when the Filter is used as a Query (e.g. as direct parameter to Searcher.search()). For the special case of BooleanQueries combining Filters and Queries the idea is, to optimize the BooleanQuery logic in such a way, that it detects if a BooleanClause is a Filter (using instanceof) and then directly uses the Filter API and not take the burden of the ConstantScoreQuery (see LUCENE-1345). Here some ideas how to implement Searcher.search() with Query and Filter: - User runs Searcher.search() using a Filter as the only parameter. As every Filter is also a ConstantScoreQuery, the query can be executed and returns score 1.0 for all matching documents. - User runs Searcher.search() using a Query as the only parameter: No change, all is the same as before - User runs Searcher.search() using a BooleanQuery as parameter: If the BooleanQuery does not contain a Query that is subclass of Filter (the new Filter) everything as usual. If the BooleanQuery only contains exactly one Filter and nothing else the Filter is used as a constant score query. If BooleanQuery contains clauses with Queries and Filters the new algorithm could be used: The queries are executed and the results filtered with the filters. For the user this has the main advantage: That he can construct his query using a simplified API without thinking about Filters oder Queries, you can just combine clauses together. The scorer/weight logic then identifies the cases to use the filter or the query weight API. Just like the query optimizer of a RDB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1518) Merge Query and Filter classes
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663069#action_12663069 ] Uwe Schindler commented on LUCENE-1518: --- Further patches must now remove the deprecated ConstantScoreQuery from the core and contrib classes (RangeQuery gets identical to RangeFilter and so on). The rewrite method (inherited from the Filter API) may then be changed in RangeQuery to return a BooleanQuery when useConstantScoreRewrite==false. In this case the RangeFilter gets the same as the Query and when used as Query, can also rewrite to TermQueries. But as I know, this is to be removed in Lucene 3.0. In this case every RangeFilter and the others can simply be constant score or filters. Merge Query and Filter classes -- Key: LUCENE-1518 URL: https://issues.apache.org/jira/browse/LUCENE-1518 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.4 Reporter: Uwe Schindler Attachments: LUCENE-1518.patch This issue presents a patch, that merges Queries and Filters in a way, that the new Filter class extends Query. This would make it possible, to use every filter as a query. The new abstract filter class would contain all methods of ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the Filter's getDocIdSet()/bits() methods he has nothing more to do, he could just use the filter as a normal query. I do not want to completely convert Filters to ConstantScoreQueries. The idea is to combine Queries and Filters in such a way, that every Filter can automatically be used at all places where a Query can be used (e.g. also alone a search query without any other constraint). For that, the abstract Query methods must be implemented and return a default weight for Filters which is the current ConstantScore Logic. If the filter is used as a real filter (where the API wants a Filter), the getDocIdSet part could be directly used, the weight is useless (as it is currently, too). The constant score default implementation is only used when the Filter is used as a Query (e.g. as direct parameter to Searcher.search()). For the special case of BooleanQueries combining Filters and Queries the idea is, to optimize the BooleanQuery logic in such a way, that it detects if a BooleanClause is a Filter (using instanceof) and then directly uses the Filter API and not take the burden of the ConstantScoreQuery (see LUCENE-1345). Here some ideas how to implement Searcher.search() with Query and Filter: - User runs Searcher.search() using a Filter as the only parameter. As every Filter is also a ConstantScoreQuery, the query can be executed and returns score 1.0 for all matching documents. - User runs Searcher.search() using a Query as the only parameter: No change, all is the same as before - User runs Searcher.search() using a BooleanQuery as parameter: If the BooleanQuery does not contain a Query that is subclass of Filter (the new Filter) everything as usual. If the BooleanQuery only contains exactly one Filter and nothing else the Filter is used as a constant score query. If BooleanQuery contains clauses with Queries and Filters the new algorithm could be used: The queries are executed and the results filtered with the filters. For the user this has the main advantage: That he can construct his query using a simplified API without thinking about Filters oder Queries, you can just combine clauses together. The scorer/weight logic then identifies the cases to use the filter or the query weight API. Just like the query optimizer of a RDB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663070#action_12663070 ] Uwe Schindler commented on LUCENE-1345: --- I created and linked a new issue LUCENE-1518, that handles the merge suggestion. I also included all relevant comments from me about this. Allow Filter as clause to BooleanQuery -- Key: LUCENE-1345 URL: https://issues.apache.org/jira/browse/LUCENE-1345 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Paul Elschot Priority: Minor Fix For: 2.9 Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Realtime Search
Just thinking out loud... haven't looked at your patch yet (one of these days I will be back up for air) My initial thought is that you would have a factory that produced both the Reader and the Writer as a pair, or was at least aware of what to go get from the Writer Something like: class IndexFactory{ IndexWriter getWriter() IndexReader getReader() //Not sure if this is needed yet, but IndexReader getReader(IndexWriter) } The factory (or whatever you want to call it) is responsible for making sure the Writer and Reader have the pieces they need, i.e. the SegmentInfos. The first getReader will get you the plain old Reader that everyone knows and loves today (assuming there is a benefit to keeping it around), the second one knows what to get off the Writer to create the appropriate Reader. It's nothing particularly hard to implement over what you are proposing, I don't think. Just trying to keep the Reader out of the Writer from an API cleanliness standpoint. -Grant On Jan 12, 2009, at 12:55 PM, Jason Rutherglen wrote: Grant, Do you have a proposal in mind? It would help to suggest something like some classes and methods to help understand an alternative to what is being discussed. -J On Fri, Jan 9, 2009 at 12:05 PM, Grant Ingersoll gsing...@apache.org wrote: I realize we aren't adding read functionality to the Writer, but it would be coupling the Writer to the Reader nonetheless. I understand it is brainstorming (like I said, not trying to distract from the discussion), just saying that if the Reader and the Writer both need access to the underlying data structures, then we should refactor to make that possible, not just glom the Reader onto the Writer. I suspect if that is done, anyway, that it may make the bigger picture a bit clearer, too. On Jan 9, 2009, at 2:53 PM, Michael McCandless wrote: Grant Ingersoll wrote: We've spent a lot of time up until now getting write functionality out of the Reader, and now we are going to add read functionality into the Writer? Well... we're not really adding read functionality into IW; instead, we are asking IW to open the reader for us, except the reader is provided the SegmentInfos it should use from IW (instead of trying to find the latest segments_N file in the Directory). Ie, what IW.getReader returns is an otherwise normal MultiSegmentReader. The goal is to allow an IndexReader to access segments flushed but not yet committed by IW. These segments are normally private to IW, in memory in its SegmentInfos instance. And this is all just thinking-out-loud-brainstorming. There are still many details to work through... Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663075#action_12663075 ] Paul Elschot commented on LUCENE-1345: -- Ok, I'll wait for LUCENE-1518. Allow Filter as clause to BooleanQuery -- Key: LUCENE-1345 URL: https://issues.apache.org/jira/browse/LUCENE-1345 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Paul Elschot Priority: Minor Fix For: 2.9 Attachments: booleansetperf.txt, DisjunctionDISI.java, DisjunctionDISI.patch, DisjunctionDISI.patch, LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, TestIteratorPerf.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1518) Merge Query and Filter classes
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663077#action_12663077 ] Marvin Humphrey commented on LUCENE-1518: - the query can be executed and returns score 1.0 for all matching documents Why 1.0 and not 0.0? I chose to use 0.0 in KS because then a Filter would effectively perform only binary filtering and never affect scores. Perhaps your envisioned query optimization algo ensures that the Filter will only serve as a DocIDSetIterator unless it's used as a top-level Query? Can we agree that a Filter used as a sub-clause in a complex query should not contribute to the aggregate score? Merge Query and Filter classes -- Key: LUCENE-1518 URL: https://issues.apache.org/jira/browse/LUCENE-1518 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.4 Reporter: Uwe Schindler Attachments: LUCENE-1518.patch This issue presents a patch, that merges Queries and Filters in a way, that the new Filter class extends Query. This would make it possible, to use every filter as a query. The new abstract filter class would contain all methods of ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the Filter's getDocIdSet()/bits() methods he has nothing more to do, he could just use the filter as a normal query. I do not want to completely convert Filters to ConstantScoreQueries. The idea is to combine Queries and Filters in such a way, that every Filter can automatically be used at all places where a Query can be used (e.g. also alone a search query without any other constraint). For that, the abstract Query methods must be implemented and return a default weight for Filters which is the current ConstantScore Logic. If the filter is used as a real filter (where the API wants a Filter), the getDocIdSet part could be directly used, the weight is useless (as it is currently, too). The constant score default implementation is only used when the Filter is used as a Query (e.g. as direct parameter to Searcher.search()). For the special case of BooleanQueries combining Filters and Queries the idea is, to optimize the BooleanQuery logic in such a way, that it detects if a BooleanClause is a Filter (using instanceof) and then directly uses the Filter API and not take the burden of the ConstantScoreQuery (see LUCENE-1345). Here some ideas how to implement Searcher.search() with Query and Filter: - User runs Searcher.search() using a Filter as the only parameter. As every Filter is also a ConstantScoreQuery, the query can be executed and returns score 1.0 for all matching documents. - User runs Searcher.search() using a Query as the only parameter: No change, all is the same as before - User runs Searcher.search() using a BooleanQuery as parameter: If the BooleanQuery does not contain a Query that is subclass of Filter (the new Filter) everything as usual. If the BooleanQuery only contains exactly one Filter and nothing else the Filter is used as a constant score query. If BooleanQuery contains clauses with Queries and Filters the new algorithm could be used: The queries are executed and the results filtered with the filters. For the user this has the main advantage: That he can construct his query using a simplified API without thinking about Filters oder Queries, you can just combine clauses together. The scorer/weight logic then identifies the cases to use the filter or the query weight API. Just like the query optimizer of a RDB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1518) Merge Query and Filter classes
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663077#action_12663077 ] creamyg edited comment on LUCENE-1518 at 1/12/09 12:40 PM: --- the query can be executed and returns score 1.0 for all matching documents Why 1.0 and not 0.0? I chose to use 0.0 in KS because then a Filter would effectively perform only binary filtering and never affect scores. Perhaps your envisioned query optimization algo ensures that the Filter will only serve as a DocIDSetIterator unless it's used as a top-level Query? Can we agree that a Filter used as a sub-clause in a complex query should not contribute to the aggregate score? UPDATE: A closer reading reveals that the original issue text contains the answer to this question (we agree), so the only remaining question is whether a top level query would return hits with scores of 0.0 or 1.0. Which is probably a bikeshed painting issue. was (Author: creamyg): the query can be executed and returns score 1.0 for all matching documents Why 1.0 and not 0.0? I chose to use 0.0 in KS because then a Filter would effectively perform only binary filtering and never affect scores. Perhaps your envisioned query optimization algo ensures that the Filter will only serve as a DocIDSetIterator unless it's used as a top-level Query? Can we agree that a Filter used as a sub-clause in a complex query should not contribute to the aggregate score? Merge Query and Filter classes -- Key: LUCENE-1518 URL: https://issues.apache.org/jira/browse/LUCENE-1518 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.4 Reporter: Uwe Schindler Attachments: LUCENE-1518.patch This issue presents a patch, that merges Queries and Filters in a way, that the new Filter class extends Query. This would make it possible, to use every filter as a query. The new abstract filter class would contain all methods of ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the Filter's getDocIdSet()/bits() methods he has nothing more to do, he could just use the filter as a normal query. I do not want to completely convert Filters to ConstantScoreQueries. The idea is to combine Queries and Filters in such a way, that every Filter can automatically be used at all places where a Query can be used (e.g. also alone a search query without any other constraint). For that, the abstract Query methods must be implemented and return a default weight for Filters which is the current ConstantScore Logic. If the filter is used as a real filter (where the API wants a Filter), the getDocIdSet part could be directly used, the weight is useless (as it is currently, too). The constant score default implementation is only used when the Filter is used as a Query (e.g. as direct parameter to Searcher.search()). For the special case of BooleanQueries combining Filters and Queries the idea is, to optimize the BooleanQuery logic in such a way, that it detects if a BooleanClause is a Filter (using instanceof) and then directly uses the Filter API and not take the burden of the ConstantScoreQuery (see LUCENE-1345). Here some ideas how to implement Searcher.search() with Query and Filter: - User runs Searcher.search() using a Filter as the only parameter. As every Filter is also a ConstantScoreQuery, the query can be executed and returns score 1.0 for all matching documents. - User runs Searcher.search() using a Query as the only parameter: No change, all is the same as before - User runs Searcher.search() using a BooleanQuery as parameter: If the BooleanQuery does not contain a Query that is subclass of Filter (the new Filter) everything as usual. If the BooleanQuery only contains exactly one Filter and nothing else the Filter is used as a constant score query. If BooleanQuery contains clauses with Queries and Filters the new algorithm could be used: The queries are executed and the results filtered with the filters. For the user this has the main advantage: That he can construct his query using a simplified API without thinking about Filters oder Queries, you can just combine clauses together. The scorer/weight logic then identifies the cases to use the filter or the query weight API. Just like the query optimizer of a RDB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1518) Merge Query and Filter classes
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663090#action_12663090 ] Doug Cutting commented on LUCENE-1518: -- Why 1.0 and not 0.0? 0.0 does seem more appropriate, since scores are typically added, not multiplied. There used to be places that filtered anything with a 0.0 score however. Are any of those left? Merge Query and Filter classes -- Key: LUCENE-1518 URL: https://issues.apache.org/jira/browse/LUCENE-1518 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.4 Reporter: Uwe Schindler Attachments: LUCENE-1518.patch This issue presents a patch, that merges Queries and Filters in a way, that the new Filter class extends Query. This would make it possible, to use every filter as a query. The new abstract filter class would contain all methods of ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the Filter's getDocIdSet()/bits() methods he has nothing more to do, he could just use the filter as a normal query. I do not want to completely convert Filters to ConstantScoreQueries. The idea is to combine Queries and Filters in such a way, that every Filter can automatically be used at all places where a Query can be used (e.g. also alone a search query without any other constraint). For that, the abstract Query methods must be implemented and return a default weight for Filters which is the current ConstantScore Logic. If the filter is used as a real filter (where the API wants a Filter), the getDocIdSet part could be directly used, the weight is useless (as it is currently, too). The constant score default implementation is only used when the Filter is used as a Query (e.g. as direct parameter to Searcher.search()). For the special case of BooleanQueries combining Filters and Queries the idea is, to optimize the BooleanQuery logic in such a way, that it detects if a BooleanClause is a Filter (using instanceof) and then directly uses the Filter API and not take the burden of the ConstantScoreQuery (see LUCENE-1345). Here some ideas how to implement Searcher.search() with Query and Filter: - User runs Searcher.search() using a Filter as the only parameter. As every Filter is also a ConstantScoreQuery, the query can be executed and returns score 1.0 for all matching documents. - User runs Searcher.search() using a Query as the only parameter: No change, all is the same as before - User runs Searcher.search() using a BooleanQuery as parameter: If the BooleanQuery does not contain a Query that is subclass of Filter (the new Filter) everything as usual. If the BooleanQuery only contains exactly one Filter and nothing else the Filter is used as a constant score query. If BooleanQuery contains clauses with Queries and Filters the new algorithm could be used: The queries are executed and the results filtered with the filters. For the user this has the main advantage: That he can construct his query using a simplified API without thinking about Filters oder Queries, you can just combine clauses together. The scorer/weight logic then identifies the cases to use the filter or the query weight API. Just like the query optimizer of a RDB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1518) Merge Query and Filter classes
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663091#action_12663091 ] Uwe Schindler commented on LUCENE-1518: --- bq. Why 1.0 and not 0.0? I chose to use 0.0 in KS because then a Filter would effectively perform only binary filtering and never affect scores. You are right. I was coming from the ConstantScoreQuery that has a score of 1.0 in the result. {quote} Perhaps your envisioned query optimization algo ensures that the Filter will only serve as a DocIDSetIterator unless it's used as a top-level Query? Can we agree that a Filter used as a sub-clause in a complex query should not contribute to the aggregate score? UPDATE: A closer reading reveals that the original issue text contains the answer to this question (we agree), so the only remaining question is whether a top level query would return hits with scores of 0.0 or 1.0. Which is probably a bikeshed painting issue. {quote} This is exactly what I was thinking about. But it is not only a top level query. In the case of a BooleanQuery containing only one clause that is a filter, the constant score implementation must also used (as a Filter alone is useless). ConstantScoreQuery returns 1.0 if executed alone or alone in a BooleanQuery. For backwards compatibility, I think we should just use the following logic: A filter is a query, but can also be used as a query (if alone). The default implementation for this is a constant score query as currently when wrapping a filter with ConstantScoreQuery. In all other cases (combined with other boolean clauses and not alone), the score calculation is removed and you can say 0.0f (if additive). Merge Query and Filter classes -- Key: LUCENE-1518 URL: https://issues.apache.org/jira/browse/LUCENE-1518 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.4 Reporter: Uwe Schindler Attachments: LUCENE-1518.patch This issue presents a patch, that merges Queries and Filters in a way, that the new Filter class extends Query. This would make it possible, to use every filter as a query. The new abstract filter class would contain all methods of ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the Filter's getDocIdSet()/bits() methods he has nothing more to do, he could just use the filter as a normal query. I do not want to completely convert Filters to ConstantScoreQueries. The idea is to combine Queries and Filters in such a way, that every Filter can automatically be used at all places where a Query can be used (e.g. also alone a search query without any other constraint). For that, the abstract Query methods must be implemented and return a default weight for Filters which is the current ConstantScore Logic. If the filter is used as a real filter (where the API wants a Filter), the getDocIdSet part could be directly used, the weight is useless (as it is currently, too). The constant score default implementation is only used when the Filter is used as a Query (e.g. as direct parameter to Searcher.search()). For the special case of BooleanQueries combining Filters and Queries the idea is, to optimize the BooleanQuery logic in such a way, that it detects if a BooleanClause is a Filter (using instanceof) and then directly uses the Filter API and not take the burden of the ConstantScoreQuery (see LUCENE-1345). Here some ideas how to implement Searcher.search() with Query and Filter: - User runs Searcher.search() using a Filter as the only parameter. As every Filter is also a ConstantScoreQuery, the query can be executed and returns score 1.0 for all matching documents. - User runs Searcher.search() using a Query as the only parameter: No change, all is the same as before - User runs Searcher.search() using a BooleanQuery as parameter: If the BooleanQuery does not contain a Query that is subclass of Filter (the new Filter) everything as usual. If the BooleanQuery only contains exactly one Filter and nothing else the Filter is used as a constant score query. If BooleanQuery contains clauses with Queries and Filters the new algorithm could be used: The queries are executed and the results filtered with the filters. For the user this has the main advantage: That he can construct his query using a simplified API without thinking about Filters oder Queries, you can just combine clauses together. The scorer/weight logic then identifies the cases to use the filter or the query weight API. Just like the query optimizer of a RDB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To
[jira] Commented: (LUCENE-1518) Merge Query and Filter classes
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663120#action_12663120 ] Eks Dev commented on LUCENE-1518: - nice, you did it top down (api), Paul takes it bottom up (speed). this makes some really crazy things possible, e.g. implementing normal TermQuery as a DirectFilter and when the optimization of the BooleanQuery gets done (no Score calculation, direct usage of DocIdSetIterators) you can speed up some queries containing TermQuery without really instantiating Filter. Of course only for cases where tf/idf/norm can be ignored. Kind of middle-ground between Filter and full ranked TermQuery (better said any BooleanQuery!), Faster than ranked case due to the switched off score calculation and more comfortable than Filter usage, no instantiation of DocIdSet-s... very nice indeed, smooth mix between ranked and pure boolean model with both benefits. Merge Query and Filter classes -- Key: LUCENE-1518 URL: https://issues.apache.org/jira/browse/LUCENE-1518 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.4 Reporter: Uwe Schindler Attachments: LUCENE-1518.patch This issue presents a patch, that merges Queries and Filters in a way, that the new Filter class extends Query. This would make it possible, to use every filter as a query. The new abstract filter class would contain all methods of ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the Filter's getDocIdSet()/bits() methods he has nothing more to do, he could just use the filter as a normal query. I do not want to completely convert Filters to ConstantScoreQueries. The idea is to combine Queries and Filters in such a way, that every Filter can automatically be used at all places where a Query can be used (e.g. also alone a search query without any other constraint). For that, the abstract Query methods must be implemented and return a default weight for Filters which is the current ConstantScore Logic. If the filter is used as a real filter (where the API wants a Filter), the getDocIdSet part could be directly used, the weight is useless (as it is currently, too). The constant score default implementation is only used when the Filter is used as a Query (e.g. as direct parameter to Searcher.search()). For the special case of BooleanQueries combining Filters and Queries the idea is, to optimize the BooleanQuery logic in such a way, that it detects if a BooleanClause is a Filter (using instanceof) and then directly uses the Filter API and not take the burden of the ConstantScoreQuery (see LUCENE-1345). Here some ideas how to implement Searcher.search() with Query and Filter: - User runs Searcher.search() using a Filter as the only parameter. As every Filter is also a ConstantScoreQuery, the query can be executed and returns score 1.0 for all matching documents. - User runs Searcher.search() using a Query as the only parameter: No change, all is the same as before - User runs Searcher.search() using a BooleanQuery as parameter: If the BooleanQuery does not contain a Query that is subclass of Filter (the new Filter) everything as usual. If the BooleanQuery only contains exactly one Filter and nothing else the Filter is used as a constant score query. If BooleanQuery contains clauses with Queries and Filters the new algorithm could be used: The queries are executed and the results filtered with the filters. For the user this has the main advantage: That he can construct his query using a simplified API without thinking about Filters oder Queries, you can just combine clauses together. The scorer/weight logic then identifies the cases to use the filter or the query weight API. Just like the query optimizer of a RDB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1518) Merge Query and Filter classes
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663131#action_12663131 ] Paul Elschot commented on LUCENE-1518: -- bq. There used to be places that filtered anything with a 0.0 score however. Are any of those left? They should be gone by now, even from the javadocs. Merge Query and Filter classes -- Key: LUCENE-1518 URL: https://issues.apache.org/jira/browse/LUCENE-1518 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.4 Reporter: Uwe Schindler Attachments: LUCENE-1518.patch This issue presents a patch, that merges Queries and Filters in a way, that the new Filter class extends Query. This would make it possible, to use every filter as a query. The new abstract filter class would contain all methods of ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the Filter's getDocIdSet()/bits() methods he has nothing more to do, he could just use the filter as a normal query. I do not want to completely convert Filters to ConstantScoreQueries. The idea is to combine Queries and Filters in such a way, that every Filter can automatically be used at all places where a Query can be used (e.g. also alone a search query without any other constraint). For that, the abstract Query methods must be implemented and return a default weight for Filters which is the current ConstantScore Logic. If the filter is used as a real filter (where the API wants a Filter), the getDocIdSet part could be directly used, the weight is useless (as it is currently, too). The constant score default implementation is only used when the Filter is used as a Query (e.g. as direct parameter to Searcher.search()). For the special case of BooleanQueries combining Filters and Queries the idea is, to optimize the BooleanQuery logic in such a way, that it detects if a BooleanClause is a Filter (using instanceof) and then directly uses the Filter API and not take the burden of the ConstantScoreQuery (see LUCENE-1345). Here some ideas how to implement Searcher.search() with Query and Filter: - User runs Searcher.search() using a Filter as the only parameter. As every Filter is also a ConstantScoreQuery, the query can be executed and returns score 1.0 for all matching documents. - User runs Searcher.search() using a Query as the only parameter: No change, all is the same as before - User runs Searcher.search() using a BooleanQuery as parameter: If the BooleanQuery does not contain a Query that is subclass of Filter (the new Filter) everything as usual. If the BooleanQuery only contains exactly one Filter and nothing else the Filter is used as a constant score query. If BooleanQuery contains clauses with Queries and Filters the new algorithm could be used: The queries are executed and the results filtered with the filters. For the user this has the main advantage: That he can construct his query using a simplified API without thinking about Filters oder Queries, you can just combine clauses together. The scorer/weight logic then identifies the cases to use the filter or the query weight API. Just like the query optimizer of a RDB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1518) Merge Query and Filter classes
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663133#action_12663133 ] Jason Rutherglen commented on LUCENE-1518: -- Hopefully this will allow caching of individual term queries even if they are a part of a boolean query? Merge Query and Filter classes -- Key: LUCENE-1518 URL: https://issues.apache.org/jira/browse/LUCENE-1518 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.4 Reporter: Uwe Schindler Attachments: LUCENE-1518.patch This issue presents a patch, that merges Queries and Filters in a way, that the new Filter class extends Query. This would make it possible, to use every filter as a query. The new abstract filter class would contain all methods of ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the Filter's getDocIdSet()/bits() methods he has nothing more to do, he could just use the filter as a normal query. I do not want to completely convert Filters to ConstantScoreQueries. The idea is to combine Queries and Filters in such a way, that every Filter can automatically be used at all places where a Query can be used (e.g. also alone a search query without any other constraint). For that, the abstract Query methods must be implemented and return a default weight for Filters which is the current ConstantScore Logic. If the filter is used as a real filter (where the API wants a Filter), the getDocIdSet part could be directly used, the weight is useless (as it is currently, too). The constant score default implementation is only used when the Filter is used as a Query (e.g. as direct parameter to Searcher.search()). For the special case of BooleanQueries combining Filters and Queries the idea is, to optimize the BooleanQuery logic in such a way, that it detects if a BooleanClause is a Filter (using instanceof) and then directly uses the Filter API and not take the burden of the ConstantScoreQuery (see LUCENE-1345). Here some ideas how to implement Searcher.search() with Query and Filter: - User runs Searcher.search() using a Filter as the only parameter. As every Filter is also a ConstantScoreQuery, the query can be executed and returns score 1.0 for all matching documents. - User runs Searcher.search() using a Query as the only parameter: No change, all is the same as before - User runs Searcher.search() using a BooleanQuery as parameter: If the BooleanQuery does not contain a Query that is subclass of Filter (the new Filter) everything as usual. If the BooleanQuery only contains exactly one Filter and nothing else the Filter is used as a constant score query. If BooleanQuery contains clauses with Queries and Filters the new algorithm could be used: The queries are executed and the results filtered with the filters. For the user this has the main advantage: That he can construct his query using a simplified API without thinking about Filters oder Queries, you can just combine clauses together. The scorer/weight logic then identifies the cases to use the filter or the query weight API. Just like the query optimizer of a RDB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1518) Merge Query and Filter classes
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663134#action_12663134 ] Uwe Schindler commented on LUCENE-1518: --- In my opinion, both approches could be combined. I do not know how the scoring and the whole BooleanQery works, I did only the merging of the API. If we have consensus, that this may be good, I could remove the rest of deprecated ConstantScoreQuery. The merge then works without any problems and backwards compatible. Then Paul could adapt his patch to not create new methods to add Filter clauses to boolean queries, but do just some difference like that: if (clause.query instanceof Filter) { do only filter optimization } else { do conventional boolean scoring } If he cannot do the optimization (because the Filter is alone in BooleanQuery) he could just fall back to the standard query logic (that uses implicit the current ConstantScoreQuery algorithm). But before start implementing more for removing the deprecated class, I wanted to hear some more ideas, maybe something completely different (like every query can also automatically be a filter): Only one superclass Query no filter anymore, every query clause can implement a filter and/or a query with always a Fallback to the other side (if no filter implementation provided, a filter is provided by a algorithm like QueryFilter; if no weight/rewrite, constant score weight is provided). Just ideas... Merge Query and Filter classes -- Key: LUCENE-1518 URL: https://issues.apache.org/jira/browse/LUCENE-1518 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.4 Reporter: Uwe Schindler Attachments: LUCENE-1518.patch This issue presents a patch, that merges Queries and Filters in a way, that the new Filter class extends Query. This would make it possible, to use every filter as a query. The new abstract filter class would contain all methods of ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the Filter's getDocIdSet()/bits() methods he has nothing more to do, he could just use the filter as a normal query. I do not want to completely convert Filters to ConstantScoreQueries. The idea is to combine Queries and Filters in such a way, that every Filter can automatically be used at all places where a Query can be used (e.g. also alone a search query without any other constraint). For that, the abstract Query methods must be implemented and return a default weight for Filters which is the current ConstantScore Logic. If the filter is used as a real filter (where the API wants a Filter), the getDocIdSet part could be directly used, the weight is useless (as it is currently, too). The constant score default implementation is only used when the Filter is used as a Query (e.g. as direct parameter to Searcher.search()). For the special case of BooleanQueries combining Filters and Queries the idea is, to optimize the BooleanQuery logic in such a way, that it detects if a BooleanClause is a Filter (using instanceof) and then directly uses the Filter API and not take the burden of the ConstantScoreQuery (see LUCENE-1345). Here some ideas how to implement Searcher.search() with Query and Filter: - User runs Searcher.search() using a Filter as the only parameter. As every Filter is also a ConstantScoreQuery, the query can be executed and returns score 1.0 for all matching documents. - User runs Searcher.search() using a Query as the only parameter: No change, all is the same as before - User runs Searcher.search() using a BooleanQuery as parameter: If the BooleanQuery does not contain a Query that is subclass of Filter (the new Filter) everything as usual. If the BooleanQuery only contains exactly one Filter and nothing else the Filter is used as a constant score query. If BooleanQuery contains clauses with Queries and Filters the new algorithm could be used: The queries are executed and the results filtered with the filters. For the user this has the main advantage: That he can construct his query using a simplified API without thinking about Filters oder Queries, you can just combine clauses together. The scorer/weight logic then identifies the cases to use the filter or the query weight API. Just like the query optimizer of a RDB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1518) Merge Query and Filter classes
[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663134#action_12663134 ] thetaphi edited comment on LUCENE-1518 at 1/12/09 2:41 PM: In my opinion, both approches could be combined. I do not know how the scoring and the whole BooleanQery works, I did only the merging of the API. If we have consensus, that this may be good, I could remove the rest of deprecated ConstantScoreQuery. The new code then works without any problems and backwards compatible as before, but with no optimization. Then Paul could adapt his patch to not create new methods to add Filter clauses to BooleanQueries, but do just some difference in the whole BooleanQuery logic like that: if (clause.query instanceof Filter) { do only filter optimization from Paul's patch } else { do conventional boolean scoring } If he cannot do the optimization (because the Filter is alone in BooleanQuery) he could just fall back to the standard query logic (that uses implicit the current ConstantScoreQuery algorithm). But before start implementing more for removing the deprecated class, I wanted to hear some more ideas, maybe something completely different (like every query can also automatically be a filter): Only one superclass Query no filter anymore, every query clause can implement a filter and/or a query with always a Fallback to the other side (if no filter implementation provided, a filter is provided by a algorithm like QueryFilter; if no weight/rewrite, constant score weight is provided). Just ideas... was (Author: thetaphi): In my opinion, both approches could be combined. I do not know how the scoring and the whole BooleanQery works, I did only the merging of the API. If we have consensus, that this may be good, I could remove the rest of deprecated ConstantScoreQuery. The merge then works without any problems and backwards compatible. Then Paul could adapt his patch to not create new methods to add Filter clauses to boolean queries, but do just some difference like that: if (clause.query instanceof Filter) { do only filter optimization } else { do conventional boolean scoring } If he cannot do the optimization (because the Filter is alone in BooleanQuery) he could just fall back to the standard query logic (that uses implicit the current ConstantScoreQuery algorithm). But before start implementing more for removing the deprecated class, I wanted to hear some more ideas, maybe something completely different (like every query can also automatically be a filter): Only one superclass Query no filter anymore, every query clause can implement a filter and/or a query with always a Fallback to the other side (if no filter implementation provided, a filter is provided by a algorithm like QueryFilter; if no weight/rewrite, constant score weight is provided). Just ideas... Merge Query and Filter classes -- Key: LUCENE-1518 URL: https://issues.apache.org/jira/browse/LUCENE-1518 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.4 Reporter: Uwe Schindler Attachments: LUCENE-1518.patch This issue presents a patch, that merges Queries and Filters in a way, that the new Filter class extends Query. This would make it possible, to use every filter as a query. The new abstract filter class would contain all methods of ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the Filter's getDocIdSet()/bits() methods he has nothing more to do, he could just use the filter as a normal query. I do not want to completely convert Filters to ConstantScoreQueries. The idea is to combine Queries and Filters in such a way, that every Filter can automatically be used at all places where a Query can be used (e.g. also alone a search query without any other constraint). For that, the abstract Query methods must be implemented and return a default weight for Filters which is the current ConstantScore Logic. If the filter is used as a real filter (where the API wants a Filter), the getDocIdSet part could be directly used, the weight is useless (as it is currently, too). The constant score default implementation is only used when the Filter is used as a Query (e.g. as direct parameter to Searcher.search()). For the special case of BooleanQueries combining Filters and Queries the idea is, to optimize the BooleanQuery logic in such a way, that it detects if a BooleanClause is a Filter (using instanceof) and then directly uses the Filter API and not take the burden of the ConstantScoreQuery (see LUCENE-1345). Here some ideas how to implement Searcher.search() with Query and Filter: - User runs Searcher.search() using a Filter as the only parameter. As