[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2009-01-12 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662907#action_12662907
 ] 

Paul Elschot commented on LUCENE-1345:
--

To add a Filter is as a clause to a BooleanQuery, I would prefer to not give it 
a Weight. Instead I'd like the addition of a required Filter to behave exactly 
like the current Searcher(Query, Filter) API.
That also touches another point: backward compatibility with BooleanQuery and 
Searcher.

It's certainly possible to add scoring behaviour to a Filter when it is added 
to a BooleanQuery. A default score value could be used, and also a default 
coordination behaviour.
In principle it is also possible to add a disjunction of Filters to a 
BooleanQuery, even with a minimum number of required filters. For this case a 
score value does make sense.

Required Filters and for prohibited Filters could be added to a BooleanQuery 
without scoring behaviour. In fact, for prohibited Queries, the score value is 
never used, so one might even constrain prohibited clauses to be Filters only.

Most, if not all, of the scoring behaviour for Filters that was discussed so 
far can be obtained by using a ConstantScoreQuery based on a Filter and adding 
it to a BooleanQuery. So I think it would be cleaner to keep the scoring yes/no 
distinction between Queries and Filters. In case a simplified interface is 
desired this could then use any of the options available, for example always 
wrapping a Filter in a ConstantScoreQuery, and then composing a BooleanQuery 
only from Query clauses.


 Allow Filter as clause to BooleanQuery
 --

 Key: LUCENE-1345
 URL: https://issues.apache.org/jira/browse/LUCENE-1345
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Paul Elschot
Priority: Minor
 Fix For: 2.9

 Attachments: booleansetperf.txt, DisjunctionDISI.java, 
 DisjunctionDISI.patch, DisjunctionDISI.patch, 
 LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, 
 OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, 
 TestIteratorPerf.java




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1479) TrecDocMaker skips over documents when Date is missing from documents

2009-01-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662941#action_12662941
 ] 

Michael McCandless commented on LUCENE-1479:


Patch looks great Shai!  I'll commit shortly.

 TrecDocMaker skips over documents when Date is missing from documents
 ---

 Key: LUCENE-1479
 URL: https://issues.apache.org/jira/browse/LUCENE-1479
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
Assignee: Michael McCandless
 Fix For: 2.4.1, 2.9

 Attachments: LUCENE-1479-2.patch, LUCENE-1479.patch


 TrecDocMaker skips over Trec documents if they do not have a Date line. 
 When such a document is encountered, the code may skip over several documents 
 until the next tag that is searched for is found.
 The result is, instead of reading ~25M documents from the GOV2 collection, 
 the code reads only ~23M (don't remember the actual numbers).
 The fix adds a terminatingTag to read() such that the code looks for prefix, 
 but only until terminatingTag is found. Appropriate changes were made in 
 getNextDocData().
 Patch to follow

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1479) TrecDocMaker skips over documents when Date is missing from documents

2009-01-12 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1479.


   Resolution: Fixed
Fix Version/s: (was: 2.4.1)
Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])

Committed revision 733697.

Thanks Shai!

 TrecDocMaker skips over documents when Date is missing from documents
 ---

 Key: LUCENE-1479
 URL: https://issues.apache.org/jira/browse/LUCENE-1479
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
Assignee: Michael McCandless
 Fix For: 2.9

 Attachments: LUCENE-1479-2.patch, LUCENE-1479.patch


 TrecDocMaker skips over Trec documents if they do not have a Date line. 
 When such a document is encountered, the code may skip over several documents 
 until the next tag that is searched for is found.
 The result is, instead of reading ~25M documents from the GOV2 collection, 
 the code reads only ~23M (don't remember the actual numbers).
 The fix adds a terminatingTag to read() such that the code looks for prefix, 
 but only until terminatingTag is found. Appropriate changes were made in 
 getNextDocData().
 Patch to follow

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: How to see the indexed terms

2009-01-12 Thread Grant Ingersoll

Have a look at the TermEnum and TermDocs classes.  
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/index/TermEnum.html

Also, next time please use java-u...@lucene.apache.org for usage  
questions.  Java-dev is for discussion on building the internals of  
Lucene and java-user is for usage.


On Jan 12, 2009, at 4:37 AM, ayyanar wrote:



I need to see the indexed terms of all my lucene documents. How to  
see? Luke

not shows the terms
--
View this message in context: 
http://www.nabble.com/How-to-see-the-indexed-terms-tp21411226p21411226.html
Sent from the Lucene - Java Developer mailing list archive at  
Nabble.com.



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2009-01-12 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662984#action_12662984
 ] 

Earwin Burrfoot commented on LUCENE-1345:
-

What about complete merge of filters/queries, and deciding whether to score/use 
constant score/don't score when adding a query to BooleanQuery (or AND/OR/NOT 
alternative)?
Something along the lines of: boolQuery.add(new TermQuery(..), SHOULD, NO_SCORE)

 Allow Filter as clause to BooleanQuery
 --

 Key: LUCENE-1345
 URL: https://issues.apache.org/jira/browse/LUCENE-1345
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Paul Elschot
Priority: Minor
 Fix For: 2.9

 Attachments: booleansetperf.txt, DisjunctionDISI.java, 
 DisjunctionDISI.patch, DisjunctionDISI.patch, 
 LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, 
 OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, 
 TestIteratorPerf.java




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2009-01-12 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662993#action_12662993
 ] 

Paul Elschot commented on LUCENE-1345:
--

This:
bq. boolQuery.add(new TermQuery(..), SHOULD, NO_SCORE)
can be done (with the patch here applied) by:

boolQuery.add(new QueryWrapperFilter(new TermQuery(..), SHOULD) .

I'll post a working version of the patch within a few days. It's better to 
discuss on working code than on ideas only.

 Allow Filter as clause to BooleanQuery
 --

 Key: LUCENE-1345
 URL: https://issues.apache.org/jira/browse/LUCENE-1345
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Paul Elschot
Priority: Minor
 Fix For: 2.9

 Attachments: booleansetperf.txt, DisjunctionDISI.java, 
 DisjunctionDISI.patch, DisjunctionDISI.patch, 
 LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, 
 OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, 
 TestIteratorPerf.java




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2009-01-12 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662993#action_12662993
 ] 

paul.elsc...@xs4all.nl edited comment on LUCENE-1345 at 1/12/09 8:38 AM:
---

This:
bq. boolQuery.add(new TermQuery(..), SHOULD, NO_SCORE)
can be done (with the patch here applied) by:

boolQuery.add(new QueryWrapperFilter(new TermQuery(..), MUST) .

(SHOULD cannot be used for filters as clauses).

I'll post a working version of the patch within a few days. It's better to 
discuss on working code than on ideas only.

  was (Author: paul.elsc...@xs4all.nl):
This:
bq. boolQuery.add(new TermQuery(..), SHOULD, NO_SCORE)
can be done (with the patch here applied) by:

boolQuery.add(new QueryWrapperFilter(new TermQuery(..), SHOULD) .

I'll post a working version of the patch within a few days. It's better to 
discuss on working code than on ideas only.
  
 Allow Filter as clause to BooleanQuery
 --

 Key: LUCENE-1345
 URL: https://issues.apache.org/jira/browse/LUCENE-1345
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Paul Elschot
Priority: Minor
 Fix For: 2.9

 Attachments: booleansetperf.txt, DisjunctionDISI.java, 
 DisjunctionDISI.patch, DisjunctionDISI.patch, 
 LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, 
 OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, 
 TestIteratorPerf.java




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2009-01-12 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12662998#action_12662998
 ] 

Marvin Humphrey commented on LUCENE-1345:
-

 (SHOULD cannot be used for filters as clauses).

It doesn't have to be that way.  In KS, QueryFilter is a Query, which you can 
add as a clause to an ORQuery or a RequiredOptionalQuery.  Docs which match 
only the QueryFilter are fed to the HitCollector with a score of 0.0.

 Allow Filter as clause to BooleanQuery
 --

 Key: LUCENE-1345
 URL: https://issues.apache.org/jira/browse/LUCENE-1345
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Paul Elschot
Priority: Minor
 Fix For: 2.9

 Attachments: booleansetperf.txt, DisjunctionDISI.java, 
 DisjunctionDISI.patch, DisjunctionDISI.patch, 
 LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, 
 OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, 
 TestIteratorPerf.java




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2009-01-12 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663021#action_12663021
 ] 

Marvin Humphrey commented on LUCENE-1345:
-

Uwe Schindler:

 Maybe I should create an new JIRA issue out of my suggestion to merge 
 Filters and Queries? In my opinion, this is something nice to have in 3.0.

I agree with this tack, having taken it in KS.  However, I don't think we have
consensus as far as the best approach yet, so perhaps it would be beneficial
to hash things out on the mailing list first.

 Allow Filter as clause to BooleanQuery
 --

 Key: LUCENE-1345
 URL: https://issues.apache.org/jira/browse/LUCENE-1345
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Paul Elschot
Priority: Minor
 Fix For: 2.9

 Attachments: booleansetperf.txt, DisjunctionDISI.java, 
 DisjunctionDISI.patch, DisjunctionDISI.patch, 
 LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, 
 OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, 
 TestIteratorPerf.java




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1314) IndexReader.clone

2009-01-12 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663023#action_12663023
 ] 

Jason Rutherglen commented on LUCENE-1314:
--

Fixing the norms bytes loading is good, it seemed incorrect but I
didn't want to mess with it as I didn't fully understand it. 

I executed TestIndexReaderReopen 7 times and did not see the error.

 IndexReader.clone
 -

 Key: LUCENE-1314
 URL: https://issues.apache.org/jira/browse/LUCENE-1314
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.3.1
Reporter: Jason Rutherglen
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, 
 LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, 
 LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, 
 lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, 
 lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, 
 lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch


 Based on discussion 
 http://www.nabble.com/IndexReader.reopen-issue-td18070256.html.  The problem 
 is reopen returns the same reader if there are no changes, so if docs are 
 deleted from the new reader, they are also reflected in the previous reader 
 which is not always desired behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Realtime Search

2009-01-12 Thread Jason Rutherglen
Patch #2: Implement a realtime ram index class I think this one is
optional, or, rather an optimazation that we can swap in later
if/when necessary? Ie for starters little segments are written into
the main Directory.

John, Zoie could be of use for this patch. In addition, we may want to
implement flushing the IW ram buffer to a RAMDir for reading as M.M.
suggested.

First though the IW - IR integration LUCENE-1516 needs to be
implemented otherwise it's not possible to properly execute updates
in realtime.


On Fri, Jan 9, 2009 at 5:39 AM, Michael McCandless 
luc...@mikemccandless.com wrote:


 Jason Rutherglen wrote:

  Patch #1: Expose an IndexWriter.getReader method that returns the current
 reader and shares the write lock


 I tentatively like this approach so far...

 That reader is opened using IndexWriter's SegmentInfos instance, so it
 can read segments  deletions that have been flushed but not
 committed.  It's allowed to do its own deletions  norms updating.
 When reopen() is called, it grabs the writers SegmentInfos again.

  Patch #2: Implement a realtime ram index class


 I think this one is optional, or, rather an optimazation that we can
 swap in later if/when necessary?  Ie for starters little segments are
 written into the main Directory.

  Patch #3: Implement realtime transactions in IndexWriter or in a subclass
 of IndexWriter by implementing a createTransaction method that generates a
 realtime Transaction object.  When the transaction is flushed, the
 transaction index modifications are available via the getReader method of
 IndexWriter


 Can't this be layered on top?

 Or... are you looking to add support for multiple transactions in
 flight at once on IndexWriter?

 Mike


 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org




Re: Realtime Search

2009-01-12 Thread Jason Rutherglen
Grant,

Do you have a proposal in mind?  It would help to suggest something like
some classes and methods to help understand an alternative to what is being
discussed.

-J

On Fri, Jan 9, 2009 at 12:05 PM, Grant Ingersoll gsing...@apache.orgwrote:

 I realize we aren't adding read functionality to the Writer, but it would
 be coupling the Writer to the Reader nonetheless.  I understand it is
 brainstorming (like I said, not trying to distract from the discussion),
 just saying that if the Reader and the Writer both need access to the
 underlying data structures, then we should refactor to make that possible,
 not just glom the Reader onto the Writer.  I suspect if that is done,
 anyway, that it may make the bigger picture a bit clearer, too.


 On Jan 9, 2009, at 2:53 PM, Michael McCandless wrote:


 Grant Ingersoll wrote:

  We've spent a lot of time up until now getting write functionality out of
 the Reader, and now we are going to add read functionality into the Writer?


 Well... we're not really adding read functionality into IW; instead,
 we are asking IW to open the reader for us, except the reader is
 provided the SegmentInfos it should use from IW (instead of trying to
 find the latest segments_N file in the Directory).

 Ie, what IW.getReader returns is an otherwise normal
 MultiSegmentReader.

 The goal is to allow an IndexReader to access segments flushed but
 not yet committed by IW.  These segments are normally private to IW,
 in memory in its SegmentInfos instance.

 And this is all just thinking-out-loud-brainstorming.  There are still
 many
 details to work through...

 Mike


 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org




 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org




[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2009-01-12 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663055#action_12663055
 ] 

Doug Cutting commented on LUCENE-1345:
--

Uwe Maybe I should create an new JIRA issue out of my suggestion to merge 
Filters and Queries?

+1 to creating a new issue and +1 to the idea.

 Allow Filter as clause to BooleanQuery
 --

 Key: LUCENE-1345
 URL: https://issues.apache.org/jira/browse/LUCENE-1345
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Paul Elschot
Priority: Minor
 Fix For: 2.9

 Attachments: booleansetperf.txt, DisjunctionDISI.java, 
 DisjunctionDISI.patch, DisjunctionDISI.patch, 
 LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, 
 OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, 
 TestIteratorPerf.java




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1518) Merge Query and Filter classes

2009-01-12 Thread Uwe Schindler (JIRA)
Merge Query and Filter classes
--

 Key: LUCENE-1518
 URL: https://issues.apache.org/jira/browse/LUCENE-1518
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.4
Reporter: Uwe Schindler


This issue presents a patch, that merges Queries and Filters in a way, that the 
new Filter class extends Query. This would make it possible, to use every 
filter as a query.

The new abstract filter class would contain all methods of ConstantScoreQuery, 
deprecate ConstantScoreQuery. If somebody implements the Filter's 
getDocIdSet()/bits() methods he has nothing more to do, he could just use the 
filter as a normal query.

I do not want to completely convert Filters to ConstantScoreQueries. The idea 
is to combine Queries and Filters in such a way, that every Filter can 
automatically be used at all places where a Query can be used (e.g. also alone 
a search query without any other constraint). For that, the abstract Query 
methods must be implemented and return a default weight for Filters which is 
the current ConstantScore Logic. If the filter is used as a real filter (where 
the API wants a Filter), the getDocIdSet part could be directly used, the 
weight is useless (as it is currently, too). The constant score default 
implementation is only used when the Filter is used as a Query (e.g. as direct 
parameter to Searcher.search()). For the special case of BooleanQueries 
combining Filters and Queries the idea is, to optimize the BooleanQuery logic 
in such a way, that it detects if a BooleanClause is a Filter (using 
instanceof) and then directly uses the Filter API and not take the burden of 
the ConstantScoreQuery (see LUCENE-1345).

Here some ideas how to implement Searcher.search() with Query and Filter:
- User runs Searcher.search() using a Filter as the only parameter. As every 
Filter is also a ConstantScoreQuery, the query can be executed and returns 
score 1.0 for all matching documents.
- User runs Searcher.search() using a Query as the only parameter: No change, 
all is the same as before
- User runs Searcher.search() using a BooleanQuery as parameter: If the 
BooleanQuery does not contain a Query that is subclass of Filter (the new 
Filter) everything as usual. If the BooleanQuery only contains exactly one 
Filter and nothing else the Filter is used as a constant score query. If 
BooleanQuery contains clauses with Queries and Filters the new algorithm could 
be used: The queries are executed and the results filtered with the filters.

For the user this has the main advantage: That he can construct his query using 
a simplified API without thinking about Filters oder Queries, you can just 
combine clauses together. The scorer/weight logic then identifies the cases to 
use the filter or the query weight API. Just like the query optimizer of a RDB.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1518) Merge Query and Filter classes

2009-01-12 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1518:
--

Attachment: LUCENE-1518.patch

This is the patch.

Most tests pass. Problems are only in the tests that check the explanations 
(TestSimpleExplanations) when wrapping the deprecated ConstantScoreQuery. For 
that the test must be rewritten to directly get the explanation from the Filter 
class (that is now a query). ConstantScoreQuery is just a no-op that rewrites 
to the wrapped Filter itsself but returns no weight and no explanation.

Some small problems are the handling of toString()/toString(fieldname). The 
abstract Filter class has no toString(fieldname), but Query has one (but 
abstract). So there must be a default or Field implementations must be extended 
to provide one (bc break). The patch uses some bad workaround for that, maybe 
somebody has a better idea how to handle this.

 Merge Query and Filter classes
 --

 Key: LUCENE-1518
 URL: https://issues.apache.org/jira/browse/LUCENE-1518
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.4
Reporter: Uwe Schindler
 Attachments: LUCENE-1518.patch


 This issue presents a patch, that merges Queries and Filters in a way, that 
 the new Filter class extends Query. This would make it possible, to use every 
 filter as a query.
 The new abstract filter class would contain all methods of 
 ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the 
 Filter's getDocIdSet()/bits() methods he has nothing more to do, he could 
 just use the filter as a normal query.
 I do not want to completely convert Filters to ConstantScoreQueries. The idea 
 is to combine Queries and Filters in such a way, that every Filter can 
 automatically be used at all places where a Query can be used (e.g. also 
 alone a search query without any other constraint). For that, the abstract 
 Query methods must be implemented and return a default weight for Filters 
 which is the current ConstantScore Logic. If the filter is used as a real 
 filter (where the API wants a Filter), the getDocIdSet part could be directly 
 used, the weight is useless (as it is currently, too). The constant score 
 default implementation is only used when the Filter is used as a Query (e.g. 
 as direct parameter to Searcher.search()). For the special case of 
 BooleanQueries combining Filters and Queries the idea is, to optimize the 
 BooleanQuery logic in such a way, that it detects if a BooleanClause is a 
 Filter (using instanceof) and then directly uses the Filter API and not take 
 the burden of the ConstantScoreQuery (see LUCENE-1345).
 Here some ideas how to implement Searcher.search() with Query and Filter:
 - User runs Searcher.search() using a Filter as the only parameter. As every 
 Filter is also a ConstantScoreQuery, the query can be executed and returns 
 score 1.0 for all matching documents.
 - User runs Searcher.search() using a Query as the only parameter: No change, 
 all is the same as before
 - User runs Searcher.search() using a BooleanQuery as parameter: If the 
 BooleanQuery does not contain a Query that is subclass of Filter (the new 
 Filter) everything as usual. If the BooleanQuery only contains exactly one 
 Filter and nothing else the Filter is used as a constant score query. If 
 BooleanQuery contains clauses with Queries and Filters the new algorithm 
 could be used: The queries are executed and the results filtered with the 
 filters.
 For the user this has the main advantage: That he can construct his query 
 using a simplified API without thinking about Filters oder Queries, you can 
 just combine clauses together. The scorer/weight logic then identifies the 
 cases to use the filter or the query weight API. Just like the query 
 optimizer of a RDB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

2009-01-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663069#action_12663069
 ] 

Uwe Schindler commented on LUCENE-1518:
---

Further patches must now remove the deprecated ConstantScoreQuery from the core 
and contrib classes (RangeQuery gets identical to RangeFilter and so on).

The rewrite method (inherited from the Filter API) may then be changed in 
RangeQuery to return a BooleanQuery when useConstantScoreRewrite==false. In 
this case the RangeFilter gets the same as the Query and when used as Query, 
can also rewrite to TermQueries. But as I know, this is to be removed in Lucene 
3.0. In this case every RangeFilter and the others can simply be constant score 
or filters.

 Merge Query and Filter classes
 --

 Key: LUCENE-1518
 URL: https://issues.apache.org/jira/browse/LUCENE-1518
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.4
Reporter: Uwe Schindler
 Attachments: LUCENE-1518.patch


 This issue presents a patch, that merges Queries and Filters in a way, that 
 the new Filter class extends Query. This would make it possible, to use every 
 filter as a query.
 The new abstract filter class would contain all methods of 
 ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the 
 Filter's getDocIdSet()/bits() methods he has nothing more to do, he could 
 just use the filter as a normal query.
 I do not want to completely convert Filters to ConstantScoreQueries. The idea 
 is to combine Queries and Filters in such a way, that every Filter can 
 automatically be used at all places where a Query can be used (e.g. also 
 alone a search query without any other constraint). For that, the abstract 
 Query methods must be implemented and return a default weight for Filters 
 which is the current ConstantScore Logic. If the filter is used as a real 
 filter (where the API wants a Filter), the getDocIdSet part could be directly 
 used, the weight is useless (as it is currently, too). The constant score 
 default implementation is only used when the Filter is used as a Query (e.g. 
 as direct parameter to Searcher.search()). For the special case of 
 BooleanQueries combining Filters and Queries the idea is, to optimize the 
 BooleanQuery logic in such a way, that it detects if a BooleanClause is a 
 Filter (using instanceof) and then directly uses the Filter API and not take 
 the burden of the ConstantScoreQuery (see LUCENE-1345).
 Here some ideas how to implement Searcher.search() with Query and Filter:
 - User runs Searcher.search() using a Filter as the only parameter. As every 
 Filter is also a ConstantScoreQuery, the query can be executed and returns 
 score 1.0 for all matching documents.
 - User runs Searcher.search() using a Query as the only parameter: No change, 
 all is the same as before
 - User runs Searcher.search() using a BooleanQuery as parameter: If the 
 BooleanQuery does not contain a Query that is subclass of Filter (the new 
 Filter) everything as usual. If the BooleanQuery only contains exactly one 
 Filter and nothing else the Filter is used as a constant score query. If 
 BooleanQuery contains clauses with Queries and Filters the new algorithm 
 could be used: The queries are executed and the results filtered with the 
 filters.
 For the user this has the main advantage: That he can construct his query 
 using a simplified API without thinking about Filters oder Queries, you can 
 just combine clauses together. The scorer/weight logic then identifies the 
 cases to use the filter or the query weight API. Just like the query 
 optimizer of a RDB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2009-01-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663070#action_12663070
 ] 

Uwe Schindler commented on LUCENE-1345:
---

I created and linked a new issue LUCENE-1518, that handles the merge 
suggestion. I also included all relevant comments from me about this.

 Allow Filter as clause to BooleanQuery
 --

 Key: LUCENE-1345
 URL: https://issues.apache.org/jira/browse/LUCENE-1345
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Paul Elschot
Priority: Minor
 Fix For: 2.9

 Attachments: booleansetperf.txt, DisjunctionDISI.java, 
 DisjunctionDISI.patch, DisjunctionDISI.patch, 
 LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, 
 OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, 
 TestIteratorPerf.java




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Realtime Search

2009-01-12 Thread Grant Ingersoll
Just thinking out loud...  haven't looked at your patch yet (one of  
these days I will be back up for air)


My initial thought is that you would have a factory that produced both  
the Reader and the Writer as a pair, or was at least aware of what to  
go get from the Writer


Something like:

class IndexFactory{
IndexWriter getWriter()

IndexReader getReader()

//Not sure if this is needed yet, but
IndexReader getReader(IndexWriter)
}

The factory (or whatever you want to call it) is responsible for  
making sure the Writer and Reader have the pieces they need, i.e. the  
SegmentInfos.


The first getReader will get you the plain old Reader that everyone  
knows and loves today (assuming there is a benefit to keeping it  
around), the second one knows what to get off the Writer to create the  
appropriate Reader.


It's nothing particularly hard to implement over what you are  
proposing, I don't think.  Just trying to keep the Reader out of the  
Writer from an API cleanliness standpoint.


-Grant


On Jan 12, 2009, at 12:55 PM, Jason Rutherglen wrote:


Grant,

Do you have a proposal in mind?  It would help to suggest something  
like some classes and methods to help understand an alternative to  
what is being discussed.


-J

On Fri, Jan 9, 2009 at 12:05 PM, Grant Ingersoll  
gsing...@apache.org wrote:
I realize we aren't adding read functionality to the Writer, but it  
would be coupling the Writer to the Reader nonetheless.  I  
understand it is brainstorming (like I said, not trying to distract  
from the discussion), just saying that if the Reader and the Writer  
both need access to the underlying data structures, then we should  
refactor to make that possible, not just glom the Reader onto the  
Writer.  I suspect if that is done, anyway, that it may make the  
bigger picture a bit clearer, too.



On Jan 9, 2009, at 2:53 PM, Michael McCandless wrote:


Grant Ingersoll wrote:

We've spent a lot of time up until now getting write functionality  
out of the Reader, and now we are going to add read functionality  
into the Writer?


Well... we're not really adding read functionality into IW; instead,
we are asking IW to open the reader for us, except the reader is
provided the SegmentInfos it should use from IW (instead of trying to
find the latest segments_N file in the Directory).

Ie, what IW.getReader returns is an otherwise normal
MultiSegmentReader.

The goal is to allow an IndexReader to access segments flushed but
not yet committed by IW.  These segments are normally private to IW,
in memory in its SegmentInfos instance.

And this is all just thinking-out-loud-brainstorming.  There are  
still many

details to work through...

Mike


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ












[jira] Commented: (LUCENE-1345) Allow Filter as clause to BooleanQuery

2009-01-12 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663075#action_12663075
 ] 

Paul Elschot commented on LUCENE-1345:
--

Ok, I'll wait for  LUCENE-1518.

 Allow Filter as clause to BooleanQuery
 --

 Key: LUCENE-1345
 URL: https://issues.apache.org/jira/browse/LUCENE-1345
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Paul Elschot
Priority: Minor
 Fix For: 2.9

 Attachments: booleansetperf.txt, DisjunctionDISI.java, 
 DisjunctionDISI.patch, DisjunctionDISI.patch, 
 LUCENE-1345-Filter+Query-merge.patch, LUCENE-1345.patch, LUCENE-1345.patch, 
 OpenBitSetIteratorExperiment.java, TestIteratorPerf.java, 
 TestIteratorPerf.java




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

2009-01-12 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663077#action_12663077
 ] 

Marvin Humphrey commented on LUCENE-1518:
-

 the query can be executed and returns score 1.0 for all matching documents

Why 1.0 and not 0.0?  I chose to use 0.0 in KS because then a Filter would 
effectively perform only binary filtering and never affect scores.

Perhaps your envisioned query optimization algo ensures that the Filter will 
only serve as a DocIDSetIterator unless it's used as a top-level Query?  Can we 
agree that a Filter used as a sub-clause in a complex query should not 
contribute to the aggregate score?

 Merge Query and Filter classes
 --

 Key: LUCENE-1518
 URL: https://issues.apache.org/jira/browse/LUCENE-1518
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.4
Reporter: Uwe Schindler
 Attachments: LUCENE-1518.patch


 This issue presents a patch, that merges Queries and Filters in a way, that 
 the new Filter class extends Query. This would make it possible, to use every 
 filter as a query.
 The new abstract filter class would contain all methods of 
 ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the 
 Filter's getDocIdSet()/bits() methods he has nothing more to do, he could 
 just use the filter as a normal query.
 I do not want to completely convert Filters to ConstantScoreQueries. The idea 
 is to combine Queries and Filters in such a way, that every Filter can 
 automatically be used at all places where a Query can be used (e.g. also 
 alone a search query without any other constraint). For that, the abstract 
 Query methods must be implemented and return a default weight for Filters 
 which is the current ConstantScore Logic. If the filter is used as a real 
 filter (where the API wants a Filter), the getDocIdSet part could be directly 
 used, the weight is useless (as it is currently, too). The constant score 
 default implementation is only used when the Filter is used as a Query (e.g. 
 as direct parameter to Searcher.search()). For the special case of 
 BooleanQueries combining Filters and Queries the idea is, to optimize the 
 BooleanQuery logic in such a way, that it detects if a BooleanClause is a 
 Filter (using instanceof) and then directly uses the Filter API and not take 
 the burden of the ConstantScoreQuery (see LUCENE-1345).
 Here some ideas how to implement Searcher.search() with Query and Filter:
 - User runs Searcher.search() using a Filter as the only parameter. As every 
 Filter is also a ConstantScoreQuery, the query can be executed and returns 
 score 1.0 for all matching documents.
 - User runs Searcher.search() using a Query as the only parameter: No change, 
 all is the same as before
 - User runs Searcher.search() using a BooleanQuery as parameter: If the 
 BooleanQuery does not contain a Query that is subclass of Filter (the new 
 Filter) everything as usual. If the BooleanQuery only contains exactly one 
 Filter and nothing else the Filter is used as a constant score query. If 
 BooleanQuery contains clauses with Queries and Filters the new algorithm 
 could be used: The queries are executed and the results filtered with the 
 filters.
 For the user this has the main advantage: That he can construct his query 
 using a simplified API without thinking about Filters oder Queries, you can 
 just combine clauses together. The scorer/weight logic then identifies the 
 cases to use the filter or the query weight API. Just like the query 
 optimizer of a RDB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1518) Merge Query and Filter classes

2009-01-12 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663077#action_12663077
 ] 

creamyg edited comment on LUCENE-1518 at 1/12/09 12:40 PM:
---

 the query can be executed and returns score 1.0 for all matching documents

Why 1.0 and not 0.0?  I chose to use 0.0 in KS because then a Filter would 
effectively perform only binary filtering and never affect scores.

Perhaps your envisioned query optimization algo ensures that the Filter will 
only serve as a DocIDSetIterator unless it's used as a top-level Query?  Can we 
agree that a Filter used as a sub-clause in a complex query should not 
contribute to the aggregate score?

UPDATE: A closer reading reveals that the original issue text contains the 
answer to this question (we agree), so the only remaining question is whether a 
top level query would return hits with scores of 0.0 or 1.0. Which is probably 
a bikeshed painting issue.

  was (Author: creamyg):
 the query can be executed and returns score 1.0 for all matching documents

Why 1.0 and not 0.0?  I chose to use 0.0 in KS because then a Filter would 
effectively perform only binary filtering and never affect scores.

Perhaps your envisioned query optimization algo ensures that the Filter will 
only serve as a DocIDSetIterator unless it's used as a top-level Query?  Can we 
agree that a Filter used as a sub-clause in a complex query should not 
contribute to the aggregate score?
  
 Merge Query and Filter classes
 --

 Key: LUCENE-1518
 URL: https://issues.apache.org/jira/browse/LUCENE-1518
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.4
Reporter: Uwe Schindler
 Attachments: LUCENE-1518.patch


 This issue presents a patch, that merges Queries and Filters in a way, that 
 the new Filter class extends Query. This would make it possible, to use every 
 filter as a query.
 The new abstract filter class would contain all methods of 
 ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the 
 Filter's getDocIdSet()/bits() methods he has nothing more to do, he could 
 just use the filter as a normal query.
 I do not want to completely convert Filters to ConstantScoreQueries. The idea 
 is to combine Queries and Filters in such a way, that every Filter can 
 automatically be used at all places where a Query can be used (e.g. also 
 alone a search query without any other constraint). For that, the abstract 
 Query methods must be implemented and return a default weight for Filters 
 which is the current ConstantScore Logic. If the filter is used as a real 
 filter (where the API wants a Filter), the getDocIdSet part could be directly 
 used, the weight is useless (as it is currently, too). The constant score 
 default implementation is only used when the Filter is used as a Query (e.g. 
 as direct parameter to Searcher.search()). For the special case of 
 BooleanQueries combining Filters and Queries the idea is, to optimize the 
 BooleanQuery logic in such a way, that it detects if a BooleanClause is a 
 Filter (using instanceof) and then directly uses the Filter API and not take 
 the burden of the ConstantScoreQuery (see LUCENE-1345).
 Here some ideas how to implement Searcher.search() with Query and Filter:
 - User runs Searcher.search() using a Filter as the only parameter. As every 
 Filter is also a ConstantScoreQuery, the query can be executed and returns 
 score 1.0 for all matching documents.
 - User runs Searcher.search() using a Query as the only parameter: No change, 
 all is the same as before
 - User runs Searcher.search() using a BooleanQuery as parameter: If the 
 BooleanQuery does not contain a Query that is subclass of Filter (the new 
 Filter) everything as usual. If the BooleanQuery only contains exactly one 
 Filter and nothing else the Filter is used as a constant score query. If 
 BooleanQuery contains clauses with Queries and Filters the new algorithm 
 could be used: The queries are executed and the results filtered with the 
 filters.
 For the user this has the main advantage: That he can construct his query 
 using a simplified API without thinking about Filters oder Queries, you can 
 just combine clauses together. The scorer/weight logic then identifies the 
 cases to use the filter or the query weight API. Just like the query 
 optimizer of a RDB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

2009-01-12 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663090#action_12663090
 ] 

Doug Cutting commented on LUCENE-1518:
--

 Why 1.0 and not 0.0?

0.0 does seem more appropriate, since scores are typically added, not 
multiplied.  There used to be places that filtered anything with a 0.0 score 
however.  Are any of those left?


 Merge Query and Filter classes
 --

 Key: LUCENE-1518
 URL: https://issues.apache.org/jira/browse/LUCENE-1518
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.4
Reporter: Uwe Schindler
 Attachments: LUCENE-1518.patch


 This issue presents a patch, that merges Queries and Filters in a way, that 
 the new Filter class extends Query. This would make it possible, to use every 
 filter as a query.
 The new abstract filter class would contain all methods of 
 ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the 
 Filter's getDocIdSet()/bits() methods he has nothing more to do, he could 
 just use the filter as a normal query.
 I do not want to completely convert Filters to ConstantScoreQueries. The idea 
 is to combine Queries and Filters in such a way, that every Filter can 
 automatically be used at all places where a Query can be used (e.g. also 
 alone a search query without any other constraint). For that, the abstract 
 Query methods must be implemented and return a default weight for Filters 
 which is the current ConstantScore Logic. If the filter is used as a real 
 filter (where the API wants a Filter), the getDocIdSet part could be directly 
 used, the weight is useless (as it is currently, too). The constant score 
 default implementation is only used when the Filter is used as a Query (e.g. 
 as direct parameter to Searcher.search()). For the special case of 
 BooleanQueries combining Filters and Queries the idea is, to optimize the 
 BooleanQuery logic in such a way, that it detects if a BooleanClause is a 
 Filter (using instanceof) and then directly uses the Filter API and not take 
 the burden of the ConstantScoreQuery (see LUCENE-1345).
 Here some ideas how to implement Searcher.search() with Query and Filter:
 - User runs Searcher.search() using a Filter as the only parameter. As every 
 Filter is also a ConstantScoreQuery, the query can be executed and returns 
 score 1.0 for all matching documents.
 - User runs Searcher.search() using a Query as the only parameter: No change, 
 all is the same as before
 - User runs Searcher.search() using a BooleanQuery as parameter: If the 
 BooleanQuery does not contain a Query that is subclass of Filter (the new 
 Filter) everything as usual. If the BooleanQuery only contains exactly one 
 Filter and nothing else the Filter is used as a constant score query. If 
 BooleanQuery contains clauses with Queries and Filters the new algorithm 
 could be used: The queries are executed and the results filtered with the 
 filters.
 For the user this has the main advantage: That he can construct his query 
 using a simplified API without thinking about Filters oder Queries, you can 
 just combine clauses together. The scorer/weight logic then identifies the 
 cases to use the filter or the query weight API. Just like the query 
 optimizer of a RDB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

2009-01-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663091#action_12663091
 ] 

Uwe Schindler commented on LUCENE-1518:
---

bq. Why 1.0 and not 0.0? I chose to use 0.0 in KS because then a Filter would 
effectively perform only binary filtering and never affect scores.

You are right. I was coming from the ConstantScoreQuery that has a score of 1.0 
in the result.

{quote}
Perhaps your envisioned query optimization algo ensures that the Filter will 
only serve as a DocIDSetIterator unless it's used as a top-level Query? Can we 
agree that a Filter used as a sub-clause in a complex query should not 
contribute to the aggregate score?

UPDATE: A closer reading reveals that the original issue text contains the 
answer to this question (we agree), so the only remaining question is whether a 
top level query would return hits with scores of 0.0 or 1.0. Which is probably 
a bikeshed painting issue.
{quote}

This is exactly what I was thinking about. But it is not only a top level 
query. In the case of a BooleanQuery containing only one clause that is a 
filter, the constant score implementation must also used (as a Filter alone is 
useless).

ConstantScoreQuery returns 1.0 if executed alone or alone in a BooleanQuery.

For backwards compatibility, I think we should just use the following logic:
A filter is a query, but can also be used as a query (if alone). The default 
implementation for this is a constant score query as currently when wrapping a 
filter with ConstantScoreQuery. In all other cases (combined with other boolean 
clauses and not alone), the score calculation is removed and you can say 0.0f 
(if additive).

 Merge Query and Filter classes
 --

 Key: LUCENE-1518
 URL: https://issues.apache.org/jira/browse/LUCENE-1518
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.4
Reporter: Uwe Schindler
 Attachments: LUCENE-1518.patch


 This issue presents a patch, that merges Queries and Filters in a way, that 
 the new Filter class extends Query. This would make it possible, to use every 
 filter as a query.
 The new abstract filter class would contain all methods of 
 ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the 
 Filter's getDocIdSet()/bits() methods he has nothing more to do, he could 
 just use the filter as a normal query.
 I do not want to completely convert Filters to ConstantScoreQueries. The idea 
 is to combine Queries and Filters in such a way, that every Filter can 
 automatically be used at all places where a Query can be used (e.g. also 
 alone a search query without any other constraint). For that, the abstract 
 Query methods must be implemented and return a default weight for Filters 
 which is the current ConstantScore Logic. If the filter is used as a real 
 filter (where the API wants a Filter), the getDocIdSet part could be directly 
 used, the weight is useless (as it is currently, too). The constant score 
 default implementation is only used when the Filter is used as a Query (e.g. 
 as direct parameter to Searcher.search()). For the special case of 
 BooleanQueries combining Filters and Queries the idea is, to optimize the 
 BooleanQuery logic in such a way, that it detects if a BooleanClause is a 
 Filter (using instanceof) and then directly uses the Filter API and not take 
 the burden of the ConstantScoreQuery (see LUCENE-1345).
 Here some ideas how to implement Searcher.search() with Query and Filter:
 - User runs Searcher.search() using a Filter as the only parameter. As every 
 Filter is also a ConstantScoreQuery, the query can be executed and returns 
 score 1.0 for all matching documents.
 - User runs Searcher.search() using a Query as the only parameter: No change, 
 all is the same as before
 - User runs Searcher.search() using a BooleanQuery as parameter: If the 
 BooleanQuery does not contain a Query that is subclass of Filter (the new 
 Filter) everything as usual. If the BooleanQuery only contains exactly one 
 Filter and nothing else the Filter is used as a constant score query. If 
 BooleanQuery contains clauses with Queries and Filters the new algorithm 
 could be used: The queries are executed and the results filtered with the 
 filters.
 For the user this has the main advantage: That he can construct his query 
 using a simplified API without thinking about Filters oder Queries, you can 
 just combine clauses together. The scorer/weight logic then identifies the 
 cases to use the filter or the query weight API. Just like the query 
 optimizer of a RDB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To 

[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

2009-01-12 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663120#action_12663120
 ] 

Eks Dev commented on LUCENE-1518:
-

nice, 
you did it top down (api), Paul takes it bottom up (speed). 

this makes some really crazy things possible, e.g. implementing normal 
TermQuery as a DirectFilter and when the optimization of the BooleanQuery 
gets done (no Score calculation, direct usage of DocIdSetIterators) you can 
speed up some queries containing TermQuery  without really instantiating 
Filter. Of course only for cases where tf/idf/norm can be ignored. 

Kind of middle-ground between Filter and full ranked TermQuery (better said any 
BooleanQuery!), Faster than ranked case due to the switched off score 
calculation and more comfortable than Filter usage, no instantiation of 
DocIdSet-s... 

very nice indeed, smooth mix between ranked and pure boolean model with both 
benefits.  

 Merge Query and Filter classes
 --

 Key: LUCENE-1518
 URL: https://issues.apache.org/jira/browse/LUCENE-1518
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.4
Reporter: Uwe Schindler
 Attachments: LUCENE-1518.patch


 This issue presents a patch, that merges Queries and Filters in a way, that 
 the new Filter class extends Query. This would make it possible, to use every 
 filter as a query.
 The new abstract filter class would contain all methods of 
 ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the 
 Filter's getDocIdSet()/bits() methods he has nothing more to do, he could 
 just use the filter as a normal query.
 I do not want to completely convert Filters to ConstantScoreQueries. The idea 
 is to combine Queries and Filters in such a way, that every Filter can 
 automatically be used at all places where a Query can be used (e.g. also 
 alone a search query without any other constraint). For that, the abstract 
 Query methods must be implemented and return a default weight for Filters 
 which is the current ConstantScore Logic. If the filter is used as a real 
 filter (where the API wants a Filter), the getDocIdSet part could be directly 
 used, the weight is useless (as it is currently, too). The constant score 
 default implementation is only used when the Filter is used as a Query (e.g. 
 as direct parameter to Searcher.search()). For the special case of 
 BooleanQueries combining Filters and Queries the idea is, to optimize the 
 BooleanQuery logic in such a way, that it detects if a BooleanClause is a 
 Filter (using instanceof) and then directly uses the Filter API and not take 
 the burden of the ConstantScoreQuery (see LUCENE-1345).
 Here some ideas how to implement Searcher.search() with Query and Filter:
 - User runs Searcher.search() using a Filter as the only parameter. As every 
 Filter is also a ConstantScoreQuery, the query can be executed and returns 
 score 1.0 for all matching documents.
 - User runs Searcher.search() using a Query as the only parameter: No change, 
 all is the same as before
 - User runs Searcher.search() using a BooleanQuery as parameter: If the 
 BooleanQuery does not contain a Query that is subclass of Filter (the new 
 Filter) everything as usual. If the BooleanQuery only contains exactly one 
 Filter and nothing else the Filter is used as a constant score query. If 
 BooleanQuery contains clauses with Queries and Filters the new algorithm 
 could be used: The queries are executed and the results filtered with the 
 filters.
 For the user this has the main advantage: That he can construct his query 
 using a simplified API without thinking about Filters oder Queries, you can 
 just combine clauses together. The scorer/weight logic then identifies the 
 cases to use the filter or the query weight API. Just like the query 
 optimizer of a RDB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

2009-01-12 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663131#action_12663131
 ] 

Paul Elschot commented on LUCENE-1518:
--

bq. There used to be places that filtered anything with a 0.0 score however. 
Are any of those left?
They should be gone by now, even from the javadocs.

 Merge Query and Filter classes
 --

 Key: LUCENE-1518
 URL: https://issues.apache.org/jira/browse/LUCENE-1518
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.4
Reporter: Uwe Schindler
 Attachments: LUCENE-1518.patch


 This issue presents a patch, that merges Queries and Filters in a way, that 
 the new Filter class extends Query. This would make it possible, to use every 
 filter as a query.
 The new abstract filter class would contain all methods of 
 ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the 
 Filter's getDocIdSet()/bits() methods he has nothing more to do, he could 
 just use the filter as a normal query.
 I do not want to completely convert Filters to ConstantScoreQueries. The idea 
 is to combine Queries and Filters in such a way, that every Filter can 
 automatically be used at all places where a Query can be used (e.g. also 
 alone a search query without any other constraint). For that, the abstract 
 Query methods must be implemented and return a default weight for Filters 
 which is the current ConstantScore Logic. If the filter is used as a real 
 filter (where the API wants a Filter), the getDocIdSet part could be directly 
 used, the weight is useless (as it is currently, too). The constant score 
 default implementation is only used when the Filter is used as a Query (e.g. 
 as direct parameter to Searcher.search()). For the special case of 
 BooleanQueries combining Filters and Queries the idea is, to optimize the 
 BooleanQuery logic in such a way, that it detects if a BooleanClause is a 
 Filter (using instanceof) and then directly uses the Filter API and not take 
 the burden of the ConstantScoreQuery (see LUCENE-1345).
 Here some ideas how to implement Searcher.search() with Query and Filter:
 - User runs Searcher.search() using a Filter as the only parameter. As every 
 Filter is also a ConstantScoreQuery, the query can be executed and returns 
 score 1.0 for all matching documents.
 - User runs Searcher.search() using a Query as the only parameter: No change, 
 all is the same as before
 - User runs Searcher.search() using a BooleanQuery as parameter: If the 
 BooleanQuery does not contain a Query that is subclass of Filter (the new 
 Filter) everything as usual. If the BooleanQuery only contains exactly one 
 Filter and nothing else the Filter is used as a constant score query. If 
 BooleanQuery contains clauses with Queries and Filters the new algorithm 
 could be used: The queries are executed and the results filtered with the 
 filters.
 For the user this has the main advantage: That he can construct his query 
 using a simplified API without thinking about Filters oder Queries, you can 
 just combine clauses together. The scorer/weight logic then identifies the 
 cases to use the filter or the query weight API. Just like the query 
 optimizer of a RDB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

2009-01-12 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663133#action_12663133
 ] 

Jason Rutherglen commented on LUCENE-1518:
--

Hopefully this will allow caching of individual term queries even if they are a 
part of a boolean query?

 Merge Query and Filter classes
 --

 Key: LUCENE-1518
 URL: https://issues.apache.org/jira/browse/LUCENE-1518
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.4
Reporter: Uwe Schindler
 Attachments: LUCENE-1518.patch


 This issue presents a patch, that merges Queries and Filters in a way, that 
 the new Filter class extends Query. This would make it possible, to use every 
 filter as a query.
 The new abstract filter class would contain all methods of 
 ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the 
 Filter's getDocIdSet()/bits() methods he has nothing more to do, he could 
 just use the filter as a normal query.
 I do not want to completely convert Filters to ConstantScoreQueries. The idea 
 is to combine Queries and Filters in such a way, that every Filter can 
 automatically be used at all places where a Query can be used (e.g. also 
 alone a search query without any other constraint). For that, the abstract 
 Query methods must be implemented and return a default weight for Filters 
 which is the current ConstantScore Logic. If the filter is used as a real 
 filter (where the API wants a Filter), the getDocIdSet part could be directly 
 used, the weight is useless (as it is currently, too). The constant score 
 default implementation is only used when the Filter is used as a Query (e.g. 
 as direct parameter to Searcher.search()). For the special case of 
 BooleanQueries combining Filters and Queries the idea is, to optimize the 
 BooleanQuery logic in such a way, that it detects if a BooleanClause is a 
 Filter (using instanceof) and then directly uses the Filter API and not take 
 the burden of the ConstantScoreQuery (see LUCENE-1345).
 Here some ideas how to implement Searcher.search() with Query and Filter:
 - User runs Searcher.search() using a Filter as the only parameter. As every 
 Filter is also a ConstantScoreQuery, the query can be executed and returns 
 score 1.0 for all matching documents.
 - User runs Searcher.search() using a Query as the only parameter: No change, 
 all is the same as before
 - User runs Searcher.search() using a BooleanQuery as parameter: If the 
 BooleanQuery does not contain a Query that is subclass of Filter (the new 
 Filter) everything as usual. If the BooleanQuery only contains exactly one 
 Filter and nothing else the Filter is used as a constant score query. If 
 BooleanQuery contains clauses with Queries and Filters the new algorithm 
 could be used: The queries are executed and the results filtered with the 
 filters.
 For the user this has the main advantage: That he can construct his query 
 using a simplified API without thinking about Filters oder Queries, you can 
 just combine clauses together. The scorer/weight logic then identifies the 
 cases to use the filter or the query weight API. Just like the query 
 optimizer of a RDB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

2009-01-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663134#action_12663134
 ] 

Uwe Schindler commented on LUCENE-1518:
---

In my opinion, both approches could be combined. I do not know how the scoring 
and the whole BooleanQery works, I did only the merging of the API. If we have 
consensus, that this may be good, I could remove the rest of deprecated 
ConstantScoreQuery. The merge then works without any problems and backwards 
compatible.

Then Paul could adapt his patch to not create new methods to add Filter clauses 
to boolean queries, but do just some difference like that:

if (clause.query instanceof Filter) { do only filter optimization } else { do 
conventional boolean scoring }
If he cannot do the optimization (because the Filter is alone in BooleanQuery) 
he could just fall back to the standard query logic (that uses implicit the 
current ConstantScoreQuery algorithm).

But before start implementing more for removing the deprecated class, I wanted 
to hear some more ideas, maybe something completely different (like every query 
can also automatically be a filter): Only one superclass Query no filter 
anymore, every query clause can implement a filter and/or a query with always a 
Fallback to the other side (if no filter implementation provided, a filter is 
provided by a algorithm like QueryFilter; if no weight/rewrite, constant score 
weight is provided).

Just ideas...

 Merge Query and Filter classes
 --

 Key: LUCENE-1518
 URL: https://issues.apache.org/jira/browse/LUCENE-1518
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.4
Reporter: Uwe Schindler
 Attachments: LUCENE-1518.patch


 This issue presents a patch, that merges Queries and Filters in a way, that 
 the new Filter class extends Query. This would make it possible, to use every 
 filter as a query.
 The new abstract filter class would contain all methods of 
 ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the 
 Filter's getDocIdSet()/bits() methods he has nothing more to do, he could 
 just use the filter as a normal query.
 I do not want to completely convert Filters to ConstantScoreQueries. The idea 
 is to combine Queries and Filters in such a way, that every Filter can 
 automatically be used at all places where a Query can be used (e.g. also 
 alone a search query without any other constraint). For that, the abstract 
 Query methods must be implemented and return a default weight for Filters 
 which is the current ConstantScore Logic. If the filter is used as a real 
 filter (where the API wants a Filter), the getDocIdSet part could be directly 
 used, the weight is useless (as it is currently, too). The constant score 
 default implementation is only used when the Filter is used as a Query (e.g. 
 as direct parameter to Searcher.search()). For the special case of 
 BooleanQueries combining Filters and Queries the idea is, to optimize the 
 BooleanQuery logic in such a way, that it detects if a BooleanClause is a 
 Filter (using instanceof) and then directly uses the Filter API and not take 
 the burden of the ConstantScoreQuery (see LUCENE-1345).
 Here some ideas how to implement Searcher.search() with Query and Filter:
 - User runs Searcher.search() using a Filter as the only parameter. As every 
 Filter is also a ConstantScoreQuery, the query can be executed and returns 
 score 1.0 for all matching documents.
 - User runs Searcher.search() using a Query as the only parameter: No change, 
 all is the same as before
 - User runs Searcher.search() using a BooleanQuery as parameter: If the 
 BooleanQuery does not contain a Query that is subclass of Filter (the new 
 Filter) everything as usual. If the BooleanQuery only contains exactly one 
 Filter and nothing else the Filter is used as a constant score query. If 
 BooleanQuery contains clauses with Queries and Filters the new algorithm 
 could be used: The queries are executed and the results filtered with the 
 filters.
 For the user this has the main advantage: That he can construct his query 
 using a simplified API without thinking about Filters oder Queries, you can 
 just combine clauses together. The scorer/weight logic then identifies the 
 cases to use the filter or the query weight API. Just like the query 
 optimizer of a RDB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1518) Merge Query and Filter classes

2009-01-12 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12663134#action_12663134
 ] 

thetaphi edited comment on LUCENE-1518 at 1/12/09 2:41 PM:


In my opinion, both approches could be combined. I do not know how the scoring 
and the whole BooleanQery works, I did only the merging of the API. If we have 
consensus, that this may be good, I could remove the rest of deprecated 
ConstantScoreQuery. The new code then works without any problems and backwards 
compatible as before, but with no optimization.

Then Paul could adapt his patch to not create new methods to add Filter clauses 
to BooleanQueries, but do just some difference in the whole BooleanQuery logic 
like that:

if (clause.query instanceof Filter) { do only filter optimization from Paul's 
patch } else { do conventional boolean scoring }

If he cannot do the optimization (because the Filter is alone in BooleanQuery) 
he could just fall back to the standard query logic (that uses implicit the 
current ConstantScoreQuery algorithm).

But before start implementing more for removing the deprecated class, I wanted 
to hear some more ideas, maybe something completely different (like every query 
can also automatically be a filter): Only one superclass Query no filter 
anymore, every query clause can implement a filter and/or a query with always a 
Fallback to the other side (if no filter implementation provided, a filter is 
provided by a algorithm like QueryFilter; if no weight/rewrite, constant score 
weight is provided).

Just ideas...

  was (Author: thetaphi):
In my opinion, both approches could be combined. I do not know how the 
scoring and the whole BooleanQery works, I did only the merging of the API. If 
we have consensus, that this may be good, I could remove the rest of deprecated 
ConstantScoreQuery. The merge then works without any problems and backwards 
compatible.

Then Paul could adapt his patch to not create new methods to add Filter clauses 
to boolean queries, but do just some difference like that:

if (clause.query instanceof Filter) { do only filter optimization } else { do 
conventional boolean scoring }
If he cannot do the optimization (because the Filter is alone in BooleanQuery) 
he could just fall back to the standard query logic (that uses implicit the 
current ConstantScoreQuery algorithm).

But before start implementing more for removing the deprecated class, I wanted 
to hear some more ideas, maybe something completely different (like every query 
can also automatically be a filter): Only one superclass Query no filter 
anymore, every query clause can implement a filter and/or a query with always a 
Fallback to the other side (if no filter implementation provided, a filter is 
provided by a algorithm like QueryFilter; if no weight/rewrite, constant score 
weight is provided).

Just ideas...
  
 Merge Query and Filter classes
 --

 Key: LUCENE-1518
 URL: https://issues.apache.org/jira/browse/LUCENE-1518
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.4
Reporter: Uwe Schindler
 Attachments: LUCENE-1518.patch


 This issue presents a patch, that merges Queries and Filters in a way, that 
 the new Filter class extends Query. This would make it possible, to use every 
 filter as a query.
 The new abstract filter class would contain all methods of 
 ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the 
 Filter's getDocIdSet()/bits() methods he has nothing more to do, he could 
 just use the filter as a normal query.
 I do not want to completely convert Filters to ConstantScoreQueries. The idea 
 is to combine Queries and Filters in such a way, that every Filter can 
 automatically be used at all places where a Query can be used (e.g. also 
 alone a search query without any other constraint). For that, the abstract 
 Query methods must be implemented and return a default weight for Filters 
 which is the current ConstantScore Logic. If the filter is used as a real 
 filter (where the API wants a Filter), the getDocIdSet part could be directly 
 used, the weight is useless (as it is currently, too). The constant score 
 default implementation is only used when the Filter is used as a Query (e.g. 
 as direct parameter to Searcher.search()). For the special case of 
 BooleanQueries combining Filters and Queries the idea is, to optimize the 
 BooleanQuery logic in such a way, that it detects if a BooleanClause is a 
 Filter (using instanceof) and then directly uses the Filter API and not take 
 the burden of the ConstantScoreQuery (see LUCENE-1345).
 Here some ideas how to implement Searcher.search() with Query and Filter:
 - User runs Searcher.search() using a Filter as the only parameter. As