[ 
https://issues.apache.org/jira/browse/LUCENE-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495118#comment-13495118
 ] 

David Smiley commented on LUCENE-4548:
--------------------------------------

Uwe: Thanks tremendously for helping me understand some of these things.

I never looked at FilteredQuery before, but now that I have I like it a whole 
lot.  I like that I can compose them and pick the "filter strategy".  That 
pretty much addresses one of my concerns on how to compose them given different 
algorithm, and the solution looks very well designed.

It's especially interesting to me that Filter logic isn't in IndexSearcher 
anymore.  Not that I knew before, but what this basically tells me is that 
Lucene internally just deals with Queries, not Filters.  That's fantastic.  
Based on your remarks, it seems there is a lot of cleanup potential.  There is 
a lot of stuff to confuse people, like me, and that was a big part of my 
concern.

RE OpenBitSet -- it's not deprecated nor marked as internal.  FastBitSet on the 
other hand is marked as internal.
                
> BooleanFilter should optionally pass down further restricted acceptDocs in 
> the MUST case (and acceptDocs in general)
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-4548
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4548
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Uwe Schindler
>         Attachments: LUCENE-4548.patch
>
>
> Spin-off from dev@lao:
> {quote}
> bq. I am about to write a Filter that only operates on a set of documents 
> that have already passed other filter(s).  It's rather expensive, since it 
> has to use DocValues to examine a value and then determine if its a match.  
> So it scales O(n) where n is the number of documents it must see.  The 2nd 
> arg of getDocIdSet is Bits acceptDocs.  Unfortunately Bits doesn't have an 
> int iterator but I can deal with that seeing if it extends DocIdSet.
> bq. I'm looking at BooleanFilter which I want to use and I notice that it 
> passes null to filter.getDocIdSet for acceptDocs, and it justifies this with 
> the following comment:
> bq. // we dont pass acceptDocs, we will filter at the end using an additional 
> filter
> the idea of passing the already build bits for the MUST is a good idea and 
> can be implemented easily.
> The reason why the acceptDocs were not passed down is the new way of filter 
> works in Lucene 4.0 and to optimize caching. Because accept docs are the only 
> thing that changes when deletions are applied and filters are required to 
> handle them separately:  whenever something is able to cache (e.g. 
> CachingWrapperFilter), the acceptDocs are not cached, so the underlying 
> filters get a null acceptDocs to produce the full bitset and the filtering is 
> done when CachingWrapperFilter gets the “uptodate” acceptDocs. But for this 
> case this does not matter if the first filter clause does not get acceptdocs, 
> but later MUST clauses of course can get them (they are not 
> deletion-specific)!
> Can you open issue to optimize the MUST case (possibly MUST_NOT, too)?
> Another thing that could help here: You can stop using BooleanFilter if you 
> can apply the filters sequentially (only MUST clauses) by wrapping with 
> multiple FilteredQuery: new FilteredQuery(new FilteredQuery(originalQuery, 
> clause1), clause2). If the DocIdSets enable bits() and the FilteredQuery 
> autodetection decides to use random access filters, the acceptdocs are also 
> passed down from the outside to the inner, removing the documents filtered 
> out.
> {quote}
> Maybe BooleanFilter should have 2 modes (Boolean ctor argument): Passing down 
> the acceptDocs to every filter (for the case where Filter calculation is 
> expensive and accept docs help to limit the calculations) or not passing down 
> (if the filter is cheap and the multiple acceptDocs bit checks for every 
> single filter is more expensive – which is then more effective, e.g. when the 
> Filter is only a cached bitset). The first mode would also optimize the 
> MUST/MUST_NOT case to pass down the further restricted acceptDocs on later 
> filters (just like FilteredQuery does).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to