[ 
https://issues.apache.org/jira/browse/LUCENE-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4548:
----------------------------------

    Attachment: LUCENE-4548.patch

Here is a patch that demonstrates and for perf testing. The default is 
unchanged, if you use new BooleanFilter(true), you get an instance that passes 
the acceptDocs down to every filter clause, and - in the MUST case - also 
furter restricts the acceptDocs with the current bit set.

In any case, if you have *only* MUST clauses, don't use BooleanFilter, as it is 
much more expensive than chaing the filters with FilteredQuery.
                
> BooleanFilter should optionally pass down further restricted acceptDocs in 
> the MUST case (and acceptDocs in general)
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-4548
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4548
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Uwe Schindler
>         Attachments: LUCENE-4548.patch
>
>
> Spin-off from dev@lao:
> {quote}
> bq. I am about to write a Filter that only operates on a set of documents 
> that have already passed other filter(s).  It's rather expensive, since it 
> has to use DocValues to examine a value and then determine if its a match.  
> So it scales O(n) where n is the number of documents it must see.  The 2nd 
> arg of getDocIdSet is Bits acceptDocs.  Unfortunately Bits doesn't have an 
> int iterator but I can deal with that seeing if it extends DocIdSet.
> bq. I'm looking at BooleanFilter which I want to use and I notice that it 
> passes null to filter.getDocIdSet for acceptDocs, and it justifies this with 
> the following comment:
> bq. // we dont pass acceptDocs, we will filter at the end using an additional 
> filter
> the idea of passing the already build bits for the MUST is a good idea and 
> can be implemented easily.
> The reason why the acceptDocs were not passed down is the new way of filter 
> works in Lucene 4.0 and to optimize caching. Because accept docs are the only 
> thing that changes when deletions are applied and filters are required to 
> handle them separately:  whenever something is able to cache (e.g. 
> CachingWrapperFilter), the acceptDocs are not cached, so the underlying 
> filters get a null acceptDocs to produce the full bitset and the filtering is 
> done when CachingWrapperFilter gets the “uptodate” acceptDocs. But for this 
> case this does not matter if the first filter clause does not get acceptdocs, 
> but later MUST clauses of course can get them (they are not 
> deletion-specific)!
> Can you open issue to optimize the MUST case (possibly MUST_NOT, too)?
> Another thing that could help here: You can stop using BooleanFilter if you 
> can apply the filters sequentially (only MUST clauses) by wrapping with 
> multiple FilteredQuery: new FilteredQuery(new FilteredQuery(originalQuery, 
> clause1), clause2). If the DocIdSets enable bits() and the FilteredQuery 
> autodetection decides to use random access filters, the acceptdocs are also 
> passed down from the outside to the inner, removing the documents filtered 
> out.
> {quote}
> Maybe BooleanFilter should have 2 modes (Boolean ctor argument): Passing down 
> the acceptDocs to every filter (for the case where Filter calculation is 
> expensive and accept docs help to limit the calculations) or not passing down 
> (if the filter is cheap and the multiple acceptDocs bit checks for every 
> single filter is more expensive – which is then more effective, e.g. when the 
> Filter is only a cached bitset). The first mode would also optimize the 
> MUST/MUST_NOT case to pass down the further restricted acceptDocs on later 
> filters (just like FilteredQuery does).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to