[ https://issues.apache.org/jira/browse/LUCENE-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494858#comment-13494858 ]
Eks Dev commented on LUCENE-4548: --------------------------------- ...would be to nuke Filters completely from Lucene ... User +1 Filter is conceptually nothing more than no-scoring and a possibility to have an implementation that can be cached. >From the user API point of whew, there is really no need to bother users with >Filter abstraction. Both of these two are just attributes of the query (do you >need to score this clause or would you like to have it cached). > BooleanFilter should optionally pass down further restricted acceptDocs in > the MUST case (and acceptDocs in general) > -------------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-4548 > URL: https://issues.apache.org/jira/browse/LUCENE-4548 > Project: Lucene - Core > Issue Type: Bug > Reporter: Uwe Schindler > Attachments: LUCENE-4548.patch > > > Spin-off from dev@lao: > {quote} > bq. I am about to write a Filter that only operates on a set of documents > that have already passed other filter(s). It's rather expensive, since it > has to use DocValues to examine a value and then determine if its a match. > So it scales O(n) where n is the number of documents it must see. The 2nd > arg of getDocIdSet is Bits acceptDocs. Unfortunately Bits doesn't have an > int iterator but I can deal with that seeing if it extends DocIdSet. > bq. I'm looking at BooleanFilter which I want to use and I notice that it > passes null to filter.getDocIdSet for acceptDocs, and it justifies this with > the following comment: > bq. // we dont pass acceptDocs, we will filter at the end using an additional > filter > the idea of passing the already build bits for the MUST is a good idea and > can be implemented easily. > The reason why the acceptDocs were not passed down is the new way of filter > works in Lucene 4.0 and to optimize caching. Because accept docs are the only > thing that changes when deletions are applied and filters are required to > handle them separately: whenever something is able to cache (e.g. > CachingWrapperFilter), the acceptDocs are not cached, so the underlying > filters get a null acceptDocs to produce the full bitset and the filtering is > done when CachingWrapperFilter gets the “uptodate” acceptDocs. But for this > case this does not matter if the first filter clause does not get acceptdocs, > but later MUST clauses of course can get them (they are not > deletion-specific)! > Can you open issue to optimize the MUST case (possibly MUST_NOT, too)? > Another thing that could help here: You can stop using BooleanFilter if you > can apply the filters sequentially (only MUST clauses) by wrapping with > multiple FilteredQuery: new FilteredQuery(new FilteredQuery(originalQuery, > clause1), clause2). If the DocIdSets enable bits() and the FilteredQuery > autodetection decides to use random access filters, the acceptdocs are also > passed down from the outside to the inner, removing the documents filtered > out. > {quote} > Maybe BooleanFilter should have 2 modes (Boolean ctor argument): Passing down > the acceptDocs to every filter (for the case where Filter calculation is > expensive and accept docs help to limit the calculations) or not passing down > (if the filter is cheap and the multiple acceptDocs bit checks for every > single filter is more expensive – which is then more effective, e.g. when the > Filter is only a cached bitset). The first mode would also optimize the > MUST/MUST_NOT case to pass down the further restricted acceptDocs on later > filters (just like FilteredQuery does). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org