[ https://issues.apache.org/jira/browse/LUCENE-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926281#comment-13926281 ]
Uwe Schindler commented on LUCENE-5495: --------------------------------------- bq. In FilteredQuery, depending on the FilterStrategy, iterator() checked/called after bits(). I think if bits() is optional, and iterator() is not, then checking bits() first actually does make sense, otherwise, since iterator() impl is mandatory, bits() impl would be ignored. Maybe I am missing something... - RandomAccessFilterStrategy first calls iterator() and cancels collection if iterator is null. Then it asks for bits() and uses them, if available, otherwise falls back to good old LeapFrog approach. RandomAccessFilterStrategy only chooses to use random access, if the useRandomAccess() method returns true. In general it only does this if the iterator is not too sparse (it checks first filter doc). This can be configured by the user with other heuristics (e.g. cost() function). - LeapFrogFilterStrategy always uses iterator() (it's scorer is also the fall back for all other cases, where bits() returns null) - QueryFirstFilterStrategy first calls bits(), but if those are null it falls back to the iterator() The default is RandomAccessFilterStrategy. bq. This patch can be think of fixing a shortcoming in BooleanFilter. Are we discouraging use of BooleanFiilter? I just complained about the algo. bits() must be purely optional. If it returns null, you *must* also check the iterator(). If the iterator() also returns null, no documents match. But your patch should in no case try to "emulate" the iterator by the BitsDocIdSetIterator! iterator() is mandatory and is used as fallback if the bits() return null. But definitely not the other way round. iterator() has to be implemented, otherwise its not a valid filter. The UOE by the Facet filters is intentional, because those should never ever be used as a filter in queries. Because of the way how FilteredQuery or ConstantScoreQuery works, the user will get the UOE. I know this is a hack, but [~mikemccand] did this intentionally (earlier versions used the same scorer you added in your patch to emulate, but that made users use that filters and slow down their queries). In my opinion, if we want to improve this, it should use a strategy like FilteredQuery, too. We can improve maybe depending on cases like: - all filters are AND'ed together and some support random access -> use the same approach like FilteredQuery and pass down the bits of those filter's output as acceptDocs input for the next filter - some flters support random access, few of them only iterators. In that case the filter with the iterator could drive the query and use bits() of other filters to filter out some documents. The result can be handled as FixedBitSet, but improved that bits() is used partially (if available). - many more strategies... But on the other hand, as Mike says: Filters should be Queries without score and be handled by BooleanQuery, BooleanFilter will go away then (or subsumed inside some BooleanQuery optimizations choosen automatically depending on cost factors,...). We have a GSoC issue open. This is one of the big priorities fixing the broken and unintuitive APIs around Query+Filter. > Boolean Filter does not handle FilterClauses with only bits() implemented > ------------------------------------------------------------------------- > > Key: LUCENE-5495 > URL: https://issues.apache.org/jira/browse/LUCENE-5495 > Project: Lucene - Core > Issue Type: Bug > Components: core/search > Affects Versions: 4.6.1 > Reporter: John Wang > Attachments: LUCENE-5495.patch, LUCENE-5495.patch > > > Some Filter implementations produce DocIdSets without the iterator() > implementation, such as o.a.l.facet.range.Range.getFilter(). > Currently, such filters cannot be added to a BooleanFilter because > BooleanFilter expects all FilterClauses with Filters that have iterator() > implemented. > This patch improves the behavior by taking Filters with bits() implemented > and treat them separately. > This behavior would be faster in the case for Filters with a forward index as > the underlying data structure, where there would be no need to scan the index > to build an iterator. > See attached unit test, which fails without this patch. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org