[ 
https://issues.apache.org/jira/browse/LUCENE-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926281#comment-13926281
 ] 

Uwe Schindler commented on LUCENE-5495:
---------------------------------------

bq. In FilteredQuery, depending on the FilterStrategy, iterator() 
checked/called after bits(). I think if bits() is optional, and iterator() is 
not, then checking bits() first actually does make sense, otherwise, since 
iterator() impl is mandatory, bits() impl would be ignored. Maybe I am missing 
something...

- RandomAccessFilterStrategy first calls iterator() and cancels collection if 
iterator is null. Then it asks for bits() and uses them, if available, 
otherwise falls back to good old LeapFrog approach. RandomAccessFilterStrategy 
only chooses to use random access, if the useRandomAccess() method returns 
true. In general it only does this if the iterator is not too sparse (it checks 
first filter doc). This can be configured by the user with other heuristics 
(e.g. cost() function).
- LeapFrogFilterStrategy always uses iterator() (it's scorer is also the fall 
back for all other cases, where bits() returns null)
- QueryFirstFilterStrategy first calls bits(), but if those are null it falls 
back to the iterator()

The default is RandomAccessFilterStrategy.

bq. This patch can be think of fixing a shortcoming in BooleanFilter. Are we 
discouraging use of BooleanFiilter?

I just complained about the algo. bits() must be purely optional. If it returns 
null, you *must* also check the iterator(). If the iterator() also returns 
null, no documents match.

But your patch should in no case try to "emulate" the iterator by the 
BitsDocIdSetIterator! iterator() is mandatory and is used as fallback if the 
bits() return null. But definitely not the other way round. iterator() has to 
be implemented, otherwise its not a valid filter.

The UOE by the Facet filters is intentional, because those should never ever be 
used as a filter in queries. Because of the way how FilteredQuery or 
ConstantScoreQuery works, the user will get the UOE. I know this is a hack, but 
[~mikemccand] did this intentionally (earlier versions used the same scorer you 
added in your patch to emulate, but that made users use that filters and slow 
down their queries).

In my opinion, if we want to improve this, it should use a strategy like 
FilteredQuery, too. We can improve maybe depending on cases like:
- all filters are AND'ed together and some support random access -> use the 
same approach like FilteredQuery and pass down the bits of those filter's 
output as acceptDocs input for the next filter
- some flters support random access, few of them only iterators. In that case 
the filter with the iterator could drive the query and use bits() of other 
filters to filter out some documents. The result can be handled as FixedBitSet, 
but improved that bits() is used partially (if available).
- many more strategies...

But on the other hand, as Mike says: Filters should be Queries without score 
and be handled by BooleanQuery, BooleanFilter will go away then (or subsumed 
inside some BooleanQuery optimizations choosen automatically depending on cost 
factors,...). We have a GSoC issue open. This is one of the big priorities 
fixing the broken and unintuitive APIs around Query+Filter.

> Boolean Filter does not handle FilterClauses with only bits() implemented
> -------------------------------------------------------------------------
>
>                 Key: LUCENE-5495
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5495
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 4.6.1
>            Reporter: John Wang
>         Attachments: LUCENE-5495.patch, LUCENE-5495.patch
>
>
> Some Filter implementations produce DocIdSets without the iterator() 
> implementation, such as o.a.l.facet.range.Range.getFilter().
> Currently, such filters cannot be added to a BooleanFilter because 
> BooleanFilter expects all FilterClauses with Filters that have iterator() 
> implemented.
> This patch improves the behavior by taking Filters with bits() implemented 
> and treat them separately.
> This behavior would be faster in the case for Filters with a forward index as 
> the underlying data structure, where there would be no need to scan the index 
> to build an iterator.
> See attached unit test, which fails without this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to