[
https://issues.apache.org/jira/browse/LUCENE-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926281#comment-13926281
]
Uwe Schindler commented on LUCENE-5495:
---------------------------------------
bq. In FilteredQuery, depending on the FilterStrategy, iterator()
checked/called after bits(). I think if bits() is optional, and iterator() is
not, then checking bits() first actually does make sense, otherwise, since
iterator() impl is mandatory, bits() impl would be ignored. Maybe I am missing
something...
- RandomAccessFilterStrategy first calls iterator() and cancels collection if
iterator is null. Then it asks for bits() and uses them, if available,
otherwise falls back to good old LeapFrog approach. RandomAccessFilterStrategy
only chooses to use random access, if the useRandomAccess() method returns
true. In general it only does this if the iterator is not too sparse (it checks
first filter doc). This can be configured by the user with other heuristics
(e.g. cost() function).
- LeapFrogFilterStrategy always uses iterator() (it's scorer is also the fall
back for all other cases, where bits() returns null)
- QueryFirstFilterStrategy first calls bits(), but if those are null it falls
back to the iterator()
The default is RandomAccessFilterStrategy.
bq. This patch can be think of fixing a shortcoming in BooleanFilter. Are we
discouraging use of BooleanFiilter?
I just complained about the algo. bits() must be purely optional. If it returns
null, you *must* also check the iterator(). If the iterator() also returns
null, no documents match.
But your patch should in no case try to "emulate" the iterator by the
BitsDocIdSetIterator! iterator() is mandatory and is used as fallback if the
bits() return null. But definitely not the other way round. iterator() has to
be implemented, otherwise its not a valid filter.
The UOE by the Facet filters is intentional, because those should never ever be
used as a filter in queries. Because of the way how FilteredQuery or
ConstantScoreQuery works, the user will get the UOE. I know this is a hack, but
[~mikemccand] did this intentionally (earlier versions used the same scorer you
added in your patch to emulate, but that made users use that filters and slow
down their queries).
In my opinion, if we want to improve this, it should use a strategy like
FilteredQuery, too. We can improve maybe depending on cases like:
- all filters are AND'ed together and some support random access -> use the
same approach like FilteredQuery and pass down the bits of those filter's
output as acceptDocs input for the next filter
- some flters support random access, few of them only iterators. In that case
the filter with the iterator could drive the query and use bits() of other
filters to filter out some documents. The result can be handled as FixedBitSet,
but improved that bits() is used partially (if available).
- many more strategies...
But on the other hand, as Mike says: Filters should be Queries without score
and be handled by BooleanQuery, BooleanFilter will go away then (or subsumed
inside some BooleanQuery optimizations choosen automatically depending on cost
factors,...). We have a GSoC issue open. This is one of the big priorities
fixing the broken and unintuitive APIs around Query+Filter.
> Boolean Filter does not handle FilterClauses with only bits() implemented
> -------------------------------------------------------------------------
>
> Key: LUCENE-5495
> URL: https://issues.apache.org/jira/browse/LUCENE-5495
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/search
> Affects Versions: 4.6.1
> Reporter: John Wang
> Attachments: LUCENE-5495.patch, LUCENE-5495.patch
>
>
> Some Filter implementations produce DocIdSets without the iterator()
> implementation, such as o.a.l.facet.range.Range.getFilter().
> Currently, such filters cannot be added to a BooleanFilter because
> BooleanFilter expects all FilterClauses with Filters that have iterator()
> implemented.
> This patch improves the behavior by taking Filters with bits() implemented
> and treat them separately.
> This behavior would be faster in the case for Filters with a forward index as
> the underlying data structure, where there would be no need to scan the index
> to build an iterator.
> See attached unit test, which fails without this patch.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]