[
https://issues.apache.org/jira/browse/LUCENE-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925676#comment-13925676
]
Michael McCandless commented on LUCENE-5495:
--------------------------------------------
Unfortunately, it's intentional that the filters returned by
Range.getFilter are not "general purpose" and throw UOE from their
iterator methods.
The Range.getFilter javadocs state that you must either 1) do post
filtering (use FilteredQuery with QUERY_FIRST_FILTER_STRATEGY), or 2)
pass the Filter to DrillSideways (which is careful to do "post
filtering").
The problems is these filters can in general be very costly,
e.g. backed by a "costly" expression like Haversin distance
computation.
Really, this is all one giant hack/workaround, because Lucene is
unable to properly/generally handle the "post filter" use case
(something Solr has had for some time). I think we should fix that;
i.e., we need some way for a Filter to express that 1) it's random-access
(supports Bits), and 2) it's very costly. This is the mirror image
case to "random access filter down low", which we do for random-access
filters that have very low cost.
Ideally, we would absorb BooleanFilter and FilteredQuery into
BooleanQuery, e.g. so you can BQ.add(Filter) and then BooleanQuery
works out which filters/queries should be applied "random access down
low", "random access up high", "leap frog" or even "use a temporary
bit set" (like MultiTermQueryWrapperFilter, BooleanFilter). These all
should just be implementation details on how the hits are matched,
worked out by BooleanQuery, not by the user having to invoke cryptic
options across three classes. It's crazy we have such code
duplication across these classes today.
I think we should also have a "random access up high" for queries
(LUCENE-5460); if we had that and BQ could be trusted to do the right
thing we could e.g. rewrite a PhraseQuery("a b c") into +a +b +c
+positions(a b c) to solve LUCENE-1252.
> Boolean Filter does not handle FilterClauses with only bits() implemented
> -------------------------------------------------------------------------
>
> Key: LUCENE-5495
> URL: https://issues.apache.org/jira/browse/LUCENE-5495
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/search
> Affects Versions: 4.6.1
> Reporter: John Wang
> Attachments: LUCENE-5495.patch, LUCENE-5495.patch
>
>
> Some Filter implementations produce DocIdSets without the iterator()
> implementation, such as o.a.l.facet.range.Range.getFilter().
> Currently, such filters cannot be added to a BooleanFilter because
> BooleanFilter expects all FilterClauses with Filters that have iterator()
> implemented.
> This patch improves the behavior by taking Filters with bits() implemented
> and treat them separately.
> This behavior would be faster in the case for Filters with a forward index as
> the underlying data structure, where there would be no need to scan the index
> to build an iterator.
> See attached unit test, which fails without this patch.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]