Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

Michael McCandless Thu, 15 Oct 2009 14:47:16 -0700

Well, you could wrap the C | D filter as a Query (using
ConstantScoreQuery), and then add that as a SHOULD clause on your
toplevel BooleanQuery?


Mike

On Thu, Oct 15, 2009 at 5:42 PM, Shaun Senecal <[email protected]> wrote:
> At first I thought so, yes, but then I realised that the query I wanted to
> execute was A | B | C | D and in reality I was executing (A | B) & (C | D).
> I guess my unit tests were missing some cases and don't currently catch
> this.
>
>
>
> On Thu, Oct 15, 2009 at 11:59 PM, Michael McCandless <
> [email protected]> wrote:
>
>> You should be able to do exactly what you were doing on 2.4, right?
>> (By setting the rewrite method).
>>
>> Mike
>>
>> On Thu, Oct 15, 2009 at 8:30 AM, Shaun Senecal <[email protected]>
>> wrote:
>> > Thanks for the explanation Mike.  It looks like I have no choice but to
>> move
>> > any queries which throw TooManyClauses to be Filters. Sadly, this means a
>> > max query time of 6s under load unless I can find a way to rewrite the
>> query
>> > to span a Query and a Filter.
>> >
>> >
>> > Thanks again
>> >
>> >
>> >
>> > On Thu, Oct 15, 2009 at 6:52 PM, Michael McCandless <
>> > [email protected]> wrote:
>> >
>> >> On Thu, Oct 15, 2009 at 4:57 AM, Shaun Senecal <[email protected]
>> >
>> >> wrote:
>> >>
>> >> > Up to Lucene 2.4, this has been working out for us.  However, in
>> >> > Lucene 2.9 this breaks since rewrite() now returns a
>> >> > ConstantScoreQuery.
>> >>
>> >> You can get back to the 2.4 behavior by calling
>> >> prefixQuery.setRewriteMethod(prefixQuery.SCORING_BOOLEAN_QUERY_REWRITE)
>> >> before calling rewrite().
>> >>
>> >> > Is there a way I can know that a ConstantScoreQuery will match at
>> >> > least 1 term (if not, I dont want to add it to the BooleanQuery)?
>> >>
>> >> There is a new method in 2.9: MultiTermQuery.getTotalNumberOfTerms(),
>> >> which returns how many terms were visited during rewrite.  Would that
>> >> work?
>> >>
>> >> > My understanding is that Lucene will apply the Filter (C | D) first,
>> >> > limiting the result set, then apply the Query (A | B).  Is this
>> >> > correct?
>> >>
>> >> Actually the filter & query clauses are AND'd in a sort of leapfrog
>> >> fashion, taking turns skipping up to the other's doc ID and only
>> >> accepting a doc ID when they both skip to the same point.  But this
>> >> (the mechanics of how Lucene takes a filter into account) is an
>> >> implementation detail and is likely to change.
>> >>
>> >> > If so, the end result is essentially the query: (A | B) & (C | D)
>> >>
>> >> Except that C, D contribute no scoring information, if scoring
>> >> matters.  If scoring doesn't matter, entirely (even for A, B), you
>> >> should use a collector that does not call score() at all to save CPU.
>> >>
>> >> Mike
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [email protected]
>> >> For additional commands, e-mail: [email protected]
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: PrefixQueries on large indexes (4M+ Documents) using a partial Query partial Filter solution

Reply via email to