Re: speed of BooleanQueries on 2.9

Michael McCandless Wed, 15 Jul 2009 10:27:57 -0700

But, that query can't accept a minNumberShouldMatch -- are you really
setting that?  (You get 0 results if you set it, because the top
boolean query has a single required clause).  Maybe you set it only on
the inner large OR-query?  (But then I don't see the ~2 on that inner
clause).


I've tested a 21 term OR query, with allowDocsOutOfOrder true,
numHits=200 on a Wikpedia index that matches 10M docs and I'm seeing
the same perf on trunk & 2.4.

Mike

On Wed, Jul 15, 2009 at 11:41 AM, eks dev<eks...@yahoo.co.uk> wrote:
>
> sorry for confusion, here is exact query that runs forever with 
> setAllowDocsOutOfOrder:
> You see it on stack trace taken while "stuck" 
> o.a.l.search.TopScoreDocCollector$OutOfOrderTopScoreDocCollector.collect(UnknownSource)
>
>
> Query: +(((NAME:maria NAME:marae^0.25171682 NAME:marai^0.2365632 
> NAME:marao^0.2365632 NAME:marau^0.2365632 NAME:marea^0.2834352 
> NAME:marei^0.25171682 NAME:mareo^0.25171682 NAME:mareu^0.25171682 
> NAME:marie^0.28577283 NAME:marieh^0.2451648 NAME:mariha^0.2583552 
> NAME:mariu^0.27189124 NAME:marja^0.2834352 NAME:marje^0.2673408 
> NAME:marji^0.25171682 NAME:marjo^0.25171682 NAME:marju^0.25171682 
> NAME:marla^0.2673408 NAME:marle^0.25171682 NAME:marli^0.2365632 
> NAME:marlo^0.2365632 NAME:maroa^0.2673408 NAME:maroe^0.25171682 
> NAME:maroi^0.2365632 NAME:marou^0.2365632 NAME:marua^0.2673408 
> NAME:marue^0.25171682 NAME:marui^0.2365632 NAME:maruo^0.2365632 
> NAME:marye^0.2673408 NAME:maryi^0.25171682 NAME:maryo^0.25171682 
> NAME:meria^0.2787888 NAME:miria^0.25835523 NAME:moria^0.25835523 
> NAME:muria^0.25835523 NAME:naria^0.27648002 NAME:narie^0.25392002 
> NAME:neria^0.25392002) (NAME:piekarski NAME:bekarski^0.19200002 
> NAME:beugarski^0.20281483 NAME:blacharski^0.19200002
>  NAME:lekarski^0.19200002 NAME:pecarski^0.21294187 NAME:peikarski^0.27648002 
> NAME:pekarska^0.20172001 NAME:pekarski^0.22446752 NAME:pekarskj^0.21294187 
> NAME:pekarsky^0.21294187 NAME:pickarske^0.21168004 NAME:pickarski^0.22073482 
> NAME:piekalski^0.23941332 NAME:piekanski^0.23941332 NAME:piekaraka^0.22533335 
> NAME:piekarsci^0.29205337 NAME:piekarska^0.28421336 
> NAME:piekarskie^0.25392002 NAME:piekarsky^0.29205337 
> NAME:piekarzcyk^0.23232001 NAME:piekarzki^0.29205337 NAME:piekaski^0.24843001 
> NAME:piekavska^0.22533335 NAME:piekorski^0.28421336 NAME:pielarski^0.22997928 
> NAME:pierarski^0.22997928 NAME:pierkarski^0.24661335 
> NAME:piesarski^0.22997928 NAME:pietarski^0.22997928 
> NAME:pietkarski^0.24661335 NAME:pikarski^0.23232001 NAME:piowarski^0.20281483 
> NAME:pirkarski^0.22073482 NAME:plocharski^0.21168004 NAME:pokarski^0.20172001 
> NAME:polikarski^0.20172001 NAME:pukarski^0.20172001 NAME:pyekarska^0.26508 
> NAME:siekarski^0.20281483))^2.0)
>
>
>
>
>
> ----- Original Message ----
>> From: Michael McCandless <luc...@mikemccandless.com>
>> To: java-user@lucene.apache.org
>> Sent: Wednesday, 15 July, 2009 17:16:23
>> Subject: Re: speed of BooleanQueries on 2.9
>>
>> So now I'm confused.  Since your query has required (+) clauses, the
>> setAllowDocsOutOfOrder should have no effect, on either 2.4 or trunk.
>>
>> BooleanQuery only uses BooleanScorer when there are no required terms,
>> and allowDocsOutOfOrder is true.  So I can't explain why you see this
>> setting changing anything on this query...
>>
>> Mike
>>
>> On Tue, Jul 14, 2009 at 7:04 PM, eks devwrote:
>> >
>> > I do not know exactly why, but
>> > when I BooleanQuery.setAllowDocsOutOfOrder(true); I have the problem, but 
>> > with
>> setAllowDocsOutOfOrder(false);  no problems whatsoever
>> >
>> > not really scientific method to find such bug, but does the job and makes 
>> > me
>> happy.
>> >
>> > Empirical, "deprecated methods are not to be taken as thoroughly tested, as
>> they have short life expectancy"
>> >
>> >
>> >
>> >
>> >
>> > ----- Original Message ----
>> >> From: eks dev
>> >> To: java-user@lucene.apache.org
>> >> Sent: Wednesday, 15 July, 2009 0:24:43
>> >> Subject: Re: speed of BooleanQueries on 2.9
>> >>
>> >>
>> >> Mike, we are definitely hitting something with this one!
>> >>
>> >> we had report from our QA chaps that our servers got stuck (limit is on 
>> >> 180
>> >> Seconds Request)... We are on average 14 Requsts per second.... has 
>> >> nothing
>> to
>> >> do with gc() as
>> >> we can repeat it with freshly restarted searcher.
>> >>
>> >> - it happens on a less than 0.1% of queries, not much of a  pattern,
>> repeatable
>> >> on our index...
>> >> it is always combination of two expanded tokens (we use
>> >> minimumNooShouldMatch)...
>> >>
>> >> (+(t1 [up to 40 expansions]) +(t2 [up to 40 expansions of t2]))
>> >> all tokens are with set boost, and  minNumShouldMatch is set to two
>> >>
>> >> I cannot provide self-contained test, nor index (contains sensitive data 
>> >> and
>> is
>> >> rather big, ~5G)
>> >>
>> >> I can repeat this test on t1 and t2 with 40 expansions each. even if I 
>> >> take
>> the
>> >> most frequent tokens in collection it runs well under one second...but 
>> >> these
>> two
>> >> particular tokens with their "expansions" are making it run forever...
>> >>
>> >> and yes, if I run t1 plus expansions only, it runs super fast, the same 
>> >> for
>> t2
>> >>
>> >> java 1.4U14, tried wit 1.6U6, no changes...
>> >>
>> >> will report if I dig something out
>> >>
>> >> partial stack trace while "stuck", cpu is on max:
>> >>
>> >>
>> org.apache.lucene.search.TopScoreDocCollector$OutOfOrderTopScoreDocCollector.collect(Unknown
>> >> Source)
>> >> org.apache.lucene.search.BooleanScorer.score(Unknown Source)
>> >> org.apache.lucene.search.BooleanScorer.score(Unknown Source)
>> >> org.apache.lucene.search.IndexSearcher.search(Unknown Source)
>> >> org.apache.lucene.search.IndexSearcher.search(Unknown Source)
>> >> org.apache.lucene.search.Searcher.search(Unknown Source)
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> ----- Original Message ----
>> >> > From: eks dev
>> >> > To: java-user@lucene.apache.org
>> >> > Sent: Monday, 13 July, 2009 13:28:45
>> >> > Subject: Re: speed of BooleanQueries on 2.9
>> >> >
>> >> > Hi Mike,
>> >> >
>> >> > getMaxNumOfCandidates() in test was 200, Index is optimised and 
>> >> > read-only
>> >> >
>> >> > We found (due to an error in our warm-up code, funny) that only this 
>> >> > Query
>> >> runs
>> >> > slower on 2.9.
>> >> >
>> >> > A hint where to look could be that this Query cointains two, the most
>> frequent
>> >>
>> >> > tokens in two particular fields
>> >> > NAME:hans and ZIPS:berlin (index has ca 80Mio very short documents, 3Mio
>> >> unique
>> >> > terms)
>> >> >
>> >> > But all of this *could be just wrong measurement*, I just could not 
>> >> > spend
>> more
>> >>
>> >> > time to get to the bottom of this. We moved forward as we got overall
>> better
>> >> > average performance (sweet 10% in average) on much bigger real query log
>> from
>> >> > our regression test.
>> >> >
>> >> > Anyhow I just wanted to throw it out, maybe it triggers some synapses 
>> >> > :) If
>> >> > false alarm, sorry.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > ----- Original Message ----
>> >> > > From: Michael McCandless
>> >> > > To: java-user@lucene.apache.org
>> >> > > Sent: Monday, 13 July, 2009 11:50:48
>> >> > > Subject: Re: speed of BooleanQueries on 2.9
>> >> > >
>> >> > > This is not expected; 2.9 has had a number of changes that ought to
>> >> > > reduce CPU cost of searching.  If this holds up we definitely need to
>> >> > > get to the root cause.
>> >> > >
>> >> > > Did your test exclude the warmup query for both 2.4.1 & 2.9?  How many
>> >> > > segments in the index?  What is the actual value of
>> >> > > getMaxNumOfCandidates()?  If you simplify the query down (eg just do
>> >> > > the NAME clause or the ZIPSS clause, alone) are those also 4X slower?
>> >> > >
>> >> > > Mike
>> >> > >
>> >> > > On Sun, Jul 12, 2009 at 12:53 PM, eks devwrote:
>> >> > > >
>> >> > > > Is it possible that the same BooleanQuery on 2.9 runs significantly
>> slower
>> >>
>> >> > > than on 2.4?
>> >> > > >
>> >> > > > we have some strange effects where the following query runs approx
>> >> 4(ouch!)
>> >> > > times slower on 2.9, test done by 1000 times executing the same 
>> >> > > Query...
>> >> But!
>> >> > if
>> >> > > I run test from some real Query log with mixed Queries, I get almost 
>> >> > > the
>> >> same
>> >> > > results (?!), even slightly faster on 2.9 !?
>> >> > > >
>> >> > > >
>> >> > > > Query:
>> >> > > > +((NAME:hans NAME:hahns^0.23232001 NAME:hams^0.27648002
>> NAME:hamz^0.25392
>> >> > > NAME:hanas^0.18722998 NAME:hanbs^0.18722998 NAME:hanfs^0.18722998
>> >> > > NAME:hangs^0.18722998 NAME:hanhs^0.24030754 NAME:hanis^0.18722998
>> >> > > NAME:hanjs^0.18722998 NAME:hanks^0.18722998 NAME:hanms^0.18722998
>> >> > > NAME:hanos^0.18722998 NAME:hanrs^0.18722998 NAME:hansb^0.20172001
>> >> > > NAME:hansd^0.20172001 NAME:hansf^0.20172001 NAME:hansg^0.20172001
>> >> > > NAME:hansi^0.20172001 NAME:hansj^0.20172001 NAME:hansk^0.20172001
>> >> > > NAME:hansl^0.20172001 NAME:hansn^0.20172001 NAME:hanso^0.20172001
>> >> > > NAME:hansp^0.20172001 NAME:hanst^0.20172001 NAME:hansu^0.20172001
>> >> > > NAME:hansw^0.20172001 NAME:hansy^0.20172001 NAME:hansz^0.20172001
>> >> > > NAME:hants^0.18722998 NAME:hanus^0.18722998 NAME:hanws^0.18722998
>> >> > > NAME:hehns^0.20172001 NAME:hens^0.2736075 NAME:hins^0.24843
>> >> NAME:hons^0.24843
>> >> > > NAME:huhns^0.1801875 NAME:huns^0.24843)^2.0)
>> >> > > > +(((ZIPS:berlin ZIPS:barlin^0.28227 ZIPS:berien^0.25947002
>> >> > > ZIPS:berling^0.23232001 ZIPS:perlin^0.26133335))^1.2)
>> >> > > >
>> >> > > > The question is just to get some hints where I should look...
>> >> > > >
>> >> > > > Both fealds are without norms, omitTf(true) , RAMDirectory, using
>> >> > > > TopDocs top = ixSearcher.search(q, null, getMaxNumOfCandidates());
>> >> > > > and BooleanQuery.setAllowDocsOutOfOrder(true);
>> >> > > >
>> >> > > > maybe we made some mistakes on measuring, but we did simple timing 
>> >> > > > here
>> on
>> >>
>> >> > > search() method... strange. I would bet it is something we did, but I
>> cannot
>> >>
>> >> > see
>> >> > > where ...
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > ---------------------------------------------------------------------
>> >> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >> > > > For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >> > > >
>> >> > > >
>> >> > >
>> >> > > ---------------------------------------------------------------------
>> >> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >
>> >
>> >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: speed of BooleanQueries on 2.9

Reply via email to