Re: speed of BooleanQueries on 2.9

eks dev Wed, 15 Jul 2009 10:50:29 -0700


1. pls forget minNumberShould match, it is NOT set on this particular query 
(minNumberShouldMatch is determined dynamically, depending on semantics of user 
query... sometimes triggers, sometimes not...). 
This Exact Query here causes search to take longer than 180 Seconds with  
allowDocsOutOfOrder = true, and less than 70mS with false. Repeatable?!? No 
gc() effects involved... on 2.4 it does not happen, it works fine with both 
true/false for allowDocsOutOfOrder


2. re your test, That is exactly what makes me wonder, we also see average 
performance almost 10% better on 2.9 (even on this index when we exclude these 
stuck searches),  but on this particular index our customer's QA managed to 
find these "stuck requests". 

3. If I change tokens involved, in exactly same-structured Query, it runs fine 
=> The problem is somehow term-defendant (bah!)

Please understand that I do not have direct access to this index and it makes 
debug cycles slightly longer. Typically I give them some jar-s and they run it 
ans send me logs back... Sorry for inaccuracies in description, but I am sure 
there is a problem in lucene... We tried it with Luke as well, freshly built 
index, we see exactly the same behavior (no bugs in our app that could cause 
it, except maybe wrong lucene usage somewhere)    
  

Hard, but please stay with me, we will fix one ugly bug :)

 





----- Original Message ----
> From: Michael McCandless <luc...@mikemccandless.com>
> To: java-user@lucene.apache.org
> Sent: Wednesday, 15 July, 2009 19:27:24
> Subject: Re: speed of BooleanQueries on 2.9
> 
> But, that query can't accept a minNumberShouldMatch -- are you really
> setting that?  (You get 0 results if you set it, because the top
> boolean query has a single required clause).  Maybe you set it only on
> the inner large OR-query?  (But then I don't see the ~2 on that inner
> clause).
> 
> I've tested a 21 term OR query, with allowDocsOutOfOrder true,
> numHits=200 on a Wikpedia index that matches 10M docs and I'm seeing
> the same perf on trunk & 2.4.
> 
> Mike
> 
> On Wed, Jul 15, 2009 at 11:41 AM, eks devwrote:
> >
> > sorry for confusion, here is exact query that runs forever with 
> setAllowDocsOutOfOrder:
> > You see it on stack trace taken while "stuck" 
> o.a.l.search.TopScoreDocCollector$OutOfOrderTopScoreDocCollector.collect(UnknownSource)
> >
> >
> > Query: +(((NAME:maria NAME:marae^0.25171682 NAME:marai^0.2365632 
> NAME:marao^0.2365632 NAME:marau^0.2365632 NAME:marea^0.2834352 
> NAME:marei^0.25171682 NAME:mareo^0.25171682 NAME:mareu^0.25171682 
> NAME:marie^0.28577283 NAME:marieh^0.2451648 NAME:mariha^0.2583552 
> NAME:mariu^0.27189124 NAME:marja^0.2834352 NAME:marje^0.2673408 
> NAME:marji^0.25171682 NAME:marjo^0.25171682 NAME:marju^0.25171682 
> NAME:marla^0.2673408 NAME:marle^0.25171682 NAME:marli^0.2365632 
> NAME:marlo^0.2365632 NAME:maroa^0.2673408 NAME:maroe^0.25171682 
> NAME:maroi^0.2365632 NAME:marou^0.2365632 NAME:marua^0.2673408 
> NAME:marue^0.25171682 NAME:marui^0.2365632 NAME:maruo^0.2365632 
> NAME:marye^0.2673408 NAME:maryi^0.25171682 NAME:maryo^0.25171682 
> NAME:meria^0.2787888 NAME:miria^0.25835523 NAME:moria^0.25835523 
> NAME:muria^0.25835523 NAME:naria^0.27648002 NAME:narie^0.25392002 
> NAME:neria^0.25392002) (NAME:piekarski NAME:bekarski^0.19200002 
> NAME:beugarski^0.20281483 NAME:blacharski^0.19200002
> >  NAME:lekarski^0.19200002 NAME:pecarski^0.21294187 
> > NAME:peikarski^0.27648002 
> NAME:pekarska^0.20172001 NAME:pekarski^0.22446752 NAME:pekarskj^0.21294187 
> NAME:pekarsky^0.21294187 NAME:pickarske^0.21168004 NAME:pickarski^0.22073482 
> NAME:piekalski^0.23941332 NAME:piekanski^0.23941332 NAME:piekaraka^0.22533335 
> NAME:piekarsci^0.29205337 NAME:piekarska^0.28421336 
> NAME:piekarskie^0.25392002 
> NAME:piekarsky^0.29205337 NAME:piekarzcyk^0.23232001 
> NAME:piekarzki^0.29205337 
> NAME:piekaski^0.24843001 NAME:piekavska^0.22533335 NAME:piekorski^0.28421336 
> NAME:pielarski^0.22997928 NAME:pierarski^0.22997928 
> NAME:pierkarski^0.24661335 
> NAME:piesarski^0.22997928 NAME:pietarski^0.22997928 
> NAME:pietkarski^0.24661335 
> NAME:pikarski^0.23232001 NAME:piowarski^0.20281483 NAME:pirkarski^0.22073482 
> NAME:plocharski^0.21168004 NAME:pokarski^0.20172001 
> NAME:polikarski^0.20172001 
> NAME:pukarski^0.20172001 NAME:pyekarska^0.26508 
> NAME:siekarski^0.20281483))^2.0)
> >
> >
> >
> >
> >
> > ----- Original Message ----
> >> From: Michael McCandless 
> >> To: java-user@lucene.apache.org
> >> Sent: Wednesday, 15 July, 2009 17:16:23
> >> Subject: Re: speed of BooleanQueries on 2.9
> >>
> >> So now I'm confused.  Since your query has required (+) clauses, the
> >> setAllowDocsOutOfOrder should have no effect, on either 2.4 or trunk.
> >>
> >> BooleanQuery only uses BooleanScorer when there are no required terms,
> >> and allowDocsOutOfOrder is true.  So I can't explain why you see this
> >> setting changing anything on this query...
> >>
> >> Mike
> >>
> >> On Tue, Jul 14, 2009 at 7:04 PM, eks devwrote:
> >> >
> >> > I do not know exactly why, but
> >> > when I BooleanQuery.setAllowDocsOutOfOrder(true); I have the problem, 
> >> > but 
> with
> >> setAllowDocsOutOfOrder(false);  no problems whatsoever
> >> >
> >> > not really scientific method to find such bug, but does the job and 
> >> > makes 
> me
> >> happy.
> >> >
> >> > Empirical, "deprecated methods are not to be taken as thoroughly tested, 
> >> > as
> >> they have short life expectancy"
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > ----- Original Message ----
> >> >> From: eks dev
> >> >> To: java-user@lucene.apache.org
> >> >> Sent: Wednesday, 15 July, 2009 0:24:43
> >> >> Subject: Re: speed of BooleanQueries on 2.9
> >> >>
> >> >>
> >> >> Mike, we are definitely hitting something with this one!
> >> >>
> >> >> we had report from our QA chaps that our servers got stuck (limit is on 
> 180
> >> >> Seconds Request)... We are on average 14 Requsts per second.... has 
> nothing
> >> to
> >> >> do with gc() as
> >> >> we can repeat it with freshly restarted searcher.
> >> >>
> >> >> - it happens on a less than 0.1% of queries, not much of a  pattern,
> >> repeatable
> >> >> on our index...
> >> >> it is always combination of two expanded tokens (we use
> >> >> minimumNooShouldMatch)...
> >> >>
> >> >> (+(t1 [up to 40 expansions]) +(t2 [up to 40 expansions of t2]))
> >> >> all tokens are with set boost, and  minNumShouldMatch is set to two
> >> >>
> >> >> I cannot provide self-contained test, nor index (contains sensitive 
> >> >> data 
> and
> >> is
> >> >> rather big, ~5G)
> >> >>
> >> >> I can repeat this test on t1 and t2 with 40 expansions each. even if I 
> take
> >> the
> >> >> most frequent tokens in collection it runs well under one second...but 
> these
> >> two
> >> >> particular tokens with their "expansions" are making it run forever...
> >> >>
> >> >> and yes, if I run t1 plus expansions only, it runs super fast, the same 
> for
> >> t2
> >> >>
> >> >> java 1.4U14, tried wit 1.6U6, no changes...
> >> >>
> >> >> will report if I dig something out
> >> >>
> >> >> partial stack trace while "stuck", cpu is on max:
> >> >>
> >> >>
> >> 
> org.apache.lucene.search.TopScoreDocCollector$OutOfOrderTopScoreDocCollector.collect(Unknown
> >> >> Source)
> >> >> org.apache.lucene.search.BooleanScorer.score(Unknown Source)
> >> >> org.apache.lucene.search.BooleanScorer.score(Unknown Source)
> >> >> org.apache.lucene.search.IndexSearcher.search(Unknown Source)
> >> >> org.apache.lucene.search.IndexSearcher.search(Unknown Source)
> >> >> org.apache.lucene.search.Searcher.search(Unknown Source)
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> ----- Original Message ----
> >> >> > From: eks dev
> >> >> > To: java-user@lucene.apache.org
> >> >> > Sent: Monday, 13 July, 2009 13:28:45
> >> >> > Subject: Re: speed of BooleanQueries on 2.9
> >> >> >
> >> >> > Hi Mike,
> >> >> >
> >> >> > getMaxNumOfCandidates() in test was 200, Index is optimised and 
> read-only
> >> >> >
> >> >> > We found (due to an error in our warm-up code, funny) that only this 
> Query
> >> >> runs
> >> >> > slower on 2.9.
> >> >> >
> >> >> > A hint where to look could be that this Query cointains two, the most
> >> frequent
> >> >>
> >> >> > tokens in two particular fields
> >> >> > NAME:hans and ZIPS:berlin (index has ca 80Mio very short documents, 
> >> >> > 3Mio
> >> >> unique
> >> >> > terms)
> >> >> >
> >> >> > But all of this *could be just wrong measurement*, I just could not 
> spend
> >> more
> >> >>
> >> >> > time to get to the bottom of this. We moved forward as we got overall
> >> better
> >> >> > average performance (sweet 10% in average) on much bigger real query 
> >> >> > log
> >> from
> >> >> > our regression test.
> >> >> >
> >> >> > Anyhow I just wanted to throw it out, maybe it triggers some synapses 
> >> >> > :) 
> If
> >> >> > false alarm, sorry.
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > ----- Original Message ----
> >> >> > > From: Michael McCandless
> >> >> > > To: java-user@lucene.apache.org
> >> >> > > Sent: Monday, 13 July, 2009 11:50:48
> >> >> > > Subject: Re: speed of BooleanQueries on 2.9
> >> >> > >
> >> >> > > This is not expected; 2.9 has had a number of changes that ought to
> >> >> > > reduce CPU cost of searching.  If this holds up we definitely need 
> >> >> > > to
> >> >> > > get to the root cause.
> >> >> > >
> >> >> > > Did your test exclude the warmup query for both 2.4.1 & 2.9?  How 
> >> >> > > many
> >> >> > > segments in the index?  What is the actual value of
> >> >> > > getMaxNumOfCandidates()?  If you simplify the query down (eg just do
> >> >> > > the NAME clause or the ZIPSS clause, alone) are those also 4X 
> >> >> > > slower?
> >> >> > >
> >> >> > > Mike
> >> >> > >
> >> >> > > On Sun, Jul 12, 2009 at 12:53 PM, eks devwrote:
> >> >> > > >
> >> >> > > > Is it possible that the same BooleanQuery on 2.9 runs 
> >> >> > > > significantly
> >> slower
> >> >>
> >> >> > > than on 2.4?
> >> >> > > >
> >> >> > > > we have some strange effects where the following query runs approx
> >> >> 4(ouch!)
> >> >> > > times slower on 2.9, test done by 1000 times executing the same 
> Query...
> >> >> But!
> >> >> > if
> >> >> > > I run test from some real Query log with mixed Queries, I get 
> >> >> > > almost 
> the
> >> >> same
> >> >> > > results (?!), even slightly faster on 2.9 !?
> >> >> > > >
> >> >> > > >
> >> >> > > > Query:
> >> >> > > > +((NAME:hans NAME:hahns^0.23232001 NAME:hams^0.27648002
> >> NAME:hamz^0.25392
> >> >> > > NAME:hanas^0.18722998 NAME:hanbs^0.18722998 NAME:hanfs^0.18722998
> >> >> > > NAME:hangs^0.18722998 NAME:hanhs^0.24030754 NAME:hanis^0.18722998
> >> >> > > NAME:hanjs^0.18722998 NAME:hanks^0.18722998 NAME:hanms^0.18722998
> >> >> > > NAME:hanos^0.18722998 NAME:hanrs^0.18722998 NAME:hansb^0.20172001
> >> >> > > NAME:hansd^0.20172001 NAME:hansf^0.20172001 NAME:hansg^0.20172001
> >> >> > > NAME:hansi^0.20172001 NAME:hansj^0.20172001 NAME:hansk^0.20172001
> >> >> > > NAME:hansl^0.20172001 NAME:hansn^0.20172001 NAME:hanso^0.20172001
> >> >> > > NAME:hansp^0.20172001 NAME:hanst^0.20172001 NAME:hansu^0.20172001
> >> >> > > NAME:hansw^0.20172001 NAME:hansy^0.20172001 NAME:hansz^0.20172001
> >> >> > > NAME:hants^0.18722998 NAME:hanus^0.18722998 NAME:hanws^0.18722998
> >> >> > > NAME:hehns^0.20172001 NAME:hens^0.2736075 NAME:hins^0.24843
> >> >> NAME:hons^0.24843
> >> >> > > NAME:huhns^0.1801875 NAME:huns^0.24843)^2.0)
> >> >> > > > +(((ZIPS:berlin ZIPS:barlin^0.28227 ZIPS:berien^0.25947002
> >> >> > > ZIPS:berling^0.23232001 ZIPS:perlin^0.26133335))^1.2)
> >> >> > > >
> >> >> > > > The question is just to get some hints where I should look...
> >> >> > > >
> >> >> > > > Both fealds are without norms, omitTf(true) , RAMDirectory, using
> >> >> > > > TopDocs top = ixSearcher.search(q, null, getMaxNumOfCandidates());
> >> >> > > > and BooleanQuery.setAllowDocsOutOfOrder(true);
> >> >> > > >
> >> >> > > > maybe we made some mistakes on measuring, but we did simple 
> >> >> > > > timing 
> here
> >> on
> >> >>
> >> >> > > search() method... strange. I would bet it is something we did, but 
> >> >> > > I
> >> cannot
> >> >>
> >> >> > see
> >> >> > > where ...
> >> >> > > >
> >> >> > > >
> >> >> > > >
> >> >> > > >
> >> >> > > >
> >> >> > > > 
> ---------------------------------------------------------------------
> >> >> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> > > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >> > > >
> >> >> > > >
> >> >> > >
> >> >> > > ---------------------------------------------------------------------
> >> >> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: speed of BooleanQueries on 2.9

Reply via email to