So now I'm confused. Since your query has required (+) clauses, the setAllowDocsOutOfOrder should have no effect, on either 2.4 or trunk.
BooleanQuery only uses BooleanScorer when there are no required terms, and allowDocsOutOfOrder is true. So I can't explain why you see this setting changing anything on this query... Mike On Tue, Jul 14, 2009 at 7:04 PM, eks dev<eks...@yahoo.co.uk> wrote: > > I do not know exactly why, but > when I BooleanQuery.setAllowDocsOutOfOrder(true); I have the problem, but > with setAllowDocsOutOfOrder(false); no problems whatsoever > > not really scientific method to find such bug, but does the job and makes me > happy. > > Empirical, "deprecated methods are not to be taken as thoroughly tested, as > they have short life expectancy" > > > > > > ----- Original Message ---- >> From: eks dev <eks...@yahoo.co.uk> >> To: java-user@lucene.apache.org >> Sent: Wednesday, 15 July, 2009 0:24:43 >> Subject: Re: speed of BooleanQueries on 2.9 >> >> >> Mike, we are definitely hitting something with this one! >> >> we had report from our QA chaps that our servers got stuck (limit is on 180 >> Seconds Request)... We are on average 14 Requsts per second.... has nothing >> to >> do with gc() as >> we can repeat it with freshly restarted searcher. >> >> - it happens on a less than 0.1% of queries, not much of a pattern, >> repeatable >> on our index... >> it is always combination of two expanded tokens (we use >> minimumNooShouldMatch)... >> >> (+(t1 [up to 40 expansions]) +(t2 [up to 40 expansions of t2])) >> all tokens are with set boost, and minNumShouldMatch is set to two >> >> I cannot provide self-contained test, nor index (contains sensitive data and >> is >> rather big, ~5G) >> >> I can repeat this test on t1 and t2 with 40 expansions each. even if I take >> the >> most frequent tokens in collection it runs well under one second...but these >> two >> particular tokens with their "expansions" are making it run forever... >> >> and yes, if I run t1 plus expansions only, it runs super fast, the same for >> t2 >> >> java 1.4U14, tried wit 1.6U6, no changes... >> >> will report if I dig something out >> >> partial stack trace while "stuck", cpu is on max: >> >> org.apache.lucene.search.TopScoreDocCollector$OutOfOrderTopScoreDocCollector.collect(Unknown >> Source) >> org.apache.lucene.search.BooleanScorer.score(Unknown Source) >> org.apache.lucene.search.BooleanScorer.score(Unknown Source) >> org.apache.lucene.search.IndexSearcher.search(Unknown Source) >> org.apache.lucene.search.IndexSearcher.search(Unknown Source) >> org.apache.lucene.search.Searcher.search(Unknown Source) >> >> >> >> >> >> ----- Original Message ---- >> > From: eks dev >> > To: java-user@lucene.apache.org >> > Sent: Monday, 13 July, 2009 13:28:45 >> > Subject: Re: speed of BooleanQueries on 2.9 >> > >> > Hi Mike, >> > >> > getMaxNumOfCandidates() in test was 200, Index is optimised and read-only >> > >> > We found (due to an error in our warm-up code, funny) that only this Query >> runs >> > slower on 2.9. >> > >> > A hint where to look could be that this Query cointains two, the most >> > frequent >> >> > tokens in two particular fields >> > NAME:hans and ZIPS:berlin (index has ca 80Mio very short documents, 3Mio >> unique >> > terms) >> > >> > But all of this *could be just wrong measurement*, I just could not spend >> > more >> >> > time to get to the bottom of this. We moved forward as we got overall >> > better >> > average performance (sweet 10% in average) on much bigger real query log >> > from >> > our regression test. >> > >> > Anyhow I just wanted to throw it out, maybe it triggers some synapses :) If >> > false alarm, sorry. >> > >> > >> > >> > >> > >> > ----- Original Message ---- >> > > From: Michael McCandless >> > > To: java-user@lucene.apache.org >> > > Sent: Monday, 13 July, 2009 11:50:48 >> > > Subject: Re: speed of BooleanQueries on 2.9 >> > > >> > > This is not expected; 2.9 has had a number of changes that ought to >> > > reduce CPU cost of searching. If this holds up we definitely need to >> > > get to the root cause. >> > > >> > > Did your test exclude the warmup query for both 2.4.1 & 2.9? How many >> > > segments in the index? What is the actual value of >> > > getMaxNumOfCandidates()? If you simplify the query down (eg just do >> > > the NAME clause or the ZIPSS clause, alone) are those also 4X slower? >> > > >> > > Mike >> > > >> > > On Sun, Jul 12, 2009 at 12:53 PM, eks devwrote: >> > > > >> > > > Is it possible that the same BooleanQuery on 2.9 runs significantly >> > > > slower >> >> > > than on 2.4? >> > > > >> > > > we have some strange effects where the following query runs approx >> 4(ouch!) >> > > times slower on 2.9, test done by 1000 times executing the same Query... >> But! >> > if >> > > I run test from some real Query log with mixed Queries, I get almost the >> same >> > > results (?!), even slightly faster on 2.9 !? >> > > > >> > > > >> > > > Query: >> > > > +((NAME:hans NAME:hahns^0.23232001 NAME:hams^0.27648002 >> > > > NAME:hamz^0.25392 >> > > NAME:hanas^0.18722998 NAME:hanbs^0.18722998 NAME:hanfs^0.18722998 >> > > NAME:hangs^0.18722998 NAME:hanhs^0.24030754 NAME:hanis^0.18722998 >> > > NAME:hanjs^0.18722998 NAME:hanks^0.18722998 NAME:hanms^0.18722998 >> > > NAME:hanos^0.18722998 NAME:hanrs^0.18722998 NAME:hansb^0.20172001 >> > > NAME:hansd^0.20172001 NAME:hansf^0.20172001 NAME:hansg^0.20172001 >> > > NAME:hansi^0.20172001 NAME:hansj^0.20172001 NAME:hansk^0.20172001 >> > > NAME:hansl^0.20172001 NAME:hansn^0.20172001 NAME:hanso^0.20172001 >> > > NAME:hansp^0.20172001 NAME:hanst^0.20172001 NAME:hansu^0.20172001 >> > > NAME:hansw^0.20172001 NAME:hansy^0.20172001 NAME:hansz^0.20172001 >> > > NAME:hants^0.18722998 NAME:hanus^0.18722998 NAME:hanws^0.18722998 >> > > NAME:hehns^0.20172001 NAME:hens^0.2736075 NAME:hins^0.24843 >> NAME:hons^0.24843 >> > > NAME:huhns^0.1801875 NAME:huns^0.24843)^2.0) >> > > > +(((ZIPS:berlin ZIPS:barlin^0.28227 ZIPS:berien^0.25947002 >> > > ZIPS:berling^0.23232001 ZIPS:perlin^0.26133335))^1.2) >> > > > >> > > > The question is just to get some hints where I should look... >> > > > >> > > > Both fealds are without norms, omitTf(true) , RAMDirectory, using >> > > > TopDocs top = ixSearcher.search(q, null, getMaxNumOfCandidates()); >> > > > and BooleanQuery.setAllowDocsOutOfOrder(true); >> > > > >> > > > maybe we made some mistakes on measuring, but we did simple timing >> > > > here on >> >> > > search() method... strange. I would bet it is something we did, but I >> > > cannot >> >> > see >> > > where ... >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > --------------------------------------------------------------------- >> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > > > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > >> > > > >> > > >> > > --------------------------------------------------------------------- >> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > > For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org