Yeah, thanks Uwe - I had read it quickly, but was just rereading and it was sinking in. I hadn't cross correlated the issues yet.
Makes perfect sense. Very nice catch. >Maybe we need some change to CachingWrapperFilter that caches the DocIdSets >as before, but optionally would wrap it into an OpenBitSet, if it is not an >instance of OBS. So - do we address this for 2.9. This is a nasty performance trap, is it not? We recommend that you use both of these classes together - if it can work like this though ... Seems like a bug to me. - mark Uwe Schindler wrote: > See my mail about the CachingWrapperFilter and QueryWrapperFilter, I think > it explains this behaviour (and Thomas ran some warming queries before). > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > >> -----Original Message----- >> From: Mark Miller [mailto:markrmil...@gmail.com] >> Sent: Wednesday, September 16, 2009 7:23 PM >> To: java-user@lucene.apache.org >> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1? >> >> Nevermind. I see advance wasn't around in 2.4. This is part of the >> DocIdSetIterator changes. >> >> Anyway - either these are just not comparable runs, or there is a major >> bug (which seems unlikely). >> >> Just to keep pointing out the obvious: >> >> 2.4 calls doc 195,000 times >> 2.9 calls docId 1.4 million times >> >> That just doesn't jive. >> >> Mark Miller wrote: >> >>> Notice that while DisjunctionScorer.advance and >>> DisjuntionScorer.advanceAfterCurrent appear to be called >>> in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent >>> called. >>> >>> Can someone explain that? >>> >>> Mark Miller wrote: >>> >>> >>>> Something is very odd about this if they both cover the same search and >>>> the environ for both is identical. Even if one search was done twice, >>>> and we divide the numbers for the new api by 2 - its still *very* odd. >>>> >>>> With 2.4, ScorerDocQueue.topDoc is called half a million times. >>>> With 2.9, its called over 4 million times. >>>> >>>> Huh? >>>> >>>> Thomas Becker wrote: >>>> >>>> >>>> >>>>> No it's only a single segment. But two calls. One doing a getHitsCount >>>>> >> first and >> >>>>> the other doing the actual search. I'll paste both methods below if >>>>> >> someone's >> >>>>> interested. >>>>> >>>>> Will dig into lucene's sources and compare 2.4 search behaviour for my >>>>> >> case with >> >>>>> 2.9 tomorrow. It was about time to get more into lucene-core sources >>>>> >> anyhow. :) >> >>>>> See you tomorrow guys and thanks a lot again! It's a pleasure. >>>>> >>>>> public int getHitsCount(String query, Filter filter) throws >>>>> LuceneServiceException { >>>>> log.debug("getHitsCount('{}, {}')", query, filter); >>>>> if (StringUtils.isBlank(query)) { >>>>> log.warn("getHitsCount: empty lucene query"); >>>>> return 0; >>>>> } >>>>> long startTimeMillis = System.currentTimeMillis(); >>>>> int count = 0; >>>>> >>>>> if (indexSearcher == null) { >>>>> return 0; >>>>> } >>>>> >>>>> BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT); >>>>> Query q = null; >>>>> try { >>>>> q = createQuery(query); >>>>> TopScoreDocCollector tsdc = >>>>> >> TopScoreDocCollector.create(1, true); >> >>>>> indexSearcher.search(q, filter, tsdc); >>>>> count = tsdc.getTotalHits(); >>>>> log.info("getHitsCount: count = {}",count); >>>>> } catch (ParseException ex) { >>>>> throw new LuceneServiceException("invalid lucene >>>>> > query:" > >> + query, ex); >> >>>>> } catch (IOException e) { >>>>> throw new LuceneServiceException(" indexSearcher >>>>> > could > >> be corrupted", e); >> >>>>> } finally { >>>>> long durationMillis = System.currentTimeMillis() - >>>>> >> startTimeMillis; >> >>>>> if (durationMillis > slowQueryLimit) { >>>>> log.warn("getHitsCount: Slow query: {} ms, >>>>> >> query={}", durationMillis, query); >> >>>>> } >>>>> log.debug("getHitsCount: query took {} ms", >>>>> >> durationMillis); >> >>>>> } >>>>> return count; >>>>> } >>>>> >>>>> public List<Document> search(String query, Filter filter, Sort sort, >>>>> >> int from, >> >>>>> int size) throws LuceneServiceException { >>>>> log.debug("{} search('{}', {}, {}, {}, {})", new Object[] { >>>>> >> indexAlias, query, >> >>>>> filter, sort, from, size }); >>>>> long startTimeMillis = System.currentTimeMillis(); >>>>> >>>>> List<Document> docs = new ArrayList<Document>(); >>>>> if (indexSearcher == null) { >>>>> return docs; >>>>> } >>>>> Query q = null; >>>>> try { >>>>> if (query == null) { >>>>> log.warn("search: lucene query is null..."); >>>>> return docs; >>>>> } >>>>> q = createQuery(query); >>>>> BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT); >>>>> if (size < 0 || size > maxNumHits) { >>>>> // set hard limit for numHits >>>>> size = maxNumHits; >>>>> if (log.isDebugEnabled()) >>>>> log.debug("search: Size set to >>>>> > hardlimit: {} > >> for query: {} with filter: >> >>>>> {}", new Object[] { size, query, filter }); >>>>> } >>>>> TopFieldCollector collector = >>>>> >> TopFieldCollector.create(sort, size + from, >> >>>>> true, false, false, true); >>>>> indexSearcher.search(q, filter, collector); >>>>> if(size > collector.getTotalHits()) >>>>> size = collector.getTotalHits(); >>>>> if (size > 100000) >>>>> log.info("search: size: {} bigger than >>>>> > 100.000 for > >> query: {} with filter: >> >>>>> {}", new Object[] { size, query, filter }); >>>>> TopDocs td = collector.topDocs(from, size); >>>>> ScoreDoc[] scoreDocs = td.scoreDocs; >>>>> for (ScoreDoc scoreDoc : scoreDocs) { >>>>> docs.add(indexSearcher.doc(scoreDoc.doc)); >>>>> } >>>>> } catch (ParseException e) { >>>>> log.warn("search: ParseException: {}", >>>>> > e.getMessage()); > >>>>> if (log.isDebugEnabled()) >>>>> log.warn("search: ParseException: ", e); >>>>> return Collections.emptyList(); >>>>> } catch (IOException e) { >>>>> log.warn("search: IOException: ", e); >>>>> return Collections.emptyList(); >>>>> } finally { >>>>> long durationMillis = System.currentTimeMillis() - >>>>> >> startTimeMillis; >> >>>>> if (durationMillis > slowQueryLimit) { >>>>> log.warn("search: Slow query: {} ms, >>>>> > query={}, > >> indexUsed={}", >> >>>>> new Object[] { >>>>> > durationMillis, query, > >>>>> indexSearcher.getIndexReader().directory() }); >>>>> } >>>>> log.debug("search: query took {} ms", >>>>> > durationMillis); > >>>>> } >>>>> return docs; >>>>> } >>>>> >>>>> >>>>> Uwe Schindler wrote: >>>>> >>>>> >>>>> >>>>> >>>>>>>> http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png >>>>>>>> >>>>>>>> Have to verify that the last one is not by accident more than one >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> request. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Will >>>>>>>> do the run again and then post the required info. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> The last figure shows, that IndexSearcher.searchWithFilter was >>>>>>> >> called >> >>>>>>> twice >>>>>>> in contrast to the first figure, where IndexSearcher.search was >>>>>>> >> called >> >>>>>>> only >>>>>>> once. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> I forgot, searchWithFilter it is called per segment in 2.9. If it was >>>>>> >> only >> >>>>>> one search, you must have two segments and therefore no optimized >>>>>> >> index for >> >>>>>> this to be correct? >>>>>> >>>>>> Uwe >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- - Mark http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org