RE: lucene 2.9.0RC4 slower than 2.4.1?

Uwe Schindler Wed, 16 Sep 2009 10:38:35 -0700

I am currently preparing a patch for CachingWrapperFilter that has a boolean
ctor parameter (useOpenBitSetCache) and the method getDocIdSet will then do
what QueryWrapperFilter did before LUCENE-1427.


I would not do this in QueryWrapperFilter like before, because it would
slowdown MultiTermQuery if no caching is used and the constantScore/Boolean
rewrite mode is active.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: Mark Miller [mailto:markrmil...@gmail.com]
> Sent: Wednesday, September 16, 2009 7:33 PM
> To: java-user@lucene.apache.org
> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
> 
> Yeah, thanks Uwe - I had read it quickly, but was just rereading and it
> was sinking in. I hadn't cross correlated the issues yet.
> 
> Makes perfect sense. Very nice catch.
> 
> >Maybe we need some change to CachingWrapperFilter that caches the
> DocIdSets
> >as before, but optionally would wrap it into an OpenBitSet, if it is not
> an
> >instance of OBS.
> 
> So - do we address this for 2.9. This is a nasty performance trap, is it
> not?
> 
> We recommend that you use both of these classes together - if it can
> work like this though ...
> 
> Seems like a bug to me.
> 
> - mark
> 
> Uwe Schindler wrote:
> > See my mail about the CachingWrapperFilter and QueryWrapperFilter, I
> think
> > it explains this behaviour (and Thomas ran some warming queries before).
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> >
> >> -----Original Message-----
> >> From: Mark Miller [mailto:markrmil...@gmail.com]
> >> Sent: Wednesday, September 16, 2009 7:23 PM
> >> To: java-user@lucene.apache.org
> >> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
> >>
> >> Nevermind. I see advance wasn't around in 2.4. This is part of the
> >> DocIdSetIterator changes.
> >>
> >> Anyway - either these are just not comparable runs, or there is a major
> >> bug (which seems unlikely).
> >>
> >> Just to keep pointing out the obvious:
> >>
> >> 2.4 calls doc 195,000 times
> >> 2.9 calls docId 1.4 million times
> >>
> >> That just doesn't jive.
> >>
> >> Mark Miller wrote:
> >>
> >>> Notice that while DisjunctionScorer.advance and
> >>> DisjuntionScorer.advanceAfterCurrent appear to be called
> >>> in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent
> >>> called.
> >>>
> >>> Can someone explain that?
> >>>
> >>> Mark Miller wrote:
> >>>
> >>>
> >>>> Something is very odd about this if they both cover the same search
> and
> >>>> the environ for both is identical. Even if one search was done twice,
> >>>> and we divide the numbers for the new api by 2 - its still *very*
> odd.
> >>>>
> >>>> With 2.4, ScorerDocQueue.topDoc is called half a million times.
> >>>> With 2.9, its called over 4 million times.
> >>>>
> >>>> Huh?
> >>>>
> >>>> Thomas Becker wrote:
> >>>>
> >>>>
> >>>>
> >>>>> No it's only a single segment. But two calls. One doing a
> getHitsCount
> >>>>>
> >> first and
> >>
> >>>>> the other doing the actual search. I'll paste both methods below if
> >>>>>
> >> someone's
> >>
> >>>>> interested.
> >>>>>
> >>>>> Will dig into lucene's sources and compare 2.4 search behaviour for
> my
> >>>>>
> >> case with
> >>
> >>>>> 2.9 tomorrow. It was about time to get more into lucene-core sources
> >>>>>
> >> anyhow. :)
> >>
> >>>>> See you tomorrow guys and thanks a lot again! It's a pleasure.
> >>>>>
> >>>>>         public int getHitsCount(String query, Filter filter) throws
> >>>>> LuceneServiceException {
> >>>>>                 log.debug("getHitsCount('{}, {}')", query, filter);
> >>>>>                 if (StringUtils.isBlank(query)) {
> >>>>>                         log.warn("getHitsCount: empty lucene
query");
> >>>>>                         return 0;
> >>>>>                 }
> >>>>>                 long startTimeMillis = System.currentTimeMillis();
> >>>>>                 int count = 0;
> >>>>>
> >>>>>                 if (indexSearcher == null) {
> >>>>>                         return 0;
> >>>>>                 }
> >>>>>
> >>>>>                 BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
> >>>>>                 Query q = null;
> >>>>>                 try {
> >>>>>                         q = createQuery(query);
> >>>>>                         TopScoreDocCollector tsdc =
> >>>>>
> >> TopScoreDocCollector.create(1, true);
> >>
> >>>>>                         indexSearcher.search(q, filter, tsdc);
> >>>>>                         count = tsdc.getTotalHits();
> >>>>>                         log.info("getHitsCount: count = {}",count);
> >>>>>                 } catch (ParseException ex) {
> >>>>>                         throw new LuceneServiceException("invalid
lucene
> >>>>>
> > query:"
> >
> >> + query, ex);
> >>
> >>>>>                 } catch (IOException e) {
> >>>>>                         throw new LuceneServiceException("
indexSearcher
> >>>>>
> > could
> >
> >> be corrupted", e);
> >>
> >>>>>                 } finally {
> >>>>>                         long durationMillis =
System.currentTimeMillis() -
> >>>>>
> >> startTimeMillis;
> >>
> >>>>>                         if (durationMillis > slowQueryLimit) {
> >>>>>                                 log.warn("getHitsCount: Slow query:
{} ms,
> >>>>>
> >> query={}", durationMillis, query);
> >>
> >>>>>                         }
> >>>>>                         log.debug("getHitsCount: query took {} ms",
> >>>>>
> >> durationMillis);
> >>
> >>>>>                 }
> >>>>>                 return count;
> >>>>>         }
> >>>>>
> >>>>>         public List<Document> search(String query, Filter filter,
Sort
> sort,
> >>>>>
> >> int from,
> >>
> >>>>> int size) throws LuceneServiceException {
> >>>>>                 log.debug("{} search('{}', {}, {}, {}, {})", new
> Object[] {
> >>>>>
> >> indexAlias, query,
> >>
> >>>>> filter, sort, from, size });
> >>>>>                 long startTimeMillis = System.currentTimeMillis();
> >>>>>
> >>>>>                 List<Document> docs = new ArrayList<Document>();
> >>>>>                 if (indexSearcher == null) {
> >>>>>                         return docs;
> >>>>>                 }
> >>>>>                 Query q = null;
> >>>>>                 try {
> >>>>>                         if (query == null) {
> >>>>>                                 log.warn("search: lucene query is
null...");
> >>>>>                                 return docs;
> >>>>>                         }
> >>>>>                         q = createQuery(query);
> >>>>>
BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
> >>>>>                         if (size < 0 || size > maxNumHits) {
> >>>>>                                 // set hard limit for numHits
> >>>>>                                 size = maxNumHits;
> >>>>>                                 if (log.isDebugEnabled())
> >>>>>                                         log.debug("search: Size set
to
> >>>>>
> > hardlimit: {}
> >
> >> for query: {} with filter:
> >>
> >>>>> {}", new Object[] { size, query, filter });
> >>>>>                         }
> >>>>>                         TopFieldCollector collector =
> >>>>>
> >> TopFieldCollector.create(sort, size + from,
> >>
> >>>>> true, false, false, true);
> >>>>>                         indexSearcher.search(q, filter, collector);
> >>>>>                         if(size > collector.getTotalHits())
> >>>>>                                 size = collector.getTotalHits();
> >>>>>                         if (size > 100000)
> >>>>>                                 log.info("search: size: {} bigger
than
> >>>>>
> > 100.000 for
> >
> >> query: {} with filter:
> >>
> >>>>> {}", new Object[] { size, query, filter });
> >>>>>                         TopDocs td = collector.topDocs(from, size);
> >>>>>                         ScoreDoc[] scoreDocs = td.scoreDocs;
> >>>>>                         for (ScoreDoc scoreDoc : scoreDocs) {
> >>>>>
docs.add(indexSearcher.doc(scoreDoc.doc));
> >>>>>                         }
> >>>>>                 } catch (ParseException e) {
> >>>>>                         log.warn("search: ParseException: {}",
> >>>>>
> > e.getMessage());
> >
> >>>>>                         if (log.isDebugEnabled())
> >>>>>                                 log.warn("search: ParseException: ",
e);
> >>>>>                         return Collections.emptyList();
> >>>>>                 } catch (IOException e) {
> >>>>>                         log.warn("search: IOException: ", e);
> >>>>>                         return Collections.emptyList();
> >>>>>                 } finally {
> >>>>>                         long durationMillis =
System.currentTimeMillis() -
> >>>>>
> >> startTimeMillis;
> >>
> >>>>>                         if (durationMillis > slowQueryLimit) {
> >>>>>                                 log.warn("search: Slow query: {} ms,
> >>>>>
> > query={},
> >
> >> indexUsed={}",
> >>
> >>>>>                                                 new Object[] {
> >>>>>
> > durationMillis, query,
> >
> >>>>> indexSearcher.getIndexReader().directory() });
> >>>>>                         }
> >>>>>                         log.debug("search: query took {} ms",
> >>>>>
> > durationMillis);
> >
> >>>>>                 }
> >>>>>                 return docs;
> >>>>>         }
> >>>>>
> >>>>>
> >>>>> Uwe Schindler wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>> http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png
> >>>>>>>>
> >>>>>>>> Have to verify that the last one is not by accident more than one
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>> request.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Will
> >>>>>>>> do the run again and then post the required info.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>> The last figure shows, that IndexSearcher.searchWithFilter was
> >>>>>>>
> >> called
> >>
> >>>>>>> twice
> >>>>>>> in contrast to the first figure, where IndexSearcher.search was
> >>>>>>>
> >> called
> >>
> >>>>>>> only
> >>>>>>> once.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>> I forgot, searchWithFilter it is called per segment in 2.9. If it
> was
> >>>>>>
> >> only
> >>
> >>>>>> one search, you must have two segments and therefore no optimized
> >>>>>>
> >> index for
> >>
> >>>>>> this to be correct?
> >>>>>>
> >>>>>> Uwe
> >>>>>>
> >>>>>>
> >>>>>> -------------------------------------------------------------------
> --
> >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >> --
> >> - Mark
> >>
> >> http://www.lucidimagination.com
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
> 
> 
> --
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: lucene 2.9.0RC4 slower than 2.4.1?

Reply via email to