Re: lucene 2.9.0RC4 slower than 2.4.1?

Mark Miller Wed, 16 Sep 2009 10:34:06 -0700

Yeah, thanks Uwe - I had read it quickly, but was just rereading and it
was sinking in. I hadn't cross correlated the issues yet.


Makes perfect sense. Very nice catch.

>Maybe we need some change to CachingWrapperFilter that caches the DocIdSets
>as before, but optionally would wrap it into an OpenBitSet, if it is not an
>instance of OBS.

So - do we address this for 2.9. This is a nasty performance trap, is it
not?

We recommend that you use both of these classes together - if it can
work like this though ...

Seems like a bug to me.

- mark

Uwe Schindler wrote:
> See my mail about the CachingWrapperFilter and QueryWrapperFilter, I think
> it explains this behaviour (and Thomas ran some warming queries before).
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
>   
>> -----Original Message-----
>> From: Mark Miller [mailto:markrmil...@gmail.com]
>> Sent: Wednesday, September 16, 2009 7:23 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: lucene 2.9.0RC4 slower than 2.4.1?
>>
>> Nevermind. I see advance wasn't around in 2.4. This is part of the
>> DocIdSetIterator changes.
>>
>> Anyway - either these are just not comparable runs, or there is a major
>> bug (which seems unlikely).
>>
>> Just to keep pointing out the obvious:
>>
>> 2.4 calls doc 195,000 times
>> 2.9 calls docId 1.4 million times
>>
>> That just doesn't jive.
>>
>> Mark Miller wrote:
>>     
>>> Notice that while DisjunctionScorer.advance and
>>> DisjuntionScorer.advanceAfterCurrent appear to be called
>>> in 2.9, in 2.4, I am only seeing DisjuntionScorer.advanceAfterCurrent
>>> called.
>>>
>>> Can someone explain that?
>>>
>>> Mark Miller wrote:
>>>
>>>       
>>>> Something is very odd about this if they both cover the same search and
>>>> the environ for both is identical. Even if one search was done twice,
>>>> and we divide the numbers for the new api by 2 - its still *very* odd.
>>>>
>>>> With 2.4, ScorerDocQueue.topDoc is called half a million times.
>>>> With 2.9, its called over 4 million times.
>>>>
>>>> Huh?
>>>>
>>>> Thomas Becker wrote:
>>>>
>>>>
>>>>         
>>>>> No it's only a single segment. But two calls. One doing a getHitsCount
>>>>>           
>> first and
>>     
>>>>> the other doing the actual search. I'll paste both methods below if
>>>>>           
>> someone's
>>     
>>>>> interested.
>>>>>
>>>>> Will dig into lucene's sources and compare 2.4 search behaviour for my
>>>>>           
>> case with
>>     
>>>>> 2.9 tomorrow. It was about time to get more into lucene-core sources
>>>>>           
>> anyhow. :)
>>     
>>>>> See you tomorrow guys and thanks a lot again! It's a pleasure.
>>>>>
>>>>>   public int getHitsCount(String query, Filter filter) throws
>>>>> LuceneServiceException {
>>>>>           log.debug("getHitsCount('{}, {}')", query, filter);
>>>>>           if (StringUtils.isBlank(query)) {
>>>>>                   log.warn("getHitsCount: empty lucene query");
>>>>>                   return 0;
>>>>>           }
>>>>>           long startTimeMillis = System.currentTimeMillis();
>>>>>           int count = 0;
>>>>>
>>>>>           if (indexSearcher == null) {
>>>>>                   return 0;
>>>>>           }
>>>>>
>>>>>           BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
>>>>>           Query q = null;
>>>>>           try {
>>>>>                   q = createQuery(query);
>>>>>                   TopScoreDocCollector tsdc =
>>>>>           
>> TopScoreDocCollector.create(1, true);
>>     
>>>>>                   indexSearcher.search(q, filter, tsdc);
>>>>>                   count = tsdc.getTotalHits();
>>>>>                   log.info("getHitsCount: count = {}",count);
>>>>>           } catch (ParseException ex) {
>>>>>                   throw new LuceneServiceException("invalid lucene
>>>>>           
> query:"
>   
>> + query, ex);
>>     
>>>>>           } catch (IOException e) {
>>>>>                   throw new LuceneServiceException(" indexSearcher
>>>>>           
> could
>   
>> be corrupted", e);
>>     
>>>>>           } finally {
>>>>>                   long durationMillis = System.currentTimeMillis() -
>>>>>           
>> startTimeMillis;
>>     
>>>>>                   if (durationMillis > slowQueryLimit) {
>>>>>                           log.warn("getHitsCount: Slow query: {} ms,
>>>>>           
>> query={}", durationMillis, query);
>>     
>>>>>                   }
>>>>>                   log.debug("getHitsCount: query took {} ms",
>>>>>           
>> durationMillis);
>>     
>>>>>           }
>>>>>           return count;
>>>>>   }
>>>>>
>>>>>   public List<Document> search(String query, Filter filter, Sort sort,
>>>>>           
>> int from,
>>     
>>>>> int size) throws LuceneServiceException {
>>>>>           log.debug("{} search('{}', {}, {}, {}, {})", new Object[] {
>>>>>           
>> indexAlias, query,
>>     
>>>>> filter, sort, from, size });
>>>>>           long startTimeMillis = System.currentTimeMillis();
>>>>>
>>>>>           List<Document> docs = new ArrayList<Document>();
>>>>>           if (indexSearcher == null) {
>>>>>                   return docs;
>>>>>           }
>>>>>           Query q = null;
>>>>>           try {
>>>>>                   if (query == null) {
>>>>>                           log.warn("search: lucene query is null...");
>>>>>                           return docs;
>>>>>                   }
>>>>>                   q = createQuery(query);
>>>>>                   BooleanQuery.setMaxClauseCount(MAXCLAUSECOUNT);
>>>>>                   if (size < 0 || size > maxNumHits) {
>>>>>                           // set hard limit for numHits
>>>>>                           size = maxNumHits;
>>>>>                           if (log.isDebugEnabled())
>>>>>                                   log.debug("search: Size set to
>>>>>           
> hardlimit: {}
>   
>> for query: {} with filter:
>>     
>>>>> {}", new Object[] { size, query, filter });
>>>>>                   }
>>>>>                   TopFieldCollector collector =
>>>>>           
>> TopFieldCollector.create(sort, size + from,
>>     
>>>>> true, false, false, true);
>>>>>                   indexSearcher.search(q, filter, collector);
>>>>>                   if(size > collector.getTotalHits())
>>>>>                           size = collector.getTotalHits();
>>>>>                   if (size > 100000)
>>>>>                           log.info("search: size: {} bigger than
>>>>>           
> 100.000 for
>   
>> query: {} with filter:
>>     
>>>>> {}", new Object[] { size, query, filter });
>>>>>                   TopDocs td = collector.topDocs(from, size);
>>>>>                   ScoreDoc[] scoreDocs = td.scoreDocs;
>>>>>                   for (ScoreDoc scoreDoc : scoreDocs) {
>>>>>                           docs.add(indexSearcher.doc(scoreDoc.doc));
>>>>>                   }
>>>>>           } catch (ParseException e) {
>>>>>                   log.warn("search: ParseException: {}",
>>>>>           
> e.getMessage());
>   
>>>>>                   if (log.isDebugEnabled())
>>>>>                           log.warn("search: ParseException: ", e);
>>>>>                   return Collections.emptyList();
>>>>>           } catch (IOException e) {
>>>>>                   log.warn("search: IOException: ", e);
>>>>>                   return Collections.emptyList();
>>>>>           } finally {
>>>>>                   long durationMillis = System.currentTimeMillis() -
>>>>>           
>> startTimeMillis;
>>     
>>>>>                   if (durationMillis > slowQueryLimit) {
>>>>>                           log.warn("search: Slow query: {} ms,
>>>>>           
> query={},
>   
>> indexUsed={}",
>>     
>>>>>                                           new Object[] {
>>>>>           
> durationMillis, query,
>   
>>>>> indexSearcher.getIndexReader().directory() });
>>>>>                   }
>>>>>                   log.debug("search: query took {} ms",
>>>>>           
> durationMillis);
>   
>>>>>           }
>>>>>           return docs;
>>>>>   }
>>>>>
>>>>>
>>>>> Uwe Schindler wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>>>> http://ankeschwarzer.de/tmp/lucene_29_newapi_mmap_singlereq.png
>>>>>>>>
>>>>>>>> Have to verify that the last one is not by accident more than one
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> request.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> Will
>>>>>>>> do the run again and then post the required info.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> The last figure shows, that IndexSearcher.searchWithFilter was
>>>>>>>               
>> called
>>     
>>>>>>> twice
>>>>>>> in contrast to the first figure, where IndexSearcher.search was
>>>>>>>               
>> called
>>     
>>>>>>> only
>>>>>>> once.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> I forgot, searchWithFilter it is called per segment in 2.9. If it was
>>>>>>             
>> only
>>     
>>>>>> one search, you must have two segments and therefore no optimized
>>>>>>             
>> index for
>>     
>>>>>> this to be correct?
>>>>>>
>>>>>> Uwe
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>
>>>>>           
>>>>         
>>>
>>>       
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>     
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>   


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: lucene 2.9.0RC4 slower than 2.4.1?

Reply via email to