Re: Performance Improvement for Search using PriorityQueue

Michael McCandless Mon, 10 Dec 2007 03:17:48 -0800

Shai Erera wrote:

No - I didn't try to populate an index with real data and run realqueries
(what is "real" after all?). I know from my experience of indexes with
several millions of documents where there are queries with severalhundredthousands results (one query even hit 2.5 M documents). This istypical insearch: users type on average 2.3 terms in a query. The chancesyou'd hit aquery with huge result set are not that small in such cases (I'mnot sayingthis is the most common case though, I agree that most of thesearches don't
process that many documents).

Agreed: many queries do hit a great many results. But I agree withPaul:

it's not clear how this "typically" translates into how many ScoreDocs
get created?

However, this change will improve performance from the algorithmpoint of
view - you allocate as many as numRequestedHits+1 no matter how many
documents your query processes.


It's definitely a good step forward: not creating extra garbage in hot
spots is worthwhile, so I think we should make this change.  Still I'm
wondering how much this helps in practice.

I think benchmarking on "real" use cases (vs synthetic tests) is
worthwhile: it keeps you focused on what really counts, in the end.

In this particular case there are at least 2 things it could show us:

  * How many ScoreDocs really get created, or, what %tg of hits
    actually result in an insertion into the PQ?

  * How much is this savings as a %tg of the overall time spent
    searching?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Performance Improvement for Search using PriorityQueue

Reply via email to