Re: [jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

Jake Mannix Mon, 02 Nov 2009 15:13:41 -0800

Ok, and how many of those users are also running on indices with hundreds of
segments?


  -jake

On Mon, Nov 2, 2009 at 3:10 PM, Mark Miller <markrmil...@gmail.com> wrote:

> There are plenty of Lucene users that do go 1000 in. We've been calling it
> "deep paging" at LI. I like that name :)
>
> - Mark
>
> http://www.lucidimagination.com (mobile)
>
>
> On Nov 2, 2009, at 6:04 PM, "Jake Mannix (JIRA)" <j...@apache.org> wrote:
>
>
>>   [
>> https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772741#action_12772741
>>  ]
>>
>> Jake Mannix commented on LUCENE-1997:
>> -------------------------------------
>>
>> The current concern is to do with the memory?  I'm more concerned with the
>> weird "java ghosts" that are flying around, sometimes swaying results by
>> 20-40%...  the memory could only be an issue on a setup with hundreds of
>> segments and sorting the top 1000 values (do we really try to optimize for
>> this performance case?).  In the normal case (no more than tens of segments,
>> and the top 10 or 100 hits), we're talking about what, 100-1000 PQ entries?
>>
>>  Explore performance of multi-PQ vs single-PQ sorting API
>>> --------------------------------------------------------
>>>
>>>               Key: LUCENE-1997
>>>               URL: https://issues.apache.org/jira/browse/LUCENE-1997
>>>           Project: Lucene - Java
>>>        Issue Type: Improvement
>>>        Components: Search
>>>  Affects Versions: 2.9
>>>          Reporter: Michael McCandless
>>>          Assignee: Michael McCandless
>>>       Attachments: LUCENE-1997.patch, LUCENE-1997.patch,
>>> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch,
>>> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch
>>>
>>>
>>> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev,
>>> where a simpler (non-segment-based) comparator API is proposed that
>>> gathers results into multiple PQs (one per segment) and then merges
>>> them in the end.
>>> I started from John's multi-PQ code and worked it into
>>> contrib/benchmark so that we could run perf tests.  Then I generified
>>> the Python script I use for running search benchmarks (in
>>> contrib/benchmark/sortBench.py).
>>> The script first creates indexes with 1M docs (based on
>>> SortableSingleDocSource, and based on wikipedia, if available).  Then
>>> it runs various combinations:
>>>  * Index with 20 balanced segments vs index with the "normal" log
>>>   segment size
>>>  * Queries with different numbers of hits (only for wikipedia index)
>>>  * Different top N
>>>  * Different sorts (by title, for wikipedia, and by random string,
>>>   random int, and country for the random index)
>>> For each test, 7 search rounds are run and the best QPS is kept.  The
>>> script runs singlePQ then multiPQ, and records the resulting best QPS
>>> for each and produces table (in Jira format) as output.
>>>
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> You can reply to this email to add a comment to the issue online.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

Re: [jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

Reply via email to