Re: [jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

Mark Miller Tue, 03 Nov 2009 18:17:42 -0800

Jake Mannix wrote:
> Um, according to Mike's latest numbers, multiPQ is actually *faster*
> at 1000 hits sometimes.  In fact, all of the most recent tests have
> shown no clear winner either way in terms of QPS.  Sometimes (look at
> Yonik's linux numbers), multiPQ is nearly across the board faster. 
The numbers are changing and changing. And don't look too appealing on
Yonik's jdk 7 runs. But who knows. I'm not taking any of them as gospel yet.


>
> Either way, I think it's been made abundantly clear that any
> advantages singlePQ has in terms of QPS performance degrade to near
> zero (if not reverse to be negative) as the number of hits increases
> higher and higher.  I could show you what it looks like on a 10 or 20
> million document index if you'd like to see that as well, but I think
> that's pretty clear from the difference from 1M to 5M.
>
>   -jake
>
> On Tue, Nov 3, 2009 at 12:56 PM, Mark Miller <markrmil...@gmail.com
> <mailto:markrmil...@gmail.com>> wrote:
>
>     The memory issue is just one example of something that's somewhat
>     worse - I don't see it as a deciding faster. If things were
>     clarified to be decidedly faster with multi queue, and not 50%
>     worse at 1000 hits, I'd be for the change, more memory or not. 
>
>
>     - Mark
>
>     http://www.lucidimagination.com (mobile)
>
>     On Nov 3, 2009, at 12:42 PM, Jake Mannix <jake.man...@gmail.com
>     <mailto:jake.man...@gmail.com>> wrote:
>
>>     Mark, I'm not stuck on single examples, I'm thinking about all of
>>     lucene land: what tiny fraction of people need the combined
>>     intersection of
>>
>>       a) many many segments
>>
>>     AND
>>
>>       b) deep paging
>>
>>     AND
>>
>>       c) high QPS
>>
>>     AND
>>
>>       e) can't handle another 40MB of RAM usage.
>>
>>     Only people in the intersection of all of those bitsets would
>>     possibly have a problem with the memory requirements of multiPQ. 
>>
>>     On Tue, Nov 3, 2009 at 12:32 PM, Mark Miller
>>     <markrmil...@gmail.com <mailto:markrmil...@gmail.com>> wrote:
>>
>>         Your obviously too stuck on single examples. We have to
>>         consider everyone in lucene land.
>>
>>         I'm against 2 Apis. A custom search is advanced - it's not
>>         worth the baggage of maintaining two APIs or be limited by
>>         the APIs and back compat when moving forward.
>>
>>         If the advantage of the second API is just going to be it's
>>         simpler, I'm not for it currently.
>>
>>         - Mark
>>
>>         http://www.lucidimagination.com (mobile)
>>
>>         On Nov 3, 2009, at 10:51 AM, "Jake Mannix (JIRA)"
>>         <j...@apache.org <mailto:j...@apache.org>> wrote:
>>
>>
>>               [
>>             
>> https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773114#action_12773114
>>             
>> <https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773114#action_12773114>
>>  ]
>>
>>             Jake Mannix commented on LUCENE-1997:
>>             -------------------------------------
>>
>>             bq. Since each approach has distinct advantages, why not
>>             offer both ("simple" and "expert") comparator extensions
>>             APIs?
>>
>>             +1 from me on this one, as long as the simpler one is
>>             around.  I'll bet we'll find that we regret keeping the
>>             "expert" one by 3.2 or so though, but I'll take any
>>             compromise which gets the simpler API in there.
>>
>>             bq. Don't forget that this is multiplied by however many
>>             queries are currently in flight.
>>
>>             Sure, so if you're running with 100 queries per second on
>>             a single shard (pretty fast!), with 100 segments, and you
>>             want to do sorting by value on the top 1000 values (how
>>             far down the long tail of extreme cases are we at now?
>>              Do librarians hit their search servers with 100 QPS and
>>             have indices poorly built with hundreds of segments and
>>             can't take downtime to *ever* optimize?), we're now
>>             talking about 40MB.
>>
>>             *Forty megabytes*.  On a beefy machine which is supposed
>>             to be handling 100QPS across an index big enough to need
>>             100 segments.  How much heap would such a machine already
>>             be allocating?  4GB?  6?  More?
>>
>>             We're talking about less than 1% of the heap is being
>>             used by the multiPQ approach in comparison to singlePQ.
>>
>>                 Explore performance of multi-PQ vs single-PQ sorting API
>>                 --------------------------------------------------------
>>
>>                               Key: LUCENE-1997
>>                               URL:
>>                 https://issues.apache.org/jira/browse/LUCENE-1997
>>                           Project: Lucene - Java
>>                        Issue Type: Improvement
>>                        Components: Search
>>                  Affects Versions: 2.9
>>                          Reporter: Michael McCandless
>>                          Assignee: Michael McCandless
>>                       Attachments: LUCENE-1997.patch,
>>                 LUCENE-1997.patch, LUCENE-1997.patch,
>>                 LUCENE-1997.patch, LUCENE-1997.patch,
>>                 LUCENE-1997.patch, LUCENE-1997.patch,
>>                 LUCENE-1997.patch, LUCENE-1997.patch
>>
>>
>>                 Spinoff from recent "lucene 2.9 sorting algorithm"
>>                 thread on java-dev,
>>                 where a simpler (non-segment-based) comparator API is
>>                 proposed that
>>                 gathers results into multiple PQs (one per segment)
>>                 and then merges
>>                 them in the end.
>>                 I started from John's multi-PQ code and worked it into
>>                 contrib/benchmark so that we could run perf tests.
>>                  Then I generified
>>                 the Python script I use for running search benchmarks (in
>>                 contrib/benchmark/sortBench.py).
>>                 The script first creates indexes with 1M docs (based on
>>                 SortableSingleDocSource, and based on wikipedia, if
>>                 available).  Then
>>                 it runs various combinations:
>>                  * Index with 20 balanced segments vs index with the
>>                 "normal" log
>>                   segment size
>>                  * Queries with different numbers of hits (only for
>>                 wikipedia index)
>>                  * Different top N
>>                  * Different sorts (by title, for wikipedia, and by
>>                 random string,
>>                   random int, and country for the random index)
>>                 For each test, 7 search rounds are run and the best
>>                 QPS is kept.  The
>>                 script runs singlePQ then multiPQ, and records the
>>                 resulting best QPS
>>                 for each and produces table (in Jira format) as output.
>>
>>
>>             -- 
>>             This message is automatically generated by JIRA.
>>             -
>>             You can reply to this email to add a comment to the issue
>>             online.
>>
>>
>>             
>> ---------------------------------------------------------------------
>>             To unsubscribe, e-mail:
>>             java-dev-unsubscr...@lucene.apache.org
>>             <mailto:java-dev-unsubscr...@lucene.apache.org>
>>             For additional commands, e-mail:
>>             java-dev-h...@lucene.apache.org
>>             <mailto:java-dev-h...@lucene.apache.org>
>>
>>
>>         ---------------------------------------------------------------------
>>         To unsubscribe, e-mail:
>>         java-dev-unsubscr...@lucene.apache.org
>>         <mailto:java-dev-unsubscr...@lucene.apache.org>
>>         For additional commands, e-mail:
>>         java-dev-h...@lucene.apache.org
>>         <mailto:java-dev-h...@lucene.apache.org>
>>
>>
>


-- 
- Mark

http://www.lucidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: [jira] Commented: (LUCENE-1997) Explore performance of multi-PQ vs single-PQ sorting API

Reply via email to