Jake Mannix wrote: > Um, according to Mike's latest numbers, multiPQ is actually *faster* > at 1000 hits sometimes. In fact, all of the most recent tests have > shown no clear winner either way in terms of QPS. Sometimes (look at > Yonik's linux numbers), multiPQ is nearly across the board faster. The numbers are changing and changing. And don't look too appealing on Yonik's jdk 7 runs. But who knows. I'm not taking any of them as gospel yet.
> > Either way, I think it's been made abundantly clear that any > advantages singlePQ has in terms of QPS performance degrade to near > zero (if not reverse to be negative) as the number of hits increases > higher and higher. I could show you what it looks like on a 10 or 20 > million document index if you'd like to see that as well, but I think > that's pretty clear from the difference from 1M to 5M. > > -jake > > On Tue, Nov 3, 2009 at 12:56 PM, Mark Miller <markrmil...@gmail.com > <mailto:markrmil...@gmail.com>> wrote: > > The memory issue is just one example of something that's somewhat > worse - I don't see it as a deciding faster. If things were > clarified to be decidedly faster with multi queue, and not 50% > worse at 1000 hits, I'd be for the change, more memory or not. > > > - Mark > > http://www.lucidimagination.com (mobile) > > On Nov 3, 2009, at 12:42 PM, Jake Mannix <jake.man...@gmail.com > <mailto:jake.man...@gmail.com>> wrote: > >> Mark, I'm not stuck on single examples, I'm thinking about all of >> lucene land: what tiny fraction of people need the combined >> intersection of >> >> a) many many segments >> >> AND >> >> b) deep paging >> >> AND >> >> c) high QPS >> >> AND >> >> e) can't handle another 40MB of RAM usage. >> >> Only people in the intersection of all of those bitsets would >> possibly have a problem with the memory requirements of multiPQ. >> >> On Tue, Nov 3, 2009 at 12:32 PM, Mark Miller >> <markrmil...@gmail.com <mailto:markrmil...@gmail.com>> wrote: >> >> Your obviously too stuck on single examples. We have to >> consider everyone in lucene land. >> >> I'm against 2 Apis. A custom search is advanced - it's not >> worth the baggage of maintaining two APIs or be limited by >> the APIs and back compat when moving forward. >> >> If the advantage of the second API is just going to be it's >> simpler, I'm not for it currently. >> >> - Mark >> >> http://www.lucidimagination.com (mobile) >> >> On Nov 3, 2009, at 10:51 AM, "Jake Mannix (JIRA)" >> <j...@apache.org <mailto:j...@apache.org>> wrote: >> >> >> [ >> >> https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773114#action_12773114 >> >> <https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773114#action_12773114> >> ] >> >> Jake Mannix commented on LUCENE-1997: >> ------------------------------------- >> >> bq. Since each approach has distinct advantages, why not >> offer both ("simple" and "expert") comparator extensions >> APIs? >> >> +1 from me on this one, as long as the simpler one is >> around. I'll bet we'll find that we regret keeping the >> "expert" one by 3.2 or so though, but I'll take any >> compromise which gets the simpler API in there. >> >> bq. Don't forget that this is multiplied by however many >> queries are currently in flight. >> >> Sure, so if you're running with 100 queries per second on >> a single shard (pretty fast!), with 100 segments, and you >> want to do sorting by value on the top 1000 values (how >> far down the long tail of extreme cases are we at now? >> Do librarians hit their search servers with 100 QPS and >> have indices poorly built with hundreds of segments and >> can't take downtime to *ever* optimize?), we're now >> talking about 40MB. >> >> *Forty megabytes*. On a beefy machine which is supposed >> to be handling 100QPS across an index big enough to need >> 100 segments. How much heap would such a machine already >> be allocating? 4GB? 6? More? >> >> We're talking about less than 1% of the heap is being >> used by the multiPQ approach in comparison to singlePQ. >> >> Explore performance of multi-PQ vs single-PQ sorting API >> -------------------------------------------------------- >> >> Key: LUCENE-1997 >> URL: >> https://issues.apache.org/jira/browse/LUCENE-1997 >> Project: Lucene - Java >> Issue Type: Improvement >> Components: Search >> Affects Versions: 2.9 >> Reporter: Michael McCandless >> Assignee: Michael McCandless >> Attachments: LUCENE-1997.patch, >> LUCENE-1997.patch, LUCENE-1997.patch, >> LUCENE-1997.patch, LUCENE-1997.patch, >> LUCENE-1997.patch, LUCENE-1997.patch, >> LUCENE-1997.patch, LUCENE-1997.patch >> >> >> Spinoff from recent "lucene 2.9 sorting algorithm" >> thread on java-dev, >> where a simpler (non-segment-based) comparator API is >> proposed that >> gathers results into multiple PQs (one per segment) >> and then merges >> them in the end. >> I started from John's multi-PQ code and worked it into >> contrib/benchmark so that we could run perf tests. >> Then I generified >> the Python script I use for running search benchmarks (in >> contrib/benchmark/sortBench.py). >> The script first creates indexes with 1M docs (based on >> SortableSingleDocSource, and based on wikipedia, if >> available). Then >> it runs various combinations: >> * Index with 20 balanced segments vs index with the >> "normal" log >> segment size >> * Queries with different numbers of hits (only for >> wikipedia index) >> * Different top N >> * Different sorts (by title, for wikipedia, and by >> random string, >> random int, and country for the random index) >> For each test, 7 search rounds are run and the best >> QPS is kept. The >> script runs singlePQ then multiPQ, and records the >> resulting best QPS >> for each and produces table (in Jira format) as output. >> >> >> -- >> This message is automatically generated by JIRA. >> - >> You can reply to this email to add a comment to the issue >> online. >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: >> java-dev-unsubscr...@lucene.apache.org >> <mailto:java-dev-unsubscr...@lucene.apache.org> >> For additional commands, e-mail: >> java-dev-h...@lucene.apache.org >> <mailto:java-dev-h...@lucene.apache.org> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: >> java-dev-unsubscr...@lucene.apache.org >> <mailto:java-dev-unsubscr...@lucene.apache.org> >> For additional commands, e-mail: >> java-dev-h...@lucene.apache.org >> <mailto:java-dev-h...@lucene.apache.org> >> >> > -- - Mark http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org