Ok, and how many of those users are also running on indices with hundreds of segments?
-jake On Mon, Nov 2, 2009 at 3:10 PM, Mark Miller <markrmil...@gmail.com> wrote: > There are plenty of Lucene users that do go 1000 in. We've been calling it > "deep paging" at LI. I like that name :) > > - Mark > > http://www.lucidimagination.com (mobile) > > > On Nov 2, 2009, at 6:04 PM, "Jake Mannix (JIRA)" <j...@apache.org> wrote: > > >> [ >> https://issues.apache.org/jira/browse/LUCENE-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772741#action_12772741 >> ] >> >> Jake Mannix commented on LUCENE-1997: >> ------------------------------------- >> >> The current concern is to do with the memory? I'm more concerned with the >> weird "java ghosts" that are flying around, sometimes swaying results by >> 20-40%... the memory could only be an issue on a setup with hundreds of >> segments and sorting the top 1000 values (do we really try to optimize for >> this performance case?). In the normal case (no more than tens of segments, >> and the top 10 or 100 hits), we're talking about what, 100-1000 PQ entries? >> >> Explore performance of multi-PQ vs single-PQ sorting API >>> -------------------------------------------------------- >>> >>> Key: LUCENE-1997 >>> URL: https://issues.apache.org/jira/browse/LUCENE-1997 >>> Project: Lucene - Java >>> Issue Type: Improvement >>> Components: Search >>> Affects Versions: 2.9 >>> Reporter: Michael McCandless >>> Assignee: Michael McCandless >>> Attachments: LUCENE-1997.patch, LUCENE-1997.patch, >>> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch, >>> LUCENE-1997.patch, LUCENE-1997.patch, LUCENE-1997.patch >>> >>> >>> Spinoff from recent "lucene 2.9 sorting algorithm" thread on java-dev, >>> where a simpler (non-segment-based) comparator API is proposed that >>> gathers results into multiple PQs (one per segment) and then merges >>> them in the end. >>> I started from John's multi-PQ code and worked it into >>> contrib/benchmark so that we could run perf tests. Then I generified >>> the Python script I use for running search benchmarks (in >>> contrib/benchmark/sortBench.py). >>> The script first creates indexes with 1M docs (based on >>> SortableSingleDocSource, and based on wikipedia, if available). Then >>> it runs various combinations: >>> * Index with 20 balanced segments vs index with the "normal" log >>> segment size >>> * Queries with different numbers of hits (only for wikipedia index) >>> * Different top N >>> * Different sorts (by title, for wikipedia, and by random string, >>> random int, and country for the random index) >>> For each test, 7 search rounds are run and the best QPS is kept. The >>> script runs singlePQ then multiPQ, and records the resulting best QPS >>> for each and produces table (in Jira format) as output. >>> >> >> -- >> This message is automatically generated by JIRA. >> - >> You can reply to this email to add a comment to the issue online. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >