[ https://issues.apache.org/jira/browse/SOLR-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232781#comment-14232781 ]
Per Steffensen commented on SOLR-6810: -------------------------------------- SOLR-6795, SOLR-6796, SOLR-6812 and SOLR-6813 was all just preparation for this patch. It makes {{dqa.forceSkipGetIds}} (old {{disturb.singlePass}}) work for all queries (ever fires during the test-suite). So we have a complete feature now (when those issues are corrected). > Faster searching limited but high rows across many shards all with many hits > ---------------------------------------------------------------------------- > > Key: SOLR-6810 > URL: https://issues.apache.org/jira/browse/SOLR-6810 > Project: Solr > Issue Type: Improvement > Components: search > Reporter: Per Steffensen > Labels: distributed_search, performance > Attachments: branch_5x_rev1642874.patch > > > Searching "limited but high rows across many shards all with many hits" is > slow > E.g. > * Query from outside client: q=something&rows=1000 > * Resulting in sub-requests to each shard something a-la this > ** 1) q=something&rows=1000&fl=id,score > ** 2) Request the full documents with ids in the global-top-1000 found among > the top-1000 from each shard > What does the subject mean > * "limited but high rows" means 1000 in the example above > * "many shards" means 200-1000 in our case > * "all with many hits" means that each of the shards have a significant > number of hits on the query > The problem grows on all three factors above > Doing such a query on our system takes between 5 min to 1 hour - depending on > a lot of things. It ought to be much faster, so lets make it. > Profiling show that the problem is that it takes lots of time to access the > store to get id’s for (up to) 1000 docs (value of rows parameter) per shard. > Having 1000 shards its up to 1 mio ids that has to be fetched. There is > really no good reason to ever read information from store for more than the > overall top-1000 documents, that has to be returned to the client. > For further detail see mail-thread "Slow searching limited but high rows > across many shards all with high hits" started 13/11-2014 on > dev@lucene.apache.org -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org