On Thu, 2014-11-13 at 11:58 +0100, Per Steffensen wrote: > Searching "limited but high rows across many shards all with high hits" > is slow > E.g. > * Query from outside client: q=content:something&rows=1000 > * Resulting in sub-requests to each shard something a-la this > ** 1) q=content&rows=1000&fl=id,score > ** 2) Request the full documents with ids in the global-top-1000 found > among the top-1000 from each shard
What is the core problem? The two-phase request system (https://issues.apache.org/jira/browse/SOLR-5768 seems to solve that)? That the IDs in the second phase are sent to all shards? Something third? > Doing such a query on our system takes between 5 min to 1 hour - > depending on a lot of things. We have profiled and made our own PoC > solution that brings the response-time down to between 5 secs and 1 > minute (about a factor 60 faster) - and not requiring nearly as many > resources from the system while performing the search. Can you outline what you are doing? Related to that, why are you running 50+ shards on each machine, when you're doing search across all shards? Why not fewer shards/machine and less distribution overhead? - Toke Eskildsen, State and University Library, Denmark --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
