On Thu, 2014-11-13 at 11:58 +0100, Per Steffensen wrote:
> Searching "limited but high rows across many shards all with high hits" 
> is slow
> E.g.
> * Query from outside client: q=content:something&rows=1000
> * Resulting in sub-requests to each shard something a-la this
> ** 1) q=content&rows=1000&fl=id,score
> ** 2) Request the full documents with ids in the global-top-1000 found 
> among the top-1000 from each shard

What is the core problem? The two-phase request system
(https://issues.apache.org/jira/browse/SOLR-5768 seems to solve that)?
That the IDs in the second phase are sent to all shards? Something
third?

> Doing such a query on our system takes between 5 min to 1 hour - 
> depending on a lot of things. We have profiled and made our own PoC 
> solution that brings the response-time down to between 5 secs and 1 
> minute (about a factor 60 faster) - and not requiring nearly as many 
> resources from the system while performing the search.

Can you outline what you are doing?

Related to that, why are you running 50+ shards on each machine, when
you're doing search across all shards? Why not fewer shards/machine and
less distribution overhead?

- Toke Eskildsen, State and University Library, Denmark



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to