Hi
* Using Solr 4.4.0
* Up to 1000 shards total - spread across about 20-40 Solr-servers on
20-40 machines
Searching "limited but high rows across many shards all with high hits"
is slow
E.g.
* Query from outside client: q=content:something&rows=1000
* Resulting in sub-requests to each shard something a-la this
** 1) q=content&rows=1000&fl=id,score
** 2) Request the full documents with ids in the global-top-1000 found
among the top-1000 from each shard
Interpretation
* limited but high rows are means 1000 in the example above
* many shards means 200-1000 in our case
* all with high hits, means that each of the shards have a significant
number of hits on the query (q-param)
Doing such a query on our system takes between 5 min to 1 hour -
depending on a lot of things. We have profiled and made our own PoC
solution that brings the response-time down to between 5 secs and 1
minute (about a factor 60 faster) - and not requiring nearly as many
resources from the system while performing the search. Of course we want
to have a solution going into production. We have to either mature out
PoC solution and use that, or adopt an existing solution from the newest
Solr release.
Do any of you guys know if there are a solution to this "problem" in the
newest Solr release?
Regards, Per Steffensen
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]