On Tue, 2014-11-18 at 11:26 +0100, Per Steffensen wrote:
> It is likely (since we are not routing on anything that has to do with
> the "content" text-field) that the overall-top-1000 i fairly evenly
> distributed among the 1000 shards

Streaming in Heliosearch might work out of the box:
http://heliosearch.org/streaming-aggregation-for-solrcloud/#CloudSolrStream
Caveat: I haven't used streaming, so I can't say for sure and don't know
how/if it handles early termination, which would be a prerequisite for
speedup in your setup.

[Detailed description of solution]

[SOLR-5798]

> Hope you get the idea, and why it makes us perform much much better?!

Yes, I got it. We discussed it a bit at the office and it seems like a
really fine idea, new to Solr. As Solr is often used for log processing
these days, the number of setups with many shards and non-trivial
request sizes is growing: Your solution would help others.

The obvious next step would be a JIRA. However, I know that you have had
very limited success there, even for simple patches. 

General JIRA-handling might be a relevant topic for another thread, but
I don't have the energy for that discussion right now.


Of course, the concrete speed-up factor is highly dependent on how long
it takes to resolve IDs. You state speeds of 10, 30, 60 minutes without
the patch and a factor 60 speedup. As I understand it, the real
difference is whether ~1000*#shards IDs are resolved or only 1000.
With 50 shards or 50.000 ID-lookups per machine, that puts your worst
case resolve-time at 50.000 IDs / (60 min * 60 s/min) ~= 13 IDs/s and
the best case (10 min total) at ~83 IDs/s per machine.

(guessing spinning drives here)

With a setup with faster ID-resolving, the benefits from your patch
might be too small for top-1000 to be really interesting as ID-resolving
would not take up as much of the overall processing time. But it would
make it possible to scaling that number up (top-10000 or above).

- Toke Eskildsen, State and University Library, Denmark



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to