On Tue, 2014-11-18 at 11:26 +0100, Per Steffensen wrote: > It is likely (since we are not routing on anything that has to do with > the "content" text-field) that the overall-top-1000 i fairly evenly > distributed among the 1000 shards
Streaming in Heliosearch might work out of the box: http://heliosearch.org/streaming-aggregation-for-solrcloud/#CloudSolrStream Caveat: I haven't used streaming, so I can't say for sure and don't know how/if it handles early termination, which would be a prerequisite for speedup in your setup. [Detailed description of solution] [SOLR-5798] > Hope you get the idea, and why it makes us perform much much better?! Yes, I got it. We discussed it a bit at the office and it seems like a really fine idea, new to Solr. As Solr is often used for log processing these days, the number of setups with many shards and non-trivial request sizes is growing: Your solution would help others. The obvious next step would be a JIRA. However, I know that you have had very limited success there, even for simple patches. General JIRA-handling might be a relevant topic for another thread, but I don't have the energy for that discussion right now. Of course, the concrete speed-up factor is highly dependent on how long it takes to resolve IDs. You state speeds of 10, 30, 60 minutes without the patch and a factor 60 speedup. As I understand it, the real difference is whether ~1000*#shards IDs are resolved or only 1000. With 50 shards or 50.000 ID-lookups per machine, that puts your worst case resolve-time at 50.000 IDs / (60 min * 60 s/min) ~= 13 IDs/s and the best case (10 min total) at ~83 IDs/s per machine. (guessing spinning drives here) With a setup with faster ID-resolving, the benefits from your patch might be too small for top-1000 to be really interesting as ID-resolving would not take up as much of the overall processing time. But it would make it possible to scaling that number up (top-10000 or above). - Toke Eskildsen, State and University Library, Denmark --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org