Per Steffensen [st...@liace.dk] wrote: > Yes, something third. The problem is that the "main query" > (STAGE_EXECUTE_QUERY) asks each shard for id and score > of the top 1000 documents. When you have 1000 shards that is > asking for a total of 1 mio id/score's across 20 machines > - 50.000 for each machine. The problem is that fetching 50.000 > id's from store is slow.
I understand that the request is for rows * #shards IDs+score in total, but if you have presented your alternative, I have failed to see that. Your third factoid: A high number of hits/shard, suggests that there is a possibility of all the final top-1000 hits to originate from a single shard. Toke: Why 50 shards/machine? Per: 1 collection/month, with duplicates I was about to suggest collapsing to 2 or 3 months/shard, but that would be ruining a logistically nice setup. I understand why you want the many shards. > So we have 24 collection, with 2 shards on each of 20+ Solr-servers > = 960+ shards total. Carrying somewhere between 100 and 1000 > billion documents total. 5-50 billion records/server? That seems very high, but after hearing about many different Solr setups at Lucene/Solr Revolution, I try to adopt a "sounds insane, but it's probably correct"-mindset. Anyway, setup accepted, problem acknowledged, your possibly re-usable solution not understood. - Toke Eskildsen --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org