Per Steffensen [st...@liace.dk] wrote:
> Yes, something third. The problem is that the "main query"
> (STAGE_EXECUTE_QUERY) asks each shard for id and score
> of the top 1000 documents. When you have 1000 shards that is
> asking for a total of 1 mio id/score's across 20 machines 
> - 50.000 for each machine. The problem is that fetching 50.000
> id's from store is slow.

I understand that the request is for rows * #shards IDs+score in total, but if 
you have presented your alternative, I have failed to see that. Your third 
factoid: A high number of hits/shard, suggests that there is a possibility of 
all the final top-1000 hits to originate from a single shard.

Toke: Why 50 shards/machine?
Per: 1 collection/month, with duplicates

I was about to suggest collapsing to 2 or 3 months/shard, but that would be 
ruining a logistically nice setup. I understand why you want the many shards.

> So we have 24 collection, with 2 shards on each of 20+ Solr-servers
> = 960+ shards total. Carrying somewhere between 100 and 1000
> billion documents total.

5-50 billion records/server? That seems very high, but after hearing about many 
different Solr setups at Lucene/Solr Revolution, I try to adopt a "sounds 
insane, but it's probably correct"-mindset.


Anyway, setup accepted, problem acknowledged, your possibly re-usable solution 
not understood.

- Toke Eskildsen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to