We're about to stand up an elasticsearch cluster and we're facing the task 
of determining the correct number of shards to allocate for our single 
index.  We have 30 servers (16cpus, 48gb memory), each of which hosting one 
node.  The concern is only with query performance.  Indexing performance is 
not important.

In our research on elasticsearch, it seems that one of the biggest for 
query performance is being able to fit the entire index in memory.   
However, we've also seen where having less shards / more replicas will also 
increase query performance.

I see two paths, both assuming a single shard per node:

OPTION 1:  allocate 30 primary shards -  the shards of the index will be 
small enough to fit in OS cache on each node, but every query will have to 
hit 30 shards.  ( there's no great way to utilize "routing" here.   Assume 
each query will hit all shards.)

   OR

OPTION 2:  allocate 10 primary shards, with 2 replicas -  the shards of the 
index will NOT be small enough to fit in OS cache on each node, but a query 
will only have to hit 10 of the shards, distributed among the 30 nodes.


So, generally speaking, which is more important to query performance?  
Being able to fit the entire index in OS cache?   Or less shards / more 
replicas?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/325f920a-8676-44c4-b272-05381c3078ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to