Really helpful answer! When you say 'invoke warmers' are you saying to simply set index.warmer.enabled = true ? Also, in terms of ordering should warmers be enabled before or after an explicit optimize + refresh in a scenario where we need the index 100% ready for search before continuing ?
eg: 1) adminClient.indices().prepareOptimize(index).setMaxNumSegments(1).setForce(true).execute().actionGet(); 2) adminClient.indices().prepareRefresh(index).execute().actionGet(); // Need to do this explicitly so we can wait for it to finish before proceeding. 3) set refresh_interval = 1, index.warmer.enabled = true On Thursday, 17 July 2014 17:35:54 UTC+1, Jörg Prante wrote: > > The 30m docs may have characteristics (volume, term freqs, mappings) so ES > limits are reached within your specific configuration. This is hard to > guess without knowing more facts. > > Beside improving merge configuration, you might be able to sacrifice > indexing time by assigning limited daily indexing time windows to your > clients. > > The indexing process can then be divided into steps: > > - connect to cluster > - create index with n shards and replica level 0 > - create mappings > - disable refresh rate > - start bulk index > - stop bulk index > - optimize to segment num 1 > - enable refresh rate > - add replica levels in order to handle maximum search workload > - invoke warmers > - disconnect from cluster > > After the clients have completed indexing, you have a fully optimized > cluster, on which you can put full search load with aggregations etc. with > the highest performance, but while searching you should keep the indexing > silent (or set it even to read only). > > You do not need to scale vertically by adding hardware to the existing > servers. Scaling horizontally by adding nodes on more servers for the > replicas the method ES was designed for. Adding nodes will drastically > improve the search capabilities with regard to facets/aggregations. > > Jörg > > > On Thu, Jul 17, 2014 at 5:56 PM, jnortey <jeremy...@gmail.com > <javascript:>> wrote: > >> At the moment, we're able to bulk index data at a rate faster than we >> actually need. Indexing is not as important to use as being able to quickly >> search for data. Once we start reaching ~30 million documents indexed, we >> start to see performance decreasing in ours search queries. What are the >> best techniques for sacrificing indexing time in order to improve search >> performance? >> >> >> A bit more info: >> >> - We have the resources to improve our hardware (memory, CPU, etc) but >> we'd like to maximize the improvements that can be made programmatically or >> using properties before going for hardware increases. >> >> - Our searches make very heavy uses of faceting and aggregations. >> >> - When we run the optimize query, we see *significant* improvements in >> our search times (between 50% and 80% improvements), but as documented, >> this is usually a pretty expensive operation. Is there a way to sacrifice >> indexing time in order to have Elasticsearch index the data more >> efficiently? (I guess sort of mimicking the optimization behavior at index >> time) >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/0e134001-9a55-40c5-a8fc-4c1485a3e6fc%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/0e134001-9a55-40c5-a8fc-4c1485a3e6fc%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a6d345be-c408-4d7c-a794-5ade13826048%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.