Re: Trading index performance for search performance

2014-11-10 Thread Ben George
Really helpful answer!  When you say 'invoke warmers' are you saying to 
simply set index.warmer.enabled = true ?  Also, in terms of ordering should 
warmers be enabled before or after an explicit optimize + refresh in a 
scenario where we need the index 100% ready for search before continuing ?

eg:
1) 
adminClient.indices().prepareOptimize(index).setMaxNumSegments(1).setForce(true).execute().actionGet();
2) adminClient.indices().prepareRefresh(index).execute().actionGet(); // 
Need to do this explicitly so we can wait for it to finish before 
proceeding.
3) set refresh_interval = 1, index.warmer.enabled = true


On Thursday, 17 July 2014 17:35:54 UTC+1, Jörg Prante wrote:

 The 30m docs may have characteristics (volume, term freqs, mappings) so ES 
 limits are reached within your specific configuration. This is hard to 
 guess without knowing more facts.

 Beside improving merge configuration, you might be able to sacrifice 
 indexing time by assigning limited daily indexing time windows to your 
 clients. 

 The indexing process can then be divided into steps:

 - connect to cluster
 - create index with n shards and replica level 0
 - create mappings
 - disable refresh rate
 - start bulk index
 - stop bulk index
 - optimize to segment num 1
 - enable refresh rate
 - add replica levels in order to handle maximum search workload
 - invoke warmers
 - disconnect from cluster

 After the clients have completed indexing, you have a fully optimized 
 cluster, on which you can put full search load with aggregations etc. with 
 the highest performance, but while searching you should keep the indexing 
 silent (or set it even to read only).

 You do not need to scale vertically by adding hardware to the existing 
 servers. Scaling horizontally by adding nodes on more servers for the 
 replicas the method ES was designed for. Adding nodes will drastically 
 improve the search capabilities with regard to facets/aggregations.

 Jörg


 On Thu, Jul 17, 2014 at 5:56 PM, jnortey jeremy...@gmail.com 
 javascript: wrote:

 At the moment, we're able to bulk index data at a rate faster than we 
 actually need. Indexing is not as important to use as being able to quickly 
 search for data. Once we start reaching ~30 million documents indexed, we 
 start to see performance decreasing in ours search queries. What are the 
 best techniques for sacrificing indexing time in order to improve search 
 performance?


 A bit more info:

 - We have the resources to improve our hardware (memory, CPU, etc) but 
 we'd like to maximize the improvements that can be made programmatically or 
 using properties before going for hardware increases.

 - Our searches make very heavy uses of faceting and aggregations.

 - When we run the optimize query, we see *significant* improvements in 
 our search times (between 50% and 80% improvements), but as documented, 
 this is usually a pretty expensive operation. Is there a way to sacrifice 
 indexing time in order to have Elasticsearch index the data more 
 efficiently? (I guess sort of mimicking the optimization behavior at index 
 time)
  
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/0e134001-9a55-40c5-a8fc-4c1485a3e6fc%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/0e134001-9a55-40c5-a8fc-4c1485a3e6fc%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a6d345be-c408-4d7c-a794-5ade13826048%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Trading index performance for search performance

2014-11-10 Thread joergpra...@gmail.com
Yes, I mean index.warmer.enabled = true. This is a switch for global
enabling/disabling warmers.

If you have configured warmers at index creation time - see description at

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-warmers.html

and warmers are enabled by the global switch, then you should disable
warmers before bulk indexing, and re-enabling warmers after bulk indexing.
This is described in the documentation as well:

This can be handy when doing initial bulk indexing: disable pre registered
warmers to make indexing faster and less expensive and then enable it.

You should re-enable warmers after index optimization and after index
refresh.

Jörg



On Mon, Nov 10, 2014 at 5:09 PM, Ben George bengeo...@gmail.com wrote:

 Really helpful answer!  When you say 'invoke warmers' are you saying to
 simply set index.warmer.enabled = true ?  Also, in terms of ordering should
 warmers be enabled before or after an explicit optimize + refresh in a
 scenario where we need the index 100% ready for search before continuing ?

 eg:
 1)
 adminClient.indices().prepareOptimize(index).setMaxNumSegments(1).setForce(true).execute().actionGet();
 2) adminClient.indices().prepareRefresh(index).execute().actionGet(); //
 Need to do this explicitly so we can wait for it to finish before
 proceeding.
 3) set refresh_interval = 1, index.warmer.enabled = true


 On Thursday, 17 July 2014 17:35:54 UTC+1, Jörg Prante wrote:

 The 30m docs may have characteristics (volume, term freqs, mappings) so
 ES limits are reached within your specific configuration. This is hard to
 guess without knowing more facts.

 Beside improving merge configuration, you might be able to sacrifice
 indexing time by assigning limited daily indexing time windows to your
 clients.

 The indexing process can then be divided into steps:

 - connect to cluster
 - create index with n shards and replica level 0
 - create mappings
 - disable refresh rate
 - start bulk index
 - stop bulk index
 - optimize to segment num 1
 - enable refresh rate
 - add replica levels in order to handle maximum search workload
 - invoke warmers
 - disconnect from cluster

 After the clients have completed indexing, you have a fully optimized
 cluster, on which you can put full search load with aggregations etc. with
 the highest performance, but while searching you should keep the indexing
 silent (or set it even to read only).

 You do not need to scale vertically by adding hardware to the existing
 servers. Scaling horizontally by adding nodes on more servers for the
 replicas the method ES was designed for. Adding nodes will drastically
 improve the search capabilities with regard to facets/aggregations.

 Jörg


 On Thu, Jul 17, 2014 at 5:56 PM, jnortey jeremy...@gmail.com wrote:

 At the moment, we're able to bulk index data at a rate faster than we
 actually need. Indexing is not as important to use as being able to quickly
 search for data. Once we start reaching ~30 million documents indexed, we
 start to see performance decreasing in ours search queries. What are the
 best techniques for sacrificing indexing time in order to improve search
 performance?


 A bit more info:

 - We have the resources to improve our hardware (memory, CPU, etc) but
 we'd like to maximize the improvements that can be made programmatically or
 using properties before going for hardware increases.

 - Our searches make very heavy uses of faceting and aggregations.

 - When we run the optimize query, we see *significant* improvements in
 our search times (between 50% and 80% improvements), but as documented,
 this is usually a pretty expensive operation. Is there a way to sacrifice
 indexing time in order to have Elasticsearch index the data more
 efficiently? (I guess sort of mimicking the optimization behavior at index
 time)

 --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/0e134001-9a55-40c5-a8fc-4c1485a3e6fc%
 40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/0e134001-9a55-40c5-a8fc-4c1485a3e6fc%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/a6d345be-c408-4d7c-a794-5ade13826048%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/a6d345be-c408-4d7c-a794-5ade13826048%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.



Trading index performance for search performance

2014-07-17 Thread jnortey
At the moment, we're able to bulk index data at a rate faster than we 
actually need. Indexing is not as important to use as being able to quickly 
search for data. Once we start reaching ~30 million documents indexed, we 
start to see performance decreasing in ours search queries. What are the 
best techniques for sacrificing indexing time in order to improve search 
performance?


A bit more info:

- We have the resources to improve our hardware (memory, CPU, etc) but we'd 
like to maximize the improvements that can be made programmatically or 
using properties before going for hardware increases.

- Our searches make very heavy uses of faceting and aggregations.

- When we run the optimize query, we see *significant* improvements in our 
search times (between 50% and 80% improvements), but as documented, this is 
usually a pretty expensive operation. Is there a way to sacrifice indexing 
time in order to have Elasticsearch index the data more efficiently? (I 
guess sort of mimicking the optimization behavior at index time)

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0e134001-9a55-40c5-a8fc-4c1485a3e6fc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Trading index performance for search performance

2014-07-17 Thread Nikolas Everett
It might be useful to fiddle with the merge configuration
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-merge.html
to try to end up with fewer segments.  That'll reduce the IO cost of the
underlying Lucene operations that filter your query before the
aggregations.  One win is to make sure you aren't oversubscribing.  So if
you are going for maximum speed have one shard per server.  Maybe one
less.  If you are going for maximum throughput (like, total queries) then
have fewer total copies of the data then you do servers.  So if you have 5
shards with 2 replicas, you'd need at least 15 servers.


Cutting down the number of really helped my throughput but it might have
been because my workload is different.

Nik


On Thu, Jul 17, 2014 at 11:56 AM, jnortey jeremy.nor...@gmail.com wrote:

 At the moment, we're able to bulk index data at a rate faster than we
 actually need. Indexing is not as important to use as being able to quickly
 search for data. Once we start reaching ~30 million documents indexed, we
 start to see performance decreasing in ours search queries. What are the
 best techniques for sacrificing indexing time in order to improve search
 performance?


 A bit more info:

 - We have the resources to improve our hardware (memory, CPU, etc) but
 we'd like to maximize the improvements that can be made programmatically or
 using properties before going for hardware increases.

 - Our searches make very heavy uses of faceting and aggregations.

 - When we run the optimize query, we see *significant* improvements in
 our search times (between 50% and 80% improvements), but as documented,
 this is usually a pretty expensive operation. Is there a way to sacrifice
 indexing time in order to have Elasticsearch index the data more
 efficiently? (I guess sort of mimicking the optimization behavior at index
 time)

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/0e134001-9a55-40c5-a8fc-4c1485a3e6fc%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/0e134001-9a55-40c5-a8fc-4c1485a3e6fc%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1WEWDomC20wvP0nja_dtEwDtmNFTfa5fp0AOeirShowA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Trading index performance for search performance

2014-07-17 Thread joergpra...@gmail.com
The 30m docs may have characteristics (volume, term freqs, mappings) so ES
limits are reached within your specific configuration. This is hard to
guess without knowing more facts.

Beside improving merge configuration, you might be able to sacrifice
indexing time by assigning limited daily indexing time windows to your
clients.

The indexing process can then be divided into steps:

- connect to cluster
- create index with n shards and replica level 0
- create mappings
- disable refresh rate
- start bulk index
- stop bulk index
- optimize to segment num 1
- enable refresh rate
- add replica levels in order to handle maximum search workload
- invoke warmers
- disconnect from cluster

After the clients have completed indexing, you have a fully optimized
cluster, on which you can put full search load with aggregations etc. with
the highest performance, but while searching you should keep the indexing
silent (or set it even to read only).

You do not need to scale vertically by adding hardware to the existing
servers. Scaling horizontally by adding nodes on more servers for the
replicas the method ES was designed for. Adding nodes will drastically
improve the search capabilities with regard to facets/aggregations.

Jörg


On Thu, Jul 17, 2014 at 5:56 PM, jnortey jeremy.nor...@gmail.com wrote:

 At the moment, we're able to bulk index data at a rate faster than we
 actually need. Indexing is not as important to use as being able to quickly
 search for data. Once we start reaching ~30 million documents indexed, we
 start to see performance decreasing in ours search queries. What are the
 best techniques for sacrificing indexing time in order to improve search
 performance?


 A bit more info:

 - We have the resources to improve our hardware (memory, CPU, etc) but
 we'd like to maximize the improvements that can be made programmatically or
 using properties before going for hardware increases.

 - Our searches make very heavy uses of faceting and aggregations.

 - When we run the optimize query, we see *significant* improvements in
 our search times (between 50% and 80% improvements), but as documented,
 this is usually a pretty expensive operation. Is there a way to sacrifice
 indexing time in order to have Elasticsearch index the data more
 efficiently? (I guess sort of mimicking the optimization behavior at index
 time)

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/0e134001-9a55-40c5-a8fc-4c1485a3e6fc%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/0e134001-9a55-40c5-a8fc-4c1485a3e6fc%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHWfvjUc5KLUn9HpBpbmjo%3DEeKEQJ6iGMcqHZVCTafV0g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Trading index performance for search performance

2014-07-17 Thread jnortey
Thanks to both of you for the advise. Unfortunately setting daily indexing 
times isn't an option for us, however  I think I have a good idea of what 
we should try next.

On Thursday, July 17, 2014 10:56:31 AM UTC-5, jnortey wrote:

 At the moment, we're able to bulk index data at a rate faster than we 
 actually need. Indexing is not as important to use as being able to quickly 
 search for data. Once we start reaching ~30 million documents indexed, we 
 start to see performance decreasing in ours search queries. What are the 
 best techniques for sacrificing indexing time in order to improve search 
 performance?


 A bit more info:

 - We have the resources to improve our hardware (memory, CPU, etc) but 
 we'd like to maximize the improvements that can be made programmatically or 
 using properties before going for hardware increases.

 - Our searches make very heavy uses of faceting and aggregations.

 - When we run the optimize query, we see *significant* improvements in 
 our search times (between 50% and 80% improvements), but as documented, 
 this is usually a pretty expensive operation. Is there a way to sacrifice 
 indexing time in order to have Elasticsearch index the data more 
 efficiently? (I guess sort of mimicking the optimization behavior at index 
 time)


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/66fc6dbb-6982-410d-a682-c711708eff54%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.