The 30m docs may have characteristics (volume, term freqs, mappings) so ES
limits are reached within your specific configuration. This is hard to
guess without knowing more facts.

Beside improving merge configuration, you might be able to sacrifice
indexing time by assigning limited daily indexing time windows to your
clients.

The indexing process can then be divided into steps:

- connect to cluster
- create index with n shards and replica level 0
- create mappings
- disable refresh rate
- start bulk index
- stop bulk index
- optimize to segment num 1
- enable refresh rate
- add replica levels in order to handle maximum search workload
- invoke warmers
- disconnect from cluster

After the clients have completed indexing, you have a fully optimized
cluster, on which you can put full search load with aggregations etc. with
the highest performance, but while searching you should keep the indexing
silent (or set it even to read only).

You do not need to scale vertically by adding hardware to the existing
servers. Scaling horizontally by adding nodes on more servers for the
replicas the method ES was designed for. Adding nodes will drastically
improve the search capabilities with regard to facets/aggregations.

Jörg


On Thu, Jul 17, 2014 at 5:56 PM, jnortey <jeremy.nor...@gmail.com> wrote:

> At the moment, we're able to bulk index data at a rate faster than we
> actually need. Indexing is not as important to use as being able to quickly
> search for data. Once we start reaching ~30 million documents indexed, we
> start to see performance decreasing in ours search queries. What are the
> best techniques for sacrificing indexing time in order to improve search
> performance?
>
>
> A bit more info:
>
> - We have the resources to improve our hardware (memory, CPU, etc) but
> we'd like to maximize the improvements that can be made programmatically or
> using properties before going for hardware increases.
>
> - Our searches make very heavy uses of faceting and aggregations.
>
> - When we run the optimize query, we see *significant* improvements in
> our search times (between 50% and 80% improvements), but as documented,
> this is usually a pretty expensive operation. Is there a way to sacrifice
> indexing time in order to have Elasticsearch index the data more
> efficiently? (I guess sort of mimicking the optimization behavior at index
> time)
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/0e134001-9a55-40c5-a8fc-4c1485a3e6fc%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/0e134001-9a55-40c5-a8fc-4c1485a3e6fc%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHWfvjUc5KLUn9HpBpbmjo%3DEeKEQJ6iGMcqHZVCTafV0g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to