Re: Scaling strategies without shard splitting

joergpra...@gmail.com Fri, 17 Oct 2014 14:59:00 -0700

In my use case I have indexed a union catalog for some hundred libraries,
where each library can have a search service, plus adding their own catalog
data they do not want to share.


Elasticsearch offers far more flexibility and performance than Solr with
the ability of automatic extending the cluster by adding nodes (without
configuration change) combined with automatic rebalancing of shards, plus
the feature of index aliases and shard over-allocation, an explanation is
here:
http://elasticsearch-users.115913.n3.nabble.com/Over-allocation-of-shards-td3673978.html

With index aliases, I do not have to perform evil things like shard
splitting. No index copy required, no full re-index.

That is, I can organize some library catalog index over the machines, and
address an "index view" for each library by assigning several index aliases
(e.g. collection names or library identifiers) to the library catalog
segments they are interested in, with term filters. Index updates come from
a single point of a primary data base plus data packages the libraries can
upload. If the number of input data exceeds the capacity, I can simply
start a new node, without touching the configuration.

Also, releasing new index versions is a snap with Elasticsearch. The index
names carry timestamp information (e.g. ddMMyyHH) and it is easy to
organize index versions like rolling windows, with the latest index being
the current one to search. Old indices are dropped if the are no longer
needed.

Jörg


On Mon, Oct 13, 2014 at 8:12 PM, Ian Rose <ianr...@fullstory.com> wrote:

> Hi -
>
> My team has used Solr in it's single-node configuration (without
> SolrCloud) for a few years now.  In our current product we are now looking
> at transitioning to SolrCloud, but before we made that leap I wanted to
> also take a good look at whether ElasticSearch would be a better fit for
> our needs.  Although ES has some nice advantages (such as automatic shard
> rebalancing) I'm trying to figure out how to live in a world without shard
> splitting.  In brief, our situation is as follows:
>
>  - We use one index ("collection" in Solr) per customer.
>  - The indexes are going to vary quite a bit in size, following something
> like a power-law distribution with many small indexes (let's guess < 250k
> documents), some medium sized indexes (up to a few million documents) and a
> few large indexes (hundreds of millions of documents).
>  - So the number of shards required per index will vary greatly, and will
> be hard to predict accurately at creation time.
>
> How do people generally approach this kind of problem?  Do you just make a
> best guess at the appropriate number of shards for each new index and then
> do a full re-index (with more shards) if the number of documents grows
> bigger than expected?
>
> Thanks!
> - Ian
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/ded96e32-e1f1-4d09-8356-7367c86b1166%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/ded96e32-e1f1-4d09-8356-7367c86b1166%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHWv1bNZ571cu64VArC-H9cZ60snV8qRuPcj4JCqsVrBw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Scaling strategies without shard splitting

Reply via email to