Thanks Jorg, you've put it a lot better than I did! :)

On 12 December 2014 at 10:38, joergpra...@gmail.com <joergpra...@gmail.com>
wrote:

> The reason is the shard addressing. While it might be possible to imagine
> that a distributed system can query all nodes and ask "please give me the
> next free allocation for this doc" this is very slow. It would allow shard
> splitting by simply adding shards, iterating over nodes, but science has
> proven this is not scalable - the time of indexing and search grows linear
> (if not quadratic) with the number of nodes and Elasticsearch scalability
> would be severely broken.
>
> When a new doc arrives, a hash is computed to address the shard where it
> should reside in O(1) time. This is fast and scalable over the number of
> nodes. The price to pay is to set a constant number of shards as the
> divisor for the formula to distribute the doc by the doc ID.
>
> If the hash computation for adding shards was changed during cluster life,
> all things will be messed up, ES would not find docs any longer. To remedy
> this, you have to reindex and this recomputes the hashes for a higher
> number of shards.
>
> There are systems who allow shard growth in a limited way. By using a hash
> ring algorithm, you could address e.g. only each third shard, and let the
> next two places in the hash ring free, maybe for adding shards later, see
> e.g. Project Voldemort
> http://www.project-voldemort.com/voldemort/design.html
>
> But this would mean two things: there would be a limited number of shard
> splits (only a fixed number of shards can be reserved), and it would be up
> to the user to mess around with the very sensible condition when to
> activate the reserved shards and instruct ES to use a more difficult
> algorithm to lookup docs. I can say there is no general condition of shard
> overflow. See the shard allocation deciders, there are many good criteria,
> but in the end, the user will get the headache. And this is because it is
> an anti-pattern: the promise is that shard split will solve a problem, but
> you get a new, bigger one.
>
> Elasticsearch philosophy is that it should scale and should be easy to
> use. Having no headaches around the shard count, once it is set, is easy.
>
> Jörg
>
>
>
> On Fri, Dec 12, 2014 at 9:31 AM, Mark Walkom <markwal...@gmail.com> wrote:
>>
>> As I found out yesterday, the problem with shard splitting in ES is that
>> there algorithms that are used to round robin the data allocation during
>> indexing that are based on a pre-determined hash. So if you suddenly alter
>> the hash you may end up with shards that are overloaded compared to others.
>>
>> Maybe a dev can confirm/clarify this, but that was the understanding I
>> took away for not doing shard splitting within ES.
>>
>> On 12 December 2014 at 02:25, Kevin Burton <burtona...@gmail.com> wrote:
>>
>>>
>>> It seems to me that most people arguing this have trivial scalability
>>> requirements.  Not trying to be rude by saying that btw.  But shard
>>> splitting is really the only way to scale from 250GB indexed to 500TB
>>> indexed.
>>>
>>> On Thursday, December 11, 2014 4:58:42 PM UTC-8, Andrew Selden wrote:
>>>>
>>>> I would agree that shard splitting is not the best approach. Much
>>>> better to design for expansion by building in layers of indirection into
>>>> your application through the techniques of over-sharding, index aliasing,
>>>> and multiple indices.
>>>>
>>>
>>> Yes.. all those are lame attempts at shard splitting.
>>>
>>> Over sharding is wasteful, it might not have a significant performance
>>> impact in practice if you only have a few shards, but if you only add a few
>>> you're not goign to be able to increase your capacity.
>>>
>>> Using multiple indexes is just a way to cheat by adding more shards in a
>>> round about fashion, your runtime query performance will suffer because of
>>> this.
>>>
>>>>
>>>> First, you can allocate more shards than you need when you create the
>>>> index. If you need 5 shards today, but think you might need 10 shards in 6
>>>> months, then just create the index with 10 shards. We call this
>>>> over-sharding. There really is no penalty to doing this within reason.
>>>>
>>>
>>> So you've only given yourself a 2x overhead in capacity.  That's not
>>> very elastic.
>>>
>>> With shard splitting you can go from 2x to 10x to 100x without any
>>> wasted IO in over-indexing.
>>>
>>>
>>>> Searching against 1 index with 50 shards is exactly the same as
>>>> searching against 50 indices with one shard.
>>>>
>>>
>>> No it's not.. if the shards are on the same box you're paying a
>>> performance cost there.. If the indexes are small and fit in memory you
>>> won't feel it that much.
>>>
>>>
>>>> Second, as others have mentioned, use multiple indices and hide them
>>>> away behind an alias.
>>>>
>>>
>>> If each index has say 20 shards, and you have 10 indexes, then you have
>>> 200 shards to run your query against.  This means queries that use all
>>> these indexes will get slower and slower.
>>>
>>> The ideal situation is to shard split so that when you need more shards,
>>> you just split.
>>>
>>> If ES had this feature today, no one would be arguing against shard
>>> splitting. It would just be common practice.  The only issue is that ES
>>> hasn't implemented it yet so it's not a viable solution.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/c35d0b14-46a0-4baf-b06e-b5bb3ff43e5f%40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/c35d0b14-46a0-4baf-b06e-b5bb3ff43e5f%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8WBPxKW1GeJvat5%3D7AmcExDwk9SW8%3DXMqjiH-S2nvd8Q%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8WBPxKW1GeJvat5%3D7AmcExDwk9SW8%3DXMqjiH-S2nvd8Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHkHHmLfYyKfumFarXWzj37makn7u6cu3MdcX3zCGGGnw%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHkHHmLfYyKfumFarXWzj37makn7u6cu3MdcX3zCGGGnw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-K%3D17nGnzFHMEdd78Kvm3EZRab13wOHBYmWZ_Gwa%2BhoA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to