As I found out yesterday, the problem with shard splitting in ES is that
there algorithms that are used to round robin the data allocation during
indexing that are based on a pre-determined hash. So if you suddenly alter
the hash you may end up with shards that are overloaded compared to others.

Maybe a dev can confirm/clarify this, but that was the understanding I took
away for not doing shard splitting within ES.

On 12 December 2014 at 02:25, Kevin Burton <burtona...@gmail.com> wrote:

>
> It seems to me that most people arguing this have trivial scalability
> requirements.  Not trying to be rude by saying that btw.  But shard
> splitting is really the only way to scale from 250GB indexed to 500TB
> indexed.
>
> On Thursday, December 11, 2014 4:58:42 PM UTC-8, Andrew Selden wrote:
>>
>> I would agree that shard splitting is not the best approach. Much better
>> to design for expansion by building in layers of indirection into your
>> application through the techniques of over-sharding, index aliasing, and
>> multiple indices.
>>
>
> Yes.. all those are lame attempts at shard splitting.
>
> Over sharding is wasteful, it might not have a significant performance
> impact in practice if you only have a few shards, but if you only add a few
> you're not goign to be able to increase your capacity.
>
> Using multiple indexes is just a way to cheat by adding more shards in a
> round about fashion, your runtime query performance will suffer because of
> this.
>
>>
>> First, you can allocate more shards than you need when you create the
>> index. If you need 5 shards today, but think you might need 10 shards in 6
>> months, then just create the index with 10 shards. We call this
>> over-sharding. There really is no penalty to doing this within reason.
>>
>
> So you've only given yourself a 2x overhead in capacity.  That's not very
> elastic.
>
> With shard splitting you can go from 2x to 10x to 100x without any wasted
> IO in over-indexing.
>
>
>> Searching against 1 index with 50 shards is exactly the same as searching
>> against 50 indices with one shard.
>>
>
> No it's not.. if the shards are on the same box you're paying a
> performance cost there.. If the indexes are small and fit in memory you
> won't feel it that much.
>
>
>> Second, as others have mentioned, use multiple indices and hide them away
>> behind an alias.
>>
>
> If each index has say 20 shards, and you have 10 indexes, then you have
> 200 shards to run your query against.  This means queries that use all
> these indexes will get slower and slower.
>
> The ideal situation is to shard split so that when you need more shards,
> you just split.
>
> If ES had this feature today, no one would be arguing against shard
> splitting. It would just be common practice.  The only issue is that ES
> hasn't implemented it yet so it's not a viable solution.
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/c35d0b14-46a0-4baf-b06e-b5bb3ff43e5f%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/c35d0b14-46a0-4baf-b06e-b5bb3ff43e5f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8WBPxKW1GeJvat5%3D7AmcExDwk9SW8%3DXMqjiH-S2nvd8Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to