As I found out yesterday, the problem with shard splitting in ES is that there algorithms that are used to round robin the data allocation during indexing that are based on a pre-determined hash. So if you suddenly alter the hash you may end up with shards that are overloaded compared to others.
Maybe a dev can confirm/clarify this, but that was the understanding I took away for not doing shard splitting within ES. On 12 December 2014 at 02:25, Kevin Burton <burtona...@gmail.com> wrote: > > It seems to me that most people arguing this have trivial scalability > requirements. Not trying to be rude by saying that btw. But shard > splitting is really the only way to scale from 250GB indexed to 500TB > indexed. > > On Thursday, December 11, 2014 4:58:42 PM UTC-8, Andrew Selden wrote: >> >> I would agree that shard splitting is not the best approach. Much better >> to design for expansion by building in layers of indirection into your >> application through the techniques of over-sharding, index aliasing, and >> multiple indices. >> > > Yes.. all those are lame attempts at shard splitting. > > Over sharding is wasteful, it might not have a significant performance > impact in practice if you only have a few shards, but if you only add a few > you're not goign to be able to increase your capacity. > > Using multiple indexes is just a way to cheat by adding more shards in a > round about fashion, your runtime query performance will suffer because of > this. > >> >> First, you can allocate more shards than you need when you create the >> index. If you need 5 shards today, but think you might need 10 shards in 6 >> months, then just create the index with 10 shards. We call this >> over-sharding. There really is no penalty to doing this within reason. >> > > So you've only given yourself a 2x overhead in capacity. That's not very > elastic. > > With shard splitting you can go from 2x to 10x to 100x without any wasted > IO in over-indexing. > > >> Searching against 1 index with 50 shards is exactly the same as searching >> against 50 indices with one shard. >> > > No it's not.. if the shards are on the same box you're paying a > performance cost there.. If the indexes are small and fit in memory you > won't feel it that much. > > >> Second, as others have mentioned, use multiple indices and hide them away >> behind an alias. >> > > If each index has say 20 shards, and you have 10 indexes, then you have > 200 shards to run your query against. This means queries that use all > these indexes will get slower and slower. > > The ideal situation is to shard split so that when you need more shards, > you just split. > > If ES had this feature today, no one would be arguing against shard > splitting. It would just be common practice. The only issue is that ES > hasn't implemented it yet so it's not a viable solution. > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/c35d0b14-46a0-4baf-b06e-b5bb3ff43e5f%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/c35d0b14-46a0-4baf-b06e-b5bb3ff43e5f%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8WBPxKW1GeJvat5%3D7AmcExDwk9SW8%3DXMqjiH-S2nvd8Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.