On Thu, Oct 9, 2014 at 6:34 PM, Kevin Burton <burtona...@gmail.com> wrote:
> > > On Wednesday, October 8, 2014 12:07:30 AM UTC-7, Jörg Prante wrote: >> >> With ES, you can go up to the bandwidth limit the OS allows for writing >> I/O (if you disable throttling etc.) >> >> This means, if you write to one shard, it can be as fast as writing to >> thousands of shards in parallel in summary. There is an OS limit for file >> system buffers so the more shards, the more RAM is recommended. >> >> If OS restricts file descriptor limits and you want to write to shards, >> you can estimate you need a peak of 100-200 file descriptors for active >> merges etc. (this varies from version to version and from setting to >> setting). ES does not impose a limit here. >> >> These resource demands and custom configurations are not related to the >> choice SSD or HDD. >> >> > There's no way that's true... for example, if you are on HDD, and you're > index is not in memory, AND you're serving queries (which is a realistic > use case) then there is NO way you can write at the full IO of the disk. > It's just physically impossible. > > If ES has been able to solve that problem then they could win a nobel > prize :-p > > SSD is 2-3 orders of magnitude faster than HDD ... so yes, it is related > to the choice of SSD or HDD. > > ... it's entirely possible I'm misinterpreting what you're saying though. > > Yeah. You _could_ disable throttling and write faster. But that'd be stupid if you need query performance to stay constant. You asked about the load on my shards. Its not at all evenly distributed. Update and query rate varies a ton depending on the index. The are per-index overheads just for having that many indexes but from my perspective they are pretty small. Some stuff isn't right, for example Elasticsearch distributes the memory used to buffer writes evenly across indexes that have received writes in the past few minutes. Actually serving searches across all the shards is another story though. For the most part we lay things out so that each search request hits 1/3 - 2 of our servers. We're probably more concerned about that then most people, though, because we frequently perform actions that have high-ish per shard overhead like the phrase suggester. Oversubscribing is ok for indexes that don't get much query traffic but otherwise we make sure everything is spread out as evenly as possible. We do that by cranking up the *index *allocation factor and by adding *total_shards_per_node = 1* to our highest traffic indexes. Cranking up the index allocation spreads the shards of each index out pretty evenly but sometimes Elasticsearch will smash them together during "exciting" situations like when a node drops out. *total_shards_per_node* is a hard limit so even when things get exciting it'll keep those shards away from each other. If your wondering about write speed its only kinda related to shards. The two things you can do to speed up writing is to make sure that all shards you are writing to are evenly spread out across your nodes and to crank up the number of segments that are allowed to exist. More segments lower the write amplification factor both for IO and CPU usage. BUT they cost more to search. If you plan to do a ton of writes at once then never again you can crank up the segments during the write and then run an optimize to squash the segments together. That works great if you never update any of the documents in the index. Its even ok to index more stuff into the index. It starts to be annoying if you do lots of updates (I do) because you get into situations where your big segments have lots of deleted document in them. Then you have to merge them and that has lots of overhead. If they get big enough the merge policy will sometimes refuse to pick them for merging too! Its kind of a degenerate case though. Looping back around to shards for write performance - the number of segments allowed is a per shard thing. Its set on the whole index but it applies per shard. Say the max segments per size tier is 10 and the number of shards is 5. Then you get 50 segments per tier. So if you double the number of shards you double the number of segments per tier so writes are faster. The trouble is there is no command to squash shards together. So maybe the better solution is to double the number of segments per tier and then run optimize, because squashing shards together is optimize's job. OTOH see the caveats above to optimize. Nik -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2kEhe%2B-YabODeFA-6cLY9BZrn7WC0K7CVuo689RoHMAg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.