On Thu, Oct 9, 2014 at 6:34 PM, Kevin Burton <burtona...@gmail.com> wrote:

>
>
> On Wednesday, October 8, 2014 12:07:30 AM UTC-7, Jörg Prante wrote:
>>
>> With ES, you can go up to the bandwidth limit the OS allows for writing
>> I/O (if you disable throttling etc.)
>>
>> This means, if you write to one shard, it can be as fast as writing to
>> thousands of shards in parallel in summary. There is an OS limit for file
>> system buffers so the more shards, the more RAM is recommended.
>>
>> If OS restricts file descriptor limits and you want to write to shards,
>> you can estimate you need a peak of 100-200 file descriptors for active
>> merges etc. (this varies from version to version and from setting to
>> setting). ES does not impose a limit here.
>>
>> These resource demands and custom configurations are not related to the
>> choice SSD or HDD.
>>
>>
> There's no way that's true... for example, if you are on HDD, and you're
> index is not in memory, AND you're serving queries (which is a realistic
> use case) then there is NO way you can write at the full IO of the disk.
> It's just physically impossible.
>
> If ES has been able to solve that problem then they could win a nobel
> prize :-p
>
> SSD is 2-3 orders of magnitude faster than HDD ... so yes, it is related
> to the choice of SSD or HDD.
>
> ... it's entirely possible I'm misinterpreting what you're saying though.
>
>
Yeah.  You _could_ disable throttling and write faster.  But that'd be
stupid if you need query performance to stay constant.

You asked about the load on my shards.  Its not at all evenly distributed.
Update and query rate varies a ton depending on the index.

The are per-index overheads just for having that many indexes but from my
perspective they are pretty small.  Some stuff isn't right, for example
Elasticsearch distributes the memory used to buffer writes evenly across
indexes that have received writes in the past few minutes.

Actually serving searches across all the shards is another story though.
For the most part we lay things out so that each search request hits 1/3 -
2 of our servers.  We're probably more concerned about that then most
people, though, because we frequently perform actions that have high-ish
per shard overhead like the phrase suggester.  Oversubscribing is ok for
indexes that don't get much query traffic but otherwise we make sure
everything is spread out as evenly as possible.  We do that by cranking up
the *index *allocation factor and by adding *total_shards_per_node = 1* to
our highest traffic indexes.  Cranking up the index allocation spreads the
shards of each index out pretty evenly but sometimes Elasticsearch will
smash them together during "exciting" situations like when a node drops
out.  *total_shards_per_node* is a hard limit so even when things get
exciting it'll keep those shards away from each other.

If your wondering about write speed its only kinda related to shards.  The
two things you can do to speed up writing is to make sure that all shards
you are writing to are evenly spread out across your nodes and to crank up
the number of segments that are allowed to exist.  More segments lower the
write amplification factor both for IO and CPU usage.  BUT they cost more
to search.  If you plan to do a ton of writes at once then never again you
can crank up the segments during the write and then run an optimize to
squash the segments together.  That works great if you never update any of
the documents in the index.  Its even ok to index more stuff into the
index.  It starts to be annoying if you do lots of updates (I do) because
you get into situations where your big segments have lots of deleted
document in them.  Then you have to merge them and that has lots of
overhead.  If they get big enough the merge policy will sometimes refuse to
pick them for merging too!  Its kind of a degenerate case though.

Looping back around to shards for write performance - the number of
segments allowed is a per shard thing.  Its set on the whole index but it
applies per shard.  Say the max segments per size tier is 10 and the number
of shards is 5.  Then you get 50 segments per tier.  So if you double the
number of shards you double the number of segments per tier so writes are
faster.  The trouble is there is no command to squash shards together.  So
maybe the better solution is to double the number of segments per tier and
then run optimize, because squashing shards together is optimize's job.
OTOH see the caveats above to optimize.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2kEhe%2B-YabODeFA-6cLY9BZrn7WC0K7CVuo689RoHMAg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to