Thanks John,
I was leaning towards '2 is not quite enough' for parity, but wanted to
get a 2nd opinion. The level of detail and discussion in your answer is
very helpful, much appreciated!
Mark
On 05/04/18 08:25, John Dickinson wrote:
The answer always starts with "it depends...". Depends on your hardware, where
it's physically located, the durability you need, the access patterns, etc
There have been whole phd dissertations on the right way to calculate
durability. Two parity segments isn't exactly equivalent to three replicas
because in the EC case you've also got to figure out the chance of failure to
get all of the necessary remaining segments to satisfy a read request[1].
In your case, using 3 or 4 parity bits will probably get you better durability and
availability than a 3x replica system and still use less overall drive space[2]. My
company's product has three "canned" EC policy settings to make it simpler for
customers to choose. We've got 4+3, 8+4, and 15+4 settings, and we steer people to one of
them based on how many servers are in their cluster.
Note that there's nothing special about the m=4 examples in Swift's docs, at
least in the sense of recommending 4 parity as better than 3 or 5 (or any other
number).
In your case, you'll want to take into account how many drives you can lose and
how many servers you can lose. Suppose you have a 10+4 scheme and two servers
and 12 drives in each server. You'll be able to lose 4 drives, yes, but if
either server goes down, you'll not be able to access your data because each
server will have 7 fragments (on seven disks). However, if you had 6 servers
with 4 drives each, for the same total of 24 drives, you could lose four
drives, like the other situation, but you could also lose up to two servers and
still be able to read your data[3].
Another consideration is how much overhead you want to have. Increasing the
data segments lowers the overhead used, but increasing the parity segments
improves your durability and availability (up to the limits of your physical
hardware failure domains).
Finally, and probably most simply, you'll want to take into account the
increased CPU and network cost for a particular EC scheme. A 3x replica write
needs 3 network connections, and a read needs 1. For an EC policy, a write
needs k+m connections, and a read needs k. If you're using something really
large like an 18+3 scheme, you're looking at a 7x overhead in network
requirements when compared to a 3x replica policy. The increased socket
management and packet shuffling can add significant burden to your proxy
servers[4]. Good news on the CPU though. The EC algorithms are old and well
tuned, especially when using libraries like erasure or isa-l, and CPUs are
really fast. Erasure code policies do not add significant overhead from the
encode/decode steps.
So, in summary, it's complicated, there's isn't a "right" answer, and it
depends a lot on everything else about your cluster. But you've got this! You'll do
great, and keep asking questions.
I hope all this helps.
_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack