Thanks John,

I was leaning towards '2 is not quite enough' for parity, but wanted to get a 2nd opinion. The level of detail and discussion in your answer is very helpful, much appreciated!

Mark


On 05/04/18 08:25, John Dickinson wrote:
The answer always starts with "it depends...". Depends on your hardware, where 
it's physically located, the durability you need, the access patterns, etc

There have been whole phd dissertations on the right way to calculate 
durability. Two parity segments isn't exactly equivalent to three replicas 
because in the EC case you've also got to figure out the chance of failure to 
get all of the necessary remaining segments to satisfy a read request[1].

In your case, using 3 or 4 parity bits will probably get you better durability and 
availability than a 3x replica system and still use less overall drive space[2]. My 
company's product has three "canned" EC policy settings to make it simpler for 
customers to choose. We've got 4+3, 8+4, and 15+4 settings, and we steer people to one of 
them based on how many servers are in their cluster.

Note that there's nothing special about the m=4 examples in Swift's docs, at 
least in the sense of recommending 4 parity as better than 3 or 5 (or any other 
number).

In your case, you'll want to take into account how many drives you can lose and 
how many servers you can lose. Suppose you have a 10+4 scheme and two servers 
and 12 drives in each server. You'll be able to lose 4 drives, yes, but if 
either server goes down, you'll not be able to access your data because each 
server will have 7 fragments (on seven disks). However, if you had 6 servers 
with 4 drives each, for the same total of 24 drives, you could lose four 
drives, like the other situation, but you could also lose up to two servers and 
still be able to read your data[3].

Another consideration is how much overhead you want to have. Increasing the 
data segments lowers the overhead used, but increasing the parity segments 
improves your durability and availability (up to the limits of your physical 
hardware failure domains).

Finally, and probably most simply, you'll want to take into account the 
increased CPU and network cost for a particular EC scheme. A 3x replica write 
needs 3 network connections, and a read needs 1. For an EC policy, a write 
needs k+m connections, and a read needs k. If you're using something really 
large like an 18+3 scheme, you're looking at a 7x overhead in network 
requirements when compared to a 3x replica policy. The increased socket 
management and packet shuffling can add significant burden to your proxy 
servers[4]. Good news on the CPU though. The EC algorithms are old and well 
tuned, especially when using libraries like erasure or isa-l, and CPUs are 
really fast. Erasure code policies do not add significant overhead from the 
encode/decode steps.

So, in summary, it's complicated, there's isn't a "right" answer, and it 
depends a lot on everything else about your cluster. But you've got this! You'll do 
great, and keep asking questions.

I hope all this helps.




_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to