> Yeah, I saw erasure encoding mentioned a little while ago, but that's
> likely not to be around by the time I'm going to deploy things.
> Nevermind that super bleeding edge isn't my style when it comes to
> production systems. ^o^

> And at something like 600 disks, that would still have to be a mighty high
> level of replication to combat failure statistics...

Not sure if I understand correctly but:
It looks like it currently is a raid 01 kind of solution 
So failure domain is a raid 0 and mirror the failure domain to X replicas.
When you have a rep count of 3 you could be unlucky with 3 disks failing in 
three failure domains at the same time.
If you have enough disks in the cluster the chances are this will happen at 
some point.

It would make sense that you would be able to create a raid 10 kind of solution:
Where disk1 in failure domain 1 has the same content as disk1 in failure domain 
2 and domain 3.
So the PGs that are on one OSD will be exactly mirrored to another OSD in 
another failure domain.
This would require more uniform hardware and you lose flexibility but you win a 
lot of reliability.
Without knowing anything about the code base I *think* it should be pretty 
trivial to change the code to support this and would be a very small change 
compared to erasure code.

( I looked a bit at crush map Bucket Types but it *seems* that all Bucket types 
will still stripe the PGs across all nodes within a failure domain )

Cheers,
Robert van Leeuwen
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to