Hi list,
     I am thinking about the possibility to add some primitive in CRUSH to meet 
the following user stories:
A. "Same host", "Same rack"
        To balance between availability and performance ,one may like such a 
rule: 3 Replicas, Replica 1 and Replica 2 should in the same rack while Replica 
3 reside in another rack.This is common because a typical d        eployment in 
datacenter usually has much fewer uplink bandwidth than backbone bandwidth.

More aggressive guys may even want same host, since the most common failure is 
disk failure. And we have several disk (also means several OSDs) reside in the 
same physical machine. If we can place Replica 1 & 2 on the same host but 
replica 3 in somewhere else.It will not only reduce replication traffic but 
also saving a lot of time & bandwidth when disk failure happened and a recovery 
take place.
B."local"
         Although we cannot mount RBD volumes to where a OSD running at, but 
QEMU canbe used. This scenarios is really common in cloud computing. We have a 
large amount of compute-nodes, just plug in some disks      and make the 
machines reused for Ceph cluster. To reduce network traffic and latency , if it 
is possible to have some placement-group-maybe 3 PG for a compute-node. Define 
the rules like: primary copy of the PG      should (if possible) reside in 
localhost, the second replica should go different places
        
        By doing this , a significant amount of network bandwidth & a RTT can 
be saved. What's more ,since read always go to primary, it will benefit a lot 
from such mechanism.

It looks to me that A is simpler but B seems much complex. Hoping for inputs.

                                                                                                                                                                                                                      
 Xiaoxi
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to