Mingliang Liu created HDFS-14786:
------------------------------------

             Summary: A new block placement policy tolerating availability zone 
failure
                 Key: HDFS-14786
                 URL: https://issues.apache.org/jira/browse/HDFS-14786
             Project: Hadoop HDFS
          Issue Type: New Feature
          Components: block placement
            Reporter: Mingliang Liu


{{NetworkTopology}} assumes "/datacenter/rack/host" 3 layer topology. Default 
block placement policies are rack awareness for better fault tolerance Newer 
block placement policy like {{BlockPlacementPolicyRackFaultTolerant}} tries its 
best to place the replicas to most racks, which further tolerates more racks 
failing. [HADOOP-8470] brought {{NetworkTopologyWithNodeGroup}} to add another 
layer under rack, i.e. "/datacenter/rack/host/nodegroup" 4 layer topology. With 
that, replicas within a rack can be placed in different node groups for better 
isolation.

Existing block placement policies tolerate rack failure since at least two 
racks are chosen in those cases. Chances are all replicas could be placed in 
the same datacenter, though there are multiple data centers in the same cluster 
topology. In other words, fault of higher layers beyond rack is not well 
tolerated.

However, more deployments in public cloud are leveraging multiple available 
zones (AZ) for high-availability since the inter-AZ latency seems affordable in 
many cases. In a single AZ, some cloud providers like AWS support [partitioned 
placement 
groups|https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-partition]
 which basically are different racks. A simple network topology mapped to HDFS 
is "/availabilityzone/rack/host" 3 layers.

To achieve high availability tolerating zone failure, this JIRA proposes a new 
data placement policy which tries its best to place replicas in most AZs, most 
racks, and most evenly distributed.

Examples with 3 replicas, we choose racks as following:
# 1AZ: fall back to {{BlockPlacementPolicyRackFaultTolerant}} to most racks
# 2AZ: randomly choose one rack in 1st AZ and randomly choose two racks in the 
other AZ
# 3AZ: randomly choose one rack in one AZ
# 4AZ: randomly choose three AZ and one rack in each AZ
After racks are chosen, hosts are chosen randomly honoring local storage, 
favorite nodes, excluded nodes, storage types etc.

Data may become imbalance if topology is very uneven in AZs. This seems not a 
problem as in public cloud, infrastructure provisioning is more flexible than 
1P.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to