[ https://issues.apache.org/jira/browse/HADOOP-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13606171#comment-13606171 ]
Junping Du commented on HADOOP-8468: ------------------------------------ Jan, Thanks for the questions. It doesn't have to be multiple clusters and each with dedicated HDFS. It also make sense you setup some purely compute-only clusters that based on the same HDFS cluster by separating TaskTracker(or NodeManager) and DataNode into different VMs. The NodeGroup-awareness here will help to guarantee nodeGroup-level (physical host) locality. So you can power off/suspend your compute-cluster without any affection on other clusters. Given this, you don't have to suspend your HDFS cluster for saving resources for other applications. In other case, if you want to suspend a virtual cluster (with HDFS also), I would recommend you to stop HDFS service before you suspend your cluster and start again after you resume the cluster. It helps to get rid of data re-replication caused by DNs' heartbeat outage, and there is no need for extra storage tier for persistence. > Umbrella of enhancements to support different failure and locality topologies > ----------------------------------------------------------------------------- > > Key: HADOOP-8468 > URL: https://issues.apache.org/jira/browse/HADOOP-8468 > Project: Hadoop Common > Issue Type: Improvement > Components: ha, io > Affects Versions: 1.0.0, 2.0.0-alpha > Reporter: Junping Du > Assignee: Junping Du > Attachments: HADOOP-8468-total.patch, HADOOP-8468-total-v3.patch, > HVE_Hadoop World Meetup 2012.pptx, HVE User Guide on branch-1(draft ).pdf, > Proposal for enchanced failure and locality topologies.pdf, Proposal for > enchanced failure and locality topologies (revised-1.0).pdf > > > The current hadoop network topology (described in some previous issues like: > Hadoop-692) works well in classic three-tiers network when it comes out. > However, it does not take into account other failure models or changes in the > infrastructure that can affect network bandwidth efficiency like: > virtualization. > Virtualized platform has following genes that shouldn't been ignored by > hadoop topology in scheduling tasks, placing replica, do balancing or > fetching block for reading: > 1. VMs on the same physical host are affected by the same hardware failure. > In order to match the reliability of a physical deployment, replication of > data across two virtual machines on the same host should be avoided. > 2. The network between VMs on the same physical host has higher throughput > and lower latency and does not consume any physical switch bandwidth. > Thus, we propose to make hadoop network topology extend-able and introduce a > new level in the hierarchical topology, a node group level, which maps well > onto an infrastructure that is based on a virtualized environment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira