[jira] [Commented] (HDFS-4945) A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS
[ https://issues.apache.org/jira/browse/HDFS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698281#comment-13698281 ] Konstantin Shvachko commented on HDFS-4945: --- Yonghwan, the ideas sound interesting. But it looks to me like a new file system rather than a new feature of HDFS. Do you plan to replace HDFS or evolve it? I've been working on [Giraffa|http://code.google.com/a/apache-extras.org/p/giraffa/source/browse/?name=trunk] project. Is it similar to your ideas? You were saying "we" on several occasions. Who do you mean? > A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS > -- > > Key: HDFS-4945 > URL: https://issues.apache.org/jira/browse/HDFS-4945 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover >Affects Versions: HA branch (HDFS-1623) >Reporter: Yonghwan Kim > Labels: documentation > > See the following comment for detailed description. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4945) A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS
[ https://issues.apache.org/jira/browse/HDFS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696782#comment-13696782 ] Uma Maheswara Rao G commented on HDFS-4945: --- Most of the Questions Suresh already asked for more clarity on this feature. I have one question to know: {quote} When each fragment has k replicas, the file system can tolerate up to floor(k/2 - 1) faulty NameNodes. {quote} How/where you will manage this fragment details metadata? Regards, Uma > A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS > -- > > Key: HDFS-4945 > URL: https://issues.apache.org/jira/browse/HDFS-4945 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover >Affects Versions: HA branch (HDFS-1623) >Reporter: Yonghwan Kim > Labels: documentation > > See the following comment for detailed description. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4945) A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS
[ https://issues.apache.org/jira/browse/HDFS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696278#comment-13696278 ] Yonghwan Kim commented on HDFS-4945: Dear Suresh Srinivas, Thank you for your helpful comments. 1. As you said, in the case of HDFS 2.0, an NFS is necessary to share edit logs between 2 NNs. However, we simple thought an NFS can be a new SPOF, so some way to guarantee the fault-tolerance of an NFS should be needed (i.e. RAID or network multiplexing). We just express it(fault-tolerant technique) in a word 'reliable sophisticated storage'. Actually, latest version of HDFS 2.0(maybe 2.0.5) adopts QJM(HDFS-3077), and one of its requirements is "No requirement for special hardware". Likewise, we want to propose a new architecture using only commodity hardwares. 2&3. I deeply apologize for absence of detail information. I'm writing a technical document (paper) (including background, related works, design, proof of consistency) but not completed yet. Please wait for a while and I'll add some information here before complete the writing our paper. 4. We just had tested the system's overhead for synchronizing among NNs with some dummy data. After the completion of writing document, we will implement a prototype with utmost effort. (We already had proved that our system is consistent in theory) Affects Version is my mistake. I'll edit it to HDFS-1623. Thanks. Sorry for my poor English. > A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS > -- > > Key: HDFS-4945 > URL: https://issues.apache.org/jira/browse/HDFS-4945 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover >Affects Versions: HA branch (HDFS-1623) >Reporter: Yonghwan Kim > Labels: documentation > > See the following comment for detailed description. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4945) A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS
[ https://issues.apache.org/jira/browse/HDFS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696269#comment-13696269 ] Suresh Srinivas commented on HDFS-4945: --- Some comments: # I have hardtime understanding "two NameNodes have to share a highly-reliable sophisticated storage" - what is sophisticated about having an NFS mount point, currently used for storing FSImage? # For the sake of completeness please include information related federation which can support multiple namenodes in a cluster # Please add a design document. # Do you have a prototype working? Why is Affects Version/s set to HDFS-1623? > A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS > -- > > Key: HDFS-4945 > URL: https://issues.apache.org/jira/browse/HDFS-4945 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover >Affects Versions: HA branch (HDFS-1623) >Reporter: Yonghwan Kim > Labels: documentation > > See the following comment for detailed description. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4945) A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS
[ https://issues.apache.org/jira/browse/HDFS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696268#comment-13696268 ] Suresh Srinivas commented on HDFS-4945: --- [~chaika1015] I have moved the long description to a subsequent comment. > A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS > -- > > Key: HDFS-4945 > URL: https://issues.apache.org/jira/browse/HDFS-4945 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover >Affects Versions: HA branch (HDFS-1623) >Reporter: Yonghwan Kim > Labels: documentation > > See the following comment for detailed description. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4945) A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS
[ https://issues.apache.org/jira/browse/HDFS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696267#comment-13696267 ] Suresh Srinivas commented on HDFS-4945: --- Recently, Hadoop attracts much attention of engineers and researchers as an emerging and effective framework for Big Data. HDFS(Hadoop Distributed File System) can manage huge amount of data with guaranteeing high performance and reliability with only commodity hardware. However, HDFS requires a single master node, called NameNode, to manage the entire namespace (or all the i-nodes) of a file system. This causes SPOF (Single Point Of Failure) problem because the file system becomes inaccessible when the NameNode fails. (HDFS-2064) This also causes a bottleneck of efficiency since all the access requests to the file system have to contact the NameNode. Hadoop 2.0 resolves the SPOF problem by introducing manual failover based on two NameNodes, Active and Standby. However, it still has the efficiency bottleneck problem since all the access requests have to contact the Active in ordinary executions. It may also lose an advantage of using commodity hardware since the two NameNodes have to share a highly-reliable sophisticated storage. We here propose a new HDFS architecture to resolve all the problems mentioned above. The proposed architecture has the following features and advantages. 1. Multiple NameNodes (not restricted to two) can be utilized to improve availability. The entire namespace of a file system is partitioned into several fragments, and replicas of each fragment are dispersed among the NameNodes. When each fragment has k replicas, the file system can tolerate up to floor(k/2 - 1) faulty NameNodes. 2. Multiple NameNodes can be utilized to improve performance. The performance bottleneck caused by a single NameNode can be circumvented by assigning different NameNodes to different fragments as the primary ones (or the entry points). 3. The highly-reliable storage shared by the NameNodes is removed by introducing message-based consistency mechanism among the NameNodes. The architecture requires only commodity hardware. > A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS > -- > > Key: HDFS-4945 > URL: https://issues.apache.org/jira/browse/HDFS-4945 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover >Affects Versions: HA branch (HDFS-1623) >Reporter: Yonghwan Kim > Labels: documentation > > See the following comment for detailed description. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira