[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410396#comment-15410396 ] Amithsha commented on HDFS-1052: And By Default i am mentioning the both the namenodes IP's in Conf so why the code can take that in for loop and search blockLoction like this if not getBlockLocation(): go to second nn > HDFS scalability with multiple namenodes > > > Key: HDFS-1052 > URL: https://issues.apache.org/jira/browse/HDFS-1052 > Project: Hadoop HDFS > Issue Type: New Feature > Components: namenode >Affects Versions: 0.22.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Fix For: 0.23.0 > > Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, > HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, > Mulitple Namespaces5.pdf, high-level-design.pdf > > > HDFS currently uses a single namenode that limits scalability of the cluster. > This jira proposes an architecture to scale the nameservice horizontally > using multiple namenodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15410387#comment-15410387 ] Amithsha commented on HDFS-1052: Hi all, >From the doc and examples i could see that Namenode Federation will reduce the >pressure on single namenode by scaling it horizontally. At, the same time i want to know that if i have directory structure like this /project/project1 /project/project2 so project1 is in NN1 & project2 is in NN2 project/project1 - NN1 /project/project2 - NN2 MR Job running on the top of /project so how it will detect the BlockLocation of /project/project1 & /project/project2 1) Do i need to mention the hdfs url of /project/project1 & /project/project2 > HDFS scalability with multiple namenodes > > > Key: HDFS-1052 > URL: https://issues.apache.org/jira/browse/HDFS-1052 > Project: Hadoop HDFS > Issue Type: New Feature > Components: namenode >Affects Versions: 0.22.0 >Reporter: Suresh Srinivas >Assignee: Suresh Srinivas > Fix For: 0.23.0 > > Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, > HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, > Mulitple Namespaces5.pdf, high-level-design.pdf > > > HDFS currently uses a single namenode that limits scalability of the cluster. > This jira proposes an architecture to scale the nameservice horizontally > using multiple namenodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069477#comment-13069477 ] vamshi commented on HDFS-1052: -- suresh, i went through the Desingn document (NamenNode Federation).What i was trying to ask is, after we make NameNode distributed among multiple nodes, there exist unique Namespace with respect to each NameNode. Now , when a client requests for some block location,it should be searched in the group of NameNodes for that block details(meta data). For this lookup/searching of metadata, can we use Distributed Hashing? please let me know some thing regarding it. Thank you HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069603#comment-13069603 ] Suresh Srinivas commented on HDFS-1052: --- For this lookup/searching of metadata, can we use Distributed Hashing? please let me know some thing regarding it. Client goes directly to the namespace of interest. In order to ease this, HADOOP-7257 add client side mount tables. This eliminates the need to lookup/search on group of namenodes and facilitates client going to directly to a namespace. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068242#comment-13068242 ] vamshi commented on HDFS-1052: -- Hi suresh, few days ago i got this same idea of distributed NameNode in hadoop, i got impressed with your idea. Can i use Distributed Hash Table in working with cluster of NameNodes to serve the requests of clients? please .. Thank you HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068750#comment-13068750 ] Suresh Srinivas commented on HDFS-1052: --- vamshi, I am not sure what you mean by this. Have you looked at the design document? Federation is multiple namespaces, each supporting a namespace. Client side mount tables provide unified view of all the namespaces. May be you can ping me on the details of what you are trying to do. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051037#comment-13051037 ] Hudson commented on HDFS-1052: -- Integrated in Hadoop-Hdfs-trunk #699 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/699/]) HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050118#comment-13050118 ] Hudson commented on HDFS-1052: -- Integrated in Hadoop-Hdfs-trunk-Commit #746 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/746/]) HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037354#comment-13037354 ] Hudson commented on HDFS-1052: -- Integrated in Hadoop-Hdfs-trunk #673 (See [https://builds.apache.org/hudson/job/Hadoop-Hdfs-trunk/673/]) HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033553#comment-13033553 ] Hudson commented on HDFS-1052: -- Integrated in Hadoop-Mapreduce-trunk #679 (See [https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/679/]) HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031891#comment-13031891 ] Hudson commented on HDFS-1052: -- Integrated in Hadoop-Mapreduce-trunk-Commit #663 (See [https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/663/]) MAPREDUCE-2467. HDFS-1052 changes break the raid contrib module in MapReduce. (suresh srinivas via mahadev) HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027934#comment-13027934 ] Suresh Srinivas commented on HDFS-1052: --- HDFS-1871 fixed the build failures. Todd, are you still seeing the problem? HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027942#comment-13027942 ] Todd Lipcon commented on HDFS-1052: --- Yes, in the raid contrib - see MAPREDUCE-2465 HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027130#comment-13027130 ] Konstantin Boudnik commented on HDFS-1052: -- It'd be nice to have the test plan attached to the JIRA if any. Thanks. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Fix For: 0.23.0 Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026208#comment-13026208 ] Hadoop QA commented on HDFS-1052: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12477624/HDFS-1052.6.patch against trunk revision 1097329. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 348 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.namenode.TestBackupNode org.apache.hadoop.hdfs.server.namenode.TestBlocksWithNotEnoughRacks org.apache.hadoop.hdfs.TestDatanodeBlockScanner org.apache.hadoop.hdfs.TestDecommission org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/429//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/429//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/429//console This message is automatically generated. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026215#comment-13026215 ] Hadoop QA commented on HDFS-1052: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12477624/HDFS-1052.6.patch against trunk revision 1097329. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 348 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.namenode.TestBackupNode org.apache.hadoop.hdfs.TestDatanodeBlockScanner org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/427//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/427//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/427//console This message is automatically generated. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026715#comment-13026715 ] Suresh Srinivas commented on HDFS-1052: --- TestFileConcurrentReader is a known failure. TestBackupNode, TestDatanodeBlockScanner and TestDFSStorageStateRecovery pass on my machine. But if these tests continue to fail, I will create a separate jira to address it. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026847#comment-13026847 ] dhruba borthakur commented on HDFS-1052: +1. although the benchmarks are single node benchmarks, it looks good to go into trunk. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.6.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025897#comment-13025897 ] Suresh Srinivas commented on HDFS-1052: --- Some benchmark results: h3. TestDFSIO read tests *Without federation:* {noformat} - TestDFSIO - : read Date time: Wed Apr 27 02:04:24 PDT 2011 Number of files: 1000 Total MBytes processed: 3.0 Throughput mb/sec: 43.62329251162561 Average IO rate mb/sec: 44.619869232177734 IO rate std deviation: 5.060306158158443 Test exec time sec: 959.943 {noformat} *With federation:* {noformat} - TestDFSIO - : read Date time: Wed Apr 27 02:43:10 PDT 2011 Number of files: 1000 Total MBytes processed: 3.0 Throughput mb/sec: 45.657513857055456 Average IO rate mb/sec: 46.72107696533203 IO rate std deviation: 5.455125923399539 Test exec time sec: 924.922 {noformat} h3. TestDFSIO write tests *Without federation:* {noformat} - TestDFSIO - : write Date time: Wed Apr 27 01:47:50 PDT 2011 Number of files: 1000 Total MBytes processed: 3.0 Throughput mb/sec: 35.940755259031015 Average IO rate mb/sec: 38.236236572265625 IO rate std deviation: 5.929484960036511 Test exec time sec: 1266.624 {noformat} *With federation:* {noformat} - TestDFSIO - : write Date time: Wed Apr 27 02:27:12 PDT 2011 Number of files: 1000 Total MBytes processed: 3.0 Throughput mb/sec: 42.17884674597227 Average IO rate mb/sec: 43.11423873901367 IO rate std deviation: 5.357057259968647 Test exec time sec: 1135.298 {noformat} HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025908#comment-13025908 ] Suresh Srinivas commented on HDFS-1052: --- BTW just to clarify - the above benchmarks are based on trunk; with or without the federation patch. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022892#comment-13022892 ] Hadoop QA commented on HDFS-1052: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12477028/HDFS-1052.4.patch against trunk revision 1095789. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 344 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/400//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/400//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/400//console This message is automatically generated. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022974#comment-13022974 ] Hadoop QA commented on HDFS-1052: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12477039/HDFS-1052.5.patch against trunk revision 1095830. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 348 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.namenode.TestBackupNode org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileConcurrentReader +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/401//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/401//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/401//console This message is automatically generated. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023079#comment-13023079 ] Hadoop QA commented on HDFS-1052: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12477039/HDFS-1052.5.patch against trunk revision 1095830. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 348 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.namenode.TestBackupNode org.apache.hadoop.hdfs.TestDFSStorageStateRecovery org.apache.hadoop.hdfs.TestFileAppend4 org.apache.hadoop.hdfs.TestLargeBlock org.apache.hadoop.hdfs.TestWriteConfigurationToDFS +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/402//testReport/ Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/402//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/402//console This message is automatically generated. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, HDFS-1052.3.patch, HDFS-1052.4.patch, HDFS-1052.5.patch, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022409#comment-13022409 ] Hadoop QA commented on HDFS-1052: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12476941/HDFS-1052.patch against trunk revision 1095461. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 322 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/393//testReport/ Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/393//console This message is automatically generated. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, HDFS-1052.patch, Mulitple Namespaces5.pdf, high-level-design.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905343#action_12905343 ] Tsz Wo (Nicholas), SZE commented on HDFS-1052: -- I have created Federation Branch JIRA version. Please select it for the related JIRAs. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, high-level-design.pdf, Mulitple Namespaces5.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900605#action_12900605 ] ryan rawson commented on HDFS-1052: --- Have you considered merely increasing the heap size? Switch to a no-pause GC collector, one of them lies therein: http://www.managedruntime.org/ Right now a machine can have 256 GB of ram for ~ $10,000. That is a 4x increase over what we have now. Added bonus: no additional complexity! HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, Mulitple Namespaces5.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900910#action_12900910 ] Suresh Srinivas commented on HDFS-1052: --- ryan, this is discussed in the proposal already. Let me summarize: # Increasing the namenode heap does not increase the namenode throughput # Currently NN takes 30 mins to startup with 50G heap. The startup time would go to 2.5 hrs. There are couple of jiras improving the NN startup time. Even with that, start up time would be 1 hr for such a large heap. # While debugging memory leaks in NN, I could not get lot of tools to work with the heap dump of 40G, especially jhat. Not sure how well the tools can support 250G heap dump. # This solution does not work for installation where the NN needs to support more 4x scaling. This is needed in clusters that might want to store smaller files instead of depending on large files to reduce object count. The solution proposed here does not preclude one from using a single namenode and vertically scaling it. I am also curious about your experience and challenges of running a namenode with such large heap. We could have that discussion offline. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, Mulitple Namespaces5.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887683#action_12887683 ] Guilin Sun commented on HDFS-1052: -- Thanks Suresh, so in this proposal you means we just make DataNodes shared by several NameNodes, and in client side view, it's totally dependent different HDFS clusteres? HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, Mulitple Namespaces5.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887692#action_12887692 ] Guilin Sun commented on HDFS-1052: -- Sorry,I means independent. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, Mulitple Namespaces5.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887889#action_12887889 ] Suresh Srinivas commented on HDFS-1052: --- It is a single cluster made of multiple independent namenodes/namespaces all sharing the same set of datanodes. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, Mulitple Namespaces5.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887225#action_12887225 ] Konstantin Boudnik commented on HDFS-1052: -- It seems to me that multiple namenode approach just begs for a namenode autodiscovery, doesn't it? If a DN has to track all NNs in the cluster it adds a complexity to already puzzling configuration management and put yet another error-prone element into it. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, Mulitple Namespaces5.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886386#action_12886386 ] Suresh Srinivas commented on HDFS-1052: --- Gulin, # An application could choose to use one of the namenodes as default file system in its configuration. In that case /a/b/c will be resolved relative to that namespace. # There is a proposal in HDFS-1053 for client side mount tables, where client can define it's namespace and how it maps to server side namespace. In that case /a/b/c will be resolved in the context of client side mount table. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, Mulitple Namespaces5.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886384#action_12886384 ] Suresh Srinivas commented on HDFS-1052: --- Min, yes distributed namespace could be another proposal to solve this problem. However, it is a lot more complicated solution to develop, takes much longer time and involves a lot of changes to the system. This does not fit the time line in which we need a solution to namenode scalability. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, Mulitple Namespaces5.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885898#action_12885898 ] Guilin Sun commented on HDFS-1052: -- Hi Suresh, I have a small question, how a client gets correct NameNode of a path such as /a/b/c in your proposal? Thanks! HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, Mulitple Namespaces5.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870569#action_12870569 ] Min Zhou commented on HDFS-1052: I don't think multiple namespaces is a good solution for this issue. The datasets stored on our cluster are shared by many departments of our company. If these datasets are seperated by a number of namespaces, there is no data sharing; If we put them in one namespace managed by a single NameNode, however, the scalability is limited by NameNode's memory . Why don't we employ some distributed metadata management approaches like dynamic subtree patitioning(ceph) or hash-based partitioning(Lustre) ? Min HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf, Mulitple Namespaces5.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851499#action_12851499 ] Doug Cutting commented on HDFS-1052: If I follow the document, the initial implementation can confine its changes to the datanode: the namenode need not initially be aware of block pools, only the datanode. Is that correct? If so, I like that simplification. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851650#action_12851650 ] Sanjay Radia commented on HDFS-1052: Do you mean initial implementation or first patch? The client protocol to the DNs needs to include the block pool id if the DNs manage multiple block pools. We should change all wire protocols to include the block pool id and have the NN simply set the block pool id to zero in the reply to getBlockLocations(). HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: Block pool proposal.pdf HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848562#action_12848562 ] ryan rawson commented on HDFS-1052: --- This sounds great! Also as part of the architecture can you explain how you will improve availability? HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848568#action_12848568 ] Jeff Hammerbacher commented on HDFS-1052: - Ryan: see Sanjay's comment at https://issues.apache.org/jira/browse/HDFS-1051?focusedCommentId=12848235page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12848235 HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848702#action_12848702 ] dhruba borthakur commented on HDFS-1052: Hi Ryan, I think scalability of the NN is not directly related to the availability of the NN, isn't it? HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848964#action_12848964 ] Sanjay Radia commented on HDFS-1052: There are no failover notions amongst the multiple NNs in this proposal (ie the multiple NNs are *not* m+k failover); failover is separable and complementary to this proposal. However, multiple namenodes in a cluster will help with availability in the sense that it allows one to isolate, say production apps from non-production apps by giving them different namespaces and hence different NNs. Further with this proposal one is likely to run multiple smaller NNs and hence startup will be faster which of course helps availability of the system. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1052) HDFS scalability with multiple namenodes
[ https://issues.apache.org/jira/browse/HDFS-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847652#action_12847652 ] Suresh Srinivas commented on HDFS-1052: --- Will post the proposal document in a couple of days. HDFS scalability with multiple namenodes Key: HDFS-1052 URL: https://issues.apache.org/jira/browse/HDFS-1052 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Affects Versions: 0.22.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas HDFS currently uses a single namenode that limits scalability of the cluster. This jira proposes an architecture to scale the nameservice horizontally using multiple namenodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.