[ https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870951#comment-13870951 ]
Daryn Sharp commented on HDFS-5477: ----------------------------------- Yes, it is true that the BM will have a large heap. Multiple federated BMs to divide the load have been considered as an additional feature after the initial work is complete. As a datapoint, in our clusters roughly 60% of the heap is blocks, and 40% is namespace. Offloading the blocks into a BM as service has two primary benefits: 1) allowing the namespace to scale highly due to reduced heap The end goal is for the BM(s) to run on separate host(s) which rules out shared memory. We need to offload the BM memory requirements to free up more memory for the NN. A RPC server with optimized calls to ideally require 1-2 RPCs/operation seems like the most straightforward approach. Perhaps it could be pluggable/configurable to use alternate proxy implementations although the initial implementation would either create a proxy or not to be compatible. 2) Removing a lot of unnecessary locking of the namespace. You may be surprised (or maybe not) how the datanode manager, heartbeat manager, replication monitor, etc all lock the namespace. The namespace lock appears to be misused as essentially an "operational lock" to prevent safemode or HA transitions during an operation. (I do plan to try to tackle this independently because lease renewals and token operations all lock the namespace even though they neither update the namespace nor generate edits) The hope is the reduced latency from concurrent read/write operations in the namespace via finer grain locking will offset the added latency for calls to the BM. > Block manager as a service > -------------------------- > > Key: HDFS-5477 > URL: https://issues.apache.org/jira/browse/HDFS-5477 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 2.0.0-alpha, 3.0.0 > Reporter: Daryn Sharp > Assignee: Daryn Sharp > Attachments: Proposal.pdf, Proposal.pdf, Standalone BM.pdf, > Standalone BM.pdf > > > The block manager needs to evolve towards having the ability to run as a > standalone service to improve NN vertical and horizontal scalability. The > goal is reducing the memory footprint of the NN proper to support larger > namespaces, and improve overall performance by decoupling the block manager > from the namespace and its lock. Ideally, a distinct BM will be transparent > to clients and DNs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)