[ 
https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870951#comment-13870951
 ] 

Daryn Sharp commented on HDFS-5477:
-----------------------------------

Yes, it is true that the BM will have a large heap.  Multiple federated BMs to 
divide the load have been considered as an additional feature after the initial 
work is complete.

As a datapoint, in our clusters roughly 60% of the heap is blocks, and 40% is 
namespace.  Offloading the blocks into a BM as service has two primary benefits:

1) allowing the namespace to scale highly due to reduced heap
The end goal is for the BM(s) to run on separate host(s) which rules out shared 
memory.  We need to offload the BM memory requirements to free up more memory 
for the NN.  A RPC server with optimized calls to ideally require 1-2 
RPCs/operation seems like the most straightforward approach.  Perhaps it could 
be pluggable/configurable to use alternate proxy implementations although the 
initial implementation would either create a proxy or not to be compatible.

2) Removing a lot of unnecessary locking of the namespace.
You may be surprised (or maybe not) how the datanode manager, heartbeat 
manager, replication monitor, etc all lock the namespace.  The namespace lock 
appears to be misused as essentially an "operational lock" to prevent safemode 
or HA transitions during an operation.  (I do plan to try to tackle this 
independently because lease renewals and token operations all lock the 
namespace even though they neither update the namespace nor generate edits)

The hope is the reduced latency from concurrent read/write operations in the 
namespace via finer grain locking will offset the added latency for calls to 
the BM.

> Block manager as a service
> --------------------------
>
>                 Key: HDFS-5477
>                 URL: https://issues.apache.org/jira/browse/HDFS-5477
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: Proposal.pdf, Proposal.pdf, Standalone BM.pdf, 
> Standalone BM.pdf
>
>
> The block manager needs to evolve towards having the ability to run as a 
> standalone service to improve NN vertical and horizontal scalability.  The 
> goal is reducing the memory footprint of the NN proper to support larger 
> namespaces, and improve overall performance by decoupling the block manager 
> from the namespace and its lock.  Ideally, a distinct BM will be transparent 
> to clients and DNs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to