[ http://issues.apache.org/jira/browse/HADOOP-814?page=all ]
dhruba borthakur updated HADOOP-814:
------------------------------------
Attachment: (was: heartbeatlock.patch)
> Increase dfs scalability by optimizing locking on namenode.
> -----------------------------------------------------------
>
> Key: HADOOP-814
> URL: http://issues.apache.org/jira/browse/HADOOP-814
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Reporter: dhruba borthakur
> Assigned To: dhruba borthakur
> Attachments: heartbeatlock2.patch
>
>
> The current dfs namenode encounters locking bottlenecks when the number of
> datanodes is large. The namenode uses a single global lock to protect access
> to data structures. One key area is heartbeat processing. The lower the cost
> of processing a heartbeat, more the number of nodes HDFS can support. A
> simple change to this current locking model can increase the scalability.
> Here are the details:
> Case 1: Currently we have three locks, the global lock (on FSNamesystem), the
> heartbeat lock and the datanodeMap lock. The following function is called
> when a heartbeat is received by the Namenode
> public synchronized FSNamesystem. gotHeartbeat() { ........ (A)
> synchronized (heartbeat) {
> ........ (B)
> synchronized (datanodeMap) { ......... (C)
> ...
> }
> }
> In the above piece of code, statement (A) acquires the
> global-FSNamesystem-lock. This synchronization can be safely removed (remove
> updateStats too). This means that a heartbeat from the datanode can be
> processed without holding the FSnamesystem-global-lock.
> Case 2: A following thread called the heartbeatCheck thread periodically
> traverses all known Datanodes to determine if any of them has timed out. It
> is of the following form:
> void FSNamesystem.heartbeatCheck() {
> synchronized (this) {
> ........... (D)
> synchronized (heartbeats) {
> .............(E)
> }
> This thread acquires the global-FSNamesystem lock in Statement (D). This
> statement (D) can be removed. Instead the loop can check to see if any nodes
> are dead. If a dead node is found, only then it acquires the
> FSNamesystem-global-lock.
> It is possible that fixing the above two cases will cause HDFS to scale to
> higher number of nodes.
>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira