[ http://issues.apache.org/jira/browse/HADOOP-814?page=all ]

Doug Cutting updated HADOOP-814:
--------------------------------

           Status: Resolved  (was: Patch Available)
    Fix Version/s: 0.10.0
       Resolution: Fixed

I just committed this.  Thanks, Dhruba!

> Increase dfs scalability by optimizing locking on namenode.
> -----------------------------------------------------------
>
>                 Key: HADOOP-814
>                 URL: http://issues.apache.org/jira/browse/HADOOP-814
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>             Fix For: 0.10.0
>
>         Attachments: heartbeatlock3.patch
>
>
> The current dfs namenode encounters locking bottlenecks when the number of 
> datanodes is large. The namenode uses a single global lock to protect access 
> to data structures. One key area is heartbeat processing. The lower the cost 
> of processing a heartbeat, more the number of nodes HDFS can support.  A 
> simple change to this current locking model can increase the scalability. 
> Here are the details:
> Case 1: Currently we have three locks, the global lock (on FSNamesystem), the 
> heartbeat lock and the datanodeMap lock. The following function is called 
> when a heartbeat is received by the Namenode
> public synchronized FSNamesystem. gotHeartbeat() { ........ (A)
>     synchronized (heartbeat) {                                        
> ........ (B)
>       synchronized (datanodeMap) {                               ......... (C)
>    ...
>      }
> }
> In the above piece of code, statement (A) acquires the 
> global-FSNamesystem-lock. This synchronization can be safely removed (remove 
> updateStats too). This means that a heartbeat from the datanode can be 
> processed without holding the FSnamesystem-global-lock.
> Case 2: A following thread called the heartbeatCheck thread periodically 
> traverses all known Datanodes to determine if any of them has timed out. It 
> is of the following form:
> void FSNamesystem.heartbeatCheck() {
>             synchronized (this) {                                        
> ........... (D)
>                         synchronized (heartbeats) {                
> .............(E) 
> }
> This thread acquires the global-FSNamesystem lock in Statement (D). This 
> statement (D) can be removed. Instead the loop can check to see if any nodes 
> are dead. If a dead node is found, only then it acquires the 
> FSNamesystem-global-lock.
> It is possible that fixing the above two cases will cause HDFS to scale to 
> higher number of nodes.
>  

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to