DFS Scalability : a BlockReport that returns large number of 
blocks-to-be-deleted cause datanode to lost connectivity to namenode
---------------------------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-994
                 URL: https://issues.apache.org/jira/browse/HADOOP-994
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
            Reporter: dhruba borthakur


The Datanode periodically invokes a block report RPC to the Namenode. This RPC 
returns the number of blocks that are to be invalidated by the Datanode. The 
Datanode then starts to delete all the corresponding files. This block deletion 
is done by the heartbeat thread in the Datanode. If the number of files to be 
deleted is large, the Datanode stops sending heartbeats for this entire 
duration. The Namenode declares the Datanode as "dead" and starts replicating 
its blocks.

In my observed case, the block report returns 1669 blocks that were to be 
invalidated. The Datanode was running on a RAID5 ext3 filesystem and 4 active 
tasks were running on it. The deletion of  these 1669 files took about 30 
minutes, Wow! The average disk service time during this period was less than 10 
ms. The Datanode was using about 30% CPU during this time. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to