[ 
https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672824#action_12672824
 ] 

Raghu Angadi commented on HADOOP-4584:
--------------------------------------

Regd more complete solution, a much simpler fix could be :
 
  * BlockReport does not list the directories at all, but just sends the 
in-memory list of blocks
  * Any mysteriously missing blocks would be caught by DataBlockScanner 
(default period 3 weeks)

Pros :
   * The simplest fix with maximum benefit
   * Good precursor to eventually removing or drastically less frequent block 
reports

Cons :
    * DataBlockScanner is rather slow (though the period could configured)
         ** But we have rarely seen blocks disappear. more likely there are 
truncated or corrupt.

This is preferred fix for this jira.

Separating heartbeat from block reports and deletions that this attached patch 
does could still be useful. I am +1 having that too.. but not a requirement.


> Slow generation of blockReport at DataNode causes delay of sending heartbeat 
> to NameNode
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>
>         Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 
> 4584.patch
>
>
> sometimes due to disk or some other problems, datanode takes minutes or tens 
> of minutes to generate a block report. It causes the datanode not able to 
> send heartbeat to NameNode every 3 seconds. In the worst case, it makes 
> NameNode to detect a lost heartbeat and wrongly decide that the datanode is 
> dead.
> It would be nice to have two threads instead. One thread is for scanning data 
> directories and generating block report, and executes the requests sent by 
> NameNode; Another thread is for sending heartbeats, block reports, and 
> picking up the requests from NameNode. By having these two threads, the 
> sending of heartbeats will not get delayed by any slow block report or slow 
> execution of NameNode requests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to