[jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode

Konstantin Shvachko (JIRA) Tue, 24 Feb 2009 11:57:24 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676391#action_12676391
 ]


Konstantin Shvachko commented on HADOOP-4584:
---------------------------------------------

So what is wrong with just going with the original proposal in this jira that 
is: prepare block reports in a separate thread without delaying heartbeats and 
other commands, and sending them as soon as they are ready by 
{{offerService()}}. This seem to be the mission declared by the issue, and 
changing block reports to be memory based is an add-on, which is not required 
to solve the problem stated.
I understand Dhruba's concerns about reliability. I can add to this that memory 
based reports can also slow down cleaning up disks from unnecessary blocks, 
which may be critical if the data-node is close to running out of disk space.
My approach would be to drop the in-memory block report part and commit the 
rest. The in-memory reports can be discussed in a subsequent issue.
I think that would be enough of a change by itself, because there may be a 
dangerous race condition between {{blockReceived()}} and {{blockReport()}} if 
it is not done right, as we had seen before.

> Slow generation of blockReport at DataNode causes delay of sending heartbeat 
> to NameNode
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>
>         Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch, 
> 4584.patch, 4584.patch
>
>
> sometimes due to disk or some other problems, datanode takes minutes or tens 
> of minutes to generate a block report. It causes the datanode not able to 
> send heartbeat to NameNode every 3 seconds. In the worst case, it makes 
> NameNode to detect a lost heartbeat and wrongly decide that the datanode is 
> dead.
> It would be nice to have two threads instead. One thread is for scanning data 
> directories and generating block report, and executes the requests sent by 
> NameNode; Another thread is for sending heartbeats, block reports, and 
> picking up the requests from NameNode. By having these two threads, the 
> sending of heartbeats will not get delayed by any slow block report or slow 
> execution of NameNode requests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode

Reply via email to