[jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode

Suresh Srinivas (JIRA) Wed, 08 Apr 2009 13:56:36 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697199#action_12697199
 ]


Suresh Srinivas commented on HADOOP-4584:
-----------------------------------------

Will take care of other comments.
bq. 1.  default scan period is one hour (same as before).. I think it should be 
much less often (may be 6 to 24 hours).
I wanted to retain the old behavior of scanning a directory every 1 hour for 
now. Changing it to 6 hours, if no one expresses concerns.

bq.   2. Since there is no throttling of directory scan, it is better to 
randomize the start time. The datanodes are usually started at the same time, 
the whole cluster could slow down at the same time.
Randomizing between 0 and directory scan period?

bq.   5. At patchfile:834 : It updates generation stamp with 'diskGS' without 
moving the meta file from prev directory to memBlock's directory. Could that 
result in block and meta files in different directories?
I am not sure if I should be moving files. I think it is better to use the file 
if it exists in the same directory as the block file. Otherwise, update the GS 
to grandfather generation stamp.


> Slow generation of blockReport at DataNode causes delay of sending heartbeat 
> to NameNode
> ----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4584
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4584
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Suresh Srinivas
>             Fix For: 0.20.0
>
>         Attachments: 4584.brthread.2.patch, 4584.brthread.3.patch, 
> 4584.brthread.3.patch, 4584.brthread.3.patch, 4584.brthread.3.patch, 
> 4584.brthread.3.patch, 4584.brthread.4.patch, 4584.brthread.4.patch, 
> 4584.brthread.4.patch, 4584.hbthread.patch, 4584.patch, 4584.patch, 
> 4584.patch, 4584.patch, 4584.patch, 4584.patch, Design.pdf, Design.pdf
>
>
> sometimes due to disk or some other problems, datanode takes minutes or tens 
> of minutes to generate a block report. It causes the datanode not able to 
> send heartbeat to NameNode every 3 seconds. In the worst case, it makes 
> NameNode to detect a lost heartbeat and wrongly decide that the datanode is 
> dead.
> It would be nice to have two threads instead. One thread is for scanning data 
> directories and generating block report, and executes the requests sent by 
> NameNode; Another thread is for sending heartbeats, block reports, and 
> picking up the requests from NameNode. By having these two threads, the 
> sending of heartbeats will not get delayed by any slow block report or slow 
> execution of NameNode requests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode

Reply via email to