[
https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12676910#action_12676910
]
Konstantin Shvachko commented on HADOOP-4584:
---------------------------------------------
As I said I propose to isolate in-memory block reports into a separate issue.
Does anybody disagree with that?
As for the heartbeat thread, I would like to propose an alternative to the
approach and discuss pros and cons of the two.
# Now we have a single thread (call it offerServer thread) which does all three
operations: heartbeat with processing command returned from the name-node,
blockReceived and blockReport.
# Current Suresh's proposal is to separate heartbeats into a new thread
(heartbeat thread), which also means creating a queue of commands returned from
name-node for processing by the offerServer thread later on.
# My proposal is to separate block report preparation into a new thread
(blockReport thread), which wakes up once an hour and prepares a block report.
Once the report is ready the offerService thread sends it to the name-node.
I think the last proposal (3) may have an advantage over (2) because in (2) we
still delay blockReceived and the processing of commands from the name-node
until the block report is getting composed.
> Slow generation of blockReport at DataNode causes delay of sending heartbeat
> to NameNode
> ----------------------------------------------------------------------------------------
>
> Key: HADOOP-4584
> URL: https://issues.apache.org/jira/browse/HADOOP-4584
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Reporter: Hairong Kuang
> Assignee: Suresh Srinivas
> Fix For: 0.20.0
>
> Attachments: 4584.patch, 4584.patch, 4584.patch, 4584.patch,
> 4584.patch, 4584.patch
>
>
> sometimes due to disk or some other problems, datanode takes minutes or tens
> of minutes to generate a block report. It causes the datanode not able to
> send heartbeat to NameNode every 3 seconds. In the worst case, it makes
> NameNode to detect a lost heartbeat and wrongly decide that the datanode is
> dead.
> It would be nice to have two threads instead. One thread is for scanning data
> directories and generating block report, and executes the requests sent by
> NameNode; Another thread is for sending heartbeats, block reports, and
> picking up the requests from NameNode. By having these two threads, the
> sending of heartbeats will not get delayed by any slow block report or slow
> execution of NameNode requests.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.