[
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980067#comment-16980067
]
Xiaoqiao He commented on HDFS-14997:
------------------------------------
Thanks [~elgoiri] for the deep reviews.
{quote}Do we need to do take and then poll?{quote}
Actually I want to waiting if necessary until an element becomes available in
the #BlockingQueue. In my opinion, there is not very different between take +
poll vs only take.
{quote}If it is interrupted, shouldn't shouldRun() return false so no need to
break?{quote}
I think it will more friendly if only CommandProcessingThread interrupted but
not DataNode process.
{quote}processCommand(DatanodeCommand[] cmds) is kind of repeated now. Should
we merge the new and the old together?{quote}
Sorry, I don't get this point.
Others are fix please help to check [^HDFS-14997.003.patch]. Correct me if
something I misunderstand please.
Check failed unit tests and run at local, it seems passed except
#TestRedudantBlocks. I will follow up.
> BPServiceActor process command from NameNode asynchronously
> -----------------------------------------------------------
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Reporter: Xiaoqiao He
> Assignee: Xiaoqiao He
> Priority: Major
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch,
> HDFS-14997.003.patch
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport,
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If
> processCommand cost long time it will block send report flow. Meanwhile
> processCommand could cost long time(over 1000s the worst case I meet) when IO
> load of DataNode is very high. Since some IO operations are under
> #datasetLock, So it has to wait to acquire #datasetLock long time when
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve
> it at another JIRA.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]