Xiaoqiao He created HDFS-14997:
----------------------------------

             Summary: BPServiceActor process command from NameNode 
asynchronously
                 Key: HDFS-14997
                 URL: https://issues.apache.org/jira/browse/HDFS-14997
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: datanode
            Reporter: Xiaoqiao He
            Assignee: Aiphago


There are two core functions, report(#sendHeartbeat, #blockReport, 
#cacheReport) and #processCommand in #BPServiceActor main process flow. If 
processCommand cost long time it will block send report flow. Meanwhile 
processCommand could cost long time(over 1000s the worst case I meet) when IO 
load  of DataNode is very high. Since some IO operations are under 
#datasetLock, So it has to wait to acquire #datasetLock long time when process 
some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat will not 
send to NameNode in-time, and trigger other disasters.
I propose to improve #processCommand asynchronously and not block 
#BPServiceActor to send heartbeat back to NameNode when meet high IO load.
Notes:
1. Lifeline could be one effective solution, however some old branches are not 
support this feature.
2. IO operations under #datasetLock is another issue, I think we should solve 
it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to