[ https://issues.apache.org/jira/browse/HDFS-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Ma updated HDFS-16115: ----------------------------- Description: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exceptions or errors happen in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep putting commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread was dead owing to some non-fatal errors like "can not create native thread" which is caused by too many threads existed on the node, this kind of problem should be given much torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal errors mentioned above probably can be recovered soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. was: It is an improvement issue. Actually the issue has two sub issues: 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( CommandProcessThread handle commands ), so if there are any exception or errors happens in thread CommandProcessthread resulting the thread fails and stop, of which BPServiceActor cannot aware and still keep put commands from namenode into queues waiting to be handled by CommandProcessThread, actually CommandProcessThread was dead already. 2-the second sub issue is based on the first one, if CommandProcessThread fails owing to some non-fatal error like "can not create native thread" which is caused by too many threads existed on the node, this kind of problem should be given much torlerance instead of simply shudown the thread and never recover automatically, because the non-fatal eror mention above may recover soon by itself, currently, Datanode BPServiceActor cannot turn to normal even when the non-fatal error was eliminated. > Asynchronously handle BPServiceActor command mechanism may result in > BPServiceActor never fails even CommandProcessingThread is closed with fatal > error. > -------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-16115 > URL: https://issues.apache.org/jira/browse/HDFS-16115 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Affects Versions: 3.3.1 > Reporter: Daniel Ma > Priority: Critical > Fix For: 3.3.1 > > Attachments: 0001-HDFS-16115.patch > > > It is an improvement issue. Actually the issue has two sub issues: > 1- BPServerActor thread handle commands from NameNode in aysnchronous way ( > CommandProcessThread handle commands ), so if there are any exceptions or > errors happen in thread CommandProcessthread resulting the thread fails and > stop, of which BPServiceActor cannot aware and still keep putting commands > from namenode into queues waiting to be handled by CommandProcessThread, > actually CommandProcessThread was dead already. > 2-the second sub issue is based on the first one, if CommandProcessThread was > dead owing to some non-fatal errors like "can not create native thread" which > is caused by too many threads existed on the node, this kind of problem > should be given much torlerance instead of simply shudown the thread and > never recover automatically, because the non-fatal errors mentioned above > probably can be recovered soon by itself, > currently, Datanode BPServiceActor cannot turn to normal even when the > non-fatal error was eliminated. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org