[ https://issues.apache.org/jira/browse/HDFS-17397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830417#comment-17830417 ]
ASF GitHub Bot commented on HDFS-17397: --------------------------------------- Hexiaoqiao commented on PR #6591: URL: https://github.com/apache/hadoop/pull/6591#issuecomment-2017607444 @xleoken Thanks for your proposal. I am not sure this is the proper solution for your case as @ZanderXu mentioned. IIUC, you expect to fast fail when meet network issue between client and the first DataNode while write data to pipeline, right? IMO, it is difficult to determine to do that because, a. Sometimes we could not determine that it is client to the first DataNode network issue only using the time cost of ACK. Timeout between DataNodes in pipeline could also lead client wait time out IMO. b. If fast fail and recovery pipeline, the time cost could be more considerable, such as re-create pipeline and transfer data will involve more time cost when have writen out more than 10MB. For this case, we have discussed times, I think we need to split it to two step, report metrics back to client, then improve strategy (fast fail or switch dn or some other way based on different metrics). FYI. Thanks again. > Choose another DN as soon as possible, when encountering network issues > ----------------------------------------------------------------------- > > Key: HDFS-17397 > URL: https://issues.apache.org/jira/browse/HDFS-17397 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: xleoken > Priority: Minor > Labels: pull-request-available > Attachments: hadoop.png > > > Choose another DN as soon as possible, when encountering network issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org