[ https://issues.apache.org/jira/browse/HDFS-17397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822331#comment-17822331 ]
ASF GitHub Bot commented on HDFS-17397: --------------------------------------- xleoken commented on code in PR #6591: URL: https://github.com/apache/hadoop/pull/6591#discussion_r1508372843 ########## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java: ########## @@ -1182,10 +1182,12 @@ public void run() { if (begin != null) { long duration = Time.monotonicNowNanos() - begin; if (TimeUnit.NANOSECONDS.toMillis(duration) > dfsclientSlowLogThresholdMs) { - LOG.info("Slow ReadProcessor read fields for block " + block + final String msg = "Slow ReadProcessor read fields for block " + block + " took " + TimeUnit.NANOSECONDS.toMillis(duration) + "ms (threshold=" + dfsclientSlowLogThresholdMs + "ms); ack: " + ack - + ", targets: " + Arrays.asList(targets)); + + ", targets: " + Arrays.asList(targets); + LOG.warn(msg); + throw new IOException(msg); Review Comment: Welcome @ZanderXu > How to identify this case When the client takes more time to read ack than `dfsclientSlowLogThresholdMs`. > Which datanode should be marked as a bad or slow DN When some datanodes in poor network environment. > Maybe Datastreamer can identify this case and recovery it through PipelineRecovery The core issue is that the response time between the client and DN is greater than `dfsclientSlowLogThresholdMs`, but only print a log without taking any action. We should print the log and throw an `IOException`. > but I don't think your modification is a good solution. Maybe you're right, but this may be the simplest modification. After this patch, we solved the slow dn problem in production environment. 1. 打了patch之后,客户端会在超时`dfsclientSlowLogThresholdMs`之后立马选择一个新的DN完成写操作,尽量保证客户端写入不hang死在与某些慢dn交互中 2. 这些慢节点会出现在hdfs的jmx里面,当监控到这些慢节点,运维会有后续的处理方案 > Choose another DN as soon as possible, when encountering network issues > ----------------------------------------------------------------------- > > Key: HDFS-17397 > URL: https://issues.apache.org/jira/browse/HDFS-17397 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: xleoken > Priority: Minor > Labels: pull-request-available > Attachments: hadoop.png > > > Choose another DN as soon as possible, when encountering network issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org