[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuanxin Zhu updated HDFS-16293: ------------------------------- Attachment: (was: HDFS-16293.03.patch) > Client sleeps and holds 'dataQueue' when DataNodes are congested > ---------------------------------------------------------------- > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client > Affects Versions: 3.2.2, 3.3.1, 3.2.3 > Reporter: Yuanxin Zhu > Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org