[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HDFS-16293: Fix Version/s: 3.3.2 (was: 3.3.3) > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Assignee: Yuanxin Zhu >Priority: Major > Fix For: 3.4.0, 3.3.2 > > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch, > HDFS-16293.05.patch, HDFS-16293.06.patch, HDFS-16293.07.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-16293: Fix Version/s: 3.4.0 3.3.3 Resolution: Fixed Status: Resolved (was: Patch Available) > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Assignee: Yuanxin Zhu >Priority: Major > Fix For: 3.4.0, 3.3.3 > > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch, > HDFS-16293.05.patch, HDFS-16293.06.patch, HDFS-16293.07.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Attachment: HDFS-16293.07.patch > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Assignee: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch, > HDFS-16293.05.patch, HDFS-16293.06.patch, HDFS-16293.07.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Attachment: HDFS-16293.06.patch > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Assignee: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch, > HDFS-16293.05.patch, HDFS-16293.06.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Attachment: HDFS-16293.05.patch > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Assignee: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch, > HDFS-16293.05.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Attachment: HDFS-16293.04.patch > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Assignee: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Attachment: HDFS-16293.03.patch > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Attachment: (was: HDFS-16293.03.patch) > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Attachment: HDFS-16293.03.patch > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Attachment: HDFS-16293.02.patch > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Attachment: (was: HDFS-16293.02.patch) > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Attachment: HDFS-16293.02.patch > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Affects Version/s: 3.3.1 3.2.3 > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Attachment: HDFS-16293.01-branch-3.2.2.patch > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2 >Reporter: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Attachment: HDFS-16293.01.patch Status: Patch Available (was: Open) > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2 >Reporter: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Description: When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for testing, DataNodes are congested(HDFS-8008). The client enters the sleep state after receiving the ACK for many times, but does not release the 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to release the 'dataQueue', which is equivalent to that the ResponseProcessor thread also enters sleep, resulting in ACK delay.MapReduce tasks can be delayed by tens of minutes or even hours. The DataStreamer thread can first execute 'one = dataQueue. getFirst()', release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' according to 'one.isHeartbeatPacket()' was: When I open the ECN and use Terasort(500G,8 DataNodes,76 vcores/DN) for testing, DataNodes are congested(HDFS-8008). The client enters the sleep state after receiving the ACK for many times, but does not release the 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to release the 'dataQueue', which is equivalent to that the ResponseProcessor thread also enters sleep, resulting in ACK delay.MapReduce tasks can be delayed by tens of minutes or even hours. The DataStreamer thread can first execute 'one = dataQueue. getFirst()', release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' according to 'one.isHeartbeatPacket()' > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2 >Reporter: Yuanxin Zhu >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Description: When I open the ECN and use Terasort(500G,8 DataNodes,76 vcores/DN) for testing, DataNodes are congested(HDFS-8008). The client enters the sleep state after receiving the ACK for many times, but does not release the 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to release the 'dataQueue', which is equivalent to that the ResponseProcessor thread also enters sleep, resulting in ACK delay.MapReduce tasks can be delayed by tens of minutes or even hours. The DataStreamer thread can first execute 'one = dataQueue. getFirst()', release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' according to 'one.isHeartbeatPacket()' was: When I open the ECN and use Terasort for testing, DataNodes are congested(HDFS-8008). The client enters the sleep state after receiving the ACK for many times, but does not release the 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to release the 'dataQueue', which is equivalent to that the ResponseProcessor thread also enters sleep, resulting in ACK delay.MapReduce tasks can be delayed by tens of minutes or even hours. The DataStreamer thread can first execute 'one = dataQueue. getFirst()', release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' according to 'one.isHeartbeatPacket()' > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2 >Reporter: Yuanxin Zhu >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Summary: Client sleeps and holds 'dataQueue' when DataNodes are congested (was: Client sleep and hold 'dataQueue' when DataNodes are congested) > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2 >Reporter: Yuanxin Zhu >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort for testing, DataNodes are > congested(HDFS-8008). The client enters the sleep state after receiving the > ACK for many times, but does not release the 'dataQueue'. The > ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org