[jira] [Comment Edited] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.
[ https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090748#comment-17090748 ] Toshihiko Uchida edited comment on HDFS-14353 at 4/23/20, 5:09 PM: --- [~ayushtkn] Thanks! > what I remember the xmitsInProgress shouldn't go negative Right. > The <= change might not verify the said issue Let me explain my suggestion, which would also be [~maobaolong]'s intention, in more detail. The waitFor line makes sure that DataNodes completed all EC reconstruction tasks: - Each DataNode runs the DataXceiverServer thread in the dataXceiverServer threadGroup; - The thread runs DataXceiver threads in the same threadGroup when it receives/sends data; - After all EC reconstruction tasks finish, the DataXceiverServer thread should be the only thread belonging to the threadGroup (i.e., curDn.getXceiverCount() == 1). The reason for <= 1 is that the thread is not running on a dead DataNode, which was shutdown to cause EC reconstruction. Attached the patch to negate curDn.getXceiverCount() > 1. Please kindly review. was (Author: touchida): [~ayushtkn] Thanks! > what I remember the xmitsInProgress shouldn't go negative Yes. > The <= change might not verify the said issue Let me explain my suggestion, which would also be [~maobaolong]'s intention, in more detail. The waitFor line makes sure that DataNodes completed all EC reconstruction tasks: - Each DataNode runs the DataXceiverServer thread in the dataXceiverServer threadGroup; - The thread runs DataXceiver threads in the same threadGroup when it receives/sends data; - After all EC reconstruction tasks finish, the DataXceiverServer thread should be the only thread belonging to the threadGroup (i.e., curDn.getXceiverCount() == 1). The reason for <= 1 is that the thread is not running on a dead DataNode, which was shutdown to cause EC reconstruction. Attached the patch to negate curDn.getXceiverCount() > 1. Please kindly review. > Erasure Coding: metrics xmitsInProgress become to negative. > --- > > Key: HDFS-14353 > URL: https://issues.apache.org/jira/browse/HDFS-14353 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, erasure-coding >Affects Versions: 3.3.0 >Reporter: maobaolong >Assignee: maobaolong >Priority: Major > Attachments: HDFS-14353.001.patch, HDFS-14353.002.patch, > HDFS-14353.003.patch, HDFS-14353.004.patch, HDFS-14353.005.patch, > HDFS-14353.006.patch, HDFS-14353.007.patch, HDFS-14353.008.patch, > HDFS-14353.009.patch, HDFS-14353.010.patch, HDFS-14353.010.patch, > screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.
[ https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081406#comment-17081406 ] Brahma Reddy Battula edited comment on HDFS-14353 at 4/11/20, 6:22 PM: --- [~elgoiri] as this is reverted, can I remove the fix version for this jira..? was (Author: brahmareddy): Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a blocker. > Erasure Coding: metrics xmitsInProgress become to negative. > --- > > Key: HDFS-14353 > URL: https://issues.apache.org/jira/browse/HDFS-14353 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, erasure-coding >Affects Versions: 3.3.0 >Reporter: maobaolong >Assignee: maobaolong >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14353.001.patch, HDFS-14353.002.patch, > HDFS-14353.003.patch, HDFS-14353.004.patch, HDFS-14353.005.patch, > HDFS-14353.006.patch, HDFS-14353.007.patch, HDFS-14353.008.patch, > HDFS-14353.009.patch, screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.
[ https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847740#comment-16847740 ] Íñigo Goiri edited comment on HDFS-14353 at 5/24/19 5:22 PM: - [~maobaolong] can you take a look? It looks like the new wait for the number of xceivers is not happening. Actually, as [~ayushtkn] mentioned, this actually failed in our builds... my bad here. I'm reverting this. was (Author: elgoiri): [~maobaolong] can you take a look? It looks like the new wait for the number of xceivers is not happening. > Erasure Coding: metrics xmitsInProgress become to negative. > --- > > Key: HDFS-14353 > URL: https://issues.apache.org/jira/browse/HDFS-14353 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, erasure-coding >Affects Versions: 3.3.0 >Reporter: maobaolong >Assignee: maobaolong >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14353.001.patch, HDFS-14353.002.patch, > HDFS-14353.003.patch, HDFS-14353.004.patch, HDFS-14353.005.patch, > HDFS-14353.006.patch, HDFS-14353.007.patch, HDFS-14353.008.patch, > HDFS-14353.009.patch, screenshot-1.png > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.
[ https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829882#comment-16829882 ] maobaolong edited comment on HDFS-14353 at 4/30/19 2:19 AM: [~elgoiri] I guess he will not reply you these day, because i find his last activity is 21/Dec/18 19:28. And, his last activity at hadoop is 09/May/18 18:38 was (Author: maobaolong): [~elgoiri] I guess he will not reply you these day, because i find his last activity is 21/Dec/18 19:28. > Erasure Coding: metrics xmitsInProgress become to negative. > --- > > Key: HDFS-14353 > URL: https://issues.apache.org/jira/browse/HDFS-14353 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, erasure-coding >Affects Versions: 3.3.0 >Reporter: maobaolong >Assignee: maobaolong >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14353.001.patch, HDFS-14353.002.patch, > HDFS-14353.003.patch, HDFS-14353.004.patch, HDFS-14353.005.patch, > screenshot-1.png > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.
[ https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789208#comment-16789208 ] maobaolong edited comment on HDFS-14353 at 3/11/19 6:47 AM: The suspect code i think is around the class ErasureCodingWorker and StripedBlockReconstructor. {code:java} public void processErasureCodingTasks( Collection ecTasks) { for (BlockECReconstructionInfo reconInfo : ecTasks) { int xmitsSubmitted = 0; try { StripedReconstructionInfo stripedReconInfo = new StripedReconstructionInfo( reconInfo.getExtendedBlock(), reconInfo.getErasureCodingPolicy(), reconInfo.getLiveBlockIndices(), reconInfo.getSourceDnInfos(), reconInfo.getTargetDnInfos(), reconInfo.getTargetStorageTypes(), reconInfo.getTargetStorageIDs()); // It may throw IllegalArgumentException from task#stripedReader // constructor. final StripedBlockReconstructor task = new StripedBlockReconstructor(this, stripedReconInfo); if (task.hasValidTargets()) { // See HDFS-12044. We increase xmitsInProgress even the task is only // enqueued, so that // 1) NN will not send more tasks than what DN can execute and // 2) DN will not throw away reconstruction tasks, and instead keeps // an unbounded number of tasks in the executor's task queue. xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1); getDatanode().incrementXmitsInProcess(xmitsSubmitted); stripedReconstructionPool.submit(task); } else { LOG.warn("No missing internal block. Skip reconstruction for task:{}", reconInfo); } } catch (Throwable e) { getDatanode().decrementXmitsInProgress(xmitsSubmitted); LOG.warn("Failed to reconstruct striped block {}", reconInfo.getExtendedBlock().getLocalBlock(), e); } } } {code} was (Author: maobaolong): The suspect code i think is around the class ErasureCodingWorker and StripedBlockReconstructor. > Erasure Coding: metrics xmitsInProgress become to negative. > --- > > Key: HDFS-14353 > URL: https://issues.apache.org/jira/browse/HDFS-14353 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, erasure-coding >Affects Versions: 3.3.0 >Reporter: maobaolong >Priority: Major > Attachments: screenshot-1.png > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org