[jira] [Comment Edited] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.

2020-04-23 Thread Toshihiko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090748#comment-17090748
 ] 

Toshihiko Uchida edited comment on HDFS-14353 at 4/23/20, 5:09 PM:
---

[~ayushtkn] Thanks!
> what I remember the xmitsInProgress shouldn't go negative
Right.
> The <= change might not verify the said issue
Let me explain my suggestion, which would also be [~maobaolong]'s intention, in 
more detail.
The waitFor line makes sure that DataNodes completed all EC reconstruction 
tasks:
- Each DataNode runs the DataXceiverServer thread in the dataXceiverServer 
threadGroup;
- The thread runs DataXceiver threads in the same threadGroup when it 
receives/sends data;
- After all EC reconstruction tasks finish, the DataXceiverServer thread should 
be the only thread belonging to the threadGroup (i.e., curDn.getXceiverCount() 
== 1).

The reason for <= 1 is that the thread is not running on a dead DataNode, which 
was shutdown to cause EC reconstruction.

Attached the patch to negate curDn.getXceiverCount() > 1.
Please kindly review.


was (Author: touchida):
[~ayushtkn] Thanks!
> what I remember the xmitsInProgress shouldn't go negative
Yes.
> The <= change might not verify the said issue
Let me explain my suggestion, which would also be [~maobaolong]'s intention, in 
more detail.
The waitFor line makes sure that DataNodes completed all EC reconstruction 
tasks:
- Each DataNode runs the DataXceiverServer thread in the dataXceiverServer 
threadGroup;
- The thread runs DataXceiver threads in the same threadGroup when it 
receives/sends data;
- After all EC reconstruction tasks finish, the DataXceiverServer thread should 
be the only thread belonging to the threadGroup (i.e., curDn.getXceiverCount() 
== 1).

The reason for <= 1 is that the thread is not running on a dead DataNode, which 
was shutdown to cause EC reconstruction.

Attached the patch to negate curDn.getXceiverCount() > 1.
Please kindly review.

> Erasure Coding: metrics xmitsInProgress become to negative.
> ---
>
> Key: HDFS-14353
> URL: https://issues.apache.org/jira/browse/HDFS-14353
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, erasure-coding
>Affects Versions: 3.3.0
>Reporter: maobaolong
>Assignee: maobaolong
>Priority: Major
> Attachments: HDFS-14353.001.patch, HDFS-14353.002.patch, 
> HDFS-14353.003.patch, HDFS-14353.004.patch, HDFS-14353.005.patch, 
> HDFS-14353.006.patch, HDFS-14353.007.patch, HDFS-14353.008.patch, 
> HDFS-14353.009.patch, HDFS-14353.010.patch, HDFS-14353.010.patch, 
> screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.

2020-04-11 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081406#comment-17081406
 ] 

Brahma Reddy Battula edited comment on HDFS-14353 at 4/11/20, 6:22 PM:
---

[~elgoiri] as this is reverted, can I remove the fix version for this jira..?


was (Author: brahmareddy):
Bulk update: moved all 3.3.0 non-blocker issues, please move back if it is a 
blocker.



> Erasure Coding: metrics xmitsInProgress become to negative.
> ---
>
> Key: HDFS-14353
> URL: https://issues.apache.org/jira/browse/HDFS-14353
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, erasure-coding
>Affects Versions: 3.3.0
>Reporter: maobaolong
>Assignee: maobaolong
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14353.001.patch, HDFS-14353.002.patch, 
> HDFS-14353.003.patch, HDFS-14353.004.patch, HDFS-14353.005.patch, 
> HDFS-14353.006.patch, HDFS-14353.007.patch, HDFS-14353.008.patch, 
> HDFS-14353.009.patch, screenshot-1.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.

2019-05-24 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847740#comment-16847740
 ] 

Íñigo Goiri edited comment on HDFS-14353 at 5/24/19 5:22 PM:
-

[~maobaolong] can you take a look?
It looks like the new wait for the number of xceivers is not happening.
Actually, as [~ayushtkn] mentioned, this actually failed in our builds... my 
bad here.
I'm reverting this.


was (Author: elgoiri):
[~maobaolong] can you take a look?
It looks like the new wait for the number of xceivers is not happening.

> Erasure Coding: metrics xmitsInProgress become to negative.
> ---
>
> Key: HDFS-14353
> URL: https://issues.apache.org/jira/browse/HDFS-14353
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, erasure-coding
>Affects Versions: 3.3.0
>Reporter: maobaolong
>Assignee: maobaolong
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14353.001.patch, HDFS-14353.002.patch, 
> HDFS-14353.003.patch, HDFS-14353.004.patch, HDFS-14353.005.patch, 
> HDFS-14353.006.patch, HDFS-14353.007.patch, HDFS-14353.008.patch, 
> HDFS-14353.009.patch, screenshot-1.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.

2019-04-29 Thread maobaolong (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829882#comment-16829882
 ] 

maobaolong edited comment on HDFS-14353 at 4/30/19 2:19 AM:


[~elgoiri] I guess he will not reply you these day, because i find his last 
activity is 21/Dec/18 19:28.

And, his last activity at hadoop is 09/May/18 18:38


was (Author: maobaolong):
[~elgoiri] I guess he will not reply you these day, because i find his last 
activity is 21/Dec/18 19:28.

> Erasure Coding: metrics xmitsInProgress become to negative.
> ---
>
> Key: HDFS-14353
> URL: https://issues.apache.org/jira/browse/HDFS-14353
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, erasure-coding
>Affects Versions: 3.3.0
>Reporter: maobaolong
>Assignee: maobaolong
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14353.001.patch, HDFS-14353.002.patch, 
> HDFS-14353.003.patch, HDFS-14353.004.patch, HDFS-14353.005.patch, 
> screenshot-1.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14353) Erasure Coding: metrics xmitsInProgress become to negative.

2019-03-11 Thread maobaolong (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789208#comment-16789208
 ] 

maobaolong edited comment on HDFS-14353 at 3/11/19 6:47 AM:


The suspect code i think is around the class ErasureCodingWorker and 
StripedBlockReconstructor.

{code:java}
public void processErasureCodingTasks(
  Collection ecTasks) {
for (BlockECReconstructionInfo reconInfo : ecTasks) {
  int xmitsSubmitted = 0;
  try {
StripedReconstructionInfo stripedReconInfo =
new StripedReconstructionInfo(
reconInfo.getExtendedBlock(), reconInfo.getErasureCodingPolicy(),
reconInfo.getLiveBlockIndices(), reconInfo.getSourceDnInfos(),
reconInfo.getTargetDnInfos(), reconInfo.getTargetStorageTypes(),
reconInfo.getTargetStorageIDs());
// It may throw IllegalArgumentException from task#stripedReader
// constructor.
final StripedBlockReconstructor task =
new StripedBlockReconstructor(this, stripedReconInfo);
if (task.hasValidTargets()) {
  // See HDFS-12044. We increase xmitsInProgress even the task is only
  // enqueued, so that
  //   1) NN will not send more tasks than what DN can execute and
  //   2) DN will not throw away reconstruction tasks, and instead keeps
  //  an unbounded number of tasks in the executor's task queue.
  xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1);
  getDatanode().incrementXmitsInProcess(xmitsSubmitted);
  stripedReconstructionPool.submit(task);
} else {
  LOG.warn("No missing internal block. Skip reconstruction for task:{}",
  reconInfo);
}
  } catch (Throwable e) {
getDatanode().decrementXmitsInProgress(xmitsSubmitted);
LOG.warn("Failed to reconstruct striped block {}",
reconInfo.getExtendedBlock().getLocalBlock(), e);
  }
}
  }
{code}



was (Author: maobaolong):
The suspect code i think is around the class ErasureCodingWorker and 
StripedBlockReconstructor.

> Erasure Coding: metrics xmitsInProgress become to negative.
> ---
>
> Key: HDFS-14353
> URL: https://issues.apache.org/jira/browse/HDFS-14353
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, erasure-coding
>Affects Versions: 3.3.0
>Reporter: maobaolong
>Priority: Major
> Attachments: screenshot-1.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org