[jira] [Work logged] (HDFS-16544) EC decoding failed due to invalid buffer

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16544?focusedWorklogId=758991&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758991
 ]

ASF GitHub Bot logged work on HDFS-16544:
-

Author: ASF GitHub Bot
Created on: 20/Apr/22 06:25
Start Date: 20/Apr/22 06:25
Worklog Time Spent: 10m 
  Work Description: liubingxing commented on PR #4179:
URL: https://github.com/apache/hadoop/pull/4179#issuecomment-1103516579

   Thanks @tasanuma for the merged and thanks @jojochuang 




Issue Time Tracking
---

Worklog Id: (was: 758991)
Time Spent: 1h 10m  (was: 1h)

> EC decoding failed due to invalid buffer
> 
>
> Key: HDFS-16544
> URL: https://issues.apache.org/jira/browse/HDFS-16544
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we 
> found an EC file decoding bug if more than one data block read failed. 
> Currently, we found another bug trigger by #StatefulStripeReader.decode.
> If we read an EC file which {*}length more than one stripe{*}, and this file 
> have *one data block* and *the first parity block* corrupted, this error will 
> happen.
> {code:java}
> org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not 
> allowing null    at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132)
>     at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:48)
>     at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
>     at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
>     at 
> org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435)
>     at 
> org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
>     at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392)
>     at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315)
>     at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408)
>     at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) 
> {code}
>  
> Let's say we use ec(6+3) and the data block[0] and the first parity block[6] 
> are corrupted.
>  # The readers for block[0] and block[6] will be closed after reading the 
> first stripe of an EC file;
>  # When the client reading the second stripe of the EC file, it will trigger 
> #prepareParityChunk for block[6]. 
>  # The decodeInputs[6] will not be constructed because the reader for 
> block[6] was closed.
>  
> {code:java}
> boolean prepareParityChunk(int index) {
>   Preconditions.checkState(index >= dataBlkNum
>   && alignedStripe.chunks[index] == null);
>   if (readerInfos[index] != null && readerInfos[index].shouldSkip) {
> alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING);
> // we have failed the block reader before
> return false;
>   }
>   final int parityIndex = index - dataBlkNum;
>   ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate();
>   buf.position(cellSize * parityIndex);
>   buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock);
>   decodeInputs[index] =
>   new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock);
>   alignedStripe.chunks[index] =
>   new StripingChunk(decodeInputs[index].getBuffer());
>   return true;
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16544) EC decoding failed due to invalid buffer

2022-04-19 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16544.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.4
 Assignee: qinyuren
   Resolution: Fixed

> EC decoding failed due to invalid buffer
> 
>
> Key: HDFS-16544
> URL: https://issues.apache.org/jira/browse/HDFS-16544
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we 
> found an EC file decoding bug if more than one data block read failed. 
> Currently, we found another bug trigger by #StatefulStripeReader.decode.
> If we read an EC file which {*}length more than one stripe{*}, and this file 
> have *one data block* and *the first parity block* corrupted, this error will 
> happen.
> {code:java}
> org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not 
> allowing null    at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132)
>     at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:48)
>     at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
>     at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
>     at 
> org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435)
>     at 
> org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
>     at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392)
>     at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315)
>     at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408)
>     at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) 
> {code}
>  
> Let's say we use ec(6+3) and the data block[0] and the first parity block[6] 
> are corrupted.
>  # The readers for block[0] and block[6] will be closed after reading the 
> first stripe of an EC file;
>  # When the client reading the second stripe of the EC file, it will trigger 
> #prepareParityChunk for block[6]. 
>  # The decodeInputs[6] will not be constructed because the reader for 
> block[6] was closed.
>  
> {code:java}
> boolean prepareParityChunk(int index) {
>   Preconditions.checkState(index >= dataBlkNum
>   && alignedStripe.chunks[index] == null);
>   if (readerInfos[index] != null && readerInfos[index].shouldSkip) {
> alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING);
> // we have failed the block reader before
> return false;
>   }
>   final int parityIndex = index - dataBlkNum;
>   ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate();
>   buf.position(cellSize * parityIndex);
>   buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock);
>   decodeInputs[index] =
>   new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock);
>   alignedStripe.chunks[index] =
>   new StripingChunk(decodeInputs[index].getBuffer());
>   return true;
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16544) EC decoding failed due to invalid buffer

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16544?focusedWorklogId=758977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758977
 ]

ASF GitHub Bot logged work on HDFS-16544:
-

Author: ASF GitHub Bot
Created on: 20/Apr/22 06:04
Start Date: 20/Apr/22 06:04
Worklog Time Spent: 10m 
  Work Description: tasanuma merged PR #4179:
URL: https://github.com/apache/hadoop/pull/4179




Issue Time Tracking
---

Worklog Id: (was: 758977)
Time Spent: 1h  (was: 50m)

> EC decoding failed due to invalid buffer
> 
>
> Key: HDFS-16544
> URL: https://issues.apache.org/jira/browse/HDFS-16544
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: qinyuren
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we 
> found an EC file decoding bug if more than one data block read failed. 
> Currently, we found another bug trigger by #StatefulStripeReader.decode.
> If we read an EC file which {*}length more than one stripe{*}, and this file 
> have *one data block* and *the first parity block* corrupted, this error will 
> happen.
> {code:java}
> org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not 
> allowing null    at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132)
>     at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:48)
>     at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
>     at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
>     at 
> org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435)
>     at 
> org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
>     at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392)
>     at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315)
>     at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408)
>     at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) 
> {code}
>  
> Let's say we use ec(6+3) and the data block[0] and the first parity block[6] 
> are corrupted.
>  # The readers for block[0] and block[6] will be closed after reading the 
> first stripe of an EC file;
>  # When the client reading the second stripe of the EC file, it will trigger 
> #prepareParityChunk for block[6]. 
>  # The decodeInputs[6] will not be constructed because the reader for 
> block[6] was closed.
>  
> {code:java}
> boolean prepareParityChunk(int index) {
>   Preconditions.checkState(index >= dataBlkNum
>   && alignedStripe.chunks[index] == null);
>   if (readerInfos[index] != null && readerInfos[index].shouldSkip) {
> alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING);
> // we have failed the block reader before
> return false;
>   }
>   final int parityIndex = index - dataBlkNum;
>   ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate();
>   buf.position(cellSize * parityIndex);
>   buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock);
>   decodeInputs[index] =
>   new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock);
>   alignedStripe.chunks[index] =
>   new StripingChunk(decodeInputs[index].getBuffer());
>   return true;
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16528) Reconfigure slow peer enable for Namenode

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16528?focusedWorklogId=758971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758971
 ]

ASF GitHub Bot logged work on HDFS-16528:
-

Author: ASF GitHub Bot
Created on: 20/Apr/22 05:41
Start Date: 20/Apr/22 05:41
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on code in PR #4186:
URL: https://github.com/apache/hadoop/pull/4186#discussion_r853744159


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java:
##
@@ -366,6 +365,21 @@ public class DatanodeManager {
 
DFSConfigKeys.DFS_NAMENODE_BLOCKS_PER_POSTPONEDBLOCKS_RESCAN_KEY_DEFAULT);
   }
 
+  /**
+   * Determines whether slow peer tracker should be enabled. If 
dataNodePeerStatsEnabledVal is
+   * true, slow peer tracker is initialized.
+   *
+   * @param conf The configuration to use while initializing slowPeerTracker.
+   * @param timer Timer object for slowPeerTracker.
+   * @param dataNodePeerStatsEnabledVal To determine whether slow peer 
tracking should be enabled.
+   */
+  public void initSlowPeerTracker(Configuration conf, Timer timer,
+  boolean dataNodePeerStatsEnabledVal) {
+this.dataNodePeerStatsEnabled = dataNodePeerStatsEnabledVal;
+this.slowPeerTracker = dataNodePeerStatsEnabled ?
+new SlowPeerTracker(conf, timer) : null;

Review Comment:
   Done, please take a look @tomscut





Issue Time Tracking
---

Worklog Id: (was: 758971)
Time Spent: 1h 50m  (was: 1h 40m)

> Reconfigure slow peer enable for Namenode
> -
>
> Key: HDFS-16528
> URL: https://issues.apache.org/jira/browse/HDFS-16528
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> HDFS-16396 provides reconfig options for several configs associated with 
> slownodes in Datanode. Similarly, HDFS-16287 and HDFS-16327 have added some 
> slownodes related configs as the reconfig options in Namenode.
> The purpose of this Jira is to add DFS_DATANODE_PEER_STATS_ENABLED_KEY as 
> reconfigurable option for Namenode (similar to how HDFS-16396 has included it 
> for Datanode).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16528) Reconfigure slow peer enable for Namenode

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16528?focusedWorklogId=758968&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758968
 ]

ASF GitHub Bot logged work on HDFS-16528:
-

Author: ASF GitHub Bot
Created on: 20/Apr/22 05:27
Start Date: 20/Apr/22 05:27
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on code in PR #4186:
URL: https://github.com/apache/hadoop/pull/4186#discussion_r853737690


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java:
##
@@ -366,6 +365,21 @@ public class DatanodeManager {
 
DFSConfigKeys.DFS_NAMENODE_BLOCKS_PER_POSTPONEDBLOCKS_RESCAN_KEY_DEFAULT);
   }
 
+  /**
+   * Determines whether slow peer tracker should be enabled. If 
dataNodePeerStatsEnabledVal is
+   * true, slow peer tracker is initialized.
+   *
+   * @param conf The configuration to use while initializing slowPeerTracker.
+   * @param timer Timer object for slowPeerTracker.
+   * @param dataNodePeerStatsEnabledVal To determine whether slow peer 
tracking should be enabled.
+   */
+  public void initSlowPeerTracker(Configuration conf, Timer timer,
+  boolean dataNodePeerStatsEnabledVal) {
+this.dataNodePeerStatsEnabled = dataNodePeerStatsEnabledVal;
+this.slowPeerTracker = dataNodePeerStatsEnabled ?
+new SlowPeerTracker(conf, timer) : null;

Review Comment:
   Let me get back to this in a while.





Issue Time Tracking
---

Worklog Id: (was: 758968)
Time Spent: 1h 40m  (was: 1.5h)

> Reconfigure slow peer enable for Namenode
> -
>
> Key: HDFS-16528
> URL: https://issues.apache.org/jira/browse/HDFS-16528
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> HDFS-16396 provides reconfig options for several configs associated with 
> slownodes in Datanode. Similarly, HDFS-16287 and HDFS-16327 have added some 
> slownodes related configs as the reconfig options in Namenode.
> The purpose of this Jira is to add DFS_DATANODE_PEER_STATS_ENABLED_KEY as 
> reconfigurable option for Namenode (similar to how HDFS-16396 has included it 
> for Datanode).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16528) Reconfigure slow peer enable for Namenode

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16528?focusedWorklogId=758964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758964
 ]

ASF GitHub Bot logged work on HDFS-16528:
-

Author: ASF GitHub Bot
Created on: 20/Apr/22 05:18
Start Date: 20/Apr/22 05:18
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on code in PR #4186:
URL: https://github.com/apache/hadoop/pull/4186#discussion_r853733962


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java:
##
@@ -2406,27 +2412,49 @@ String reconfigureSlowNodesParameters(final 
DatanodeManager datanodeManager,
 namesystem.writeLock();
 String result;
 try {
-  if (property.equals(DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_KEY)) {
-boolean enable = (newVal == null ? 
DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_DEFAULT :
+  switch (property) {
+  case DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_KEY: {
+boolean enable = (newVal == null ?
+DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_DEFAULT :
 Boolean.parseBoolean(newVal));
 result = Boolean.toString(enable);
 datanodeManager.setAvoidSlowDataNodesForReadEnabled(enable);
-  } else if (property.equals(
-DFS_NAMENODE_BLOCKPLACEMENTPOLICY_EXCLUDE_SLOW_NODES_ENABLED_KEY)) 
{
+break;
+  }
+  case DFS_NAMENODE_BLOCKPLACEMENTPOLICY_EXCLUDE_SLOW_NODES_ENABLED_KEY: {
 boolean enable = (newVal == null ?
 
DFS_NAMENODE_BLOCKPLACEMENTPOLICY_EXCLUDE_SLOW_NODES_ENABLED_DEFAULT :
 Boolean.parseBoolean(newVal));
 result = Boolean.toString(enable);
 bm.setExcludeSlowNodesEnabled(enable);
-  } else if (property.equals(DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_KEY)) 
{
+break;
+  }
+  case DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_KEY: {
 int maxSlowpeerCollectNodes = (newVal == null ?
 DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_DEFAULT :
 Integer.parseInt(newVal));
 result = Integer.toString(maxSlowpeerCollectNodes);
 datanodeManager.setMaxSlowpeerCollectNodes(maxSlowpeerCollectNodes);
-  } else {
-throw new IllegalArgumentException("Unexpected property " +
-property + " in reconfigureSlowNodesParameters");
+break;
+  }
+  case DFS_DATANODE_PEER_STATS_ENABLED_KEY: {
+Timer timer = new Timer();
+if (newVal != null && !newVal.equalsIgnoreCase("true") && 
!newVal.equalsIgnoreCase(
+"false")) {
+  throw new ReconfigurationException(property, newVal, 
getConf().get(property),
+  new NumberFormatException(newVal + " is not boolean value"));

Review Comment:
   Sounds good, let me change this. Thanks @tomscut 





Issue Time Tracking
---

Worklog Id: (was: 758964)
Time Spent: 1.5h  (was: 1h 20m)

> Reconfigure slow peer enable for Namenode
> -
>
> Key: HDFS-16528
> URL: https://issues.apache.org/jira/browse/HDFS-16528
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> HDFS-16396 provides reconfig options for several configs associated with 
> slownodes in Datanode. Similarly, HDFS-16287 and HDFS-16327 have added some 
> slownodes related configs as the reconfig options in Namenode.
> The purpose of this Jira is to add DFS_DATANODE_PEER_STATS_ENABLED_KEY as 
> reconfigurable option for Namenode (similar to how HDFS-16396 has included it 
> for Datanode).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16533) COMPOSITE_CRC failed between replicated file and striped file.

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16533?focusedWorklogId=758956&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758956
 ]

ASF GitHub Bot logged work on HDFS-16533:
-

Author: ASF GitHub Bot
Created on: 20/Apr/22 04:26
Start Date: 20/Apr/22 04:26
Worklog Time Spent: 10m 
  Work Description: jojochuang commented on code in PR #4155:
URL: https://github.com/apache/hadoop/pull/4155#discussion_r853715096


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FileChecksumHelper.java:
##
@@ -316,18 +317,22 @@ FileChecksum makeCompositeCrcResult() throws IOException {
 "Added blockCrc 0x{} for block index {} of size {}",
 Integer.toString(blockCrc, 16), i, block.getBlockSize());
   }
-
-  // NB: In some cases the located blocks have their block size adjusted
-  // explicitly based on the requested length, but not all cases;
-  // these numbers may or may not reflect actual sizes on disk.
-  long reportedLastBlockSize =
-  blockLocations.getLastLocatedBlock().getBlockSize();
-  long consumedLastBlockLength = reportedLastBlockSize;
-  if (length - sumBlockLengths < reportedLastBlockSize) {
-LOG.warn(
-"Last block length {} is less than reportedLastBlockSize {}",
-length - sumBlockLengths, reportedLastBlockSize);
-consumedLastBlockLength = length - sumBlockLengths;
+  LocatedBlock nextBlock = locatedBlocks.get(i);
+  long consumedLastBlockLength = Math.min(length - sumBlockLengths,
+  nextBlock.getBlockSize());
+  LocatedBlock lastBlock = blockLocations.getLastLocatedBlock();
+  if (nextBlock.equals(lastBlock)) {

Review Comment:
   Could you elaborate what this check is? Looking at the test case I assume 
these few lines distinguish replicated vs striped blocks. Am I right? How about 
turning them into a helper method that is more readable?





Issue Time Tracking
---

Worklog Id: (was: 758956)
Time Spent: 1.5h  (was: 1h 20m)

> COMPOSITE_CRC failed between replicated file and striped file.
> --
>
> Key: HDFS-16533
> URL: https://issues.apache.org/jira/browse/HDFS-16533
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-16533.001.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> After testing the COMPOSITE_CRC with some random length between replicated 
> file and striped file which has same data with replicated file, it failed. 
> Reproduce step like this:
> {code:java}
> @Test(timeout = 9)
> public void testStripedAndReplicatedFileChecksum2() throws Exception {
>   int abnormalSize = (dataBlocks * 2 - 2) * blockSize +
>   (int) (blockSize * 0.5);
>   prepareTestFiles(abnormalSize, new String[] {stripedFile1, replicatedFile});
>   int loopNumber = 100;
>   while (loopNumber-- > 0) {
> int verifyLength = ThreadLocalRandom.current()
> .nextInt(10, abnormalSize);
> FileChecksum stripedFileChecksum1 = getFileChecksum(stripedFile1,
> verifyLength, false);
> FileChecksum replicatedFileChecksum = getFileChecksum(replicatedFile,
> verifyLength, false);
> if (checksumCombineMode.equals(ChecksumCombineMode.COMPOSITE_CRC.name())) 
> {
>   Assert.assertEquals(stripedFileChecksum1, replicatedFileChecksum);
> } else {
>   Assert.assertNotEquals(stripedFileChecksum1, replicatedFileChecksum);
> }
>   }
> } {code}
> And after tracing the root cause, `FileChecksumHelper#makeCompositeCrcResult` 
> maybe compute an error `consumedLastBlockLength` when updating checksum for 
> the last block of the fixed length which maybe not the last block in the file.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16528) Reconfigure slow peer enable for Namenode

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16528?focusedWorklogId=758914&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758914
 ]

ASF GitHub Bot logged work on HDFS-16528:
-

Author: ASF GitHub Bot
Created on: 20/Apr/22 01:38
Start Date: 20/Apr/22 01:38
Worklog Time Spent: 10m 
  Work Description: tomscut commented on code in PR #4186:
URL: https://github.com/apache/hadoop/pull/4186#discussion_r853655682


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java:
##
@@ -366,6 +365,21 @@ public class DatanodeManager {
 
DFSConfigKeys.DFS_NAMENODE_BLOCKS_PER_POSTPONEDBLOCKS_RESCAN_KEY_DEFAULT);
   }
 
+  /**
+   * Determines whether slow peer tracker should be enabled. If 
dataNodePeerStatsEnabledVal is
+   * true, slow peer tracker is initialized.
+   *
+   * @param conf The configuration to use while initializing slowPeerTracker.
+   * @param timer Timer object for slowPeerTracker.
+   * @param dataNodePeerStatsEnabledVal To determine whether slow peer 
tracking should be enabled.
+   */
+  public void initSlowPeerTracker(Configuration conf, Timer timer,
+  boolean dataNodePeerStatsEnabledVal) {
+this.dataNodePeerStatsEnabled = dataNodePeerStatsEnabledVal;
+this.slowPeerTracker = dataNodePeerStatsEnabled ?
+new SlowPeerTracker(conf, timer) : null;

Review Comment:
   If `this.slowPeerTracker` is set to null directly, may cause NPE. 





Issue Time Tracking
---

Worklog Id: (was: 758914)
Time Spent: 1h 20m  (was: 1h 10m)

> Reconfigure slow peer enable for Namenode
> -
>
> Key: HDFS-16528
> URL: https://issues.apache.org/jira/browse/HDFS-16528
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> HDFS-16396 provides reconfig options for several configs associated with 
> slownodes in Datanode. Similarly, HDFS-16287 and HDFS-16327 have added some 
> slownodes related configs as the reconfig options in Namenode.
> The purpose of this Jira is to add DFS_DATANODE_PEER_STATS_ENABLED_KEY as 
> reconfigurable option for Namenode (similar to how HDFS-16396 has included it 
> for Datanode).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16528) Reconfigure slow peer enable for Namenode

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16528?focusedWorklogId=758913&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758913
 ]

ASF GitHub Bot logged work on HDFS-16528:
-

Author: ASF GitHub Bot
Created on: 20/Apr/22 01:37
Start Date: 20/Apr/22 01:37
Worklog Time Spent: 10m 
  Work Description: tomscut commented on code in PR #4186:
URL: https://github.com/apache/hadoop/pull/4186#discussion_r853655682


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java:
##
@@ -366,6 +365,21 @@ public class DatanodeManager {
 
DFSConfigKeys.DFS_NAMENODE_BLOCKS_PER_POSTPONEDBLOCKS_RESCAN_KEY_DEFAULT);
   }
 
+  /**
+   * Determines whether slow peer tracker should be enabled. If 
dataNodePeerStatsEnabledVal is
+   * true, slow peer tracker is initialized.
+   *
+   * @param conf The configuration to use while initializing slowPeerTracker.
+   * @param timer Timer object for slowPeerTracker.
+   * @param dataNodePeerStatsEnabledVal To determine whether slow peer 
tracking should be enabled.
+   */
+  public void initSlowPeerTracker(Configuration conf, Timer timer,
+  boolean dataNodePeerStatsEnabledVal) {
+this.dataNodePeerStatsEnabled = dataNodePeerStatsEnabledVal;
+this.slowPeerTracker = dataNodePeerStatsEnabled ?
+new SlowPeerTracker(conf, timer) : null;

Review Comment:
   If this.slowPeerTracker is set to null, may cause NPE. 





Issue Time Tracking
---

Worklog Id: (was: 758913)
Time Spent: 1h 10m  (was: 1h)

> Reconfigure slow peer enable for Namenode
> -
>
> Key: HDFS-16528
> URL: https://issues.apache.org/jira/browse/HDFS-16528
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> HDFS-16396 provides reconfig options for several configs associated with 
> slownodes in Datanode. Similarly, HDFS-16287 and HDFS-16327 have added some 
> slownodes related configs as the reconfig options in Namenode.
> The purpose of this Jira is to add DFS_DATANODE_PEER_STATS_ENABLED_KEY as 
> reconfigurable option for Namenode (similar to how HDFS-16396 has included it 
> for Datanode).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16528) Reconfigure slow peer enable for Namenode

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16528?focusedWorklogId=758912&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758912
 ]

ASF GitHub Bot logged work on HDFS-16528:
-

Author: ASF GitHub Bot
Created on: 20/Apr/22 01:30
Start Date: 20/Apr/22 01:30
Worklog Time Spent: 10m 
  Work Description: tomscut commented on code in PR #4186:
URL: https://github.com/apache/hadoop/pull/4186#discussion_r853653286


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java:
##
@@ -2406,27 +2412,49 @@ String reconfigureSlowNodesParameters(final 
DatanodeManager datanodeManager,
 namesystem.writeLock();
 String result;
 try {
-  if (property.equals(DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_KEY)) {
-boolean enable = (newVal == null ? 
DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_DEFAULT :
+  switch (property) {
+  case DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_KEY: {
+boolean enable = (newVal == null ?
+DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_DEFAULT :
 Boolean.parseBoolean(newVal));
 result = Boolean.toString(enable);
 datanodeManager.setAvoidSlowDataNodesForReadEnabled(enable);
-  } else if (property.equals(
-DFS_NAMENODE_BLOCKPLACEMENTPOLICY_EXCLUDE_SLOW_NODES_ENABLED_KEY)) 
{
+break;
+  }
+  case DFS_NAMENODE_BLOCKPLACEMENTPOLICY_EXCLUDE_SLOW_NODES_ENABLED_KEY: {
 boolean enable = (newVal == null ?
 
DFS_NAMENODE_BLOCKPLACEMENTPOLICY_EXCLUDE_SLOW_NODES_ENABLED_DEFAULT :
 Boolean.parseBoolean(newVal));
 result = Boolean.toString(enable);
 bm.setExcludeSlowNodesEnabled(enable);
-  } else if (property.equals(DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_KEY)) 
{
+break;
+  }
+  case DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_KEY: {
 int maxSlowpeerCollectNodes = (newVal == null ?
 DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_DEFAULT :
 Integer.parseInt(newVal));
 result = Integer.toString(maxSlowpeerCollectNodes);
 datanodeManager.setMaxSlowpeerCollectNodes(maxSlowpeerCollectNodes);
-  } else {
-throw new IllegalArgumentException("Unexpected property " +
-property + " in reconfigureSlowNodesParameters");
+break;
+  }
+  case DFS_DATANODE_PEER_STATS_ENABLED_KEY: {
+Timer timer = new Timer();
+if (newVal != null && !newVal.equalsIgnoreCase("true") && 
!newVal.equalsIgnoreCase(
+"false")) {
+  throw new ReconfigurationException(property, newVal, 
getConf().get(property),
+  new NumberFormatException(newVal + " is not boolean value"));

Review Comment:
   Hi @virajjasani , should here throw an IllegalArgumentException?





Issue Time Tracking
---

Worklog Id: (was: 758912)
Time Spent: 1h  (was: 50m)

> Reconfigure slow peer enable for Namenode
> -
>
> Key: HDFS-16528
> URL: https://issues.apache.org/jira/browse/HDFS-16528
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> HDFS-16396 provides reconfig options for several configs associated with 
> slownodes in Datanode. Similarly, HDFS-16287 and HDFS-16327 have added some 
> slownodes related configs as the reconfig options in Namenode.
> The purpose of this Jira is to add DFS_DATANODE_PEER_STATS_ENABLED_KEY as 
> reconfigurable option for Namenode (similar to how HDFS-16396 has included it 
> for Datanode).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfered to observer state

2022-04-19 Thread tomscut (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tomscut updated HDFS-16547:
---
Summary: [SBN read] Namenode in safe mode should not be transfered to 
observer state  (was: [SBN read] Namenode in safe mode should not be transfer 
to observer state)

> [SBN read] Namenode in safe mode should not be transfered to observer state
> ---
>
> Key: HDFS-16547
> URL: https://issues.apache.org/jira/browse/HDFS-16547
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, when a Namenode is in safemode(under starting or enter safemode 
> manually), we can transfer this Namenode to Observer by command. This 
> Observer node may receive many requests and then throw a SafemodeException, 
> this causes unnecessary failover on the client.
> So Namenode in safe mode should not be transfer to observer state.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16521) DFS API to retrieve slow datanodes

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16521?focusedWorklogId=758849&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758849
 ]

ASF GitHub Bot logged work on HDFS-16521:
-

Author: ASF GitHub Bot
Created on: 19/Apr/22 22:30
Start Date: 19/Apr/22 22:30
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4107:
URL: https://github.com/apache/hadoop/pull/4107#issuecomment-1103229191

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  5s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  buf  |   0m  1s |  |  buf was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 48s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  28m 17s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   7m  0s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   6m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 37s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 38s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   3m  0s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   3m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   8m  8s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 47s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 56s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 50s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  cc  |   6m 50s |  |  the patch passed  |
   | -1 :x: |  javac  |   6m 50s | 
[/results-compile-javac-hadoop-hdfs-project-jdkUbuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4107/5/artifact/out/results-compile-javac-hadoop-hdfs-project-jdkUbuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04.txt)
 |  hadoop-hdfs-project-jdkUbuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 generated 1 new + 651 unchanged - 0 
fixed = 652 total (was 651)  |
   | +1 :green_heart: |  compile  |   6m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  cc  |   6m 19s |  |  the patch passed  |
   | -1 :x: |  javac  |   6m 19s | 
[/results-compile-javac-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4107/5/artifact/out/results-compile-javac-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt)
 |  hadoop-hdfs-project-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 
with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 generated 1 new + 
629 unchanged - 0 fixed = 630 total (was 629)  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 19s |  |  hadoop-hdfs-project: The 
patch generated 0 new + 456 unchanged - 1 fixed = 456 total (was 457)  |
   | +1 :green_heart: |  mvnsite  |   3m 27s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   3m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   8m 58s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 58s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 27s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 392m 41s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4107/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green

[jira] [Work logged] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfer to observer state

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16547?focusedWorklogId=758633&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758633
 ]

ASF GitHub Bot logged work on HDFS-16547:
-

Author: ASF GitHub Bot
Created on: 19/Apr/22 17:28
Start Date: 19/Apr/22 17:28
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4201:
URL: https://github.com/apache/hadoop/pull/4201#issuecomment-1102907082

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 40s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m 41s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 43s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 49s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 49s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 17s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  2s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 33s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 249m 29s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 13s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 360m  3s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4201 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 17d48942fd53 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 
23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 0ed4aa1dcaeb267708033f3867e8b9b2ee463944 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/1/testReport/ |
   | Max. process+thread count | 3300 (vs. ulimit of 5500) |
   | modules | C: hadoop-

[jira] [Work logged] (HDFS-16544) EC decoding failed due to invalid buffer

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16544?focusedWorklogId=758475&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758475
 ]

ASF GitHub Bot logged work on HDFS-16544:
-

Author: ASF GitHub Bot
Created on: 19/Apr/22 13:43
Start Date: 19/Apr/22 13:43
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4179:
URL: https://github.com/apache/hadoop/pull/4179#issuecomment-1102674037

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 54s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 42s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  28m 21s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m 50s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   6m 27s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 31s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 50s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m 36s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 48s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 26s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 41s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   6m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   6m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 24s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 42s |  |  the patch passed with JDK 
Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  9s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m 24s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 54s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 27s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 370m 32s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4179/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  6s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 527m 12s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
   |   | hadoop.hdfs.TestClientProtocolForPipelineRecovery |
   |   | hadoop.hdfs.TestReplaceDatanodeFailureReplication |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4179/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4179 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 542720fc08b8 4.15.0-166-generic #174-Ubuntu SMP Wed Dec 8 
19:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 2b6adbcb61fa76d0147dfb1365ccb3a2ca3360a6 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Mul

[jira] [Updated] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfer to observer state

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16547:
--
Labels: pull-request-available  (was: )

> [SBN read] Namenode in safe mode should not be transfer to observer state
> -
>
> Key: HDFS-16547
> URL: https://issues.apache.org/jira/browse/HDFS-16547
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, when a Namenode is in safemode(under starting or enter safemode 
> manually), we can transfer this Namenode to Observer by command. This 
> Observer node may receive many requests and then throw a SafemodeException, 
> this causes unnecessary failover on the client.
> So Namenode in safe mode should not be transfer to observer state.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfer to observer state

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16547?focusedWorklogId=758400&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758400
 ]

ASF GitHub Bot logged work on HDFS-16547:
-

Author: ASF GitHub Bot
Created on: 19/Apr/22 11:27
Start Date: 19/Apr/22 11:27
Worklog Time Spent: 10m 
  Work Description: tomscut opened a new pull request, #4201:
URL: https://github.com/apache/hadoop/pull/4201

   JIRA: HDFS-16547.
   
   Currently, when a Namenode is in safemode(under starting or enter safemode 
manually), we can transfer this Namenode to Observer by command. This Observer 
node may receive many requests and then throw a SafemodeException, this causes 
unnecessary failover on the client.
   
   So Namenode in safe mode should not be transfer to observer state.




Issue Time Tracking
---

Worklog Id: (was: 758400)
Remaining Estimate: 0h
Time Spent: 10m

> [SBN read] Namenode in safe mode should not be transfer to observer state
> -
>
> Key: HDFS-16547
> URL: https://issues.apache.org/jira/browse/HDFS-16547
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, when a Namenode is in safemode(under starting or enter safemode 
> manually), we can transfer this Namenode to Observer by command. This 
> Observer node may receive many requests and then throw a SafemodeException, 
> this causes unnecessary failover on the client.
> So Namenode in safe mode should not be transfer to observer state.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfer to observer state

2022-04-19 Thread tomscut (Jira)
tomscut created HDFS-16547:
--

 Summary: [SBN read] Namenode in safe mode should not be transfer 
to observer state
 Key: HDFS-16547
 URL: https://issues.apache.org/jira/browse/HDFS-16547
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: tomscut
Assignee: tomscut


Currently, when a Namenode is in safemode(under starting or enter safemode 
manually), we can transfer this Namenode to Observer by command. This Observer 
node may receive many requests and then throw a SafemodeException, this causes 
unnecessary failover on the client.

So Namenode in safe mode should not be transfer to observer state.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16531) Avoid setReplication logging an edit record if old replication equals the new value

2022-04-19 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16531:
-
Fix Version/s: 3.2.4
   3.3.4

> Avoid setReplication logging an edit record if old replication equals the new 
> value
> ---
>
> Key: HDFS-16531
> URL: https://issues.apache.org/jira/browse/HDFS-16531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I recently came across a NN log where about 800k setRep calls were made, 
> setting the replication from 3 to 3 - ie leaving it unchanged.
> Even in a case like this, we log an edit record, an audit log, and perform 
> some quota checks etc.
> I believe it should be possible to avoid some of the work if we check for 
> oldRep == newRep and jump out of the method early.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16355) Improve the description of dfs.block.scanner.volume.bytes.per.second

2022-04-19 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-16355:
--
Fix Version/s: 3.4.0

> Improve the description of dfs.block.scanner.volume.bytes.per.second
> 
>
> Key: HDFS-16355
> URL: https://issues.apache.org/jira/browse/HDFS-16355
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, hdfs
>Affects Versions: 3.3.1
>Reporter: guophilipse
>Assignee: guophilipse
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> datanode block scanner will be disabled if 
> `dfs.block.scanner.volume.bytes.per.second` is configured less then or equal 
> to zero, we can improve the desciption



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16501) Print the exception when reporting a bad block

2022-04-19 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HDFS-16501.
---
Resolution: Fixed

> Print the exception when reporting a bad block
> --
>
> Key: HDFS-16501
> URL: https://issues.apache.org/jira/browse/HDFS-16501
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: image-2022-03-10-19-27-31-622.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> !image-2022-03-10-19-27-31-622.png|width=847,height=27!
> Currently, volumeScanner will find bad block and report it to namenode 
> without printing the reason why the block is a bad block. I think we should 
> be better print the exception in log file.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11041) Unable to unregister FsDatasetState MBean if DataNode is shutdown twice

2022-04-19 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-11041:
--
Fix Version/s: 3.4.0

> Unable to unregister FsDatasetState MBean if DataNode is shutdown twice
> ---
>
> Key: HDFS-11041
> URL: https://issues.apache.org/jira/browse/HDFS-11041
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Trivial
> Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.3
>
> Attachments: HDFS-11041.01.patch, HDFS-11041.02.patch, 
> HDFS-11041.03.patch
>
>
> I saw error message like the following in some tests
> {noformat}
> 2016-10-21 04:09:03,900 [main] WARN  util.MBeans 
> (MBeans.java:unregister(114)) - Error unregistering 
> Hadoop:service=DataNode,name=FSDatasetState-33cd714c-0b1a-471f-8efe-f431d7d874bc
> javax.management.InstanceNotFoundException: 
> Hadoop:service=DataNode,name=FSDatasetState-33cd714c-0b1a-471f-8efe-f431d7d874bc
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546)
>   at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:112)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:2127)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:2016)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1985)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1962)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1936)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1929)
>   at 
> org.apache.hadoop.hdfs.TestDatanodeReport.testDatanodeReport(TestDatanodeReport.java:144)
> {noformat}
> The test shuts down datanode, and then shutdown cluster, which shuts down the 
> a datanode twice. Resetting the FsDatasetSpi reference in DataNode to null 
> resolves the issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16501) Print the exception when reporting a bad block

2022-04-19 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-16501:
--
Fix Version/s: 3.3.3
   (was: 3.3.4)

> Print the exception when reporting a bad block
> --
>
> Key: HDFS-16501
> URL: https://issues.apache.org/jira/browse/HDFS-16501
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: image-2022-03-10-19-27-31-622.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> !image-2022-03-10-19-27-31-622.png|width=847,height=27!
> Currently, volumeScanner will find bad block and report it to namenode 
> without printing the reason why the block is a bad block. I think we should 
> be better print the exception in log file.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16428) Source path with storagePolicy cause wrong typeConsumed while rename

2022-04-19 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-16428:
--
Fix Version/s: 3.4.0

> Source path with storagePolicy cause wrong typeConsumed while rename
> 
>
> Key: HDFS-16428
> URL: https://issues.apache.org/jira/browse/HDFS-16428
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: lei w
>Assignee: lei w
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.3
>
> Attachments: example.txt
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When compute quota in rename operation , we use storage policy of the target 
> directory to compute src  quota usage. This will cause wrong value of 
> typeConsumed when source path was setted storage policy. I provided a unit 
> test to present this situation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads

2022-04-19 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-16422:
--
Fix Version/s: 3.4.0

> Fix thread safety of EC decoding during concurrent preads
> -
>
> Key: HDFS-16422
> URL: https://issues.apache.org/jira/browse/HDFS-16422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.3
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Reading data on an erasure-coded file with missing replicas(internal block of 
> block group) will cause online reconstruction: read dataUnits part of data 
> and decode them into the target missing data. Each DFSStripedInputStream 
> object has a RawErasureDecoder object, and when we doing pread concurrently, 
> RawErasureDecoder.decode will be invoked concurrently too. 
> RawErasureDecoder.decode is not thread safe, as a result of that we get wrong 
> data from pread occasionally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress

2022-04-19 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-16507:
--
Fix Version/s: 3.4.0

> [SBN read] Avoid purging edit log which is in progress
> --
>
> Key: HDFS-16507
> URL: https://issues.apache.org/jira/browse/HDFS-16507
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: tomscut
>Assignee: tomscut
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL 
> exception. It looks like it's purging edit logs which is in process.
> According to the analysis, I suspect that the editlog which is in progress to 
> be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN 
> rolls edit its self. 
> The stack:
> {code:java}
> java.lang.Thread.getStackTrace(Thread.java:1552)
>     org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
>     
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620)
>     
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512)
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177)
>     
> org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>     
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>     org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>     
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>     
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>     org.eclipse.jetty.server.Server.handle(Server.java:539)
>     org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>     
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>     
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>     org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
>     
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>     
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>     
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJ

[jira] [Updated] (HDFS-16437) ReverseXML processor doesn't accept XML files without the SnapshotDiffSection.

2022-04-19 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-16437:
--
Fix Version/s: 3.4.0

> ReverseXML processor doesn't accept XML files without the SnapshotDiffSection.
> --
>
> Key: HDFS-16437
> URL: https://issues.apache.org/jira/browse/HDFS-16437
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1, 3.3.0
>Reporter: yanbin.zhang
>Assignee: yanbin.zhang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.3
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> In a cluster environment without snapshot, if you want to convert back to 
> fsimage through the generated xml, an error will be reported.
> {code:java}
> //代码占位符
> [test@test001 ~]$ hdfs oiv -p ReverseXML -i fsimage_0257220.xml 
> -o fsimage_0257220
> OfflineImageReconstructor failed: FSImage XML ended prematurely, without 
> including section(s) SnapshotDiffSection
> java.io.IOException: FSImage XML ended prematurely, without including 
> section(s) SnapshotDiffSection
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.processXml(OfflineImageReconstructor.java:1765)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.run(OfflineImageReconstructor.java:1842)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(OfflineImageViewerPB.java:211)
>         at 
> org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.main(OfflineImageViewerPB.java:149)
> 22/01/25 15:56:52 INFO util.ExitUtil: Exiting with status 1: ExitException 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16544) EC decoding failed due to invalid buffer

2022-04-19 Thread qinyuren (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qinyuren updated HDFS-16544:

Description: 
In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we 
found an EC file decoding bug if more than one data block read failed. 

Currently, we found another bug trigger by #StatefulStripeReader.decode.

If we read an EC file which {*}length more than one stripe{*}, and this file 
have *one data block* and *the first parity block* corrupted, this error will 
happen.
{code:java}
org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not 
allowing null    at 
org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:48)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
    at 
org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435)
    at 
org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
    at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392)
    at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315)
    at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) 
{code}
 

Let's say we use ec(6+3) and the data block[0] and the first parity block[6] 
are corrupted.
 # The readers for block[0] and block[6] will be closed after reading the first 
stripe of an EC file;
 # When the client reading the second stripe of the EC file, it will trigger 
#prepareParityChunk for block[6]. 
 # The decodeInputs[6] will not be constructed because the reader for block[6] 
was closed.

 
{code:java}
boolean prepareParityChunk(int index) {
  Preconditions.checkState(index >= dataBlkNum
  && alignedStripe.chunks[index] == null);
  if (readerInfos[index] != null && readerInfos[index].shouldSkip) {
alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING);
// we have failed the block reader before
return false;
  }
  final int parityIndex = index - dataBlkNum;
  ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate();
  buf.position(cellSize * parityIndex);
  buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock);
  decodeInputs[index] =
  new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock);
  alignedStripe.chunks[index] =
  new StripingChunk(decodeInputs[index].getBuffer());
  return true;
} {code}
 

  was:
In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we 
found an EC file decoding bug if more than one data block read failed. 

Currently, we found another bug trigger by #StatefulStripeReader.decode.

If we read an EC file which {*}length more than one stripe{*}, and this file 
have *one data block* and *the first parity block* corrupted, this error will 
happen.
{code:java}
org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not 
allowing null    at 
org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:48)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
    at 
org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435)
    at 
org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
    at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392)
    at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315)
    at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) 
{code}
 

Let's say we use ec(6+3) and the data block[0] and the first parity block[6] 
are corrupted.
 # The readers for block[0] and block[6] will be closed after reading the first 
stripe of an EC file;
 # When the client reading the second stripe of the EC file, it will trigger 
#prepareParityChunk for block[6]. 
 # The decodeInputs[6] will not be constructed due to the reader for block[6] 
was closed.

 
{code:java}
boolean prepareParityChunk(int index) {
  Preconditions.checkState(index >= dataBlkNum
  && alignedStripe.chunks[index] == null);
  if (readerInfos[index] != null && readerInfos[index].shouldSkip) {
alignedStr

[jira] [Work logged] (HDFS-16538) EC decoding failed due to not enough valid inputs

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16538?focusedWorklogId=758345&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758345
 ]

ASF GitHub Bot logged work on HDFS-16538:
-

Author: ASF GitHub Bot
Created on: 19/Apr/22 08:32
Start Date: 19/Apr/22 08:32
Worklog Time Spent: 10m 
  Work Description: liubingxing commented on PR #4167:
URL: https://github.com/apache/hadoop/pull/4167#issuecomment-1102290130

   @tasanuma Thanks for the review and merged. I found another bug related to 
EC decoding in 
[HDFS-16538](http://https//issues.apache.org/jira/browse/HDFS-16538) , Please 
take a look. Thanks you very much.




Issue Time Tracking
---

Worklog Id: (was: 758345)
Time Spent: 1h 20m  (was: 1h 10m)

>  EC decoding failed due to not enough valid inputs
> --
>
> Key: HDFS-16538
> URL: https://issues.apache.org/jira/browse/HDFS-16538
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, we found this error if the #StripeReader.readStripe() have more 
> than one block read failed.
> We use the EC policy ec(6+3) in our cluster.
> {code:java}
> Caused by: org.apache.hadoop.HadoopIllegalArgumentException: No enough valid 
> inputs are provided, not recoverable
>         at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkInputBuffers(ByteBufferDecodingState.java:119)
>         at 
> org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:47)
>         at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
>         at 
> org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
>         at 
> org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:462)
>         at 
> org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
>         at 
> org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:406)
>         at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:327)
>         at 
> org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:420)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:892)
>         at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
>         at java.base/java.io.DataInputStream.read(DataInputStream.java:149) 
> {code}
>  
> {code:java}
> while (!futures.isEmpty()) {
>   try {
> StripingChunkReadResult r = StripedBlockUtil
> .getNextCompletedStripedRead(service, futures, 0);
> dfsStripedInputStream.updateReadStats(r.getReadStats());
> DFSClient.LOG.debug("Read task returned: {}, for stripe {}",
> r, alignedStripe);
> StripingChunk returnedChunk = alignedStripe.chunks[r.index];
> Preconditions.checkNotNull(returnedChunk);
> Preconditions.checkState(returnedChunk.state == StripingChunk.PENDING);
> if (r.state == StripingChunkReadResult.SUCCESSFUL) {
>   returnedChunk.state = StripingChunk.FETCHED;
>   alignedStripe.fetchedChunksNum++;
>   updateState4SuccessRead(r);
>   if (alignedStripe.fetchedChunksNum == dataBlkNum) {
> clearFutures();
> break;
>   }
> } else {
>   returnedChunk.state = StripingChunk.MISSING;
>   // close the corresponding reader
>   dfsStripedInputStream.closeReader(readerInfos[r.index]);
>   final int missing = alignedStripe.missingChunksNum;
>   alignedStripe.missingChunksNum++;
>   checkMissingBlocks();
>   readDataForDecoding();
>   readParityChunks(alignedStripe.missingChunksNum - missing);
> } {code}
> This error can be trigger by #StatefulStripeReader.decode.
> The reason is that:
>  # If there are more than one *data block* read failed, the 
> #readDataForDecoding will be called multiple times;
>  # The *decodeInputs array* will be initialized repeatedly.
>  # The *parity* *data* in *decodeInputs array* which filled by 
> #readParityChunks previously will be set to null.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16544) EC decoding failed due to invalid buffer

2022-04-19 Thread qinyuren (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qinyuren updated HDFS-16544:

Description: 
In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we 
found an EC file decoding bug if more than one data block read failed. 

Currently, we found another bug trigger by #StatefulStripeReader.decode.

If we read an EC file which {*}length more than one stripe{*}, and this file 
have *one data block* and *the first parity block* corrupted, this error will 
happen.
{code:java}
org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not 
allowing null    at 
org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:48)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
    at 
org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435)
    at 
org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
    at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392)
    at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315)
    at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) 
{code}
 

Let's say we use ec(6+3) and the data block[0] and the first parity block[6] 
are corrupted.
 # The readers for block[0] and block[6] will be closed after reading the first 
stripe of an EC file;
 # When the client reading the second stripe of the EC file, it will trigger 
#prepareParityChunk for block[6]. 
 # The decodeInputs[6] will not be constructed due to the reader for block[6] 
was closed.

 
{code:java}
boolean prepareParityChunk(int index) {
  Preconditions.checkState(index >= dataBlkNum
  && alignedStripe.chunks[index] == null);
  if (readerInfos[index] != null && readerInfos[index].shouldSkip) {
alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING);
// we have failed the block reader before
return false;
  }
  final int parityIndex = index - dataBlkNum;
  ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate();
  buf.position(cellSize * parityIndex);
  buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock);
  decodeInputs[index] =
  new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock);
  alignedStripe.chunks[index] =
  new StripingChunk(decodeInputs[index].getBuffer());
  return true;
} {code}
 

  was:
In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we 
found an EC file decoding bug if more than one data block read failed. 

Currently, we found another bug trigger by #StatefulStripeReader.decode.

If we read an EC file which {*}length more than one stripe{*}, and this file 
have *one data block* and *the first parity block* corrupted, this error will 
happen.
{code:java}
org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not 
allowing null    at 
org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:48)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
    at 
org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
    at 
org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435)
    at 
org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
    at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392)
    at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315)
    at 
org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) 
{code}


> EC decoding failed due to invalid buffer
> 
>
> Key: HDFS-16544
> URL: https://issues.apache.org/jira/browse/HDFS-16544
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: qinyuren
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we 
> found an EC file decoding bug if more than one data block read failed. 
> Currently, we found ano

[jira] [Updated] (HDFS-14750) RBF: Improved isolation for downstream name nodes. {Dynamic}

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-14750:
--
Labels: pull-request-available  (was: )

> RBF: Improved isolation for downstream name nodes. {Dynamic}
> 
>
> Key: HDFS-14750
> URL: https://issues.apache.org/jira/browse/HDFS-14750
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This Jira tracks the work around dynamic allocation of resources in routers 
> for downstream hdfs clusters. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-14750) RBF: Improved isolation for downstream name nodes. {Dynamic}

2022-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14750?focusedWorklogId=758337&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758337
 ]

ASF GitHub Bot logged work on HDFS-14750:
-

Author: ASF GitHub Bot
Created on: 19/Apr/22 07:56
Start Date: 19/Apr/22 07:56
Worklog Time Spent: 10m 
  Work Description: kokonguyen191 opened a new pull request, #4199:
URL: https://github.com/apache/hadoop/pull/4199

   ### Description of PR
   Add a `DynamicRouterRpcFairnessPolicyController` class that resizes permit 
capacity periodically based on traffic to namespaces.
   
   ### How was this patch tested?
   Unit tests and local deployment.
   
   ### For code changes:
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?




Issue Time Tracking
---

Worklog Id: (was: 758337)
Remaining Estimate: 0h
Time Spent: 10m

> RBF: Improved isolation for downstream name nodes. {Dynamic}
> 
>
> Key: HDFS-14750
> URL: https://issues.apache.org/jira/browse/HDFS-14750
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: CR Hota
>Assignee: CR Hota
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This Jira tracks the work around dynamic allocation of resources in routers 
> for downstream hdfs clusters. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org