[jira] [Work logged] (HDFS-16544) EC decoding failed due to invalid buffer
[ https://issues.apache.org/jira/browse/HDFS-16544?focusedWorklogId=758991&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758991 ] ASF GitHub Bot logged work on HDFS-16544: - Author: ASF GitHub Bot Created on: 20/Apr/22 06:25 Start Date: 20/Apr/22 06:25 Worklog Time Spent: 10m Work Description: liubingxing commented on PR #4179: URL: https://github.com/apache/hadoop/pull/4179#issuecomment-1103516579 Thanks @tasanuma for the merged and thanks @jojochuang Issue Time Tracking --- Worklog Id: (was: 758991) Time Spent: 1h 10m (was: 1h) > EC decoding failed due to invalid buffer > > > Key: HDFS-16544 > URL: https://issues.apache.org/jira/browse/HDFS-16544 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: qinyuren >Assignee: qinyuren >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.4 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we > found an EC file decoding bug if more than one data block read failed. > Currently, we found another bug trigger by #StatefulStripeReader.decode. > If we read an EC file which {*}length more than one stripe{*}, and this file > have *one data block* and *the first parity block* corrupted, this error will > happen. > {code:java} > org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not > allowing null at > org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132) > at > org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:48) > at > org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86) > at > org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170) > at > org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435) > at > org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94) > at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) > {code} > > Let's say we use ec(6+3) and the data block[0] and the first parity block[6] > are corrupted. > # The readers for block[0] and block[6] will be closed after reading the > first stripe of an EC file; > # When the client reading the second stripe of the EC file, it will trigger > #prepareParityChunk for block[6]. > # The decodeInputs[6] will not be constructed because the reader for > block[6] was closed. > > {code:java} > boolean prepareParityChunk(int index) { > Preconditions.checkState(index >= dataBlkNum > && alignedStripe.chunks[index] == null); > if (readerInfos[index] != null && readerInfos[index].shouldSkip) { > alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING); > // we have failed the block reader before > return false; > } > final int parityIndex = index - dataBlkNum; > ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate(); > buf.position(cellSize * parityIndex); > buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock); > decodeInputs[index] = > new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock); > alignedStripe.chunks[index] = > new StripingChunk(decodeInputs[index].getBuffer()); > return true; > } {code} > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16544) EC decoding failed due to invalid buffer
[ https://issues.apache.org/jira/browse/HDFS-16544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma resolved HDFS-16544. - Fix Version/s: 3.4.0 3.2.4 3.3.4 Assignee: qinyuren Resolution: Fixed > EC decoding failed due to invalid buffer > > > Key: HDFS-16544 > URL: https://issues.apache.org/jira/browse/HDFS-16544 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: qinyuren >Assignee: qinyuren >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.4 > > Time Spent: 1h > Remaining Estimate: 0h > > In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we > found an EC file decoding bug if more than one data block read failed. > Currently, we found another bug trigger by #StatefulStripeReader.decode. > If we read an EC file which {*}length more than one stripe{*}, and this file > have *one data block* and *the first parity block* corrupted, this error will > happen. > {code:java} > org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not > allowing null at > org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132) > at > org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:48) > at > org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86) > at > org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170) > at > org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435) > at > org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94) > at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) > {code} > > Let's say we use ec(6+3) and the data block[0] and the first parity block[6] > are corrupted. > # The readers for block[0] and block[6] will be closed after reading the > first stripe of an EC file; > # When the client reading the second stripe of the EC file, it will trigger > #prepareParityChunk for block[6]. > # The decodeInputs[6] will not be constructed because the reader for > block[6] was closed. > > {code:java} > boolean prepareParityChunk(int index) { > Preconditions.checkState(index >= dataBlkNum > && alignedStripe.chunks[index] == null); > if (readerInfos[index] != null && readerInfos[index].shouldSkip) { > alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING); > // we have failed the block reader before > return false; > } > final int parityIndex = index - dataBlkNum; > ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate(); > buf.position(cellSize * parityIndex); > buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock); > decodeInputs[index] = > new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock); > alignedStripe.chunks[index] = > new StripingChunk(decodeInputs[index].getBuffer()); > return true; > } {code} > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16544) EC decoding failed due to invalid buffer
[ https://issues.apache.org/jira/browse/HDFS-16544?focusedWorklogId=758977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758977 ] ASF GitHub Bot logged work on HDFS-16544: - Author: ASF GitHub Bot Created on: 20/Apr/22 06:04 Start Date: 20/Apr/22 06:04 Worklog Time Spent: 10m Work Description: tasanuma merged PR #4179: URL: https://github.com/apache/hadoop/pull/4179 Issue Time Tracking --- Worklog Id: (was: 758977) Time Spent: 1h (was: 50m) > EC decoding failed due to invalid buffer > > > Key: HDFS-16544 > URL: https://issues.apache.org/jira/browse/HDFS-16544 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: qinyuren >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we > found an EC file decoding bug if more than one data block read failed. > Currently, we found another bug trigger by #StatefulStripeReader.decode. > If we read an EC file which {*}length more than one stripe{*}, and this file > have *one data block* and *the first parity block* corrupted, this error will > happen. > {code:java} > org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not > allowing null at > org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132) > at > org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:48) > at > org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86) > at > org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170) > at > org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435) > at > org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94) > at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) > {code} > > Let's say we use ec(6+3) and the data block[0] and the first parity block[6] > are corrupted. > # The readers for block[0] and block[6] will be closed after reading the > first stripe of an EC file; > # When the client reading the second stripe of the EC file, it will trigger > #prepareParityChunk for block[6]. > # The decodeInputs[6] will not be constructed because the reader for > block[6] was closed. > > {code:java} > boolean prepareParityChunk(int index) { > Preconditions.checkState(index >= dataBlkNum > && alignedStripe.chunks[index] == null); > if (readerInfos[index] != null && readerInfos[index].shouldSkip) { > alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING); > // we have failed the block reader before > return false; > } > final int parityIndex = index - dataBlkNum; > ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate(); > buf.position(cellSize * parityIndex); > buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock); > decodeInputs[index] = > new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock); > alignedStripe.chunks[index] = > new StripingChunk(decodeInputs[index].getBuffer()); > return true; > } {code} > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16528) Reconfigure slow peer enable for Namenode
[ https://issues.apache.org/jira/browse/HDFS-16528?focusedWorklogId=758971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758971 ] ASF GitHub Bot logged work on HDFS-16528: - Author: ASF GitHub Bot Created on: 20/Apr/22 05:41 Start Date: 20/Apr/22 05:41 Worklog Time Spent: 10m Work Description: virajjasani commented on code in PR #4186: URL: https://github.com/apache/hadoop/pull/4186#discussion_r853744159 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java: ## @@ -366,6 +365,21 @@ public class DatanodeManager { DFSConfigKeys.DFS_NAMENODE_BLOCKS_PER_POSTPONEDBLOCKS_RESCAN_KEY_DEFAULT); } + /** + * Determines whether slow peer tracker should be enabled. If dataNodePeerStatsEnabledVal is + * true, slow peer tracker is initialized. + * + * @param conf The configuration to use while initializing slowPeerTracker. + * @param timer Timer object for slowPeerTracker. + * @param dataNodePeerStatsEnabledVal To determine whether slow peer tracking should be enabled. + */ + public void initSlowPeerTracker(Configuration conf, Timer timer, + boolean dataNodePeerStatsEnabledVal) { +this.dataNodePeerStatsEnabled = dataNodePeerStatsEnabledVal; +this.slowPeerTracker = dataNodePeerStatsEnabled ? +new SlowPeerTracker(conf, timer) : null; Review Comment: Done, please take a look @tomscut Issue Time Tracking --- Worklog Id: (was: 758971) Time Spent: 1h 50m (was: 1h 40m) > Reconfigure slow peer enable for Namenode > - > > Key: HDFS-16528 > URL: https://issues.apache.org/jira/browse/HDFS-16528 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > HDFS-16396 provides reconfig options for several configs associated with > slownodes in Datanode. Similarly, HDFS-16287 and HDFS-16327 have added some > slownodes related configs as the reconfig options in Namenode. > The purpose of this Jira is to add DFS_DATANODE_PEER_STATS_ENABLED_KEY as > reconfigurable option for Namenode (similar to how HDFS-16396 has included it > for Datanode). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16528) Reconfigure slow peer enable for Namenode
[ https://issues.apache.org/jira/browse/HDFS-16528?focusedWorklogId=758968&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758968 ] ASF GitHub Bot logged work on HDFS-16528: - Author: ASF GitHub Bot Created on: 20/Apr/22 05:27 Start Date: 20/Apr/22 05:27 Worklog Time Spent: 10m Work Description: virajjasani commented on code in PR #4186: URL: https://github.com/apache/hadoop/pull/4186#discussion_r853737690 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java: ## @@ -366,6 +365,21 @@ public class DatanodeManager { DFSConfigKeys.DFS_NAMENODE_BLOCKS_PER_POSTPONEDBLOCKS_RESCAN_KEY_DEFAULT); } + /** + * Determines whether slow peer tracker should be enabled. If dataNodePeerStatsEnabledVal is + * true, slow peer tracker is initialized. + * + * @param conf The configuration to use while initializing slowPeerTracker. + * @param timer Timer object for slowPeerTracker. + * @param dataNodePeerStatsEnabledVal To determine whether slow peer tracking should be enabled. + */ + public void initSlowPeerTracker(Configuration conf, Timer timer, + boolean dataNodePeerStatsEnabledVal) { +this.dataNodePeerStatsEnabled = dataNodePeerStatsEnabledVal; +this.slowPeerTracker = dataNodePeerStatsEnabled ? +new SlowPeerTracker(conf, timer) : null; Review Comment: Let me get back to this in a while. Issue Time Tracking --- Worklog Id: (was: 758968) Time Spent: 1h 40m (was: 1.5h) > Reconfigure slow peer enable for Namenode > - > > Key: HDFS-16528 > URL: https://issues.apache.org/jira/browse/HDFS-16528 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > HDFS-16396 provides reconfig options for several configs associated with > slownodes in Datanode. Similarly, HDFS-16287 and HDFS-16327 have added some > slownodes related configs as the reconfig options in Namenode. > The purpose of this Jira is to add DFS_DATANODE_PEER_STATS_ENABLED_KEY as > reconfigurable option for Namenode (similar to how HDFS-16396 has included it > for Datanode). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16528) Reconfigure slow peer enable for Namenode
[ https://issues.apache.org/jira/browse/HDFS-16528?focusedWorklogId=758964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758964 ] ASF GitHub Bot logged work on HDFS-16528: - Author: ASF GitHub Bot Created on: 20/Apr/22 05:18 Start Date: 20/Apr/22 05:18 Worklog Time Spent: 10m Work Description: virajjasani commented on code in PR #4186: URL: https://github.com/apache/hadoop/pull/4186#discussion_r853733962 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java: ## @@ -2406,27 +2412,49 @@ String reconfigureSlowNodesParameters(final DatanodeManager datanodeManager, namesystem.writeLock(); String result; try { - if (property.equals(DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_KEY)) { -boolean enable = (newVal == null ? DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_DEFAULT : + switch (property) { + case DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_KEY: { +boolean enable = (newVal == null ? +DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_DEFAULT : Boolean.parseBoolean(newVal)); result = Boolean.toString(enable); datanodeManager.setAvoidSlowDataNodesForReadEnabled(enable); - } else if (property.equals( -DFS_NAMENODE_BLOCKPLACEMENTPOLICY_EXCLUDE_SLOW_NODES_ENABLED_KEY)) { +break; + } + case DFS_NAMENODE_BLOCKPLACEMENTPOLICY_EXCLUDE_SLOW_NODES_ENABLED_KEY: { boolean enable = (newVal == null ? DFS_NAMENODE_BLOCKPLACEMENTPOLICY_EXCLUDE_SLOW_NODES_ENABLED_DEFAULT : Boolean.parseBoolean(newVal)); result = Boolean.toString(enable); bm.setExcludeSlowNodesEnabled(enable); - } else if (property.equals(DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_KEY)) { +break; + } + case DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_KEY: { int maxSlowpeerCollectNodes = (newVal == null ? DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_DEFAULT : Integer.parseInt(newVal)); result = Integer.toString(maxSlowpeerCollectNodes); datanodeManager.setMaxSlowpeerCollectNodes(maxSlowpeerCollectNodes); - } else { -throw new IllegalArgumentException("Unexpected property " + -property + " in reconfigureSlowNodesParameters"); +break; + } + case DFS_DATANODE_PEER_STATS_ENABLED_KEY: { +Timer timer = new Timer(); +if (newVal != null && !newVal.equalsIgnoreCase("true") && !newVal.equalsIgnoreCase( +"false")) { + throw new ReconfigurationException(property, newVal, getConf().get(property), + new NumberFormatException(newVal + " is not boolean value")); Review Comment: Sounds good, let me change this. Thanks @tomscut Issue Time Tracking --- Worklog Id: (was: 758964) Time Spent: 1.5h (was: 1h 20m) > Reconfigure slow peer enable for Namenode > - > > Key: HDFS-16528 > URL: https://issues.apache.org/jira/browse/HDFS-16528 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > HDFS-16396 provides reconfig options for several configs associated with > slownodes in Datanode. Similarly, HDFS-16287 and HDFS-16327 have added some > slownodes related configs as the reconfig options in Namenode. > The purpose of this Jira is to add DFS_DATANODE_PEER_STATS_ENABLED_KEY as > reconfigurable option for Namenode (similar to how HDFS-16396 has included it > for Datanode). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16533) COMPOSITE_CRC failed between replicated file and striped file.
[ https://issues.apache.org/jira/browse/HDFS-16533?focusedWorklogId=758956&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758956 ] ASF GitHub Bot logged work on HDFS-16533: - Author: ASF GitHub Bot Created on: 20/Apr/22 04:26 Start Date: 20/Apr/22 04:26 Worklog Time Spent: 10m Work Description: jojochuang commented on code in PR #4155: URL: https://github.com/apache/hadoop/pull/4155#discussion_r853715096 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FileChecksumHelper.java: ## @@ -316,18 +317,22 @@ FileChecksum makeCompositeCrcResult() throws IOException { "Added blockCrc 0x{} for block index {} of size {}", Integer.toString(blockCrc, 16), i, block.getBlockSize()); } - - // NB: In some cases the located blocks have their block size adjusted - // explicitly based on the requested length, but not all cases; - // these numbers may or may not reflect actual sizes on disk. - long reportedLastBlockSize = - blockLocations.getLastLocatedBlock().getBlockSize(); - long consumedLastBlockLength = reportedLastBlockSize; - if (length - sumBlockLengths < reportedLastBlockSize) { -LOG.warn( -"Last block length {} is less than reportedLastBlockSize {}", -length - sumBlockLengths, reportedLastBlockSize); -consumedLastBlockLength = length - sumBlockLengths; + LocatedBlock nextBlock = locatedBlocks.get(i); + long consumedLastBlockLength = Math.min(length - sumBlockLengths, + nextBlock.getBlockSize()); + LocatedBlock lastBlock = blockLocations.getLastLocatedBlock(); + if (nextBlock.equals(lastBlock)) { Review Comment: Could you elaborate what this check is? Looking at the test case I assume these few lines distinguish replicated vs striped blocks. Am I right? How about turning them into a helper method that is more readable? Issue Time Tracking --- Worklog Id: (was: 758956) Time Spent: 1.5h (was: 1h 20m) > COMPOSITE_CRC failed between replicated file and striped file. > -- > > Key: HDFS-16533 > URL: https://issues.apache.org/jira/browse/HDFS-16533 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Attachments: HDFS-16533.001.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > After testing the COMPOSITE_CRC with some random length between replicated > file and striped file which has same data with replicated file, it failed. > Reproduce step like this: > {code:java} > @Test(timeout = 9) > public void testStripedAndReplicatedFileChecksum2() throws Exception { > int abnormalSize = (dataBlocks * 2 - 2) * blockSize + > (int) (blockSize * 0.5); > prepareTestFiles(abnormalSize, new String[] {stripedFile1, replicatedFile}); > int loopNumber = 100; > while (loopNumber-- > 0) { > int verifyLength = ThreadLocalRandom.current() > .nextInt(10, abnormalSize); > FileChecksum stripedFileChecksum1 = getFileChecksum(stripedFile1, > verifyLength, false); > FileChecksum replicatedFileChecksum = getFileChecksum(replicatedFile, > verifyLength, false); > if (checksumCombineMode.equals(ChecksumCombineMode.COMPOSITE_CRC.name())) > { > Assert.assertEquals(stripedFileChecksum1, replicatedFileChecksum); > } else { > Assert.assertNotEquals(stripedFileChecksum1, replicatedFileChecksum); > } > } > } {code} > And after tracing the root cause, `FileChecksumHelper#makeCompositeCrcResult` > maybe compute an error `consumedLastBlockLength` when updating checksum for > the last block of the fixed length which maybe not the last block in the file. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16528) Reconfigure slow peer enable for Namenode
[ https://issues.apache.org/jira/browse/HDFS-16528?focusedWorklogId=758914&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758914 ] ASF GitHub Bot logged work on HDFS-16528: - Author: ASF GitHub Bot Created on: 20/Apr/22 01:38 Start Date: 20/Apr/22 01:38 Worklog Time Spent: 10m Work Description: tomscut commented on code in PR #4186: URL: https://github.com/apache/hadoop/pull/4186#discussion_r853655682 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java: ## @@ -366,6 +365,21 @@ public class DatanodeManager { DFSConfigKeys.DFS_NAMENODE_BLOCKS_PER_POSTPONEDBLOCKS_RESCAN_KEY_DEFAULT); } + /** + * Determines whether slow peer tracker should be enabled. If dataNodePeerStatsEnabledVal is + * true, slow peer tracker is initialized. + * + * @param conf The configuration to use while initializing slowPeerTracker. + * @param timer Timer object for slowPeerTracker. + * @param dataNodePeerStatsEnabledVal To determine whether slow peer tracking should be enabled. + */ + public void initSlowPeerTracker(Configuration conf, Timer timer, + boolean dataNodePeerStatsEnabledVal) { +this.dataNodePeerStatsEnabled = dataNodePeerStatsEnabledVal; +this.slowPeerTracker = dataNodePeerStatsEnabled ? +new SlowPeerTracker(conf, timer) : null; Review Comment: If `this.slowPeerTracker` is set to null directly, may cause NPE. Issue Time Tracking --- Worklog Id: (was: 758914) Time Spent: 1h 20m (was: 1h 10m) > Reconfigure slow peer enable for Namenode > - > > Key: HDFS-16528 > URL: https://issues.apache.org/jira/browse/HDFS-16528 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > HDFS-16396 provides reconfig options for several configs associated with > slownodes in Datanode. Similarly, HDFS-16287 and HDFS-16327 have added some > slownodes related configs as the reconfig options in Namenode. > The purpose of this Jira is to add DFS_DATANODE_PEER_STATS_ENABLED_KEY as > reconfigurable option for Namenode (similar to how HDFS-16396 has included it > for Datanode). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16528) Reconfigure slow peer enable for Namenode
[ https://issues.apache.org/jira/browse/HDFS-16528?focusedWorklogId=758913&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758913 ] ASF GitHub Bot logged work on HDFS-16528: - Author: ASF GitHub Bot Created on: 20/Apr/22 01:37 Start Date: 20/Apr/22 01:37 Worklog Time Spent: 10m Work Description: tomscut commented on code in PR #4186: URL: https://github.com/apache/hadoop/pull/4186#discussion_r853655682 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java: ## @@ -366,6 +365,21 @@ public class DatanodeManager { DFSConfigKeys.DFS_NAMENODE_BLOCKS_PER_POSTPONEDBLOCKS_RESCAN_KEY_DEFAULT); } + /** + * Determines whether slow peer tracker should be enabled. If dataNodePeerStatsEnabledVal is + * true, slow peer tracker is initialized. + * + * @param conf The configuration to use while initializing slowPeerTracker. + * @param timer Timer object for slowPeerTracker. + * @param dataNodePeerStatsEnabledVal To determine whether slow peer tracking should be enabled. + */ + public void initSlowPeerTracker(Configuration conf, Timer timer, + boolean dataNodePeerStatsEnabledVal) { +this.dataNodePeerStatsEnabled = dataNodePeerStatsEnabledVal; +this.slowPeerTracker = dataNodePeerStatsEnabled ? +new SlowPeerTracker(conf, timer) : null; Review Comment: If this.slowPeerTracker is set to null, may cause NPE. Issue Time Tracking --- Worklog Id: (was: 758913) Time Spent: 1h 10m (was: 1h) > Reconfigure slow peer enable for Namenode > - > > Key: HDFS-16528 > URL: https://issues.apache.org/jira/browse/HDFS-16528 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > HDFS-16396 provides reconfig options for several configs associated with > slownodes in Datanode. Similarly, HDFS-16287 and HDFS-16327 have added some > slownodes related configs as the reconfig options in Namenode. > The purpose of this Jira is to add DFS_DATANODE_PEER_STATS_ENABLED_KEY as > reconfigurable option for Namenode (similar to how HDFS-16396 has included it > for Datanode). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16528) Reconfigure slow peer enable for Namenode
[ https://issues.apache.org/jira/browse/HDFS-16528?focusedWorklogId=758912&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758912 ] ASF GitHub Bot logged work on HDFS-16528: - Author: ASF GitHub Bot Created on: 20/Apr/22 01:30 Start Date: 20/Apr/22 01:30 Worklog Time Spent: 10m Work Description: tomscut commented on code in PR #4186: URL: https://github.com/apache/hadoop/pull/4186#discussion_r853653286 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java: ## @@ -2406,27 +2412,49 @@ String reconfigureSlowNodesParameters(final DatanodeManager datanodeManager, namesystem.writeLock(); String result; try { - if (property.equals(DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_KEY)) { -boolean enable = (newVal == null ? DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_DEFAULT : + switch (property) { + case DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_KEY: { +boolean enable = (newVal == null ? +DFS_NAMENODE_AVOID_SLOW_DATANODE_FOR_READ_DEFAULT : Boolean.parseBoolean(newVal)); result = Boolean.toString(enable); datanodeManager.setAvoidSlowDataNodesForReadEnabled(enable); - } else if (property.equals( -DFS_NAMENODE_BLOCKPLACEMENTPOLICY_EXCLUDE_SLOW_NODES_ENABLED_KEY)) { +break; + } + case DFS_NAMENODE_BLOCKPLACEMENTPOLICY_EXCLUDE_SLOW_NODES_ENABLED_KEY: { boolean enable = (newVal == null ? DFS_NAMENODE_BLOCKPLACEMENTPOLICY_EXCLUDE_SLOW_NODES_ENABLED_DEFAULT : Boolean.parseBoolean(newVal)); result = Boolean.toString(enable); bm.setExcludeSlowNodesEnabled(enable); - } else if (property.equals(DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_KEY)) { +break; + } + case DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_KEY: { int maxSlowpeerCollectNodes = (newVal == null ? DFS_NAMENODE_MAX_SLOWPEER_COLLECT_NODES_DEFAULT : Integer.parseInt(newVal)); result = Integer.toString(maxSlowpeerCollectNodes); datanodeManager.setMaxSlowpeerCollectNodes(maxSlowpeerCollectNodes); - } else { -throw new IllegalArgumentException("Unexpected property " + -property + " in reconfigureSlowNodesParameters"); +break; + } + case DFS_DATANODE_PEER_STATS_ENABLED_KEY: { +Timer timer = new Timer(); +if (newVal != null && !newVal.equalsIgnoreCase("true") && !newVal.equalsIgnoreCase( +"false")) { + throw new ReconfigurationException(property, newVal, getConf().get(property), + new NumberFormatException(newVal + " is not boolean value")); Review Comment: Hi @virajjasani , should here throw an IllegalArgumentException? Issue Time Tracking --- Worklog Id: (was: 758912) Time Spent: 1h (was: 50m) > Reconfigure slow peer enable for Namenode > - > > Key: HDFS-16528 > URL: https://issues.apache.org/jira/browse/HDFS-16528 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > HDFS-16396 provides reconfig options for several configs associated with > slownodes in Datanode. Similarly, HDFS-16287 and HDFS-16327 have added some > slownodes related configs as the reconfig options in Namenode. > The purpose of this Jira is to add DFS_DATANODE_PEER_STATS_ENABLED_KEY as > reconfigurable option for Namenode (similar to how HDFS-16396 has included it > for Datanode). -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfered to observer state
[ https://issues.apache.org/jira/browse/HDFS-16547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tomscut updated HDFS-16547: --- Summary: [SBN read] Namenode in safe mode should not be transfered to observer state (was: [SBN read] Namenode in safe mode should not be transfer to observer state) > [SBN read] Namenode in safe mode should not be transfered to observer state > --- > > Key: HDFS-16547 > URL: https://issues.apache.org/jira/browse/HDFS-16547 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Currently, when a Namenode is in safemode(under starting or enter safemode > manually), we can transfer this Namenode to Observer by command. This > Observer node may receive many requests and then throw a SafemodeException, > this causes unnecessary failover on the client. > So Namenode in safe mode should not be transfer to observer state. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16521) DFS API to retrieve slow datanodes
[ https://issues.apache.org/jira/browse/HDFS-16521?focusedWorklogId=758849&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758849 ] ASF GitHub Bot logged work on HDFS-16521: - Author: ASF GitHub Bot Created on: 19/Apr/22 22:30 Start Date: 19/Apr/22 22:30 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4107: URL: https://github.com/apache/hadoop/pull/4107#issuecomment-1103229191 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 5s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | buf | 0m 1s | | buf was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 3 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 48s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 28m 17s | | trunk passed | | +1 :green_heart: | compile | 7m 0s | | trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 6m 31s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 37s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 38s | | trunk passed | | +1 :green_heart: | javadoc | 3m 0s | | trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 3m 39s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 8m 8s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 47s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 27s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 56s | | the patch passed | | +1 :green_heart: | compile | 6m 50s | | the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | cc | 6m 50s | | the patch passed | | -1 :x: | javac | 6m 50s | [/results-compile-javac-hadoop-hdfs-project-jdkUbuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4107/5/artifact/out/results-compile-javac-hadoop-hdfs-project-jdkUbuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04.txt) | hadoop-hdfs-project-jdkUbuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 generated 1 new + 651 unchanged - 0 fixed = 652 total (was 651) | | +1 :green_heart: | compile | 6m 19s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | cc | 6m 19s | | the patch passed | | -1 :x: | javac | 6m 19s | [/results-compile-javac-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4107/5/artifact/out/results-compile-javac-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07.txt) | hadoop-hdfs-project-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 generated 1 new + 629 unchanged - 0 fixed = 630 total (was 629) | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 19s | | hadoop-hdfs-project: The patch generated 0 new + 456 unchanged - 1 fixed = 456 total (was 457) | | +1 :green_heart: | mvnsite | 3m 27s | | the patch passed | | +1 :green_heart: | javadoc | 2m 37s | | the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 3m 28s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 8m 58s | | the patch passed | | +1 :green_heart: | shadedclient | 25m 58s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 27s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 392m 41s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4107/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green
[jira] [Work logged] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfer to observer state
[ https://issues.apache.org/jira/browse/HDFS-16547?focusedWorklogId=758633&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758633 ] ASF GitHub Bot logged work on HDFS-16547: - Author: ASF GitHub Bot Created on: 19/Apr/22 17:28 Start Date: 19/Apr/22 17:28 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4201: URL: https://github.com/apache/hadoop/pull/4201#issuecomment-1102907082 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 40s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 41s | | trunk passed | | +1 :green_heart: | compile | 1m 44s | | trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 1m 38s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 25s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 43s | | trunk passed | | +1 :green_heart: | javadoc | 1m 23s | | trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 1m 49s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 49s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 17s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 25s | | the patch passed | | +1 :green_heart: | compile | 1m 26s | | the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javac | 1m 26s | | the patch passed | | +1 :green_heart: | compile | 1m 19s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 19s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 2s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 27s | | the patch passed | | +1 :green_heart: | javadoc | 0m 58s | | the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 1m 33s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 20s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 33s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 249m 29s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 13s | | The patch does not generate ASF License warnings. | | | | 360m 3s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4201 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 17d48942fd53 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 0ed4aa1dcaeb267708033f3867e8b9b2ee463944 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/1/testReport/ | | Max. process+thread count | 3300 (vs. ulimit of 5500) | | modules | C: hadoop-
[jira] [Work logged] (HDFS-16544) EC decoding failed due to invalid buffer
[ https://issues.apache.org/jira/browse/HDFS-16544?focusedWorklogId=758475&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758475 ] ASF GitHub Bot logged work on HDFS-16544: - Author: ASF GitHub Bot Created on: 19/Apr/22 13:43 Start Date: 19/Apr/22 13:43 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4179: URL: https://github.com/apache/hadoop/pull/4179#issuecomment-1102674037 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 54s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 42s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 28m 21s | | trunk passed | | +1 :green_heart: | compile | 6m 50s | | trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 6m 27s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 31s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 50s | | trunk passed | | +1 :green_heart: | javadoc | 2m 9s | | trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 2m 32s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 36s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 48s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 26s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 16s | | the patch passed | | +1 :green_heart: | compile | 6m 41s | | the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javac | 6m 41s | | the patch passed | | +1 :green_heart: | compile | 6m 21s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 6m 21s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 16s | | the patch passed | | +1 :green_heart: | mvnsite | 2m 24s | | the patch passed | | +1 :green_heart: | javadoc | 1m 42s | | the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 2m 9s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 24s | | the patch passed | | +1 :green_heart: | shadedclient | 25m 54s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 27s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 370m 32s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4179/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 6s | | The patch does not generate ASF License warnings. | | | | 527m 12s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots | | | hadoop.hdfs.TestClientProtocolForPipelineRecovery | | | hadoop.hdfs.TestReplaceDatanodeFailureReplication | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4179/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4179 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 542720fc08b8 4.15.0-166-generic #174-Ubuntu SMP Wed Dec 8 19:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 2b6adbcb61fa76d0147dfb1365ccb3a2ca3360a6 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Mul
[jira] [Updated] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfer to observer state
[ https://issues.apache.org/jira/browse/HDFS-16547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16547: -- Labels: pull-request-available (was: ) > [SBN read] Namenode in safe mode should not be transfer to observer state > - > > Key: HDFS-16547 > URL: https://issues.apache.org/jira/browse/HDFS-16547 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, when a Namenode is in safemode(under starting or enter safemode > manually), we can transfer this Namenode to Observer by command. This > Observer node may receive many requests and then throw a SafemodeException, > this causes unnecessary failover on the client. > So Namenode in safe mode should not be transfer to observer state. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfer to observer state
[ https://issues.apache.org/jira/browse/HDFS-16547?focusedWorklogId=758400&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758400 ] ASF GitHub Bot logged work on HDFS-16547: - Author: ASF GitHub Bot Created on: 19/Apr/22 11:27 Start Date: 19/Apr/22 11:27 Worklog Time Spent: 10m Work Description: tomscut opened a new pull request, #4201: URL: https://github.com/apache/hadoop/pull/4201 JIRA: HDFS-16547. Currently, when a Namenode is in safemode(under starting or enter safemode manually), we can transfer this Namenode to Observer by command. This Observer node may receive many requests and then throw a SafemodeException, this causes unnecessary failover on the client. So Namenode in safe mode should not be transfer to observer state. Issue Time Tracking --- Worklog Id: (was: 758400) Remaining Estimate: 0h Time Spent: 10m > [SBN read] Namenode in safe mode should not be transfer to observer state > - > > Key: HDFS-16547 > URL: https://issues.apache.org/jira/browse/HDFS-16547 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: tomscut >Assignee: tomscut >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently, when a Namenode is in safemode(under starting or enter safemode > manually), we can transfer this Namenode to Observer by command. This > Observer node may receive many requests and then throw a SafemodeException, > this causes unnecessary failover on the client. > So Namenode in safe mode should not be transfer to observer state. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfer to observer state
tomscut created HDFS-16547: -- Summary: [SBN read] Namenode in safe mode should not be transfer to observer state Key: HDFS-16547 URL: https://issues.apache.org/jira/browse/HDFS-16547 Project: Hadoop HDFS Issue Type: Improvement Reporter: tomscut Assignee: tomscut Currently, when a Namenode is in safemode(under starting or enter safemode manually), we can transfer this Namenode to Observer by command. This Observer node may receive many requests and then throw a SafemodeException, this causes unnecessary failover on the client. So Namenode in safe mode should not be transfer to observer state. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16531) Avoid setReplication logging an edit record if old replication equals the new value
[ https://issues.apache.org/jira/browse/HDFS-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell updated HDFS-16531: - Fix Version/s: 3.2.4 3.3.4 > Avoid setReplication logging an edit record if old replication equals the new > value > --- > > Key: HDFS-16531 > URL: https://issues.apache.org/jira/browse/HDFS-16531 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.4 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > I recently came across a NN log where about 800k setRep calls were made, > setting the replication from 3 to 3 - ie leaving it unchanged. > Even in a case like this, we log an edit record, an audit log, and perform > some quota checks etc. > I believe it should be possible to avoid some of the work if we check for > oldRep == newRep and jump out of the method early. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16355) Improve the description of dfs.block.scanner.volume.bytes.per.second
[ https://issues.apache.org/jira/browse/HDFS-16355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-16355: -- Fix Version/s: 3.4.0 > Improve the description of dfs.block.scanner.volume.bytes.per.second > > > Key: HDFS-16355 > URL: https://issues.apache.org/jira/browse/HDFS-16355 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, hdfs >Affects Versions: 3.3.1 >Reporter: guophilipse >Assignee: guophilipse >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.3 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > datanode block scanner will be disabled if > `dfs.block.scanner.volume.bytes.per.second` is configured less then or equal > to zero, we can improve the desciption -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16501) Print the exception when reporting a bad block
[ https://issues.apache.org/jira/browse/HDFS-16501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HDFS-16501. --- Resolution: Fixed > Print the exception when reporting a bad block > -- > > Key: HDFS-16501 > URL: https://issues.apache.org/jira/browse/HDFS-16501 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: qinyuren >Assignee: qinyuren >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.3 > > Attachments: image-2022-03-10-19-27-31-622.png > > Time Spent: 1h 10m > Remaining Estimate: 0h > > !image-2022-03-10-19-27-31-622.png|width=847,height=27! > Currently, volumeScanner will find bad block and report it to namenode > without printing the reason why the block is a bad block. I think we should > be better print the exception in log file. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11041) Unable to unregister FsDatasetState MBean if DataNode is shutdown twice
[ https://issues.apache.org/jira/browse/HDFS-11041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-11041: -- Fix Version/s: 3.4.0 > Unable to unregister FsDatasetState MBean if DataNode is shutdown twice > --- > > Key: HDFS-11041 > URL: https://issues.apache.org/jira/browse/HDFS-11041 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Trivial > Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.3 > > Attachments: HDFS-11041.01.patch, HDFS-11041.02.patch, > HDFS-11041.03.patch > > > I saw error message like the following in some tests > {noformat} > 2016-10-21 04:09:03,900 [main] WARN util.MBeans > (MBeans.java:unregister(114)) - Error unregistering > Hadoop:service=DataNode,name=FSDatasetState-33cd714c-0b1a-471f-8efe-f431d7d874bc > javax.management.InstanceNotFoundException: > Hadoop:service=DataNode,name=FSDatasetState-33cd714c-0b1a-471f-8efe-f431d7d874bc > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546) > at org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:112) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:2127) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:2016) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1985) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1962) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1936) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1929) > at > org.apache.hadoop.hdfs.TestDatanodeReport.testDatanodeReport(TestDatanodeReport.java:144) > {noformat} > The test shuts down datanode, and then shutdown cluster, which shuts down the > a datanode twice. Resetting the FsDatasetSpi reference in DataNode to null > resolves the issue. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16501) Print the exception when reporting a bad block
[ https://issues.apache.org/jira/browse/HDFS-16501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-16501: -- Fix Version/s: 3.3.3 (was: 3.3.4) > Print the exception when reporting a bad block > -- > > Key: HDFS-16501 > URL: https://issues.apache.org/jira/browse/HDFS-16501 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: qinyuren >Assignee: qinyuren >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.3 > > Attachments: image-2022-03-10-19-27-31-622.png > > Time Spent: 1h 10m > Remaining Estimate: 0h > > !image-2022-03-10-19-27-31-622.png|width=847,height=27! > Currently, volumeScanner will find bad block and report it to namenode > without printing the reason why the block is a bad block. I think we should > be better print the exception in log file. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16428) Source path with storagePolicy cause wrong typeConsumed while rename
[ https://issues.apache.org/jira/browse/HDFS-16428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-16428: -- Fix Version/s: 3.4.0 > Source path with storagePolicy cause wrong typeConsumed while rename > > > Key: HDFS-16428 > URL: https://issues.apache.org/jira/browse/HDFS-16428 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Reporter: lei w >Assignee: lei w >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.3 > > Attachments: example.txt > > Time Spent: 2.5h > Remaining Estimate: 0h > > When compute quota in rename operation , we use storage policy of the target > directory to compute src quota usage. This will cause wrong value of > typeConsumed when source path was setted storage policy. I provided a unit > test to present this situation. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads
[ https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-16422: -- Fix Version/s: 3.4.0 > Fix thread safety of EC decoding during concurrent preads > - > > Key: HDFS-16422 > URL: https://issues.apache.org/jira/browse/HDFS-16422 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient, ec, erasure-coding >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.3 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Reading data on an erasure-coded file with missing replicas(internal block of > block group) will cause online reconstruction: read dataUnits part of data > and decode them into the target missing data. Each DFSStripedInputStream > object has a RawErasureDecoder object, and when we doing pread concurrently, > RawErasureDecoder.decode will be invoked concurrently too. > RawErasureDecoder.decode is not thread safe, as a result of that we get wrong > data from pread occasionally. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress
[ https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-16507: -- Fix Version/s: 3.4.0 > [SBN read] Avoid purging edit log which is in progress > -- > > Key: HDFS-16507 > URL: https://issues.apache.org/jira/browse/HDFS-16507 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: tomscut >Assignee: tomscut >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.3 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL > exception. It looks like it's purging edit logs which is in process. > According to the analysis, I suspect that the editlog which is in progress to > be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN > rolls edit its self. > The stack: > {code:java} > java.lang.Thread.getStackTrace(Thread.java:1552) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > > org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185) > > org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623) > > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388) > > org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620) > > org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512) > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177) > > org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515) > javax.servlet.http.HttpServlet.service(HttpServlet.java:710) > javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > org.eclipse.jetty.server.Server.handle(Server.java:539) > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) > > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJ
[jira] [Updated] (HDFS-16437) ReverseXML processor doesn't accept XML files without the SnapshotDiffSection.
[ https://issues.apache.org/jira/browse/HDFS-16437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-16437: -- Fix Version/s: 3.4.0 > ReverseXML processor doesn't accept XML files without the SnapshotDiffSection. > -- > > Key: HDFS-16437 > URL: https://issues.apache.org/jira/browse/HDFS-16437 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.1, 3.3.0 >Reporter: yanbin.zhang >Assignee: yanbin.zhang >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.3 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > In a cluster environment without snapshot, if you want to convert back to > fsimage through the generated xml, an error will be reported. > {code:java} > //代码占位符 > [test@test001 ~]$ hdfs oiv -p ReverseXML -i fsimage_0257220.xml > -o fsimage_0257220 > OfflineImageReconstructor failed: FSImage XML ended prematurely, without > including section(s) SnapshotDiffSection > java.io.IOException: FSImage XML ended prematurely, without including > section(s) SnapshotDiffSection > at > org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.processXml(OfflineImageReconstructor.java:1765) > at > org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageReconstructor.run(OfflineImageReconstructor.java:1842) > at > org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(OfflineImageViewerPB.java:211) > at > org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.main(OfflineImageViewerPB.java:149) > 22/01/25 15:56:52 INFO util.ExitUtil: Exiting with status 1: ExitException > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16544) EC decoding failed due to invalid buffer
[ https://issues.apache.org/jira/browse/HDFS-16544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qinyuren updated HDFS-16544: Description: In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we found an EC file decoding bug if more than one data block read failed. Currently, we found another bug trigger by #StatefulStripeReader.decode. If we read an EC file which {*}length more than one stripe{*}, and this file have *one data block* and *the first parity block* corrupted, this error will happen. {code:java} org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not allowing null at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132) at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:48) at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86) at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170) at org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435) at org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94) at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392) at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315) at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) {code} Let's say we use ec(6+3) and the data block[0] and the first parity block[6] are corrupted. # The readers for block[0] and block[6] will be closed after reading the first stripe of an EC file; # When the client reading the second stripe of the EC file, it will trigger #prepareParityChunk for block[6]. # The decodeInputs[6] will not be constructed because the reader for block[6] was closed. {code:java} boolean prepareParityChunk(int index) { Preconditions.checkState(index >= dataBlkNum && alignedStripe.chunks[index] == null); if (readerInfos[index] != null && readerInfos[index].shouldSkip) { alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING); // we have failed the block reader before return false; } final int parityIndex = index - dataBlkNum; ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate(); buf.position(cellSize * parityIndex); buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock); decodeInputs[index] = new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock); alignedStripe.chunks[index] = new StripingChunk(decodeInputs[index].getBuffer()); return true; } {code} was: In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we found an EC file decoding bug if more than one data block read failed. Currently, we found another bug trigger by #StatefulStripeReader.decode. If we read an EC file which {*}length more than one stripe{*}, and this file have *one data block* and *the first parity block* corrupted, this error will happen. {code:java} org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not allowing null at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132) at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:48) at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86) at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170) at org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435) at org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94) at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392) at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315) at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) {code} Let's say we use ec(6+3) and the data block[0] and the first parity block[6] are corrupted. # The readers for block[0] and block[6] will be closed after reading the first stripe of an EC file; # When the client reading the second stripe of the EC file, it will trigger #prepareParityChunk for block[6]. # The decodeInputs[6] will not be constructed due to the reader for block[6] was closed. {code:java} boolean prepareParityChunk(int index) { Preconditions.checkState(index >= dataBlkNum && alignedStripe.chunks[index] == null); if (readerInfos[index] != null && readerInfos[index].shouldSkip) { alignedStr
[jira] [Work logged] (HDFS-16538) EC decoding failed due to not enough valid inputs
[ https://issues.apache.org/jira/browse/HDFS-16538?focusedWorklogId=758345&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758345 ] ASF GitHub Bot logged work on HDFS-16538: - Author: ASF GitHub Bot Created on: 19/Apr/22 08:32 Start Date: 19/Apr/22 08:32 Worklog Time Spent: 10m Work Description: liubingxing commented on PR #4167: URL: https://github.com/apache/hadoop/pull/4167#issuecomment-1102290130 @tasanuma Thanks for the review and merged. I found another bug related to EC decoding in [HDFS-16538](http://https//issues.apache.org/jira/browse/HDFS-16538) , Please take a look. Thanks you very much. Issue Time Tracking --- Worklog Id: (was: 758345) Time Spent: 1h 20m (was: 1h 10m) > EC decoding failed due to not enough valid inputs > -- > > Key: HDFS-16538 > URL: https://issues.apache.org/jira/browse/HDFS-16538 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: qinyuren >Assignee: qinyuren >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.4 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, we found this error if the #StripeReader.readStripe() have more > than one block read failed. > We use the EC policy ec(6+3) in our cluster. > {code:java} > Caused by: org.apache.hadoop.HadoopIllegalArgumentException: No enough valid > inputs are provided, not recoverable > at > org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkInputBuffers(ByteBufferDecodingState.java:119) > at > org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:47) > at > org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86) > at > org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170) > at > org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:462) > at > org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94) > at > org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:406) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:327) > at > org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:420) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:892) > at java.base/java.io.DataInputStream.read(DataInputStream.java:149) > at java.base/java.io.DataInputStream.read(DataInputStream.java:149) > {code} > > {code:java} > while (!futures.isEmpty()) { > try { > StripingChunkReadResult r = StripedBlockUtil > .getNextCompletedStripedRead(service, futures, 0); > dfsStripedInputStream.updateReadStats(r.getReadStats()); > DFSClient.LOG.debug("Read task returned: {}, for stripe {}", > r, alignedStripe); > StripingChunk returnedChunk = alignedStripe.chunks[r.index]; > Preconditions.checkNotNull(returnedChunk); > Preconditions.checkState(returnedChunk.state == StripingChunk.PENDING); > if (r.state == StripingChunkReadResult.SUCCESSFUL) { > returnedChunk.state = StripingChunk.FETCHED; > alignedStripe.fetchedChunksNum++; > updateState4SuccessRead(r); > if (alignedStripe.fetchedChunksNum == dataBlkNum) { > clearFutures(); > break; > } > } else { > returnedChunk.state = StripingChunk.MISSING; > // close the corresponding reader > dfsStripedInputStream.closeReader(readerInfos[r.index]); > final int missing = alignedStripe.missingChunksNum; > alignedStripe.missingChunksNum++; > checkMissingBlocks(); > readDataForDecoding(); > readParityChunks(alignedStripe.missingChunksNum - missing); > } {code} > This error can be trigger by #StatefulStripeReader.decode. > The reason is that: > # If there are more than one *data block* read failed, the > #readDataForDecoding will be called multiple times; > # The *decodeInputs array* will be initialized repeatedly. > # The *parity* *data* in *decodeInputs array* which filled by > #readParityChunks previously will be set to null. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16544) EC decoding failed due to invalid buffer
[ https://issues.apache.org/jira/browse/HDFS-16544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qinyuren updated HDFS-16544: Description: In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we found an EC file decoding bug if more than one data block read failed. Currently, we found another bug trigger by #StatefulStripeReader.decode. If we read an EC file which {*}length more than one stripe{*}, and this file have *one data block* and *the first parity block* corrupted, this error will happen. {code:java} org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not allowing null at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132) at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:48) at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86) at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170) at org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435) at org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94) at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392) at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315) at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) {code} Let's say we use ec(6+3) and the data block[0] and the first parity block[6] are corrupted. # The readers for block[0] and block[6] will be closed after reading the first stripe of an EC file; # When the client reading the second stripe of the EC file, it will trigger #prepareParityChunk for block[6]. # The decodeInputs[6] will not be constructed due to the reader for block[6] was closed. {code:java} boolean prepareParityChunk(int index) { Preconditions.checkState(index >= dataBlkNum && alignedStripe.chunks[index] == null); if (readerInfos[index] != null && readerInfos[index].shouldSkip) { alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING); // we have failed the block reader before return false; } final int parityIndex = index - dataBlkNum; ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate(); buf.position(cellSize * parityIndex); buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock); decodeInputs[index] = new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock); alignedStripe.chunks[index] = new StripingChunk(decodeInputs[index].getBuffer()); return true; } {code} was: In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we found an EC file decoding bug if more than one data block read failed. Currently, we found another bug trigger by #StatefulStripeReader.decode. If we read an EC file which {*}length more than one stripe{*}, and this file have *one data block* and *the first parity block* corrupted, this error will happen. {code:java} org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not allowing null at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132) at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.(ByteBufferDecodingState.java:48) at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86) at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170) at org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435) at org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94) at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392) at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315) at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) {code} > EC decoding failed due to invalid buffer > > > Key: HDFS-16544 > URL: https://issues.apache.org/jira/browse/HDFS-16544 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: qinyuren >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > In [HDFS-16538|http://https//issues.apache.org/jira/browse/HDFS-16538] , we > found an EC file decoding bug if more than one data block read failed. > Currently, we found ano
[jira] [Updated] (HDFS-14750) RBF: Improved isolation for downstream name nodes. {Dynamic}
[ https://issues.apache.org/jira/browse/HDFS-14750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-14750: -- Labels: pull-request-available (was: ) > RBF: Improved isolation for downstream name nodes. {Dynamic} > > > Key: HDFS-14750 > URL: https://issues.apache.org/jira/browse/HDFS-14750 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This Jira tracks the work around dynamic allocation of resources in routers > for downstream hdfs clusters. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-14750) RBF: Improved isolation for downstream name nodes. {Dynamic}
[ https://issues.apache.org/jira/browse/HDFS-14750?focusedWorklogId=758337&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758337 ] ASF GitHub Bot logged work on HDFS-14750: - Author: ASF GitHub Bot Created on: 19/Apr/22 07:56 Start Date: 19/Apr/22 07:56 Worklog Time Spent: 10m Work Description: kokonguyen191 opened a new pull request, #4199: URL: https://github.com/apache/hadoop/pull/4199 ### Description of PR Add a `DynamicRouterRpcFairnessPolicyController` class that resizes permit capacity periodically based on traffic to namespaces. ### How was this patch tested? Unit tests and local deployment. ### For code changes: - [x] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? Issue Time Tracking --- Worklog Id: (was: 758337) Remaining Estimate: 0h Time Spent: 10m > RBF: Improved isolation for downstream name nodes. {Dynamic} > > > Key: HDFS-14750 > URL: https://issues.apache.org/jira/browse/HDFS-14750 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: CR Hota >Assignee: CR Hota >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This Jira tracks the work around dynamic allocation of resources in routers > for downstream hdfs clusters. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org