[jira] [Commented] (HDFS-17453) IncrementalBlockReport can have race condition with Edit Log Tailer
[ https://issues.apache.org/jira/browse/HDFS-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834472#comment-17834472 ] ASF GitHub Bot commented on HDFS-17453: --- goiri commented on code in PR #6708: URL: https://github.com/apache/hadoop/pull/6708#discussion_r1554465107 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestIncrementalBlockReports.java: ## @@ -215,4 +229,95 @@ public void testReplaceReceivedBlock() throws InterruptedException, IOException cluster = null; } } + + @Test + public void testIBRRaceCondition() throws Exception { +cluster.shutdown(); +Configuration conf = new Configuration(); +HAUtil.setAllowStandbyReads(conf, true); +conf.setInt(DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_KEY, 1); +cluster = new MiniDFSCluster.Builder(conf) +.nnTopology(MiniDFSNNTopology.simpleHATopology()) +.numDataNodes(3) +.build(); +try { + cluster.waitActive(); + cluster.transitionToActive(0); + + NameNode nn1 = cluster.getNameNode(0); + NameNode nn2 = cluster.getNameNode(1); + FileSystem fs = HATestUtil.configureFailoverFs(cluster, conf); + List ibrsToStandby = new ArrayList<>(); + List spies = new ArrayList<>(); + Phaser ibrPhaser = new Phaser(1); + for (DataNode dn : cluster.getDataNodes()) { +DatanodeProtocolClientSideTranslatorPB nnSpy = +InternalDataNodeTestUtils.spyOnBposToNN(dn, nn2); +doAnswer((inv) -> { + for (StorageReceivedDeletedBlocks srdb : + inv.getArgument(2, StorageReceivedDeletedBlocks[].class)) { +for (ReceivedDeletedBlockInfo block : srdb.getBlocks()) { + if (block.getStatus().equals(BlockStatus.RECEIVED_BLOCK)) { +ibrPhaser.arriveAndDeregister(); + } +} + } + ibrsToStandby.add(inv); + return null; +}).when(nnSpy).blockReceivedAndDeleted( +any(DatanodeRegistration.class), +anyString(), +any(StorageReceivedDeletedBlocks[].class)); +spies.add(nnSpy); + } + + Thread.sleep(1000); Review Comment: Can we do better than sleep? ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingDataNodeMessages.java: ## @@ -95,16 +95,27 @@ void removeAllMessagesForDatanode(DatanodeDescriptor dn) { void enqueueReportedBlock(DatanodeStorageInfo storageInfo, Block block, ReplicaState reportedState) { +long genStamp = block.getGenerationStamp(); +Queue queue = null; if (BlockIdManager.isStripedBlockID(block.getBlockId())) { Block blkId = new Block(BlockIdManager.convertToStripedID(block .getBlockId())); - getBlockQueue(blkId).add( - new ReportedBlockInfo(storageInfo, new Block(block), reportedState)); + queue = getBlockQueue(blkId); } else { block = new Block(block); - getBlockQueue(block).add( - new ReportedBlockInfo(storageInfo, block, reportedState)); + queue = getBlockQueue(block); } +// We only want the latest non-future reported block to be queued for each +// DataNode. Otherwise, there can be a race condition that causes an old +// reported block to be kept in the queue until the SNN switches to ANN and +// the old reported block will be processed and marked as corrupt by the ANN. +// See HDFS-17453 +int size = queue.size(); +if (queue.removeIf(rbi -> rbi.storageInfo.equals(storageInfo) && Review Comment: We could make this more robus to nulls with: ``` void enqueueReportedBlock(DatanodeStorageInfo storageInfo, Block block, ReplicaState reportedState) { if (storageInfo == null || block == null || reportedState == null) { return; } ... if (queue.removeIf(rbi -> storageInfo.equals(rbi.storageInfo) && ... ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestIncrementalBlockReports.java: ## @@ -215,4 +229,95 @@ public void testReplaceReceivedBlock() throws InterruptedException, IOException cluster = null; } } + + @Test + public void testIBRRaceCondition() throws Exception { +cluster.shutdown(); +Configuration conf = new Configuration(); +HAUtil.setAllowStandbyReads(conf, true); +conf.setInt(DFSConfigKeys.DFS_HA_TAILEDITS_PERIOD_KEY, 1); +cluster = new MiniDFSCluster.Builder(conf) +.nnTopology(MiniDFSNNTopology.simpleHATopology()) +.numDataNodes(3) +.build(); +try { + cluster.waitActive(); + cluster.transitionToActive(0); + + NameNode nn1 = cluster.getNameNode(0); + NameNode nn2 = cluster.getNameNode(1); + FileSystem fs =
[jira] [Commented] (HDFS-17453) IncrementalBlockReport can have race condition with Edit Log Tailer
[ https://issues.apache.org/jira/browse/HDFS-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834471#comment-17834471 ] ASF GitHub Bot commented on HDFS-17453: --- dannytbecker commented on PR #6708: URL: https://github.com/apache/hadoop/pull/6708#issuecomment-2040837894 @kihwal Could you take a look at my PR? I think it addresses the issue you mentioned here https://issues.apache.org/jira/browse/HDFS-14941?focusedCommentId=17140156=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17140156 > IncrementalBlockReport can have race condition with Edit Log Tailer > --- > > Key: HDFS-17453 > URL: https://issues.apache.org/jira/browse/HDFS-17453 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover, ha, hdfs, namenode >Affects Versions: 3.3.0, 3.3.1, 2.10.2, 3.3.2, 3.3.5, 3.3.4, 3.3.6 >Reporter: Danny Becker >Assignee: Danny Becker >Priority: Major > Labels: pull-request-available > > h2. Summary > There is a race condition between IncrementalBlockReports (IBR) and > EditLogTailer in Standby NameNode (SNN) which can lead to leaked IBRs and > false corrupt blocks after HA Failover. The race condition occurs when the > SNN loads the edit logs before it receives the block reports from DataNode > (DN). > h2. Example > In the following example there is a block (b1) with 3 generation stamps (gs1, > gs2, gs3). > # SNN1 loads edit logs for b1gs1 and b1gs2. > # DN1 sends the IBR for b1gs1 to SNN1. > # SNN1 will determine that the reported block b1gs1 from DN1 is corrupt and > it will be queued for later. > [BlockManager.java|https://github.com/apache/hadoop/blob/6ed73896f6e8b4b7c720eff64193cb30b3e77fb2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3447C1-L3464C6] > {code:java} > BlockToMarkCorrupt c = checkReplicaCorrupt( > block, reportedState, storedBlock, ucState, dn); > if (c != null) { > if (shouldPostponeBlocksFromFuture) { > // If the block is an out-of-date generation stamp or state, > // but we're the standby, we shouldn't treat it as corrupt, > // but instead just queue it for later processing. > // Storing the reported block for later processing, as that is what > // comes from the IBR / FBR and hence what we should use to compare > // against the memory state. > // See HDFS-6289 and HDFS-15422 for more context. > queueReportedBlock(storageInfo, block, reportedState, > QUEUE_REASON_CORRUPT_STATE); > } else { > toCorrupt.add(c); > } > return storedBlock; > } {code} > # DN1 sends IBR for b1gs2 and b1gs3 to SNN1. > # SNN1 processes b1sg2 and updates the blocks map. > # SNN1 queues b1gs3 for later because it determines that b1gs3 is a future > genstamp. > # SNN1 loads b1gs3 edit logs and processes the queued reports for b1. > # SNN1 processes b1gs1 first and puts it back in the queue. > # SNN1 processes b1gs3 next and updates the blocks map. > # Later, SNN1 becomes the Active NameNode (ANN) during an HA Failover. > # SNN1 will catch to the latest edit logs, then process all queued block > reports to become the ANN. > # ANN1 will process b1gs1 and mark it as corrupt. > If the example above happens for every DN which stores b1, then when the HA > failover happens, b1 will be incorrectly marked as corrupt. This will be > fixed when the first DN sends a FullBlockReport or an IBR for b1. > h2. Logs from Active Cluster > I added the following logs to confirm this issue in an active cluster: > {code:java} > BlockToMarkCorrupt c = checkReplicaCorrupt( > block, reportedState, storedBlock, ucState, dn); > if (c != null) { > DatanodeStorageInfo storedStorageInfo = storedBlock.findStorageInfo(dn); > LOG.info("Found corrupt block {} [{}, {}] from DN {}. Stored block {} from > DN {}", > block, reportedState.name(), ucState.name(), storageInfo, storedBlock, > storedStorageInfo); > if (storageInfo.equals(storedStorageInfo) && > storedBlock.getGenerationStamp() > block.getGenerationStamp()) { > LOG.info("Stored Block {} from the same DN {} has a newer GenStamp." + > storedBlock, storedStorageInfo); > } > if (shouldPostponeBlocksFromFuture) { > // If the block is an out-of-date generation stamp or state, > // but we're the standby, we shouldn't treat it as corrupt, > // but instead just queue it for later processing. > // Storing the reported block for later processing, as that is what > // comes from the IBR / FBR and hence what we should use to compare > // against the memory state. > // See HDFS-6289 and HDFS-15422 for more context. >
[jira] [Commented] (HDFS-17453) IncrementalBlockReport can have race condition with Edit Log Tailer
[ https://issues.apache.org/jira/browse/HDFS-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834470#comment-17834470 ] ASF GitHub Bot commented on HDFS-17453: --- dannytbecker commented on PR #6708: URL: https://github.com/apache/hadoop/pull/6708#issuecomment-2040837517 @sodonnel Could you take a look at my PR. I think it addresses the issue you mentioned here https://issues.apache.org/jira/browse/HDFS-15422?focusedCommentId=17287194=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17287194 > IncrementalBlockReport can have race condition with Edit Log Tailer > --- > > Key: HDFS-17453 > URL: https://issues.apache.org/jira/browse/HDFS-17453 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover, ha, hdfs, namenode >Affects Versions: 3.3.0, 3.3.1, 2.10.2, 3.3.2, 3.3.5, 3.3.4, 3.3.6 >Reporter: Danny Becker >Assignee: Danny Becker >Priority: Major > Labels: pull-request-available > > h2. Summary > There is a race condition between IncrementalBlockReports (IBR) and > EditLogTailer in Standby NameNode (SNN) which can lead to leaked IBRs and > false corrupt blocks after HA Failover. The race condition occurs when the > SNN loads the edit logs before it receives the block reports from DataNode > (DN). > h2. Example > In the following example there is a block (b1) with 3 generation stamps (gs1, > gs2, gs3). > # SNN1 loads edit logs for b1gs1 and b1gs2. > # DN1 sends the IBR for b1gs1 to SNN1. > # SNN1 will determine that the reported block b1gs1 from DN1 is corrupt and > it will be queued for later. > [BlockManager.java|https://github.com/apache/hadoop/blob/6ed73896f6e8b4b7c720eff64193cb30b3e77fb2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3447C1-L3464C6] > {code:java} > BlockToMarkCorrupt c = checkReplicaCorrupt( > block, reportedState, storedBlock, ucState, dn); > if (c != null) { > if (shouldPostponeBlocksFromFuture) { > // If the block is an out-of-date generation stamp or state, > // but we're the standby, we shouldn't treat it as corrupt, > // but instead just queue it for later processing. > // Storing the reported block for later processing, as that is what > // comes from the IBR / FBR and hence what we should use to compare > // against the memory state. > // See HDFS-6289 and HDFS-15422 for more context. > queueReportedBlock(storageInfo, block, reportedState, > QUEUE_REASON_CORRUPT_STATE); > } else { > toCorrupt.add(c); > } > return storedBlock; > } {code} > # DN1 sends IBR for b1gs2 and b1gs3 to SNN1. > # SNN1 processes b1sg2 and updates the blocks map. > # SNN1 queues b1gs3 for later because it determines that b1gs3 is a future > genstamp. > # SNN1 loads b1gs3 edit logs and processes the queued reports for b1. > # SNN1 processes b1gs1 first and puts it back in the queue. > # SNN1 processes b1gs3 next and updates the blocks map. > # Later, SNN1 becomes the Active NameNode (ANN) during an HA Failover. > # SNN1 will catch to the latest edit logs, then process all queued block > reports to become the ANN. > # ANN1 will process b1gs1 and mark it as corrupt. > If the example above happens for every DN which stores b1, then when the HA > failover happens, b1 will be incorrectly marked as corrupt. This will be > fixed when the first DN sends a FullBlockReport or an IBR for b1. > h2. Logs from Active Cluster > I added the following logs to confirm this issue in an active cluster: > {code:java} > BlockToMarkCorrupt c = checkReplicaCorrupt( > block, reportedState, storedBlock, ucState, dn); > if (c != null) { > DatanodeStorageInfo storedStorageInfo = storedBlock.findStorageInfo(dn); > LOG.info("Found corrupt block {} [{}, {}] from DN {}. Stored block {} from > DN {}", > block, reportedState.name(), ucState.name(), storageInfo, storedBlock, > storedStorageInfo); > if (storageInfo.equals(storedStorageInfo) && > storedBlock.getGenerationStamp() > block.getGenerationStamp()) { > LOG.info("Stored Block {} from the same DN {} has a newer GenStamp." + > storedBlock, storedStorageInfo); > } > if (shouldPostponeBlocksFromFuture) { > // If the block is an out-of-date generation stamp or state, > // but we're the standby, we shouldn't treat it as corrupt, > // but instead just queue it for later processing. > // Storing the reported block for later processing, as that is what > // comes from the IBR / FBR and hence what we should use to compare > // against the memory state. > // See HDFS-6289 and HDFS-15422 for more context. >
[jira] [Commented] (HDFS-17453) IncrementalBlockReport can have race condition with Edit Log Tailer
[ https://issues.apache.org/jira/browse/HDFS-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834468#comment-17834468 ] ASF GitHub Bot commented on HDFS-17453: --- dannytbecker commented on PR #6708: URL: https://github.com/apache/hadoop/pull/6708#issuecomment-2040836033 @goiri I have added the unit test which reproduces the exact race condition. I confirmed this by testing the unit test against a branch without the fix and caught the false corrupt replicas. > IncrementalBlockReport can have race condition with Edit Log Tailer > --- > > Key: HDFS-17453 > URL: https://issues.apache.org/jira/browse/HDFS-17453 > Project: Hadoop HDFS > Issue Type: Bug > Components: auto-failover, ha, hdfs, namenode >Affects Versions: 3.3.0, 3.3.1, 2.10.2, 3.3.2, 3.3.5, 3.3.4, 3.3.6 >Reporter: Danny Becker >Assignee: Danny Becker >Priority: Major > Labels: pull-request-available > > h2. Summary > There is a race condition between IncrementalBlockReports (IBR) and > EditLogTailer in Standby NameNode (SNN) which can lead to leaked IBRs and > false corrupt blocks after HA Failover. The race condition occurs when the > SNN loads the edit logs before it receives the block reports from DataNode > (DN). > h2. Example > In the following example there is a block (b1) with 3 generation stamps (gs1, > gs2, gs3). > # SNN1 loads edit logs for b1gs1 and b1gs2. > # DN1 sends the IBR for b1gs1 to SNN1. > # SNN1 will determine that the reported block b1gs1 from DN1 is corrupt and > it will be queued for later. > [BlockManager.java|https://github.com/apache/hadoop/blob/6ed73896f6e8b4b7c720eff64193cb30b3e77fb2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3447C1-L3464C6] > {code:java} > BlockToMarkCorrupt c = checkReplicaCorrupt( > block, reportedState, storedBlock, ucState, dn); > if (c != null) { > if (shouldPostponeBlocksFromFuture) { > // If the block is an out-of-date generation stamp or state, > // but we're the standby, we shouldn't treat it as corrupt, > // but instead just queue it for later processing. > // Storing the reported block for later processing, as that is what > // comes from the IBR / FBR and hence what we should use to compare > // against the memory state. > // See HDFS-6289 and HDFS-15422 for more context. > queueReportedBlock(storageInfo, block, reportedState, > QUEUE_REASON_CORRUPT_STATE); > } else { > toCorrupt.add(c); > } > return storedBlock; > } {code} > # DN1 sends IBR for b1gs2 and b1gs3 to SNN1. > # SNN1 processes b1sg2 and updates the blocks map. > # SNN1 queues b1gs3 for later because it determines that b1gs3 is a future > genstamp. > # SNN1 loads b1gs3 edit logs and processes the queued reports for b1. > # SNN1 processes b1gs1 first and puts it back in the queue. > # SNN1 processes b1gs3 next and updates the blocks map. > # Later, SNN1 becomes the Active NameNode (ANN) during an HA Failover. > # SNN1 will catch to the latest edit logs, then process all queued block > reports to become the ANN. > # ANN1 will process b1gs1 and mark it as corrupt. > If the example above happens for every DN which stores b1, then when the HA > failover happens, b1 will be incorrectly marked as corrupt. This will be > fixed when the first DN sends a FullBlockReport or an IBR for b1. > h2. Logs from Active Cluster > I added the following logs to confirm this issue in an active cluster: > {code:java} > BlockToMarkCorrupt c = checkReplicaCorrupt( > block, reportedState, storedBlock, ucState, dn); > if (c != null) { > DatanodeStorageInfo storedStorageInfo = storedBlock.findStorageInfo(dn); > LOG.info("Found corrupt block {} [{}, {}] from DN {}. Stored block {} from > DN {}", > block, reportedState.name(), ucState.name(), storageInfo, storedBlock, > storedStorageInfo); > if (storageInfo.equals(storedStorageInfo) && > storedBlock.getGenerationStamp() > block.getGenerationStamp()) { > LOG.info("Stored Block {} from the same DN {} has a newer GenStamp." + > storedBlock, storedStorageInfo); > } > if (shouldPostponeBlocksFromFuture) { > // If the block is an out-of-date generation stamp or state, > // but we're the standby, we shouldn't treat it as corrupt, > // but instead just queue it for later processing. > // Storing the reported block for later processing, as that is what > // comes from the IBR / FBR and hence what we should use to compare > // against the memory state. > // See HDFS-6289 and HDFS-15422 for more context. > queueReportedBlock(storageInfo, block, reportedState, >
[jira] [Commented] (HDFS-17455) Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt
[ https://issues.apache.org/jira/browse/HDFS-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834427#comment-17834427 ] ASF GitHub Bot commented on HDFS-17455: --- hadoop-yetus commented on PR #6710: URL: https://github.com/apache/hadoop/pull/6710#issuecomment-2040538624 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 32s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 39s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 32m 28s | | trunk passed | | +1 :green_heart: | compile | 5m 31s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 5m 22s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 1m 26s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 27s | | trunk passed | | +1 :green_heart: | javadoc | 1m 57s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 2m 35s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | -1 :x: | spotbugs | 2m 59s | [/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6710/1/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html) | hadoop-hdfs-project/hadoop-hdfs-client in trunk has 1 extant spotbugs warnings. | | +1 :green_heart: | shadedclient | 37m 40s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 33s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 2s | | the patch passed | | +1 :green_heart: | compile | 5m 19s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 5m 19s | | the patch passed | | +1 :green_heart: | compile | 5m 13s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 5m 13s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 14s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6710/1/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 1 new + 33 unchanged - 0 fixed = 34 total (was 33) | | +1 :green_heart: | mvnsite | 2m 3s | | the patch passed | | +1 :green_heart: | javadoc | 1m 34s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 2m 7s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 5m 55s | | the patch passed | | +1 :green_heart: | shadedclient | 35m 55s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 28s | | hadoop-hdfs-client in the patch passed. | | +1 :green_heart: | unit | 226m 4s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 46s | | The patch does not generate ASF License warnings. | | | | 403m 33s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6710/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6710 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 2bcd94340db8 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 52f7ba8eccaf03420ed317659c19193ed895ddd4 | | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
[jira] [Commented] (HDFS-17449) Ill-formed decommission host name and port pair would trigger IndexOutOfBound error
[ https://issues.apache.org/jira/browse/HDFS-17449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834361#comment-17834361 ] ASF GitHub Bot commented on HDFS-17449: --- hadoop-yetus commented on PR #6691: URL: https://github.com/apache/hadoop/pull/6691#issuecomment-2040042875 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 46s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 50m 4s | | trunk passed | | +1 :green_heart: | compile | 1m 23s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 1m 17s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 1m 12s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 24s | | trunk passed | | +1 :green_heart: | javadoc | 1m 9s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 38s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 19s | | trunk passed | | +1 :green_heart: | shadedclient | 41m 8s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 10s | | the patch passed | | +1 :green_heart: | compile | 1m 16s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 1m 16s | | the patch passed | | +1 :green_heart: | compile | 1m 5s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 1m 5s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 1s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 14s | | the patch passed | | +1 :green_heart: | javadoc | 0m 55s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 34s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 16s | | the patch passed | | +1 :green_heart: | shadedclient | 41m 8s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 259m 43s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 46s | | The patch does not generate ASF License warnings. | | | | 417m 25s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6691/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6691 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux ca4eabab33a4 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / e841403d0a1bc44ad1fd0820f5c7949001d8511c | | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6691/2/testReport/ | | Max. process+thread count | 3046 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6691/2/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > Ill-formed decommission host name and
[jira] [Commented] (HDFS-17454) Fix namenode fsck swallows the exception stacktrace, this can help us to troubleshooting log.
[ https://issues.apache.org/jira/browse/HDFS-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834353#comment-17834353 ] ASF GitHub Bot commented on HDFS-17454: --- hadoop-yetus commented on PR #6709: URL: https://github.com/apache/hadoop/pull/6709#issuecomment-2040032072 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 51s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 51m 5s | | trunk passed | | +1 :green_heart: | compile | 1m 31s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 1m 16s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 1m 21s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 27s | | trunk passed | | +1 :green_heart: | javadoc | 1m 11s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 44s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 33s | | trunk passed | | +1 :green_heart: | shadedclient | 42m 46s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 20s | | the patch passed | | +1 :green_heart: | compile | 1m 22s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 1m 22s | | the patch passed | | +1 :green_heart: | compile | 1m 15s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 1m 15s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 9s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 19s | | the patch passed | | +1 :green_heart: | javadoc | 0m 55s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 37s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 3m 29s | | the patch passed | | +1 :green_heart: | shadedclient | 42m 17s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 268m 55s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 54s | | The patch does not generate ASF License warnings. | | | | 432m 20s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6709/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6709 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 5ea6cfbdebcf 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 41fa5cb805156f5df8e0c60106c982d713d5c040 | | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6709/1/testReport/ | | Max. process+thread count | 3016 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6709/1/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0
[jira] [Commented] (HDFS-17455) Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt
[ https://issues.apache.org/jira/browse/HDFS-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834304#comment-17834304 ] ASF GitHub Bot commented on HDFS-17455: --- haiyang1987 opened a new pull request, #6710: URL: https://github.com/apache/hadoop/pull/6710 ### Description of PR https://issues.apache.org/jira/browse/HDFS-17455 When the client read data, connect to the datanode, because at this time the datanode access token is invalid will throw InvalidBlockTokenException. At this time, when call fetchBlockAt method will throw java.lang.IndexOutOfBoundsException causing read data failed. **Root case:** - The HDFS file contains only one RBW block, with a block data size of 2048KB. - The client open this file and seeks to the offset of 1024KB to read data. - Call DFSInputStream#getBlockReader method connect to the datanode, because at this time the datanode access token is invalid will throw InvalidBlockTokenException., and call DFSInputStream#fetchBlockAt will throw java.lang.IndexOutOfBoundsException. ``` private synchronized DatanodeInfo blockSeekTo(long target) throws IOException { if (target >= getFileLength()) { // the target size is smaller than fileLength (completeBlockSize + lastBlockBeingWrittenLength), // here at this time target is 1024 and getFileLength is 2048 throw new IOException("Attempted to read past end of file"); } ... while (true) { ... try { blockReader = getBlockReader(targetBlock, offsetIntoBlock, targetBlock.getBlockSize() - offsetIntoBlock, targetAddr, storageType, chosenNode); if(connectFailedOnce) { DFSClient.LOG.info("Successfully connected to " + targetAddr + " for " + targetBlock.getBlock()); } return chosenNode; } catch (IOException ex) { ... } else if (refetchToken > 0 && tokenRefetchNeeded(ex, targetAddr)) { refetchToken--; // Here will catch InvalidBlockTokenException. fetchBlockAt(target); } else { ... } } } } private LocatedBlock fetchBlockAt(long offset, long length, boolean useCache) throws IOException { maybeRegisterBlockRefresh(); synchronized(infoLock) { // Here the locatedBlocks only contains one locatedBlock, at this time the offset is 1024 and fileLength is 0, // so the targetBlockIdx is -2 int targetBlockIdx = locatedBlocks.findBlock(offset); if (targetBlockIdx < 0) { // block is not cached targetBlockIdx = LocatedBlocks.getInsertIndex(targetBlockIdx); // Here the targetBlockIdx is 1; useCache = false; } if (!useCache) { // fetch blocks final LocatedBlocks newBlocks = (length == 0) ? dfsClient.getLocatedBlocks(src, offset) : dfsClient.getLocatedBlocks(src, offset, length); if (newBlocks == null || newBlocks.locatedBlockCount() == 0) { throw new EOFException("Could not find target position " + offset); } // Update the LastLocatedBlock, if offset is for last block. if (offset >= locatedBlocks.getFileLength()) { setLocatedBlocksFields(newBlocks, getLastBlockLength(newBlocks)); } else { locatedBlocks.insertRange(targetBlockIdx, newBlocks.getLocatedBlocks()); } } // Here the locatedBlocks only contains one locatedBlock, so will throw java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1 return locatedBlocks.get(targetBlockIdx); } } ``` The client exception: ``` java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1 at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266) at java.base/java.util.Objects.checkIndex(Objects.java:359) at java.base/java.util.ArrayList.get(ArrayList.java:427) at org.apache.hadoop.hdfs.protocol.LocatedBlocks.get(LocatedBlocks.java:87) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockAt(DFSInputStream.java:569) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockAt(DFSInputStream.java:540) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:704) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:884) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:957) at
[jira] [Updated] (HDFS-17455) Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt
[ https://issues.apache.org/jira/browse/HDFS-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-17455: -- Labels: pull-request-available (was: ) > Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt > - > > Key: HDFS-17455 > URL: https://issues.apache.org/jira/browse/HDFS-17455 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When the client read data, connect to the datanode, because at this time the > datanode access token is invalid will throw InvalidBlockTokenException. At > this time, when call fetchBlockAt method will throw > java.lang.IndexOutOfBoundsException causing read data failed. > *Root case:* > * The HDFS file contains only one RBW block, with a block data size of 2048KB. > * The client open this file and seeks to the offset of 1024KB to read data. > * Call DFSInputStream#getBlockReader method connect to the datanode, because > at this time the datanode access token is invalid will throw > InvalidBlockTokenException., and call DFSInputStream#fetchBlockAt will throw > java.lang.IndexOutOfBoundsException. > {code:java} > private synchronized DatanodeInfo blockSeekTo(long target) > throws IOException { >if (target >= getFileLength()) { >// the target size is smaller than fileLength (completeBlockSize + > lastBlockBeingWrittenLength), >// here at this time target is 1024 and getFileLength is 2048 > throw new IOException("Attempted to read past end of file"); >} >... >while (true) { > ... > try { >blockReader = getBlockReader(targetBlock, offsetIntoBlock, >targetBlock.getBlockSize() - offsetIntoBlock, targetAddr, >storageType, chosenNode); >if(connectFailedOnce) { > DFSClient.LOG.info("Successfully connected to " + targetAddr + > " for " + targetBlock.getBlock()); >} >return chosenNode; > } catch (IOException ex) { >... >} else if (refetchToken > 0 && tokenRefetchNeeded(ex, targetAddr)) { > refetchToken--; > // Here will catch InvalidBlockTokenException. > fetchBlockAt(target); >} else { > ... >} > } >} > } > private LocatedBlock fetchBlockAt(long offset, long length, boolean useCache) > throws IOException { > maybeRegisterBlockRefresh(); > synchronized(infoLock) { > // Here the locatedBlocks only contains one locatedBlock, at this time > the offset is 1024 and fileLength is 0, > // so the targetBlockIdx is -2 > int targetBlockIdx = locatedBlocks.findBlock(offset); > if (targetBlockIdx < 0) { // block is not cached > targetBlockIdx = LocatedBlocks.getInsertIndex(targetBlockIdx); > // Here the targetBlockIdx is 1; > useCache = false; > } > if (!useCache) { // fetch blocks > final LocatedBlocks newBlocks = (length == 0) > ? dfsClient.getLocatedBlocks(src, offset) > : dfsClient.getLocatedBlocks(src, offset, length); > if (newBlocks == null || newBlocks.locatedBlockCount() == 0) { > throw new EOFException("Could not find target position " + offset); > } > // Update the LastLocatedBlock, if offset is for last block. > if (offset >= locatedBlocks.getFileLength()) { > setLocatedBlocksFields(newBlocks, getLastBlockLength(newBlocks)); > } else { > locatedBlocks.insertRange(targetBlockIdx, > newBlocks.getLocatedBlocks()); > } > } > // Here the locatedBlocks only contains one locatedBlock, so will throw > java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1 > return locatedBlocks.get(targetBlockIdx); > } > } > {code} > The client exception: > {code:java} > java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1 > at > java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) > at > java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) > at > java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266) > at java.base/java.util.Objects.checkIndex(Objects.java:359) > at java.base/java.util.ArrayList.get(ArrayList.java:427) > at > org.apache.hadoop.hdfs.protocol.LocatedBlocks.get(LocatedBlocks.java:87) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchBlockAt(DFSInputStream.java:569) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchBlockAt(DFSInputStream.java:540) > at >
[jira] [Updated] (HDFS-17455) Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt
[ https://issues.apache.org/jira/browse/HDFS-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haiyang Hu updated HDFS-17455: -- Description: When the client read data, connect to the datanode, because at this time the datanode access token is invalid will throw InvalidBlockTokenException. At this time, when call fetchBlockAt method will throw java.lang.IndexOutOfBoundsException causing read data failed. *Root case: * The HDFS file contains only one RBW block, with a block data size of 2048KB. * The client open this file and seeks to the offset of 1024KB to read data. * Call DFSInputStream#getBlockReader method connect to the datanode, because at this time the datanode access token is invalid will throw InvalidBlockTokenException., and call DFSInputStream#fetchBlockAt will throw java.lang.IndexOutOfBoundsException. {code:java} private synchronized DatanodeInfo blockSeekTo(long target) throws IOException { if (target >= getFileLength()) { // the target size is smaller than fileLength (completeBlockSize + lastBlockBeingWrittenLength), // here at this time target is 1024 and getFileLength is 2048 throw new IOException("Attempted to read past end of file"); } ... while (true) { ... try { blockReader = getBlockReader(targetBlock, offsetIntoBlock, targetBlock.getBlockSize() - offsetIntoBlock, targetAddr, storageType, chosenNode); if(connectFailedOnce) { DFSClient.LOG.info("Successfully connected to " + targetAddr + " for " + targetBlock.getBlock()); } return chosenNode; } catch (IOException ex) { ... } else if (refetchToken > 0 && tokenRefetchNeeded(ex, targetAddr)) { refetchToken--; // Here will catch InvalidBlockTokenException. fetchBlockAt(target); } else { ... } } } } private LocatedBlock fetchBlockAt(long offset, long length, boolean useCache) throws IOException { maybeRegisterBlockRefresh(); synchronized(infoLock) { // Here the locatedBlocks only contains one locatedBlock, at this time the offset is 1024 and fileLength is 0, // so the targetBlockIdx is -2 int targetBlockIdx = locatedBlocks.findBlock(offset); if (targetBlockIdx < 0) { // block is not cached targetBlockIdx = LocatedBlocks.getInsertIndex(targetBlockIdx); // Here the targetBlockIdx is 1; useCache = false; } if (!useCache) { // fetch blocks final LocatedBlocks newBlocks = (length == 0) ? dfsClient.getLocatedBlocks(src, offset) : dfsClient.getLocatedBlocks(src, offset, length); if (newBlocks == null || newBlocks.locatedBlockCount() == 0) { throw new EOFException("Could not find target position " + offset); } // Update the LastLocatedBlock, if offset is for last block. if (offset >= locatedBlocks.getFileLength()) { setLocatedBlocksFields(newBlocks, getLastBlockLength(newBlocks)); } else { locatedBlocks.insertRange(targetBlockIdx, newBlocks.getLocatedBlocks()); } } // Here the locatedBlocks only contains one locatedBlock, so will throw java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1 return locatedBlocks.get(targetBlockIdx); } } {code} The client exception: {code:java} java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1 at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266) at java.base/java.util.Objects.checkIndex(Objects.java:359) at java.base/java.util.ArrayList.get(ArrayList.java:427) at org.apache.hadoop.hdfs.protocol.LocatedBlocks.get(LocatedBlocks.java:87) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockAt(DFSInputStream.java:569) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockAt(DFSInputStream.java:540) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:704) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:884) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:957) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:804) {code} The datanode exception: {code:java} 2024-03-27 15:56:35,477 WARN datanode.DataNode (DataXceiver.java:checkAccess(1487)) [DataXceiver for client DFSClient_NONMAPREDUCE_475786505_1 at /xxx [Sending block BP-xxx:blk_1138933918_65194340]] - Block token verification failed: op=READ_BLOCK, remoteAddress=/XXX, message=Can't re-compute password for block_token_identifier
[jira] [Updated] (HDFS-17455) Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt
[ https://issues.apache.org/jira/browse/HDFS-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haiyang Hu updated HDFS-17455: -- Description: When the client read data, connect to the datanode, because at this time the datanode access token is invalid will throw InvalidBlockTokenException. At this time, when call fetchBlockAt method will throw java.lang.IndexOutOfBoundsException causing read data failed. *Root case:* * The HDFS file contains only one RBW block, with a block data size of 2048KB. * The client open this file and seeks to the offset of 1024KB to read data. * Call DFSInputStream#getBlockReader method connect to the datanode, because at this time the datanode access token is invalid will throw InvalidBlockTokenException., and call DFSInputStream#fetchBlockAt will throw java.lang.IndexOutOfBoundsException. {code:java} private synchronized DatanodeInfo blockSeekTo(long target) throws IOException { if (target >= getFileLength()) { // the target size is smaller than fileLength (completeBlockSize + lastBlockBeingWrittenLength), // here at this time target is 1024 and getFileLength is 2048 throw new IOException("Attempted to read past end of file"); } ... while (true) { ... try { blockReader = getBlockReader(targetBlock, offsetIntoBlock, targetBlock.getBlockSize() - offsetIntoBlock, targetAddr, storageType, chosenNode); if(connectFailedOnce) { DFSClient.LOG.info("Successfully connected to " + targetAddr + " for " + targetBlock.getBlock()); } return chosenNode; } catch (IOException ex) { ... } else if (refetchToken > 0 && tokenRefetchNeeded(ex, targetAddr)) { refetchToken--; // Here will catch InvalidBlockTokenException. fetchBlockAt(target); } else { ... } } } } private LocatedBlock fetchBlockAt(long offset, long length, boolean useCache) throws IOException { maybeRegisterBlockRefresh(); synchronized(infoLock) { // Here the locatedBlocks only contains one locatedBlock, at this time the offset is 1024 and fileLength is 0, // so the targetBlockIdx is -2 int targetBlockIdx = locatedBlocks.findBlock(offset); if (targetBlockIdx < 0) { // block is not cached targetBlockIdx = LocatedBlocks.getInsertIndex(targetBlockIdx); // Here the targetBlockIdx is 1; useCache = false; } if (!useCache) { // fetch blocks final LocatedBlocks newBlocks = (length == 0) ? dfsClient.getLocatedBlocks(src, offset) : dfsClient.getLocatedBlocks(src, offset, length); if (newBlocks == null || newBlocks.locatedBlockCount() == 0) { throw new EOFException("Could not find target position " + offset); } // Update the LastLocatedBlock, if offset is for last block. if (offset >= locatedBlocks.getFileLength()) { setLocatedBlocksFields(newBlocks, getLastBlockLength(newBlocks)); } else { locatedBlocks.insertRange(targetBlockIdx, newBlocks.getLocatedBlocks()); } } // Here the locatedBlocks only contains one locatedBlock, so will throw java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1 return locatedBlocks.get(targetBlockIdx); } } {code} The client exception: {code:java} java.lang.IndexOutOfBoundsException: Index 1 out of bounds for length 1 at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64) at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70) at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266) at java.base/java.util.Objects.checkIndex(Objects.java:359) at java.base/java.util.ArrayList.get(ArrayList.java:427) at org.apache.hadoop.hdfs.protocol.LocatedBlocks.get(LocatedBlocks.java:87) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockAt(DFSInputStream.java:569) at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockAt(DFSInputStream.java:540) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:704) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:884) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:957) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:804) {code} The datanode exception: {code:java} 2024-03-27 15:56:35,477 WARN datanode.DataNode (DataXceiver.java:checkAccess(1487)) [DataXceiver for client DFSClient_NONMAPREDUCE_475786505_1 at /xxx [Sending block BP-xxx:blk_1138933918_65194340]] - Block token verification failed: op=READ_BLOCK, remoteAddress=/XXX, message=Can't re-compute password for block_token_identifier
[jira] [Created] (HDFS-17455) Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt
Haiyang Hu created HDFS-17455: - Summary: Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt Key: HDFS-17455 URL: https://issues.apache.org/jira/browse/HDFS-17455 Project: Hadoop HDFS Issue Type: Bug Reporter: Haiyang Hu Assignee: Haiyang Hu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17449) Ill-formed decommission host name and port pair would trigger IndexOutOfBound error
[ https://issues.apache.org/jira/browse/HDFS-17449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834189#comment-17834189 ] ASF GitHub Bot commented on HDFS-17449: --- teamconfx commented on code in PR #6691: URL: https://github.com/apache/hadoop/pull/6691#discussion_r1553156626 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/util/HostsFileWriter.java: ## @@ -106,9 +106,14 @@ public void initOutOfServiceHosts(List decommissionHostNameAndPorts, for (String hostNameAndPort : decommissionHostNameAndPorts) { DatanodeAdminProperties dn = new DatanodeAdminProperties(); String[] hostAndPort = hostNameAndPort.split(":"); - dn.setHostName(hostAndPort[0]); - dn.setPort(Integer.parseInt(hostAndPort[1])); - dn.setAdminState(AdminStates.DECOMMISSIONED); + try { +dn.setHostName(hostAndPort[0]); +dn.setPort(Integer.parseInt(hostAndPort[1])); +dn.setAdminState(AdminStates.DECOMMISSIONED); + } catch (Exception e) { +throw new IllegalArgumentException("The decommision host name and port format is " ++ "invalid. The format should be in :, not " + hostNameAndPort, e); + } Review Comment: I've made the change accordingly. Thanks for the advice! > Ill-formed decommission host name and port pair would trigger IndexOutOfBound > error > --- > > Key: HDFS-17449 > URL: https://issues.apache.org/jira/browse/HDFS-17449 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ConfX >Priority: Major > Labels: pull-request-available > > h2. What happened: > Got IndexOutOfBound when trying to run > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatusAfterDNRestart > with namenode host provider set to > org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager. > h2. Buggy code: > In HostsFileWriter.java: > {code:java} > String[] hostAndPort = hostNameAndPort.split(":"); // hostNameAndPort might > be invalid > dn.setHostName(hostAndPort[0]); > dn.setPort(Integer.parseInt(hostAndPort[1])); // here IndexOutOfBound might > be thrown > dn.setAdminState(AdminStates.DECOMMISSIONED);{code} > h2. StackTrace: > {code:java} > java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 > at > org.apache.hadoop.hdfs.util.HostsFileWriter.initOutOfServiceHosts(HostsFileWriter.java:110){code} > h2. How to reproduce: > (1) Set {{dfs.namenode.hosts.provider.classname}} to > {{org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager}} > (2) Run test: > {{org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatusAfterDNRestart}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17454) Fix namenode fsck swallows the exception stacktrace, this can help us to troubleshooting log.
[ https://issues.apache.org/jira/browse/HDFS-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834180#comment-17834180 ] ASF GitHub Bot commented on HDFS-17454: --- xiaojunxiang2023 opened a new pull request, #6709: URL: https://github.com/apache/hadoop/pull/6709 When I used `hdfs fsck /xxx.txt -move`, missing error, but I can't kown the reason, because the exception stacktrace doesn't append to LOG, original code: ![image](https://github.com/apache/hadoop/assets/65019264/3fb94da0-5a9e-4363-a941-67772b9420c1) When I fix it, look, we can see the exception stacktrace: ![image](https://github.com/apache/hadoop/assets/65019264/1a6cfad7-b78c-456e-a8f4-df41a215bf20) > Fix namenode fsck swallows the exception stacktrace, this can help us to > troubleshooting log. > - > > Key: HDFS-17454 > URL: https://issues.apache.org/jira/browse/HDFS-17454 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.6 >Reporter: xiaojunxiang >Priority: Minor > Attachments: image-2024-04-05-15-40-37-147.png, > image-2024-04-05-15-41-38-420.png > > > When I used `hdfs fsck /xxx.txt -move`, missing error, but I can't kown the > reason, because the exception stacktrace doesn't append to LOG, original code: > !image-2024-04-05-15-40-37-147.png! > > When I fix it, look, we can see the exception stacktrace: > !image-2024-04-05-15-41-38-420.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17454) Fix namenode fsck swallows the exception stacktrace, this can help us to troubleshooting log.
[ https://issues.apache.org/jira/browse/HDFS-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-17454: -- Labels: pull-request-available (was: ) > Fix namenode fsck swallows the exception stacktrace, this can help us to > troubleshooting log. > - > > Key: HDFS-17454 > URL: https://issues.apache.org/jira/browse/HDFS-17454 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.6 >Reporter: xiaojunxiang >Priority: Minor > Labels: pull-request-available > Attachments: image-2024-04-05-15-40-37-147.png, > image-2024-04-05-15-41-38-420.png > > > When I used `hdfs fsck /xxx.txt -move`, missing error, but I can't kown the > reason, because the exception stacktrace doesn't append to LOG, original code: > !image-2024-04-05-15-40-37-147.png! > > When I fix it, look, we can see the exception stacktrace: > !image-2024-04-05-15-41-38-420.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17454) Fix namenode fsck swallows the exception stacktrace, this can help us to troubleshooting log.
[ https://issues.apache.org/jira/browse/HDFS-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaojunxiang updated HDFS-17454: Description: When I used `hdfs fsck /xxx.txt -move`, missing error, but I can't kown the reason, because the exception stacktrace doesn't append to LOG, original code: !image-2024-04-05-15-40-37-147.png! When I fix it, look, we can see the exception stacktrace: !image-2024-04-05-15-41-38-420.png! was: When I used `hdfs fsck /xxx.txt -move`, missing error, but I can;t kown the reason, because the exception stacktrace doesn't append to LOG, original code: !image-2024-04-05-15-40-37-147.png! When I fix it, look, we can see the exception stacktrace: !image-2024-04-05-15-41-38-420.png! > Fix namenode fsck swallows the exception stacktrace, this can help us to > troubleshooting log. > - > > Key: HDFS-17454 > URL: https://issues.apache.org/jira/browse/HDFS-17454 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.6 >Reporter: xiaojunxiang >Priority: Minor > Attachments: image-2024-04-05-15-40-37-147.png, > image-2024-04-05-15-41-38-420.png > > > When I used `hdfs fsck /xxx.txt -move`, missing error, but I can't kown the > reason, because the exception stacktrace doesn't append to LOG, original code: > !image-2024-04-05-15-40-37-147.png! > > When I fix it, look, we can see the exception stacktrace: > !image-2024-04-05-15-41-38-420.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17454) Fix namenode fsck swallows the exception stacktrace, this can help us to troubleshooting log.
[ https://issues.apache.org/jira/browse/HDFS-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiaojunxiang updated HDFS-17454: Affects Version/s: 3.3.6 > Fix namenode fsck swallows the exception stacktrace, this can help us to > troubleshooting log. > - > > Key: HDFS-17454 > URL: https://issues.apache.org/jira/browse/HDFS-17454 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.6 >Reporter: xiaojunxiang >Priority: Minor > Attachments: image-2024-04-05-15-40-37-147.png, > image-2024-04-05-15-41-38-420.png > > > When I used `hdfs fsck /xxx.txt -move`, missing error, but I can;t kown the > reason, because the exception stacktrace doesn't append to LOG, original code: > !image-2024-04-05-15-40-37-147.png! > > When I fix it, look, we can see the exception stacktrace: > !image-2024-04-05-15-41-38-420.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17454) Fix namenode fsck swallows the exception stacktrace, this can help us to troubleshooting log.
xiaojunxiang created HDFS-17454: --- Summary: Fix namenode fsck swallows the exception stacktrace, this can help us to troubleshooting log. Key: HDFS-17454 URL: https://issues.apache.org/jira/browse/HDFS-17454 Project: Hadoop HDFS Issue Type: Improvement Reporter: xiaojunxiang Attachments: image-2024-04-05-15-40-37-147.png, image-2024-04-05-15-41-38-420.png When I used `hdfs fsck /xxx.txt -move`, missing error, but I can;t kown the reason, because the exception stacktrace doesn't append to LOG, original code: !image-2024-04-05-15-40-37-147.png! When I fix it, look, we can see the exception stacktrace: !image-2024-04-05-15-41-38-420.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17397) Choose another DN as soon as possible, when encountering network issues
[ https://issues.apache.org/jira/browse/HDFS-17397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834168#comment-17834168 ] ASF GitHub Bot commented on HDFS-17397: --- xleoken commented on PR #6591: URL: https://github.com/apache/hadoop/pull/6591#issuecomment-2039122938 cc @Hexiaoqiao @ZanderXu > Choose another DN as soon as possible, when encountering network issues > --- > > Key: HDFS-17397 > URL: https://issues.apache.org/jira/browse/HDFS-17397 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: xleoken >Priority: Minor > Labels: pull-request-available > Attachments: hadoop.png > > > Choose another DN as soon as possible, when encountering network issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org