[ https://issues.apache.org/jira/browse/HDFS-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840102#comment-17840102 ]
ASF GitHub Bot commented on HDFS-17488: --------------------------------------- hadoop-yetus commented on PR #6759: URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2072325438 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |:----:|----------:|--------:|:--------:|:-------:| | +0 :ok: | reexec | 0m 20s | | Docker mode activated. | |||| _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | |||| _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 39s | | trunk passed | | +1 :green_heart: | compile | 0m 43s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 0m 40s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | checkstyle | 0m 36s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 45s | | trunk passed | | +1 :green_heart: | javadoc | 0m 40s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 5s | | trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 1m 44s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 49s | | branch has no errors when building and testing our client artifacts. | |||| _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 33s | | the patch passed | | +1 :green_heart: | compile | 0m 37s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 0m 37s | | the patch passed | | +1 :green_heart: | compile | 0m 33s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | javac | 0m 33s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 30s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 36s | | the patch passed | | +1 :green_heart: | javadoc | 0m 29s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 0m 59s | | the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | +1 :green_heart: | spotbugs | 1m 44s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 4s | | patch has no errors when building and testing our client artifacts. | |||| _ Other Tests _ | | -1 :x: | unit | 204m 49s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 28s | | The patch does not generate ASF License warnings. | | | | 293m 24s | | | | Reason | Tests | |-------:|:------| | Failed junit tests | hadoop.hdfs.server.datanode.TestBPOfferService | | Subsystem | Report/Notes | |----------:|:-------------| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6759 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 258959f150db 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 07f35e4bb9fa7af0e940cb91c970bd4eaf5c5ae0 | | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/3/testReport/ | | Max. process+thread count | 4188 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/3/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > DN can fail IBRs with NPE when a volume is removed > -------------------------------------------------- > > Key: HDFS-17488 > URL: https://issues.apache.org/jira/browse/HDFS-17488 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Reporter: Felix N > Assignee: Felix N > Priority: Major > Labels: pull-request-available > > > Error logs > {code:java} > 2024-04-22 15:46:33,422 [BP-1842952724-10.22.68.249-1713771988830 > heartbeating to localhost/127.0.0.1:64977] ERROR datanode.DataNode > (BPServiceActor.java:run(922)) - Exception in BPOfferService for Block pool > BP-1842952724-10.22.68.249-1713771988830 (Datanode Uuid > 1659ffaf-1a80-4a8e-a542-643f6bd97ed4) service to localhost/127.0.0.1:64977 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolClientSideTranslatorPB.java:246) > at > org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager.sendIBRs(IncrementalBlockReportManager.java:218) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:749) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:920) > at java.lang.Thread.run(Thread.java:748) {code} > The root cause is in BPOfferService#notifyNamenodeBlock, happens when it's > called on a block belonging to a volume already removed prior. Because the > volume was already removed > > {code:java} > private void notifyNamenodeBlock(ExtendedBlock block, BlockStatus status, > String delHint, String storageUuid, boolean isOnTransientStorage) { > checkBlock(block); > final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo( > block.getLocalBlock(), status, delHint); > final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid); > > // storage == null here because it's already removed earlier. > for (BPServiceActor actor : bpServices) { > actor.getIbrManager().notifyNamenodeBlock(info, storage, > isOnTransientStorage); > } > } {code} > so IBRs with a null storage are now pending. > The reason why notifyNamenodeBlock can trigger on such blocks is up in > DirectoryScanner#reconcile > {code:java} > public void reconcile() throws IOException { > LOG.debug("reconcile start DirectoryScanning"); > scan(); > // If a volume is removed here after scan() already finished running, > // diffs is stale and checkAndUpdate will run on a removed volume > // HDFS-14476: run checkAndUpdate with batch to avoid holding the lock too > // long > int loopCount = 0; > synchronized (diffs) { > for (final Map.Entry<String, ScanInfo> entry : diffs.getEntries()) { > dataset.checkAndUpdate(entry.getKey(), entry.getValue()); > ... > } {code} > Inside checkAndUpdate, memBlockInfo is null because all the block meta in > memory is removed during the volume removal, but diskFile still exists. Then > DataNode#notifyNamenodeDeletedBlock (and further down the line, > notifyNamenodeBlock) is called on this block. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org