Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-05-11 Thread via GitHub


hadoop-yetus commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2105622801

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | -1 :x: |  patch  |   0m 55s |  |  
https://github.com/apache/hadoop/pull/6759 does not apply to trunk. Rebase 
required? Wrong Branch? See 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.  
|
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6759 |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/7/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-05-11 Thread via GitHub


ZanderXu commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2105617264

   Merged. Thanks @kokonguyen191 for your contribution and thanks @Hexiaoqiao 
@haiyang1987 @hfutatzhanghb for your review. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-05-11 Thread via GitHub


ZanderXu merged PR #6759:
URL: https://github.com/apache/hadoop/pull/6759


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-05-08 Thread via GitHub


hadoop-yetus commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2101828246

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 02s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m 02s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 02s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m 02s |  |  markdownlint was not available.  
|
   | +0 :ok: |  spotbugs  |   0m 00s |  |  spotbugs executables are not 
available.  |
   | +1 :green_heart: |  @author  |   0m 01s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 14s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  86m 40s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  38m 18s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   5m 51s |  |  trunk passed  |
   | -1 :x: |  mvnsite  |   4m 27s | 
[/branch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/6/artifact/out/branch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in trunk failed.  |
   | +1 :green_heart: |  javadoc  |  10m 13s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 157m 54s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 16s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   9m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  35m 37s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |  35m 37s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 01s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   5m 50s |  |  the patch passed  |
   | -1 :x: |  mvnsite  |   4m 30s | 
[/patch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/6/artifact/out/patch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch failed.  |
   | +1 :green_heart: |  javadoc  |  10m 18s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 167m 52s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   5m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 512m 54s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6759 |
   | Optional Tests | dupname asflicense mvnsite codespell detsecrets 
markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs 
checkstyle |
   | uname | MINGW64_NT-10.0-17763 f9e61a0ebff0 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / b42a03e1d18816a1bf2f65e82508800c8f885785 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/6/testReport/
 |
   | modules | C: hadoop-common-project/hadoop-common 
hadoop-hdfs-project/hadoop-hdfs U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/6/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-05-06 Thread via GitHub


haiyang1987 commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2097227087

   LGTM.
   
   Hi @Hexiaoqiao @hfutatzhanghb any other comments?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-05-05 Thread via GitHub


kokonguyen191 commented on code in PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#discussion_r1590504507


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java:
##
@@ -185,6 +185,8 @@ public class DataNodeMetrics {
   private MutableCounterLong numProcessedCommands;
   @Metric("Rate of processed commands of all BPServiceActors")
   private MutableRate processedCommandsOp;
+  @Metric("Number of blocks in IBRs that failed due to null storage")
+  private MutableCounterLong nullStorageBlockReports;

Review Comment:
   Added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-05-05 Thread via GitHub


haiyang1987 commented on code in PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#discussion_r1590485852


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java:
##
@@ -185,6 +185,8 @@ public class DataNodeMetrics {
   private MutableCounterLong numProcessedCommands;
   @Metric("Rate of processed commands of all BPServiceActors")
   private MutableRate processedCommandsOp;
+  @Metric("Number of blocks in IBRs that failed due to null storage")
+  private MutableCounterLong nullStorageBlockReports;

Review Comment:
   can add metrics to Metrics.md ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-28 Thread via GitHub


kokonguyen191 commented on code in PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#discussion_r1582488078


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java:
##
@@ -324,6 +325,10 @@ private void notifyNamenodeBlock(ExtendedBlock block, 
BlockStatus status,
 final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
 block.getLocalBlock(), status, delHint);
 final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
+if (storage == null) {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-28 Thread via GitHub


hfutatzhanghb commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2081770902

   > @kokonguyen191 Thanks for your report and contributions. Sorry didn't get 
this issue completely. As description, you mentioned that `storage == null` as 
following. I wonder why NPE not throw at `getPerStorageIBR(storage).put(rdbi);` 
first which at 
org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager#addRDBI.
   > 
   > ```
   > private void notifyNamenodeBlock(ExtendedBlock block, BlockStatus status,
   > String delHint, String storageUuid, boolean isOnTransientStorage) {
   >   checkBlock(block);
   >   final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
   >   block.getLocalBlock(), status, delHint);
   >   final DatanodeStorage storage = 
dn.getFSDataset().getStorage(storageUuid);
   >   
   >   // storage == null here because it's already removed earlier.
   > 
   >   for (BPServiceActor actor : bpServices) {
   > actor.getIbrManager().notifyNamenodeBlock(info, storage,
   > isOnTransientStorage);
   >   }
   > } 
   > ```
   > 
   > Another side, you mentioned that this issue is triggered by volume 
removed, so how the logic between volume removed to NPE. Thanks again.
   
   @Hexiaoqiao Hi, sir. please take a look at the description of 
https://github.com/apache/hadoop/pull/6730 . I found the same problem and 
explained in that PR's description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-28 Thread via GitHub


hadoop-yetus commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2081754060

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 02s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 01s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  | 111m 47s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   7m 33s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   6m 01s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   8m 01s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   7m 10s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 179m 43s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   5m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 16s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   4m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 01s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   2m 48s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   5m 19s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   4m 32s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 198m 13s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   6m 53s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 523m 16s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6759 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 96a75b329453 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 1f166985eedcfea6aedcc62046908f86ed9827fa |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/5/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/5/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-28 Thread via GitHub


kokonguyen191 commented on code in PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#discussion_r1582197257


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java:
##
@@ -324,6 +325,10 @@ private void notifyNamenodeBlock(ExtendedBlock block, 
BlockStatus status,
 final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
 block.getLocalBlock(), status, delHint);
 final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
+if (storage == null) {

Review Comment:
   Good idea. Better to stop it here rather than let the code propagate further 
down. I'll change it later once I charge my laptop.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-28 Thread via GitHub


kokonguyen191 commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2081512148

   Hi @Hexiaoqiao, thanks for the review. For your first question, 
`getPerStorageIBR(storage)` doesn't do a null check on `storage` and assigns a 
new `PerStorageIBG` object to `null`. So it pretty much just considers `null` 
to be a new storage. For your second question, the NPE in IBRs is caused by a 
block report with `null` storage. The condition for this to happen is for a 
storage to be removed after `DirectoryScanner` has just finished a scan but 
hasn't started processing the blocks yet, so the scan result is now stale and 
contains removed storage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-28 Thread via GitHub


Hexiaoqiao commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2081497703

   @kokonguyen191 Thanks for your report and contributions. Sorry didn't get 
this issue completely.
   As description, you mentioned that `storage == null` as following. I wonder 
why NPE not throw at `getPerStorageIBR(storage).put(rdbi);` first which at 
org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager#addRDBI.
   
   ```
   private void notifyNamenodeBlock(ExtendedBlock block, BlockStatus status,
   String delHint, String storageUuid, boolean isOnTransientStorage) {
 checkBlock(block);
 final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
 block.getLocalBlock(), status, delHint);
 final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
 
 // storage == null here because it's already removed earlier.
   
 for (BPServiceActor actor : bpServices) {
   actor.getIbrManager().notifyNamenodeBlock(info, storage,
   isOnTransientStorage);
 }
   } 
   ```
   
   Another side, you mentioned that this issue is triggered by volume removed, 
so how the logic between volume removed to NPE. Thanks again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-28 Thread via GitHub


haiyang1987 commented on code in PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#discussion_r1582071262


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java:
##
@@ -324,6 +325,10 @@ private void notifyNamenodeBlock(ExtendedBlock block, 
BlockStatus status,
 final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
 block.getLocalBlock(), status, delHint);
 final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
+if (storage == null) {

Review Comment:
   Thanks @kokonguyen191 for your work.
   
   there is a small question. when the execution reaches this, can we directly 
record `dnMetrics.incrNullStorageBlockReports()` and return? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-24 Thread via GitHub


hadoop-yetus commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2074474125

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 30s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  44m 26s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 50s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 21s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  36m  1s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 10s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  9s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   1m  9s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 17s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  36m  7s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 232m 56s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 46s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 374m 52s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestLargeBlockReport |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6759 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 1f3c0fcbc898 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 1f166985eedcfea6aedcc62046908f86ed9827fa |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/4/testReport/ |
   | Max. process+thread count | 4094 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/4/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically 

Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-23 Thread via GitHub


hadoop-yetus commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2072325438

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 39s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   0m 36s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 45s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  5s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 44s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 49s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   1m 44s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m  4s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 204m 49s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 293m 24s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestBPOfferService |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6759 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 258959f150db 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 07f35e4bb9fa7af0e940cb91c970bd4eaf5c5ae0 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/3/testReport/ |
   | Max. process+thread count | 4188 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/3/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.

Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-23 Thread via GitHub


hadoop-yetus commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2072209318

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  12m 13s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  46m 32s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  36m 54s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 59s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 24s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  37m 30s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 227m 24s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 385m 19s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestBPOfferService |
   |   | hadoop.hdfs.server.datanode.TestLargeBlockReport |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6759 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 770ac906a345 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4b773c6a8b2014e2e21151e2d9163e871994912a |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/2/testReport/ |
   | Max. process+thread count | 3559 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/2/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 

Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-23 Thread via GitHub


kokonguyen191 commented on code in PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#discussion_r1575868360


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/IncrementalBlockReportManager.java:
##
@@ -304,4 +310,4 @@ void clearIBRs() {
   int getPendingIBRSize() {
 return pendingIBRs.size();
   }
-}
\ No newline at end of file
+}

Review Comment:
   Done. Actually took me quite a while to find the setting to disable it in IDE



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java:
##
@@ -2745,14 +2745,18 @@ public void checkAndUpdate(String bpid, ScanInfo 
scanInfo)
   curDirScannerNotifyCount = 0;
   lastDirScannerNotifyTime = startTimeMs;
 }
-try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.VOLUME, bpid,
-vol.getStorageID())) {
+String storageUuid = vol.getStorageID();
+try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.VOLUME, 
bpid, storageUuid)) {
   memBlockInfo = volumeMap.get(bpid, blockId);
   if (memBlockInfo != null &&
   memBlockInfo.getState() != ReplicaState.FINALIZED) {
 // Block is not finalized - ignore the difference
 return;
   }
+  if (!storageMap.containsKey(storageUuid)) {

Review Comment:
   Moved the block up 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-23 Thread via GitHub


kokonguyen191 commented on code in PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#discussion_r1575867553


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/IncrementalBlockReportManager.java:
##
@@ -169,11 +169,17 @@ private synchronized StorageReceivedDeletedBlocks[] 
generateIBRs() {
 : pendingIBRs.entrySet()) {
   final PerStorageIBR perStorage = entry.getValue();
 
-// Send newly-received and deleted blockids to namenode
+  // Send newly-received and deleted blockids to namenode
   final ReceivedDeletedBlockInfo[] rdbi = perStorage.removeAll();
-  if (rdbi != null) {
-reports.add(new StorageReceivedDeletedBlocks(entry.getKey(), rdbi));
+  if (rdbi == null) {

Review Comment:
   I intentionally put it like this to reduce nesting and make the logic a bit 
easier to read



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-23 Thread via GitHub


ZanderXu commented on code in PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#discussion_r1575829376


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/IncrementalBlockReportManager.java:
##
@@ -169,11 +169,17 @@ private synchronized StorageReceivedDeletedBlocks[] 
generateIBRs() {
 : pendingIBRs.entrySet()) {
   final PerStorageIBR perStorage = entry.getValue();
 
-// Send newly-received and deleted blockids to namenode
+  // Send newly-received and deleted blockids to namenode
   final ReceivedDeletedBlockInfo[] rdbi = perStorage.removeAll();
-  if (rdbi != null) {
-reports.add(new StorageReceivedDeletedBlocks(entry.getKey(), rdbi));
+  if (rdbi == null) {

Review Comment:
   ```
   if (rdbi != null) {
 // Null storage, should not happen
 if (entry.getKey() == null) {
   dnMetrics.incrNullStorageBlockReports();
   continue;
 }
 reports.add(new StorageReceivedDeletedBlocks(entry.getKey(), rdbi));
   }
   ```



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java:
##
@@ -2745,14 +2745,18 @@ public void checkAndUpdate(String bpid, ScanInfo 
scanInfo)
   curDirScannerNotifyCount = 0;
   lastDirScannerNotifyTime = startTimeMs;
 }
-try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.VOLUME, bpid,
-vol.getStorageID())) {
+String storageUuid = vol.getStorageID();
+try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.VOLUME, 
bpid, storageUuid)) {
   memBlockInfo = volumeMap.get(bpid, blockId);
   if (memBlockInfo != null &&
   memBlockInfo.getState() != ReplicaState.FINALIZED) {
 // Block is not finalized - ignore the difference
 return;
   }
+  if (!storageMap.containsKey(storageUuid)) {

Review Comment:
   How about moving this logic to line 2750?



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/IncrementalBlockReportManager.java:
##
@@ -304,4 +310,4 @@ void clearIBRs() {
   int getPendingIBRSize() {
 return pendingIBRs.size();
   }
-}
\ No newline at end of file
+}

Review Comment:
   can rollback this change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-23 Thread via GitHub


kokonguyen191 commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2071495428

   Hi @ZanderXu @Hexiaoqiao, can you help me take a look whenever you're free? 
Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-22 Thread via GitHub


hadoop-yetus commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2071375244

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 02s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 00s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 00s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  | 118m 49s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   8m 15s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   6m 34s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   8m 28s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   7m 31s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 190m 44s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   6m 57s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 54s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   4m 54s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 00s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   3m 25s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   6m 07s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   4m 56s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 207m 49s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   9m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 557m 52s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6759 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 9dbcaf4dc965 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / d5a87e15db1c70fb0c3974832c75504ff184b0d1 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/1/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/1/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



Re: [PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-22 Thread via GitHub


hadoop-yetus commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2069730206

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  12m 25s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  44m 20s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  checkstyle  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 24s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  6s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 46s |  |  trunk passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  35m 28s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  7s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  javac  |   1m  7s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  1s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 230 unchanged 
- 0 fixed = 231 total (was 230)  |
   | +1 :green_heart: |  mvnsite  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  35m 53s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 235m 27s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 44s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 388m 51s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestBPOfferService |
   |   | hadoop.hdfs.server.datanode.TestLargeBlockReport |
   |   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.45 ServerAPI=1.45 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6759 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 34352d4fc452 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d5a87e15db1c70fb0c3974832c75504ff184b0d1 |
   | Default Java | Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6759/1/testReport/ |
   | Max. 

[PR] HDFS-17488. DN can fail IBRs with NPE when a volume is removed [hadoop]

2024-04-22 Thread via GitHub


kokonguyen191 opened a new pull request, #6759:
URL: https://github.com/apache/hadoop/pull/6759

   ### Description of PR
   
   Error logs
   
   ```
   2024-04-22 15:46:33,422 [BP-1842952724-10.22.68.249-1713771988830 
heartbeating to localhost/127.0.0.1:64977] ERROR datanode.DataNode 
(BPServiceActor.java:run(922)) - Exception in BPOfferService for Block pool 
BP-1842952724-10.22.68.249-1713771988830 (Datanode Uuid 
1659ffaf-1a80-4a8e-a542-643f6bd97ed4) service to localhost/127.0.0.1:64977
   java.lang.NullPointerException
   at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolClientSideTranslatorPB.java:246)
   at 
org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager.sendIBRs(IncrementalBlockReportManager.java:218)
   at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:749)
   at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:920)
   at java.lang.Thread.run(Thread.java:748) 
   ```
   
   The root cause is in `BPOfferService#notifyNamenodeBlock`, happens when it's 
called on a block belonging to a volume already removed prior. Because the 
volume was already removed
   
   ```java
   private void notifyNamenodeBlock(ExtendedBlock block, BlockStatus status,
   String delHint, String storageUuid, boolean isOnTransientStorage) {
 checkBlock(block);
 final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
 block.getLocalBlock(), status, delHint);
 final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
 
 // storage == null here because it's already removed earlier.
   
 for (BPServiceActor actor : bpServices) {
   actor.getIbrManager().notifyNamenodeBlock(info, storage,
   isOnTransientStorage);
 }
   } 
   ```
   
   so IBRs with a null storage are now pending.
   
   The reason why notifyNamenodeBlock can trigger on such blocks is up in 
DirectoryScanner#reconcile
   ```java
 public void reconcile() throws IOException {
   LOG.debug("reconcile start DirectoryScanning");
   scan();
   
   // If a volume is removed here after scan() already finished running,
   // diffs is stale and checkAndUpdate will run on a removed volume
   
   // HDFS-14476: run checkAndUpdate with batch to avoid holding the lock 
too
   // long
   int loopCount = 0;
   synchronized (diffs) {
 for (final Map.Entry entry : diffs.getEntries()) {
   dataset.checkAndUpdate(entry.getKey(), entry.getValue());
   ...
 } 
   ```
   
   Inside `checkAndUpdate`, `memBlockInfo` is null because all the block meta 
in memory is removed during the volume removal, but `diskFile` still exists. 
Then `DataNode#notifyNamenodeDeletedBlock` (and further down the line, 
`notifyNamenodeBlock`) is called on this block.
   
   This patch effectively contains 2 parts:
   * Preventing processing IBRs with null storage
   * Preventing `FsDatasetImpl#checkAndUpdate` from running on a block 
belonging to a removed storage
   
   ### How was this patch tested?
   
   Unit tests. Partially on production cluster.
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org