[jira] [Work logged] (HDFS-16622) addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.

2022-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16622?focusedWorklogId=779952=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779952
 ]

ASF GitHub Bot logged work on HDFS-16622:
-

Author: ASF GitHub Bot
Created on: 09/Jun/22 13:35
Start Date: 09/Jun/22 13:35
Worklog Time Spent: 10m 
  Work Description: Hexiaoqiao commented on code in PR #4407:
URL: https://github.com/apache/hadoop/pull/4407#discussion_r893514408


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/IncrementalBlockReportManager.java:
##
@@ -251,12 +251,20 @@ synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
   DatanodeStorage storage) {
 // Make sure another entry for the same block is first removed.
 // There may only be one such entry.
+ReceivedDeletedBlockInfo removedInfo = null;
 for (PerStorageIBR perStorage : pendingIBRs.values()) {
-  if (perStorage.remove(rdbi.getBlock()) != null) {
+  removedInfo = perStorage.remove(rdbi.getBlock());
+  if (removedInfo != null) {
 break;
   }
 }
-getPerStorageIBR(storage).put(rdbi);
+if (removedInfo != null &&

Review Comment:
   @ZanderXu Thanks for the detailed information. It is an interesting case. 
IMO, this improvement makes sense to me. Would you mind to add unit test to 
cover this case?





Issue Time Tracking
---

Worklog Id: (was: 779952)
Time Spent: 1h  (was: 50m)

> addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.
> -
>
> Key: HDFS-16622
> URL: https://issues.apache.org/jira/browse/HDFS-16622
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In our production environment,  there is a strange missing block, according 
> to the log, I suspect there is a bug in function 
> addRDBI(ReceivedDeletedBlockInfo rdbi,DatanodeStorage storage)(line 250).
> Bug code in the for loop:
> {code:java}
> synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
>   DatanodeStorage storage) {
> // Make sure another entry for the same block is first removed.
> // There may only be one such entry.
> for (PerStorageIBR perStorage : pendingIBRs.values()) {
>   if (perStorage.remove(rdbi.getBlock()) != null) {
> break;
>   }
> }
> getPerStorageIBR(storage).put(rdbi);
>   }
> {code}
> Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than 
> the GS of the Block in rdbi. And NN will invalidate the Replicate will small 
> GS when complete one block. 
> So If there is only one replicate for one block, there is a possibility of 
> missingblock because of this wrong logic. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16622) addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.

2022-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16622?focusedWorklogId=779121=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779121
 ]

ASF GitHub Bot logged work on HDFS-16622:
-

Author: ASF GitHub Bot
Created on: 07/Jun/22 14:24
Start Date: 07/Jun/22 14:24
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on code in PR #4407:
URL: https://github.com/apache/hadoop/pull/4407#discussion_r891298950


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/IncrementalBlockReportManager.java:
##
@@ -251,12 +251,20 @@ synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
   DatanodeStorage storage) {
 // Make sure another entry for the same block is first removed.
 // There may only be one such entry.
+ReceivedDeletedBlockInfo removedInfo = null;
 for (PerStorageIBR perStorage : pendingIBRs.values()) {
-  if (perStorage.remove(rdbi.getBlock()) != null) {
+  removedInfo = perStorage.remove(rdbi.getBlock());
+  if (removedInfo != null) {
 break;
   }
 }
-getPerStorageIBR(storage).put(rdbi);
+if (removedInfo != null &&

Review Comment:
   We encountered the case of concurrent CloseRecovery.  The CloseRecovery with 
small GS early process block on Storage but later being added into pendingIBRs, 
and CloseRecovery with bigger GS later process block on Storage but early being 
added into pendingIBRs. As a result, the large GS block is stored on the disk, 
but small GS block being reported to Namenode.  And very unfortunately, the 
block has one this valid replica, and leads to the block missing.





Issue Time Tracking
---

Worklog Id: (was: 779121)
Time Spent: 50m  (was: 40m)

> addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.
> -
>
> Key: HDFS-16622
> URL: https://issues.apache.org/jira/browse/HDFS-16622
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In our production environment,  there is a strange missing block, according 
> to the log, I suspect there is a bug in function 
> addRDBI(ReceivedDeletedBlockInfo rdbi,DatanodeStorage storage)(line 250).
> Bug code in the for loop:
> {code:java}
> synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
>   DatanodeStorage storage) {
> // Make sure another entry for the same block is first removed.
> // There may only be one such entry.
> for (PerStorageIBR perStorage : pendingIBRs.values()) {
>   if (perStorage.remove(rdbi.getBlock()) != null) {
> break;
>   }
> }
> getPerStorageIBR(storage).put(rdbi);
>   }
> {code}
> Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than 
> the GS of the Block in rdbi. And NN will invalidate the Replicate will small 
> GS when complete one block. 
> So If there is only one replicate for one block, there is a possibility of 
> missingblock because of this wrong logic. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16622) addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.

2022-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16622?focusedWorklogId=779118=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779118
 ]

ASF GitHub Bot logged work on HDFS-16622:
-

Author: ASF GitHub Bot
Created on: 07/Jun/22 14:21
Start Date: 07/Jun/22 14:21
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on code in PR #4407:
URL: https://github.com/apache/hadoop/pull/4407#discussion_r891298950


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/IncrementalBlockReportManager.java:
##
@@ -251,12 +251,20 @@ synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
   DatanodeStorage storage) {
 // Make sure another entry for the same block is first removed.
 // There may only be one such entry.
+ReceivedDeletedBlockInfo removedInfo = null;
 for (PerStorageIBR perStorage : pendingIBRs.values()) {
-  if (perStorage.remove(rdbi.getBlock()) != null) {
+  removedInfo = perStorage.remove(rdbi.getBlock());
+  if (removedInfo != null) {
 break;
   }
 }
-getPerStorageIBR(storage).put(rdbi);
+if (removedInfo != null &&

Review Comment:
   We encountered the case of concurrent CloseRecovery.  The CloseRecovery with 
small GS early process block on Storage but later being added into pendingIBRs, 
and CloseRecovery with bigger GS later process block on Storage but early being 
added into pendingIBRs. As a result, the large GS block is stored on the disk, 
but small GS block being reported to Namenode.





Issue Time Tracking
---

Worklog Id: (was: 779118)
Time Spent: 40m  (was: 0.5h)

> addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.
> -
>
> Key: HDFS-16622
> URL: https://issues.apache.org/jira/browse/HDFS-16622
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In our production environment,  there is a strange missing block, according 
> to the log, I suspect there is a bug in function 
> addRDBI(ReceivedDeletedBlockInfo rdbi,DatanodeStorage storage)(line 250).
> Bug code in the for loop:
> {code:java}
> synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
>   DatanodeStorage storage) {
> // Make sure another entry for the same block is first removed.
> // There may only be one such entry.
> for (PerStorageIBR perStorage : pendingIBRs.values()) {
>   if (perStorage.remove(rdbi.getBlock()) != null) {
> break;
>   }
> }
> getPerStorageIBR(storage).put(rdbi);
>   }
> {code}
> Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than 
> the GS of the Block in rdbi. And NN will invalidate the Replicate will small 
> GS when complete one block. 
> So If there is only one replicate for one block, there is a possibility of 
> missingblock because of this wrong logic. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16622) addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.

2022-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16622?focusedWorklogId=779101=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779101
 ]

ASF GitHub Bot logged work on HDFS-16622:
-

Author: ASF GitHub Bot
Created on: 07/Jun/22 13:41
Start Date: 07/Jun/22 13:41
Worklog Time Spent: 10m 
  Work Description: Hexiaoqiao commented on code in PR #4407:
URL: https://github.com/apache/hadoop/pull/4407#discussion_r891245808


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/IncrementalBlockReportManager.java:
##
@@ -251,12 +251,20 @@ synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
   DatanodeStorage storage) {
 // Make sure another entry for the same block is first removed.
 // There may only be one such entry.
+ReceivedDeletedBlockInfo removedInfo = null;
 for (PerStorageIBR perStorage : pendingIBRs.values()) {
-  if (perStorage.remove(rdbi.getBlock()) != null) {
+  removedInfo = perStorage.remove(rdbi.getBlock());
+  if (removedInfo != null) {
 break;
   }
 }
-getPerStorageIBR(storage).put(rdbi);
+if (removedInfo != null &&

Review Comment:
   My first feeling is `pendingIBRs` should keep the freshest `rdbis` set to 
report NameNode. But after changes, it will be not the fresh data and also 
inconsistence with block data on Storage, right?





Issue Time Tracking
---

Worklog Id: (was: 779101)
Time Spent: 0.5h  (was: 20m)

> addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.
> -
>
> Key: HDFS-16622
> URL: https://issues.apache.org/jira/browse/HDFS-16622
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In our production environment,  there is a strange missing block, according 
> to the log, I suspect there is a bug in function 
> addRDBI(ReceivedDeletedBlockInfo rdbi,DatanodeStorage storage)(line 250).
> Bug code in the for loop:
> {code:java}
> synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
>   DatanodeStorage storage) {
> // Make sure another entry for the same block is first removed.
> // There may only be one such entry.
> for (PerStorageIBR perStorage : pendingIBRs.values()) {
>   if (perStorage.remove(rdbi.getBlock()) != null) {
> break;
>   }
> }
> getPerStorageIBR(storage).put(rdbi);
>   }
> {code}
> Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than 
> the GS of the Block in rdbi. And NN will invalidate the Replicate will small 
> GS when complete one block. 
> So If there is only one replicate for one block, there is a possibility of 
> missingblock because of this wrong logic. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16622) addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.

2022-06-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16622?focusedWorklogId=778582=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778582
 ]

ASF GitHub Bot logged work on HDFS-16622:
-

Author: ASF GitHub Bot
Created on: 06/Jun/22 11:27
Start Date: 06/Jun/22 11:27
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4407:
URL: https://github.com/apache/hadoop/pull/4407#issuecomment-1147344802

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  3s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  40m 18s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 44s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 41s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 22s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 48s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m  0s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 32s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 31s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 33s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 41s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 395m  0s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m  2s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 512m 44s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4407/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4407 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux e7745f582308 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 
17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 91f7ff3a9989a9a18398cf8c82b1e30492a86bad |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4407/1/testReport/ |
   | Max. process+thread count | 2066 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 

[jira] [Work logged] (HDFS-16622) addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.

2022-06-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16622?focusedWorklogId=778501=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778501
 ]

ASF GitHub Bot logged work on HDFS-16622:
-

Author: ASF GitHub Bot
Created on: 06/Jun/22 02:53
Start Date: 06/Jun/22 02:53
Worklog Time Spent: 10m 
  Work Description: ZanderXu opened a new pull request, #4407:
URL: https://github.com/apache/hadoop/pull/4407

   JIRA: [HDFS-16622](https://issues.apache.org/jira/browse/HDFS-16622).  
addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.
   I suspect there is a bug in function addRDBI(ReceivedDeletedBlockInfo 
rdbi,DatanodeStorage storage)(line 250).
   Bug code in the for loop:
   synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
 DatanodeStorage storage) {
   // Make sure another entry for the same block is first removed.
   // There may only be one such entry.
   for (PerStorageIBR perStorage : pendingIBRs.values()) {
 if (perStorage.remove(rdbi.getBlock()) != null) {
   break;
 }
   }
   getPerStorageIBR(storage).put(rdbi);
 }
   Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than 
the GS of the Block in rdbi. And NN will invalidate the Replicate will small GS 
when complete one block.




Issue Time Tracking
---

Worklog Id: (was: 778501)
Remaining Estimate: 0h
Time Spent: 10m

> addRDBI in IncrementalBlockReportManager may remove the block with bigger GS.
> -
>
> Key: HDFS-16622
> URL: https://issues.apache.org/jira/browse/HDFS-16622
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In our production environment,  there is a strange missing block, according 
> to the log, I suspect there is a bug in function 
> addRDBI(ReceivedDeletedBlockInfo rdbi,DatanodeStorage storage)(line 250).
> Bug code in the for loop:
> {code:java}
> synchronized void addRDBI(ReceivedDeletedBlockInfo rdbi,
>   DatanodeStorage storage) {
> // Make sure another entry for the same block is first removed.
> // There may only be one such entry.
> for (PerStorageIBR perStorage : pendingIBRs.values()) {
>   if (perStorage.remove(rdbi.getBlock()) != null) {
> break;
>   }
> }
> getPerStorageIBR(storage).put(rdbi);
>   }
> {code}
> Removed the GS of the Block in ReceivedDeletedBlockInfo may be greater than 
> the GS of the Block in rdbi. And NN will invalidate the Replicate will small 
> GS when complete one block. 
> So If there is only one replicate for one block, there is a possibility of 
> missingblock because of this wrong logic. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org