date:20240428

[jira] [Commented] (HDFS-17456) Fix the dfsused statistics of datanode are incorrect when appending a file.

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841864#comment-17841864
 ] 

ASF GitHub Bot commented on HDFS-17456:
---

fuchaohong commented on PR #6713:
URL: https://github.com/apache/hadoop/pull/6713#issuecomment-2081977865

   > @fuchaohong Hi, IIUC the `dfsused` will be refreshed periodically. Do you 
meet any issues about this metric? Thanks.
   @Hexiaoqiao Indeed, the dfsused metric is corrected periodically, but the 
cycle is very long. Currently, when there are too many append operations, 
dfsused may expand abnormally, not reflecting the actual usage accurately. 
Thanks.
   




> Fix the dfsused statistics of datanode are incorrect when appending a file.
> ---
>
> Key: HDFS-17456
> URL: https://issues.apache.org/jira/browse/HDFS-17456
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.3.3
>Reporter: fuchaohong
>Priority: Major
>  Labels: pull-request-available
>
> In our production env, the namenode page showed that the datanode space had 
> been used up, but the actual datanode machine still had a lot of free space. 
> After troubleshooting, the dfsused statistics of datanode are incorrect when 
> appending a file. The following is the dfsused after each append of 100.
> |*Error*|*Expect*|
> |0|0|
> |100|100|
> |300|200|
> |600|300|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17476) fix: False positive "Observer Node is too far behind" due to long overflow.

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841851#comment-17841851
 ] 

ASF GitHub Bot commented on HDFS-17476:
---

hadoop-yetus commented on PR #6747:
URL: https://github.com/apache/hadoop/pull/6747#issuecomment-2081933949

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 00s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m 00s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  85m 56s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m 03s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   4m 37s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   6m 15s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   5m 43s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 141m 45s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 20s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   3m 20s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 01s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   2m 16s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   3m 52s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 152m 10s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   5m 08s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 406m 30s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6747 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 5b0a86518b3f 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 4208a984b11713148b6b6eba9f898d3d16c41fdd |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6747/2/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6747/2/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> fix: False positive "Observer Node is too far behind" due to long overflow.
> ---
>
> Key: HDFS-17476
> URL: https://issues.apache.org/jira/browse/HDFS-17476
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HDFS-17476.patch, image-2024-04-18-10-57-10-481.png
>
>
> In the code GlobalStateIdContext#receiveRequestState(), if clientStateId is a 
> small negative number, clientStateId-serverStateId may be greater than 
> （ESTIMATED_TRANSACTIONS_PER_SECOND due to overflow
>                   * TimeUnit.MILLISECONDS.toSeconds(clientWaitTime)
>                   * ESTIMATED_SERVER_TIME_MULTIPLIER）,
> resulting in false positives that Observer Node is too far behind.
> !image-2024-04-18-10-57-10-481.png|width=742,height=110!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-17462) RBF: NPE in Router concat when trg is an empty file.

2024-04-28 Thread Xiaoqiao He (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He reassigned HDFS-17462:
--

Assignee: NaihaoFan

> RBF: NPE in Router concat when trg is an empty file.
> 
>
> Key: HDFS-17462
> URL: https://issues.apache.org/jira/browse/HDFS-17462
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.2, 3.3.6
>Reporter: NaihaoFan
>Assignee: NaihaoFan
>Priority: Minor
>  Labels: pull-request-available
>
> When trg of Router concat is an empty file, it will trigger NPE in Router, 
> and the concat will fail, example:
> This is because when trg is an empty file, NameNode will return 
> lastLocatedBlock as null in the response of getBlockLocations. And Router 
> will not check null of lastLocatedBlock returned, instead Router will use it 
> to get block pool id directly.
> Trg of concat is an empty file should be allowed in router since this case is 
> supported by concat of NameNode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17462) RBF: NPE in Router concat when trg is an empty file.

2024-04-28 Thread Xiaoqiao He (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-17462:
---
Summary: RBF: NPE in Router concat when trg is an empty file.  (was: NPE in 
Router concat when trg is an empty file.)

> RBF: NPE in Router concat when trg is an empty file.
> 
>
> Key: HDFS-17462
> URL: https://issues.apache.org/jira/browse/HDFS-17462
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.2, 3.3.6
>Reporter: NaihaoFan
>Priority: Minor
>  Labels: pull-request-available
>
> When trg of Router concat is an empty file, it will trigger NPE in Router, 
> and the concat will fail, example:
> This is because when trg is an empty file, NameNode will return 
> lastLocatedBlock as null in the response of getBlockLocations. And Router 
> will not check null of lastLocatedBlock returned, instead Router will use it 
> to get block pool id directly.
> Trg of concat is an empty file should be allowed in router since this case is 
> supported by concat of NameNode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17458) Remove unnecessary BP lock in ReplicaMap

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841831#comment-17841831
 ] 

ASF GitHub Bot commented on HDFS-17458:
---

hfutatzhanghb commented on PR #6717:
URL: https://github.com/apache/hadoop/pull/6717#issuecomment-2081853953

   > @hfutatzhanghb As discussed above (which is also confirmed by 
@zhangshuyan0), It is unsafe to remove BP lock here, just close this PR. Please 
feel free to reopen it if there is one more graceful improvement. Thanks again.
   
   @Hexiaoqiao Ok, Sir.   I have push another PR to optimize this. please check 
https://github.com/apache/hadoop/pull/6764




> Remove unnecessary BP lock in ReplicaMap
> 
>
> Key: HDFS-17458
> URL: https://issues.apache.org/jira/browse/HDFS-17458
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> In HDFS-16429 we make LightWeightResizableGSet to be thread safe, and in 
> HDFS-16511  we change some methods in ReplicaMap to acquire read lock instead 
> of acquiring write lock.
> This PR try to remove unnecessary Block_Pool read lock further.
> Recently, I performed stress tests on datanodes to measure their read/write 
> operations/second.
> Before we removing some lock,  it can only achieve ~2K write ops. After 
> optimizing, it can achieve more than 5K write ops.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17458) Remove unnecessary BP lock in ReplicaMap

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841829#comment-17841829
 ] 

ASF GitHub Bot commented on HDFS-17458:
---

Hexiaoqiao commented on PR #6717:
URL: https://github.com/apache/hadoop/pull/6717#issuecomment-2081852396

   @hfutatzhanghb As discussed above (which is also confirmed by 
@zhangshuyan0), It is unsafe to remove BP lock here, just close this PR. Please 
feel free to reopen it if there is one more graceful improvement. Thanks again.




> Remove unnecessary BP lock in ReplicaMap
> 
>
> Key: HDFS-17458
> URL: https://issues.apache.org/jira/browse/HDFS-17458
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> In HDFS-16429 we make LightWeightResizableGSet to be thread safe, and in 
> HDFS-16511  we change some methods in ReplicaMap to acquire read lock instead 
> of acquiring write lock.
> This PR try to remove unnecessary Block_Pool read lock further.
> Recently, I performed stress tests on datanodes to measure their read/write 
> operations/second.
> Before we removing some lock,  it can only achieve ~2K write ops. After 
> optimizing, it can achieve more than 5K write ops.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17458) Remove unnecessary BP lock in ReplicaMap

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841830#comment-17841830
 ] 

ASF GitHub Bot commented on HDFS-17458:
---

Hexiaoqiao closed pull request #6717: HDFS-17458. Remove unnecessary BP lock in 
ReplicaMap.
URL: https://github.com/apache/hadoop/pull/6717




> Remove unnecessary BP lock in ReplicaMap
> 
>
> Key: HDFS-17458
> URL: https://issues.apache.org/jira/browse/HDFS-17458
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> In HDFS-16429 we make LightWeightResizableGSet to be thread safe, and in 
> HDFS-16511  we change some methods in ReplicaMap to acquire read lock instead 
> of acquiring write lock.
> This PR try to remove unnecessary Block_Pool read lock further.
> Recently, I performed stress tests on datanodes to measure their read/write 
> operations/second.
> Before we removing some lock,  it can only achieve ~2K write ops. After 
> optimizing, it can achieve more than 5K write ops.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17456) Fix the dfsused statistics of datanode are incorrect when appending a file.

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841828#comment-17841828
 ] 

ASF GitHub Bot commented on HDFS-17456:
---

Hexiaoqiao commented on PR #6713:
URL: https://github.com/apache/hadoop/pull/6713#issuecomment-2081850789

   @fuchaohong Hi, IIUC the `dfsused` will be refreshed periodically. Do you 
meet any issues about this metric? Thanks.




> Fix the dfsused statistics of datanode are incorrect when appending a file.
> ---
>
> Key: HDFS-17456
> URL: https://issues.apache.org/jira/browse/HDFS-17456
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.3.3
>Reporter: fuchaohong
>Priority: Major
>  Labels: pull-request-available
>
> In our production env, the namenode page showed that the datanode space had 
> been used up, but the actual datanode machine still had a lot of free space. 
> After troubleshooting, the dfsused statistics of datanode are incorrect when 
> appending a file. The following is the dfsused after each append of 100.
> |*Error*|*Expect*|
> |0|0|
> |100|100|
> |300|200|
> |600|300|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17367) Add PercentUsed for Different StorageTypes in JMX

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841822#comment-17841822
 ] 

ASF GitHub Bot commented on HDFS-17367:
---

zhtttylz commented on PR #6735:
URL: https://github.com/apache/hadoop/pull/6735#issuecomment-2081815007

   Thanks to @slfan1989 and @haiyang1987 for reviewing and merging this.




> Add PercentUsed for Different StorageTypes in JMX
> -
>
> Key: HDFS-17367
> URL: https://issues.apache.org/jira/browse/HDFS-17367
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: metrics, namenode
>Affects Versions: 3.5.0
>Reporter: Hualong Zhang
>Assignee: Hualong Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
>
> Currently, the NameNode only displays PercentUsed for the entire cluster. We 
> plan to add corresponding PercentUsed metrics for different StorageTypes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17503) Unreleased volume references because of OOM

2024-04-28 Thread Zilong Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841815#comment-17841815
 ] 

Zilong Zhu commented on HDFS-17503:
---

[~Keepromise] It appears to occur when creating the BlockSender object. This is 
an intermittent issue that occurs in our production environment. If I manually 
throw an OOM error while creating the BlockSender object, it can cause volume 
references not to be released.

> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>
> When BlockSender throws an error because of OOM，the volume reference obtained 
> by the thread is not released，which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16993) Datanode supports configure TopN DatanodeNetworkCounts

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841812#comment-17841812
 ] 

ASF GitHub Bot commented on HDFS-16993:
---

huangzhaobo99 commented on code in PR #5597:
URL: https://github.com/apache/hadoop/pull/5597#discussion_r1582490565


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java:
##
@@ -2630,6 +2631,28 @@ public int getActiveTransferThreadCount() {
 
   @Override // DataNodeMXBean
   public Map> getDatanodeNetworkCounts() {
+int maxDisplay = 
getConf().getInt(DFSConfigKeys.DFS_DATANODE_NETWORKERRORS_DISPLAY_TOPCOUNT,
+DFSConfigKeys.DFS_DATANODE_NETWORKERRORS_DISPLAY_TOPCOUNT_DEFAULT);
+if (maxDisplay >= 0) {

Review Comment:
   Can we first determine the size of the map? If it is less than N, we can 
return it directly.





> Datanode supports configure TopN DatanodeNetworkCounts
> --
>
> Key: HDFS-16993
> URL: https://issues.apache.org/jira/browse/HDFS-16993
> Project: Hadoop HDFS
>  Issue Type: Wish
>Affects Versions: 3.3.5
>Reporter: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> In our prod environment, we try to collect datanode metrics every 15s through 
> jmx_exporter.  we found the datanodenetworkerror metric generates a lot.
> for example, if we have a cluster with 1000 datanodes, every datanode may 
> generate 999 datanodenetworkerror metrics, and overall datanodes will 
> generate 1000 multiple 999 = 999000 metrics. This is a very expensive 
> operation. In most scenarios, we only need the topN of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17488) DN can fail IBRs with NPE when a volume is removed

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841809#comment-17841809
 ] 

ASF GitHub Bot commented on HDFS-17488:
---

kokonguyen191 commented on code in PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#discussion_r1582488078


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java:
##
@@ -324,6 +325,10 @@ private void notifyNamenodeBlock(ExtendedBlock block, 
BlockStatus status,
 final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
 block.getLocalBlock(), status, delHint);
 final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
+if (storage == null) {

Review Comment:
   Done





> DN can fail IBRs with NPE when a volume is removed
> --
>
> Key: HDFS-17488
> URL: https://issues.apache.org/jira/browse/HDFS-17488
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
>  
> Error logs
> {code:java}
> 2024-04-22 15:46:33,422 [BP-1842952724-10.22.68.249-1713771988830 
> heartbeating to localhost/127.0.0.1:64977] ERROR datanode.DataNode 
> (BPServiceActor.java:run(922)) - Exception in BPOfferService for Block pool 
> BP-1842952724-10.22.68.249-1713771988830 (Datanode Uuid 
> 1659ffaf-1a80-4a8e-a542-643f6bd97ed4) service to localhost/127.0.0.1:64977
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolClientSideTranslatorPB.java:246)
>     at 
> org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager.sendIBRs(IncrementalBlockReportManager.java:218)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:749)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:920)
>     at java.lang.Thread.run(Thread.java:748) {code}
> The root cause is in BPOfferService#notifyNamenodeBlock, happens when it's 
> called on a block belonging to a volume already removed prior. Because the 
> volume was already removed
>  
> {code:java}
> private void notifyNamenodeBlock(ExtendedBlock block, BlockStatus status,
> String delHint, String storageUuid, boolean isOnTransientStorage) {
>   checkBlock(block);
>   final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
>   block.getLocalBlock(), status, delHint);
>   final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
>   
>   // storage == null here because it's already removed earlier.
>   for (BPServiceActor actor : bpServices) {
> actor.getIbrManager().notifyNamenodeBlock(info, storage,
> isOnTransientStorage);
>   }
> } {code}
> so IBRs with a null storage are now pending.
> The reason why notifyNamenodeBlock can trigger on such blocks is up in 
> DirectoryScanner#reconcile
> {code:java}
>   public void reconcile() throws IOException {
>     LOG.debug("reconcile start DirectoryScanning");
>     scan();
> // If a volume is removed here after scan() already finished running,
> // diffs is stale and checkAndUpdate will run on a removed volume
>     // HDFS-14476: run checkAndUpdate with batch to avoid holding the lock too
>     // long
>     int loopCount = 0;
>     synchronized (diffs) {
>       for (final Map.Entry entry : diffs.getEntries()) {
>         dataset.checkAndUpdate(entry.getKey(), entry.getValue());        
>     ...
>   } {code}
> Inside checkAndUpdate, memBlockInfo is null because all the block meta in 
> memory is removed during the volume removal, but diskFile still exists. Then 
> DataNode#notifyNamenodeDeletedBlock (and further down the line, 
> notifyNamenodeBlock) is called on this block.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17488) DN can fail IBRs with NPE when a volume is removed

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841801#comment-17841801
 ] 

ASF GitHub Bot commented on HDFS-17488:
---

hfutatzhanghb commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2081770902

   > @kokonguyen191 Thanks for your report and contributions. Sorry didn't get 
this issue completely. As description, you mentioned that `storage == null` as 
following. I wonder why NPE not throw at `getPerStorageIBR(storage).put(rdbi);` 
first which at 
org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager#addRDBI.
   > 
   > ```
   > private void notifyNamenodeBlock(ExtendedBlock block, BlockStatus status,
   > String delHint, String storageUuid, boolean isOnTransientStorage) {
   >   checkBlock(block);
   >   final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
   >   block.getLocalBlock(), status, delHint);
   >   final DatanodeStorage storage = 
dn.getFSDataset().getStorage(storageUuid);
   >   
   >   // storage == null here because it's already removed earlier.
   > 
   >   for (BPServiceActor actor : bpServices) {
   > actor.getIbrManager().notifyNamenodeBlock(info, storage,
   > isOnTransientStorage);
   >   }
   > } 
   > ```
   > 
   > Another side, you mentioned that this issue is triggered by volume 
removed, so how the logic between volume removed to NPE. Thanks again.
   
   @Hexiaoqiao Hi, sir. please take a look at the description of 
https://github.com/apache/hadoop/pull/6730 . I found the same problem and 
explained in that PR's description.




> DN can fail IBRs with NPE when a volume is removed
> --
>
> Key: HDFS-17488
> URL: https://issues.apache.org/jira/browse/HDFS-17488
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
>  
> Error logs
> {code:java}
> 2024-04-22 15:46:33,422 [BP-1842952724-10.22.68.249-1713771988830 
> heartbeating to localhost/127.0.0.1:64977] ERROR datanode.DataNode 
> (BPServiceActor.java:run(922)) - Exception in BPOfferService for Block pool 
> BP-1842952724-10.22.68.249-1713771988830 (Datanode Uuid 
> 1659ffaf-1a80-4a8e-a542-643f6bd97ed4) service to localhost/127.0.0.1:64977
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolClientSideTranslatorPB.java:246)
>     at 
> org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager.sendIBRs(IncrementalBlockReportManager.java:218)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:749)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:920)
>     at java.lang.Thread.run(Thread.java:748) {code}
> The root cause is in BPOfferService#notifyNamenodeBlock, happens when it's 
> called on a block belonging to a volume already removed prior. Because the 
> volume was already removed
>  
> {code:java}
> private void notifyNamenodeBlock(ExtendedBlock block, BlockStatus status,
> String delHint, String storageUuid, boolean isOnTransientStorage) {
>   checkBlock(block);
>   final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
>   block.getLocalBlock(), status, delHint);
>   final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
>   
>   // storage == null here because it's already removed earlier.
>   for (BPServiceActor actor : bpServices) {
> actor.getIbrManager().notifyNamenodeBlock(info, storage,
> isOnTransientStorage);
>   }
> } {code}
> so IBRs with a null storage are now pending.
> The reason why notifyNamenodeBlock can trigger on such blocks is up in 
> DirectoryScanner#reconcile
> {code:java}
>   public void reconcile() throws IOException {
>     LOG.debug("reconcile start DirectoryScanning");
>     scan();
> // If a volume is removed here after scan() already finished running,
> // diffs is stale and checkAndUpdate will run on a removed volume
>     // HDFS-14476: run checkAndUpdate with batch to avoid holding the lock too
>     // long
>     int loopCount = 0;
>     synchronized (diffs) {
>       for (final Map.Entry entry : diffs.getEntries()) {
>         dataset.checkAndUpdate(entry.getKey(), entry.getValue());        
>     ...
>   } {code}
> Inside checkAndUpdate, memBlockInfo is null because all the block meta in 
> memory is removed during the volume removal, but diskFile still exists. Then 
> DataNode#notifyNamenodeDeletedBlock (and further down the line, 
> notifyNamenodeBlock) is called on this block.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

--

[jira] [Commented] (HDFS-17484) Introduce redundancy.considerLoad.minLoad to avoiding excluding nodes when they are not busy actually

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841798#comment-17841798
 ] 

ASF GitHub Bot commented on HDFS-17484:
---

hadoop-yetus commented on PR #6758:
URL: https://github.com/apache/hadoop/pull/6758#issuecomment-2081769086

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 01s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 00s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 00s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 00s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m 00s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m 01s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  89m 24s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m 06s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   4m 51s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   6m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   6m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 149m 15s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 31s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   3m 31s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 00s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   2m 21s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   4m 11s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m 34s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 164m 34s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   5m 33s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 430m 26s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6758 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | MINGW64_NT-10.0-17763 360e5b8ecd9b 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 0b75e7bd36f06728bcf31f08686c476d580270eb |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6758/4/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6758/4/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Introduce redundancy.considerLoad.minLoad to avoiding excluding nodes when 
> they are not busy actually
> -
>
> Key: HDFS-17484
> URL: https://issues.apache.org/jira/browse/HDFS-17484
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
>
> Currently, we have `dfs.namenode.redundancy.considerLoad` equals true by 
> default, and 
> dfs.namenode.redundancy.considerLoad.factor equals 2.0 by default.
> Think about below situation. when we are doing stress test, we may deploy 
> hdfs client onto the datanode. So, this hdfs client will prefer to write to 
> its local datanode and increase this machine's load.  Suppose we have 3 
> datanodes, the load of them are as below:  5.0, 0.2, 0.3.
>  
> The load equals to 5.0 will be excluded when choose datanodes for a block. 
> But actually, it is not slow node when load equals to 5.0 for a machine with 
> 80 cpu cores.
>  
> So, we should better add a new configuration entry :  
> `dfs.namenode.redundancy.considerLoad.minLoad` to indicate the mininum factor 
> we will make considerL

[jira] [Commented] (HDFS-17497) Logic for committed blocks is mixed when computing file size

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841797#comment-17841797
 ] 

ASF GitHub Bot commented on HDFS-17497:
---

ZanderXu commented on PR #6765:
URL: https://github.com/apache/hadoop/pull/6765#issuecomment-2081766984

   > IIRC, client also check the file length through request DataNode which 
manage the uncomplete block?
   
   Like other committed blocks, the client does not need to get the visible 
length from DN if the last block is in committed state.




> Logic for committed blocks is mixed when computing file size
> 
>
> Key: HDFS-17497
> URL: https://issues.apache.org/jira/browse/HDFS-17497
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> One in-writing HDFS file may contains multiple committed blocks, as follows 
> (assume one file contains three blocks):
> || ||Block 1||Block 2||Block 3||
> |Case 1|Complete|Commit|UnderConstruction|
> |Case 2|Complete|Commit|Commit|
> |Case 3|Commit|Commit|Commit|
>  
> But the logic for committed blocks is mixed when computing file size, it 
> ignores the bytes of the last committed block and contains the bytes of other 
> committed blocks.
> {code:java}
> public final long computeFileSize(boolean includesLastUcBlock,
> boolean usePreferredBlockSize4LastUcBlock) {
>   if (blocks.length == 0) {
> return 0;
>   }
>   final int last = blocks.length - 1;
>   //check if the last block is BlockInfoUnderConstruction
>   BlockInfo lastBlk = blocks[last];
>   long size = lastBlk.getNumBytes();
>   // the last committed block is not complete, so it's bytes may be ignored.
>   if (!lastBlk.isComplete()) {
>  if (!includesLastUcBlock) {
>size = 0;
>  } else if (usePreferredBlockSize4LastUcBlock) {
>size = isStriped()?
>getPreferredBlockSize() *
>((BlockInfoStriped)lastBlk).getDataBlockNum() :
>getPreferredBlockSize();
>  }
>   }
>   // The bytes of other committed blocks are calculated into the file length.
>   for (int i = 0; i < last; i++) {
> size += blocks[i].getNumBytes();
>   }
>   return size;
> } {code}
> The bytes of one committed block will not be changed, so the bytes of the 
> last committed block should be calculated into the file length too.
>  
> And the logic for committed blocks is mixed too when computing file length in 
> DFSInputStream. Normally DFSInputStream does not need to get visible length 
> for committed block regardless of whether the committed block is the last 
> block or not.
>  
> -HDFS-10843- encountered one bug which actually caused by the committed 
> block, but -HDFS-10843- fixed that bug by updating quota usage when 
> completing block. The num of bytes of the committed block will no longer 
> change, so we should update the quota usage when the block is committed, 
> which can reduce the delta quota usage in time.
>  
> So there are somethings we need to do:
>  * Unify the calculation logic for all committed blocks in 
> {{computeFileSize}} of {{INodeFile}}
>  * Unify the calculation logic for all committed blocks in {{getFileLength}} 
> of {{DFSInputStream}}
>  * Update quota usage when committing block



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17488) DN can fail IBRs with NPE when a volume is removed

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841794#comment-17841794
 ] 

ASF GitHub Bot commented on HDFS-17488:
---

hadoop-yetus commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2081754060

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 02s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 01s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  | 111m 47s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   7m 33s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   6m 01s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   8m 01s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   7m 10s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 179m 43s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   5m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 16s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   4m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 01s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   2m 48s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   5m 19s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   4m 32s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 198m 13s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   6m 53s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 523m 16s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6759 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 96a75b329453 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 1f166985eedcfea6aedcc62046908f86ed9827fa |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/5/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6759/5/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> DN can fail IBRs with NPE when a volume is removed
> --
>
> Key: HDFS-17488
> URL: https://issues.apache.org/jira/browse/HDFS-17488
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
>  
> Error logs
> {code:java}
> 2024-04-22 15:46:33,422 [BP-1842952724-10.22.68.249-1713771988830 
> heartbeating to localhost/127.0.0.1:64977] ERROR datanode.DataNode 
> (BPServiceActor.java:run(922)) - Exception in BPOfferService for Block pool 
> BP-1842952724-10.22.68.249-1713771988830 (Datanode Uuid 
> 1659ffaf-1a80-4a8e-a542-643f6bd97ed4) service to localhost/127.0.0.1:64977
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolClientSideTranslatorPB.java:246)
>     at 
> org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager.sendIBRs(IncrementalBlockReportManager.java:218)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:749)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:920)
>     at java.lang.Thread.run(Thread.java:748) {code}
> The root cause is in BPOfferService#notifyNamenodeBlock, happens

[jira] [Commented] (HDFS-17384) [FGL] Replace the global lock with global FS Lock and global BM lock

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841789#comment-17841789
 ] 

ASF GitHub Bot commented on HDFS-17384:
---

hadoop-yetus commented on PR #6762:
URL: https://github.com/apache/hadoop/pull/6762#issuecomment-2081720138

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 14s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m 01s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m 01s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 47 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 18s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  87m 56s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  39m 10s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   7m 54s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |  15m 54s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |  14m 59s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 174m 23s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 14s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   9m 51s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  36m 45s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |  36m 44s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 01s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   5m 53s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |  16m 04s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |  14m 57s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 183m 47s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   6m 03s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 552m 56s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6762 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint 
markdownlint |
   | uname | MINGW64_NT-10.0-17763 087b28dd158d 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 14153f07aab8a229f128f503b2741d1c81a824b2 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6762/2/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs-rbf hadoop-tools/hadoop-fs2img U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6762/2/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> [FGL] Replace the global lock with global FS Lock and global BM lock
> 
>
> Key: HDFS-17384
> URL: https://issues.apache.org/jira/browse/HDFS-17384
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: FGL, pull-request-available
>
> First, we can replace the current global lock with two locks, global FS lock 
> and global BM lock.
> The global FS lock is used to make directory tree-related operations 
> thread-safe.
> The global BM lock is used to make block-related operations and DN-related 
> operations thread-safe.
>  
> For some operations involving both directory tree and block or DN, the global 
> FS lock and the global BM lock are acquired.
>  
> The lock order should be:
>  * The global FS lock
>  * The global BM lock
>  
> There are some special requirements for this ticket.
>  * End-user can choose to use global lock or fine-grained lock th

[jira] [Commented] (HDFS-17496) DataNode supports more fine-grained dataset lock based on blockid

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841779#comment-17841779
 ] 

ASF GitHub Bot commented on HDFS-17496:
---

hadoop-yetus commented on PR #6764:
URL: https://github.com/apache/hadoop/pull/6764#issuecomment-2081668506

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 01s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 00s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 00s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 00s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  89m 34s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 58s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   4m 50s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   6m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   6m 04s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 149m 03s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 36s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   3m 36s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m 01s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6764/3/artifact/out/blanks-eol.txt)
 |  The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   2m 29s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   4m 24s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m 33s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 161m 52s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   5m 24s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 427m 33s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6764 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 a2b679b71053 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 45aeb0964fdd0a6c8675fa2342d8893a600ed4fb |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6764/3/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6764/3/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> DataNode supports more fine-grained dataset lock based on blockid
> -
>
> Key: HDFS-17496
> URL: https://issues.apache.org/jira/browse/HDFS-17496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-04-23-16-17-07-057.png
>
>
> Recently, we used NvmeSSD as volumes in datanodes and performed some stress 
> tests.
> We found that NvmeSSD and HDD disks achieve similar performance when create 
> lots of small files, such as 10KB.
> This phenomenon is counterintuitive.  After analyzing the metric monitoring , 
> we found that fsdataset lock became the bottleneck in high concurrency 
> scenario.
>  
> Currently, we have two level locks which are BLOCK_POOL and VOLUME.  We can 
> further split the volume lock to DIR lock.
> DIR lock is defined as below： given a blockid, we can determine which subdir 
> this block will be placed in finalized dir. We just use 
> subdir[0-31]/subdir[0-31] as the
> name of DIR lock.
> More details, please refer

[jira] [Commented] (HDFS-13603) Warmup NameNode EDEK thread retries continuously if there's an invalid key

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-13603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841777#comment-17841777
 ] 

ASF GitHub Bot commented on HDFS-13603:
---

hadoop-yetus commented on PR #6774:
URL: https://github.com/apache/hadoop/pull/6774#issuecomment-2081665080

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 02s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m 01s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |   6m 23s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  | 130m 49s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  61m 01s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   9m 30s |  |  trunk passed  |
   | -1 :x: |  mvnsite  |   6m 49s | 
[/branch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6774/2/artifact/out/branch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in trunk failed.  |
   | +1 :green_heart: |  javadoc  |  23m 26s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 256m 54s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   3m 30s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  18m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  56m 38s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |  56m 38s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 01s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   9m 10s |  |  the patch passed  |
   | -1 :x: |  mvnsite  |   6m 55s | 
[/patch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6774/2/artifact/out/patch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch failed.  |
   | +1 :green_heart: |  javadoc  |  22m 55s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 263m 03s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  asflicense  |   8m 50s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6774/2/artifact/out/results-asflicense.txt)
 |  The patch generated 1 ASF License warnings.  |
   |  |   | 815m 43s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6774 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | MINGW64_NT-10.0-17763 5f2bdf72e508 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / f0e0386cb4e59ba263e6254215945b971a7d1bd0 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6774/2/testReport/
 |
   | modules | C: hadoop-common-project/hadoop-common 
hadoop-common-project/hadoop-kms hadoop-hdfs-project/hadoop-hdfs U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6774/2/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Warmup NameNode EDEK thread retries continuously if there's an invalid key 
> ---
>
> Key: HDFS-13603
> URL: https://issues.apache.org/jira/browse/HDFS-13603
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, namenode
>Affects Versions: 2.8.0
>Reporter: Antony Jay
>Priority: Major
>  Labels: pull-request-available
>
> https://issues.apache.org/jira/browse/HDFS-9405 adds a background thread to 
> pre-warm EDEK cache. 
> However this fails and retr

[jira] [Commented] (HDFS-17497) Logic for committed blocks is mixed when computing file size

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841767#comment-17841767
 ] 

ASF GitHub Bot commented on HDFS-17497:
---

hadoop-yetus commented on PR #6765:
URL: https://github.com/apache/hadoop/pull/6765#issuecomment-2081636005

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 01s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  86m 48s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 57s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   4m 39s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   6m 25s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   5m 50s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 143m 08s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 27s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   3m 27s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 01s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   2m 19s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   4m 00s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m 25s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 154m 45s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   5m 13s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 412m 04s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6765 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 15fffafc3582 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 0aa96155ae7aed9c69d8c0ede601fffd4bc8c17f |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6765/3/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6765/3/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Logic for committed blocks is mixed when computing file size
> 
>
> Key: HDFS-17497
> URL: https://issues.apache.org/jira/browse/HDFS-17497
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> One in-writing HDFS file may contains multiple committed blocks, as follows 
> (assume one file contains three blocks):
> || ||Block 1||Block 2||Block 3||
> |Case 1|Complete|Commit|UnderConstruction|
> |Case 2|Complete|Commit|Commit|
> |Case 3|Commit|Commit|Commit|
>  
> But the logic for committed blocks is mixed when computing file size, it 
> ignores the bytes of the last committed block and contains the bytes of other 
> committed blocks.
> {code:java}
> public final long computeFileSize(boolean includesLastUcBlock,
> boolean usePreferredBlockSize4LastUcBlock) {
>   if (blocks.length == 0) {
> return 0;
>   }
>   final int last = blocks.length - 1;
>   //check if the last block is BlockInfoUnderConstruction
>   BlockInfo lastBlk = blocks[last];
>   long size = lastBlk.getNumBytes();
>   // the last committed block is not complete, so it's bytes may be ignored.
>   if (!lastBlk.isComplete()) {
>  if (!includesLastUcBlock) {
>size = 0;
>  } else if (usePreferredBlockSize4LastUcBlock) {
>size = isStriped()

[jira] [Commented] (HDFS-17452) DfsRouterAdmin RefreshCallQueue fails when authorization is enabled

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841683#comment-17841683
 ] 

ASF GitHub Bot commented on HDFS-17452:
---

hadoop-yetus commented on PR #6779:
URL: https://github.com/apache/hadoop/pull/6779#issuecomment-2081570914

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   9m 51s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ branch-3.3.6 Compile Tests _ |
   | -1 :x: |  mvninstall  |  36m  4s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6779/1/artifact/out/branch-mvninstall-root.txt)
 |  root in branch-3.3.6 failed.  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  branch-3.3.6 passed  |
   | +1 :green_heart: |  checkstyle  |   0m 28s |  |  branch-3.3.6 passed  |
   | +1 :green_heart: |  mvnsite  |   0m 43s |  |  branch-3.3.6 passed  |
   | +1 :green_heart: |  javadoc  |   1m  1s |  |  branch-3.3.6 passed  |
   | +1 :green_heart: |  spotbugs  |   1m 23s |  |  branch-3.3.6 passed  |
   | +1 :green_heart: |  shadedclient  |  26m 13s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 30s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6779/1/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 50s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 48s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  16m 52s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 128m 38s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6779/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6779 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 09a281817141 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.3.6 / 92e2e2d2cec4b4b7ee542d63770a957c0b886a8f |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~18.04-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6779/1/testReport/ |
   | Max. process+thread count | 2099 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6779/1/console |
   | versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> DfsRouterAdmin RefreshCallQueue fails when authorization is enabled
> ---
>
> Key: HDFS-17452
> URL: https://issues.apache.org/jira/browse/HDFS-17452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 3.3.6
>Reporter: Ananya Singh
>Assignee: Ananya Singh
>Priority: Major
>  Labels: pull-request-availa

[jira] [Commented] (HDFS-17452) DfsRouterAdmin RefreshCallQueue fails when authorization is enabled

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841662#comment-17841662
 ] 

ASF GitHub Bot commented on HDFS-17452:
---

AnanyaSingh2121 opened a new pull request, #6779:
URL: https://github.com/apache/hadoop/pull/6779

   
   
   
   ### Description of PR
   
   Adding the  kerberos principal key for Router refreshCallQueue command
   
   
   ### How was this patch tested?
   On a federated hadoop cluster kerberos was enabled and the command failed. 
After locally making the change and testing with hadoop jar the command was 
successful.
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> DfsRouterAdmin RefreshCallQueue fails when authorization is enabled
> ---
>
> Key: HDFS-17452
> URL: https://issues.apache.org/jira/browse/HDFS-17452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 3.3.6
>Reporter: Ananya Singh
>Assignee: Ananya Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17452) DfsRouterAdmin RefreshCallQueue fails when authorization is enabled

2024-04-28 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17452:
--
Labels: pull-request-available  (was: )

> DfsRouterAdmin RefreshCallQueue fails when authorization is enabled
> ---
>
> Key: HDFS-17452
> URL: https://issues.apache.org/jira/browse/HDFS-17452
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 3.3.6
>Reporter: Ananya Singh
>Assignee: Ananya Singh
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17501) Add a parameter "redirectByIPAddress" to WebHDFS

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841659#comment-17841659
 ] 

ASF GitHub Bot commented on HDFS-17501:
---

hadoop-yetus commented on PR #6775:
URL: https://github.com/apache/hadoop/pull/6775#issuecomment-2081518185

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 01s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |   4m 00s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  | 107m 47s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  26m 49s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   5m 46s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |  21m 10s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |  18m 20s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 198m 16s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 47s |  |  Maven dependency ordering for patch  |
   | -1 :x: |  mvninstall  |   3m 26s | 
[/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6775/3/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch failed.  |
   | -1 :x: |  mvninstall  |   2m 40s | 
[/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6775/3/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch failed.  |
   | -1 :x: |  compile  |   4m 06s | 
[/patch-compile-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6775/3/artifact/out/patch-compile-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project in the patch failed.  |
   | -1 :x: |  javac  |   4m 06s | 
[/patch-compile-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6775/3/artifact/out/patch-compile-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project in the patch failed.  |
   | +1 :green_heart: |  blanks  |   0m 00s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   2m 57s |  |  the patch passed  |
   | -1 :x: |  mvnsite  |   3m 14s | 
[/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6775/3/artifact/out/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch failed.  |
   | -1 :x: |  mvnsite  |   2m 41s | 
[/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6775/3/artifact/out/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch failed.  |
   | +1 :green_heart: |  javadoc  |   9m 40s |  |  the patch passed  |
   | -1 :x: |  shadedclient  |  67m 59s |  |  patch has errors when building 
and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   4m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 435m 20s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6775 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 2d9fa353a65a 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 95424d4aa56cd8c239598d7943e4d37cceb71bb6 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6775/3/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-client 
hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6775/3/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.

[jira] [Commented] (HDFS-17502) Adjust the log format of the printStatistics() in FSEditLog.java

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841658#comment-17841658
 ] 

ASF GitHub Bot commented on HDFS-17502:
---

hadoop-yetus commented on PR #6777:
URL: https://github.com/apache/hadoop/pull/6777#issuecomment-2081516325

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 00s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 00s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 00s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 00s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m 00s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  90m 02s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m 34s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   5m 22s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   7m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   6m 26s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 150m 27s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 24s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   3m 24s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 00s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   2m 14s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   4m 23s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m 29s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 160m 51s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   5m 54s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 428m 20s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6777 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 07b0b771646c 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / e2c863cd48272a73ad9807633be015e12463a32d |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6777/2/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6777/2/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Adjust the log format of the printStatistics() in FSEditLog.java
> 
>
> Key: HDFS-17502
> URL: https://issues.apache.org/jira/browse/HDFS-17502
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: Yu Wang
>Priority: Trivial
>  Labels: pull-request-available
>
> The current log format of printStatistics() is:
> {code:java}
> 2024-04-27 21:15:05,429 [main] INFO  namenode.FSEditLog 
> (FSEditLog.java:printStatistics(801)) - Number of transactions: 2 Total time 
> for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of 
> syncs: 3 SyncTimes(ms): 1 0{code}
> There are no separators between different keys, making it difficult to read. 
> The modified format is：
> {code:java}
> 2024-04-27 21:15:05,429 [main] INFO  namenode.FSEditLog 
> (FSEditLog.java:printStatistics(801)) - Number of transactions: 2, Total time 
> for transactions(ms): 2, Number of transactions batched in Syncs: 0, Number 
> of syncs: 3, SyncTimes(ms): 1 0 {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@had

[jira] [Commented] (HDFS-17488) DN can fail IBRs with NPE when a volume is removed

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841657#comment-17841657
 ] 

ASF GitHub Bot commented on HDFS-17488:
---

kokonguyen191 commented on code in PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#discussion_r1582197257


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java:
##
@@ -324,6 +325,10 @@ private void notifyNamenodeBlock(ExtendedBlock block, 
BlockStatus status,
 final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
 block.getLocalBlock(), status, delHint);
 final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
+if (storage == null) {

Review Comment:
   Good idea. Better to stop it here rather than let the code propagate further 
down. I'll change it later once I charge my laptop.





> DN can fail IBRs with NPE when a volume is removed
> --
>
> Key: HDFS-17488
> URL: https://issues.apache.org/jira/browse/HDFS-17488
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
>  
> Error logs
> {code:java}
> 2024-04-22 15:46:33,422 [BP-1842952724-10.22.68.249-1713771988830 
> heartbeating to localhost/127.0.0.1:64977] ERROR datanode.DataNode 
> (BPServiceActor.java:run(922)) - Exception in BPOfferService for Block pool 
> BP-1842952724-10.22.68.249-1713771988830 (Datanode Uuid 
> 1659ffaf-1a80-4a8e-a542-643f6bd97ed4) service to localhost/127.0.0.1:64977
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolClientSideTranslatorPB.java:246)
>     at 
> org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager.sendIBRs(IncrementalBlockReportManager.java:218)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:749)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:920)
>     at java.lang.Thread.run(Thread.java:748) {code}
> The root cause is in BPOfferService#notifyNamenodeBlock, happens when it's 
> called on a block belonging to a volume already removed prior. Because the 
> volume was already removed
>  
> {code:java}
> private void notifyNamenodeBlock(ExtendedBlock block, BlockStatus status,
> String delHint, String storageUuid, boolean isOnTransientStorage) {
>   checkBlock(block);
>   final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
>   block.getLocalBlock(), status, delHint);
>   final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
>   
>   // storage == null here because it's already removed earlier.
>   for (BPServiceActor actor : bpServices) {
> actor.getIbrManager().notifyNamenodeBlock(info, storage,
> isOnTransientStorage);
>   }
> } {code}
> so IBRs with a null storage are now pending.
> The reason why notifyNamenodeBlock can trigger on such blocks is up in 
> DirectoryScanner#reconcile
> {code:java}
>   public void reconcile() throws IOException {
>     LOG.debug("reconcile start DirectoryScanning");
>     scan();
> // If a volume is removed here after scan() already finished running,
> // diffs is stale and checkAndUpdate will run on a removed volume
>     // HDFS-14476: run checkAndUpdate with batch to avoid holding the lock too
>     // long
>     int loopCount = 0;
>     synchronized (diffs) {
>       for (final Map.Entry entry : diffs.getEntries()) {
>         dataset.checkAndUpdate(entry.getKey(), entry.getValue());        
>     ...
>   } {code}
> Inside checkAndUpdate, memBlockInfo is null because all the block meta in 
> memory is removed during the volume removal, but diskFile still exists. Then 
> DataNode#notifyNamenodeDeletedBlock (and further down the line, 
> notifyNamenodeBlock) is called on this block.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17488) DN can fail IBRs with NPE when a volume is removed

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841656#comment-17841656
 ] 

ASF GitHub Bot commented on HDFS-17488:
---

kokonguyen191 commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2081512148

   Hi @Hexiaoqiao, thanks for the review. For your first question, 
`getPerStorageIBR(storage)` doesn't do a null check on `storage` and assigns a 
new `PerStorageIBG` object to `null`. So it pretty much just considers `null` 
to be a new storage. For your second question, the NPE in IBRs is caused by a 
block report with `null` storage. The condition for this to happen is for a 
storage to be removed after `DirectoryScanner` has just finished a scan but 
hasn't started processing the blocks yet, so the scan result is now stale and 
contains removed storage.




> DN can fail IBRs with NPE when a volume is removed
> --
>
> Key: HDFS-17488
> URL: https://issues.apache.org/jira/browse/HDFS-17488
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
>  
> Error logs
> {code:java}
> 2024-04-22 15:46:33,422 [BP-1842952724-10.22.68.249-1713771988830 
> heartbeating to localhost/127.0.0.1:64977] ERROR datanode.DataNode 
> (BPServiceActor.java:run(922)) - Exception in BPOfferService for Block pool 
> BP-1842952724-10.22.68.249-1713771988830 (Datanode Uuid 
> 1659ffaf-1a80-4a8e-a542-643f6bd97ed4) service to localhost/127.0.0.1:64977
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolClientSideTranslatorPB.java:246)
>     at 
> org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager.sendIBRs(IncrementalBlockReportManager.java:218)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:749)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:920)
>     at java.lang.Thread.run(Thread.java:748) {code}
> The root cause is in BPOfferService#notifyNamenodeBlock, happens when it's 
> called on a block belonging to a volume already removed prior. Because the 
> volume was already removed
>  
> {code:java}
> private void notifyNamenodeBlock(ExtendedBlock block, BlockStatus status,
> String delHint, String storageUuid, boolean isOnTransientStorage) {
>   checkBlock(block);
>   final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
>   block.getLocalBlock(), status, delHint);
>   final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
>   
>   // storage == null here because it's already removed earlier.
>   for (BPServiceActor actor : bpServices) {
> actor.getIbrManager().notifyNamenodeBlock(info, storage,
> isOnTransientStorage);
>   }
> } {code}
> so IBRs with a null storage are now pending.
> The reason why notifyNamenodeBlock can trigger on such blocks is up in 
> DirectoryScanner#reconcile
> {code:java}
>   public void reconcile() throws IOException {
>     LOG.debug("reconcile start DirectoryScanning");
>     scan();
> // If a volume is removed here after scan() already finished running,
> // diffs is stale and checkAndUpdate will run on a removed volume
>     // HDFS-14476: run checkAndUpdate with batch to avoid holding the lock too
>     // long
>     int loopCount = 0;
>     synchronized (diffs) {
>       for (final Map.Entry entry : diffs.getEntries()) {
>         dataset.checkAndUpdate(entry.getKey(), entry.getValue());        
>     ...
>   } {code}
> Inside checkAndUpdate, memBlockInfo is null because all the block meta in 
> memory is removed during the volume removal, but diskFile still exists. Then 
> DataNode#notifyNamenodeDeletedBlock (and further down the line, 
> notifyNamenodeBlock) is called on this block.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17484) Introduce redundancy.considerLoad.minLoad to avoiding excluding nodes when they are not busy actually

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841652#comment-17841652
 ] 

ASF GitHub Bot commented on HDFS-17484:
---

Hexiaoqiao commented on PR #6758:
URL: https://github.com/apache/hadoop/pull/6758#issuecomment-2081503955

   How about turn off `dfs.namenode.redundancy.considerLoad` or turn up 
`dfs.namenode.redundancy.considerLoad.factor`. Thanks.




> Introduce redundancy.considerLoad.minLoad to avoiding excluding nodes when 
> they are not busy actually
> -
>
> Key: HDFS-17484
> URL: https://issues.apache.org/jira/browse/HDFS-17484
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
>
> Currently, we have `dfs.namenode.redundancy.considerLoad` equals true by 
> default, and 
> dfs.namenode.redundancy.considerLoad.factor equals 2.0 by default.
> Think about below situation. when we are doing stress test, we may deploy 
> hdfs client onto the datanode. So, this hdfs client will prefer to write to 
> its local datanode and increase this machine's load.  Suppose we have 3 
> datanodes, the load of them are as below:  5.0, 0.2, 0.3.
>  
> The load equals to 5.0 will be excluded when choose datanodes for a block. 
> But actually, it is not slow node when load equals to 5.0 for a machine with 
> 80 cpu cores.
>  
> So, we should better add a new configuration entry :  
> `dfs.namenode.redundancy.considerLoad.minLoad` to indicate the mininum factor 
> we will make considerLoad take effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17488) DN can fail IBRs with NPE when a volume is removed

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841651#comment-17841651
 ] 

ASF GitHub Bot commented on HDFS-17488:
---

Hexiaoqiao commented on PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#issuecomment-2081497703

   @kokonguyen191 Thanks for your report and contributions. Sorry didn't get 
this issue completely.
   As description, you mentioned that `storage == null` as following. I wonder 
why NPE not throw at `getPerStorageIBR(storage).put(rdbi);` first which at 
org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager#addRDBI.
   
   ```
   private void notifyNamenodeBlock(ExtendedBlock block, BlockStatus status,
   String delHint, String storageUuid, boolean isOnTransientStorage) {
 checkBlock(block);
 final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
 block.getLocalBlock(), status, delHint);
 final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
 
 // storage == null here because it's already removed earlier.
   
 for (BPServiceActor actor : bpServices) {
   actor.getIbrManager().notifyNamenodeBlock(info, storage,
   isOnTransientStorage);
 }
   } 
   ```
   
   Another side, you mentioned that this issue is triggered by volume removed, 
so how the logic between volume removed to NPE. Thanks again.




> DN can fail IBRs with NPE when a volume is removed
> --
>
> Key: HDFS-17488
> URL: https://issues.apache.org/jira/browse/HDFS-17488
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
>  
> Error logs
> {code:java}
> 2024-04-22 15:46:33,422 [BP-1842952724-10.22.68.249-1713771988830 
> heartbeating to localhost/127.0.0.1:64977] ERROR datanode.DataNode 
> (BPServiceActor.java:run(922)) - Exception in BPOfferService for Block pool 
> BP-1842952724-10.22.68.249-1713771988830 (Datanode Uuid 
> 1659ffaf-1a80-4a8e-a542-643f6bd97ed4) service to localhost/127.0.0.1:64977
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolClientSideTranslatorPB.java:246)
>     at 
> org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager.sendIBRs(IncrementalBlockReportManager.java:218)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:749)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:920)
>     at java.lang.Thread.run(Thread.java:748) {code}
> The root cause is in BPOfferService#notifyNamenodeBlock, happens when it's 
> called on a block belonging to a volume already removed prior. Because the 
> volume was already removed
>  
> {code:java}
> private void notifyNamenodeBlock(ExtendedBlock block, BlockStatus status,
> String delHint, String storageUuid, boolean isOnTransientStorage) {
>   checkBlock(block);
>   final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
>   block.getLocalBlock(), status, delHint);
>   final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
>   
>   // storage == null here because it's already removed earlier.
>   for (BPServiceActor actor : bpServices) {
> actor.getIbrManager().notifyNamenodeBlock(info, storage,
> isOnTransientStorage);
>   }
> } {code}
> so IBRs with a null storage are now pending.
> The reason why notifyNamenodeBlock can trigger on such blocks is up in 
> DirectoryScanner#reconcile
> {code:java}
>   public void reconcile() throws IOException {
>     LOG.debug("reconcile start DirectoryScanning");
>     scan();
> // If a volume is removed here after scan() already finished running,
> // diffs is stale and checkAndUpdate will run on a removed volume
>     // HDFS-14476: run checkAndUpdate with batch to avoid holding the lock too
>     // long
>     int loopCount = 0;
>     synchronized (diffs) {
>       for (final Map.Entry entry : diffs.getEntries()) {
>         dataset.checkAndUpdate(entry.getKey(), entry.getValue());        
>     ...
>   } {code}
> Inside checkAndUpdate, memBlockInfo is null because all the block meta in 
> memory is removed during the volume removal, but diskFile still exists. Then 
> DataNode#notifyNamenodeDeletedBlock (and further down the line, 
> notifyNamenodeBlock) is called on this block.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17503) Unreleased volume references because of OOM

2024-04-28 Thread Jian Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841650#comment-17841650
 ] 

Jian Zhang commented on HDFS-17503:
---

Can you describe in detail how oom came about?

> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>
> When BlockSender throws an error because of OOM，the volume reference obtained 
> by the thread is not released，which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17497) Logic for committed blocks is mixed when computing file size

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841647#comment-17841647
 ] 

ASF GitHub Bot commented on HDFS-17497:
---

Hexiaoqiao commented on PR #6765:
URL: https://github.com/apache/hadoop/pull/6765#issuecomment-2081489723

   Great catch, not review carefully, but I remember this have been discussed 
for long time. IIRC, client also check the file length through request DataNode 
which manage the uncomplete block? Thanks.
   (will try to review PR later.)




> Logic for committed blocks is mixed when computing file size
> 
>
> Key: HDFS-17497
> URL: https://issues.apache.org/jira/browse/HDFS-17497
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> One in-writing HDFS file may contains multiple committed blocks, as follows 
> (assume one file contains three blocks):
> || ||Block 1||Block 2||Block 3||
> |Case 1|Complete|Commit|UnderConstruction|
> |Case 2|Complete|Commit|Commit|
> |Case 3|Commit|Commit|Commit|
>  
> But the logic for committed blocks is mixed when computing file size, it 
> ignores the bytes of the last committed block and contains the bytes of other 
> committed blocks.
> {code:java}
> public final long computeFileSize(boolean includesLastUcBlock,
> boolean usePreferredBlockSize4LastUcBlock) {
>   if (blocks.length == 0) {
> return 0;
>   }
>   final int last = blocks.length - 1;
>   //check if the last block is BlockInfoUnderConstruction
>   BlockInfo lastBlk = blocks[last];
>   long size = lastBlk.getNumBytes();
>   // the last committed block is not complete, so it's bytes may be ignored.
>   if (!lastBlk.isComplete()) {
>  if (!includesLastUcBlock) {
>size = 0;
>  } else if (usePreferredBlockSize4LastUcBlock) {
>size = isStriped()?
>getPreferredBlockSize() *
>((BlockInfoStriped)lastBlk).getDataBlockNum() :
>getPreferredBlockSize();
>  }
>   }
>   // The bytes of other committed blocks are calculated into the file length.
>   for (int i = 0; i < last; i++) {
> size += blocks[i].getNumBytes();
>   }
>   return size;
> } {code}
> The bytes of one committed block will not be changed, so the bytes of the 
> last committed block should be calculated into the file length too.
>  
> And the logic for committed blocks is mixed too when computing file length in 
> DFSInputStream. Normally DFSInputStream does not need to get visible length 
> for committed block regardless of whether the committed block is the last 
> block or not.
>  
> -HDFS-10843- encountered one bug which actually caused by the committed 
> block, but -HDFS-10843- fixed that bug by updating quota usage when 
> completing block. The num of bytes of the committed block will no longer 
> change, so we should update the quota usage when the block is committed, 
> which can reduce the delta quota usage in time.
>  
> So there are somethings we need to do:
>  * Unify the calculation logic for all committed blocks in 
> {{computeFileSize}} of {{INodeFile}}
>  * Unify the calculation logic for all committed blocks in {{getFileLength}} 
> of {{DFSInputStream}}
>  * Update quota usage when committing block



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17502) Adjust the log format of the printStatistics() in FSEditLog.java

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841639#comment-17841639
 ] 

ASF GitHub Bot commented on HDFS-17502:
---

Hexiaoqiao closed pull request #6777: HDFS-17502. Adjust the log format of the 
printStatistics() in FSEditLog.java
URL: https://github.com/apache/hadoop/pull/6777




> Adjust the log format of the printStatistics() in FSEditLog.java
> 
>
> Key: HDFS-17502
> URL: https://issues.apache.org/jira/browse/HDFS-17502
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: Yu Wang
>Priority: Trivial
>  Labels: pull-request-available
>
> The current log format of printStatistics() is:
> {code:java}
> 2024-04-27 21:15:05,429 [main] INFO  namenode.FSEditLog 
> (FSEditLog.java:printStatistics(801)) - Number of transactions: 2 Total time 
> for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of 
> syncs: 3 SyncTimes(ms): 1 0{code}
> There are no separators between different keys, making it difficult to read. 
> The modified format is：
> {code:java}
> 2024-04-27 21:15:05,429 [main] INFO  namenode.FSEditLog 
> (FSEditLog.java:printStatistics(801)) - Number of transactions: 2, Total time 
> for transactions(ms): 2, Number of transactions batched in Syncs: 0, Number 
> of syncs: 3, SyncTimes(ms): 1 0 {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17502) Adjust the log format of the printStatistics() in FSEditLog.java

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841640#comment-17841640
 ] 

ASF GitHub Bot commented on HDFS-17502:
---

Hexiaoqiao commented on PR #6777:
URL: https://github.com/apache/hadoop/pull/6777#issuecomment-2081481103

   Thanks @yuw1 for your contribution. 
   Sorry, -1 from my side. Here we change the blank space to comma here, 
another PR change back to period, which is meaningless but cost resources.
   Welcome some other PRs and contributions! Thanks again.




> Adjust the log format of the printStatistics() in FSEditLog.java
> 
>
> Key: HDFS-17502
> URL: https://issues.apache.org/jira/browse/HDFS-17502
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: Yu Wang
>Priority: Trivial
>  Labels: pull-request-available
>
> The current log format of printStatistics() is:
> {code:java}
> 2024-04-27 21:15:05,429 [main] INFO  namenode.FSEditLog 
> (FSEditLog.java:printStatistics(801)) - Number of transactions: 2 Total time 
> for transactions(ms): 2 Number of transactions batched in Syncs: 0 Number of 
> syncs: 3 SyncTimes(ms): 1 0{code}
> There are no separators between different keys, making it difficult to read. 
> The modified format is：
> {code:java}
> 2024-04-27 21:15:05,429 [main] INFO  namenode.FSEditLog 
> (FSEditLog.java:printStatistics(801)) - Number of transactions: 2, Total time 
> for transactions(ms): 2, Number of transactions batched in Syncs: 0, Number 
> of syncs: 3, SyncTimes(ms): 1 0 {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17464) Improve some logs output in class FsDatasetImpl

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841623#comment-17841623
 ] 

ASF GitHub Bot commented on HDFS-17464:
---

hfutatzhanghb commented on code in PR #6724:
URL: https://github.com/apache/hadoop/pull/6724#discussion_r1582077047


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java:
##
@@ -2091,14 +2092,16 @@ public void unfinalizeBlock(ExtendedBlock b) throws 
IOException {
* @param info the replica that needs to be deleted
* @return true if data for the replica are deleted; false otherwise
*/
-  private boolean delBlockFromDisk(ReplicaInfo info) {
+  private boolean delBlockFromDisk(ReplicaInfo info, String bpid) {
 
 if (!info.deleteBlockData()) {
-  LOG.warn("Not able to delete the block data for replica " + info);
+  LOG.warn("Not able to delete the block data for replica " + info +

Review Comment:
   @haiyang1987 , Sir, thanks a lot for reviewing.  All opinions have fixed. 
please review again ~





> Improve some logs output in class FsDatasetImpl
> ---
>
> Key: HDFS-17464
> URL: https://issues.apache.org/jira/browse/HDFS-17464
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17464) Improve some logs output in class FsDatasetImpl

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841613#comment-17841613
 ] 

ASF GitHub Bot commented on HDFS-17464:
---

haiyang1987 commented on code in PR #6724:
URL: https://github.com/apache/hadoop/pull/6724#discussion_r1582072552


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java:
##
@@ -2091,14 +2092,16 @@ public void unfinalizeBlock(ExtendedBlock b) throws 
IOException {
* @param info the replica that needs to be deleted
* @return true if data for the replica are deleted; false otherwise
*/
-  private boolean delBlockFromDisk(ReplicaInfo info) {
+  private boolean delBlockFromDisk(ReplicaInfo info, String bpid) {
 
 if (!info.deleteBlockData()) {
-  LOG.warn("Not able to delete the block data for replica " + info);
+  LOG.warn("Not able to delete the block data for replica " + info +

Review Comment:
   can update warn("{}", arg) format?



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java:
##
@@ -2091,14 +2092,16 @@ public void unfinalizeBlock(ExtendedBlock b) throws 
IOException {
* @param info the replica that needs to be deleted
* @return true if data for the replica are deleted; false otherwise
*/
-  private boolean delBlockFromDisk(ReplicaInfo info) {
+  private boolean delBlockFromDisk(ReplicaInfo info, String bpid) {
 
 if (!info.deleteBlockData()) {
-  LOG.warn("Not able to delete the block data for replica " + info);
+  LOG.warn("Not able to delete the block data for replica " + info +
+  " bpid:" + bpid);
   return false;
 } else { // remove the meta file
   if (!info.deleteMetadata()) {
-LOG.warn("Not able to delete the meta data for replica " + info);
+LOG.warn("Not able to delete the meta data for replica " + info +

Review Comment:
   here



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java:
##
@@ -2016,7 +2016,7 @@ private ReplicaInfo finalizeReplica(String bpid, 
ReplicaInfo replicaInfo)
   if (volumeMap.get(bpid, replicaInfo.getBlockId()).getGenerationStamp()
   > replicaInfo.getGenerationStamp()) {
 throw new IOException("Generation Stamp should be monotonically "

Review Comment:
   ```
   throw new IOException("Generation Stamp should be monotonically "
   + "increased for bpid: " + bpid + " , block: " + replicaInfo);
 }
   ```





> Improve some logs output in class FsDatasetImpl
> ---
>
> Key: HDFS-17464
> URL: https://issues.apache.org/jira/browse/HDFS-17464
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17464) Improve some logs output in class FsDatasetImpl

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841614#comment-17841614
 ] 

ASF GitHub Bot commented on HDFS-17464:
---

haiyang1987 commented on code in PR #6724:
URL: https://github.com/apache/hadoop/pull/6724#discussion_r1582072571


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java:
##
@@ -2028,7 +2028,8 @@ private ReplicaInfo finalizeReplica(String bpid, 
ReplicaInfo replicaInfo)
   } else {
 FsVolumeImpl v = (FsVolumeImpl)replicaInfo.getVolume();
 if (v == null) {
-  throw new IOException("No volume for block " + replicaInfo);
+  throw new IOException("No volume for block " + replicaInfo +

Review Comment:
   `throw new IOException("No volume for bpid: " + bpid + " , block: " + 
replicaInfo););`



##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java:
##
@@ -1977,7 +1977,7 @@ public void finalizeBlock(ExtendedBlock b, boolean 
fsyncDir)
 b.getBlockPoolId(), getStorageUuidForLock(b))) {
   if (Thread.interrupted()) {
 // Don't allow data modifications from interrupted threads
-throw new IOException("Cannot finalize block from Interrupted Thread");
+throw new IOException("Cannot finalize block:" + b + "from Interrupted 
Thread");

Review Comment:
   ` throw new IOException("Cannot finalize block: " + b + " from Interrupted 
Thread");`





> Improve some logs output in class FsDatasetImpl
> ---
>
> Key: HDFS-17464
> URL: https://issues.apache.org/jira/browse/HDFS-17464
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17488) DN can fail IBRs with NPE when a volume is removed

2024-04-28 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841612#comment-17841612
 ] 

ASF GitHub Bot commented on HDFS-17488:
---

haiyang1987 commented on code in PR #6759:
URL: https://github.com/apache/hadoop/pull/6759#discussion_r1582071262


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java:
##
@@ -324,6 +325,10 @@ private void notifyNamenodeBlock(ExtendedBlock block, 
BlockStatus status,
 final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
 block.getLocalBlock(), status, delHint);
 final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
+if (storage == null) {

Review Comment:
   Thanks @kokonguyen191 for your work.
   
   there is a small question. when the execution reaches this, can we directly 
record `dnMetrics.incrNullStorageBlockReports()` and return? 





> DN can fail IBRs with NPE when a volume is removed
> --
>
> Key: HDFS-17488
> URL: https://issues.apache.org/jira/browse/HDFS-17488
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Felix N
>Assignee: Felix N
>Priority: Major
>  Labels: pull-request-available
>
>  
> Error logs
> {code:java}
> 2024-04-22 15:46:33,422 [BP-1842952724-10.22.68.249-1713771988830 
> heartbeating to localhost/127.0.0.1:64977] ERROR datanode.DataNode 
> (BPServiceActor.java:run(922)) - Exception in BPOfferService for Block pool 
> BP-1842952724-10.22.68.249-1713771988830 (Datanode Uuid 
> 1659ffaf-1a80-4a8e-a542-643f6bd97ed4) service to localhost/127.0.0.1:64977
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolClientSideTranslatorPB.java:246)
>     at 
> org.apache.hadoop.hdfs.server.datanode.IncrementalBlockReportManager.sendIBRs(IncrementalBlockReportManager.java:218)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:749)
>     at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:920)
>     at java.lang.Thread.run(Thread.java:748) {code}
> The root cause is in BPOfferService#notifyNamenodeBlock, happens when it's 
> called on a block belonging to a volume already removed prior. Because the 
> volume was already removed
>  
> {code:java}
> private void notifyNamenodeBlock(ExtendedBlock block, BlockStatus status,
> String delHint, String storageUuid, boolean isOnTransientStorage) {
>   checkBlock(block);
>   final ReceivedDeletedBlockInfo info = new ReceivedDeletedBlockInfo(
>   block.getLocalBlock(), status, delHint);
>   final DatanodeStorage storage = dn.getFSDataset().getStorage(storageUuid);
>   
>   // storage == null here because it's already removed earlier.
>   for (BPServiceActor actor : bpServices) {
> actor.getIbrManager().notifyNamenodeBlock(info, storage,
> isOnTransientStorage);
>   }
> } {code}
> so IBRs with a null storage are now pending.
> The reason why notifyNamenodeBlock can trigger on such blocks is up in 
> DirectoryScanner#reconcile
> {code:java}
>   public void reconcile() throws IOException {
>     LOG.debug("reconcile start DirectoryScanning");
>     scan();
> // If a volume is removed here after scan() already finished running,
> // diffs is stale and checkAndUpdate will run on a removed volume
>     // HDFS-14476: run checkAndUpdate with batch to avoid holding the lock too
>     // long
>     int loopCount = 0;
>     synchronized (diffs) {
>       for (final Map.Entry entry : diffs.getEntries()) {
>         dataset.checkAndUpdate(entry.getKey(), entry.getValue());        
>     ...
>   } {code}
> Inside checkAndUpdate, memBlockInfo is null because all the block meta in 
> memory is removed during the volume removal, but diskFile still exists. Then 
> DataNode#notifyNamenodeDeletedBlock (and further down the line, 
> notifyNamenodeBlock) is called on this block.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-17504) DN process should exit when BPServiceActor exit

2024-04-28 Thread Zilong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zilong Zhu reassigned HDFS-17504:
-

Assignee: Zilong Zhu

> DN process should exit when BPServiceActor exit
> ---
>
> Key: HDFS-17504
> URL: https://issues.apache.org/jira/browse/HDFS-17504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>
> BPServiceActor is a very important thread. In a non-HA cluster, the exit of 
> the BPServiceActor thread will cause the DN process to exit. However, in a HA 
> cluster, this is not the case.
> I found HDFS-15651 causes BPServiceActor thread to exit and sets the 
> "runningState" from "RunningState.FAILED" to "RunningState.EXITED",  it can 
> be confusing during troubleshooting.
> I believe that the DN process should exit when the flag of the BPServiceActor 
> is set to RunningState.FAILED because at this point, the DN is unable to 
> recover and establish a heartbeat connection with the ANN on its own.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-17504) DN process should exit when BPServiceActor exit

2024-04-28 Thread Zilong Zhu (Jira)

Zilong Zhu created HDFS-17504:
-

 Summary: DN process should exit when BPServiceActor exit
 Key: HDFS-17504
 URL: https://issues.apache.org/jira/browse/HDFS-17504
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Zilong Zhu


BPServiceActor is a very important thread. In a non-HA cluster, the exit of the 
BPServiceActor thread will cause the DN process to exit. However, in a HA 
cluster, this is not the case.
I found HDFS-15651 causes BPServiceActor thread to exit and sets the 
"runningState" from "RunningState.FAILED" to "RunningState.EXITED",  it can be 
confusing during troubleshooting.
I believe that the DN process should exit when the flag of the BPServiceActor 
is set to RunningState.FAILED because at this point, the DN is unable to 
recover and establish a heartbeat connection with the ANN on its own.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

39 matches

Mail list logo