date:20240509

[jira] [Commented] (HDFS-17514) RBF: Routers keep using cached stateID even when active NN returns unset header

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845215#comment-17845215
 ] 

ASF GitHub Bot commented on HDFS-17514:
---

hadoop-yetus commented on PR #6804:
URL: https://github.com/apache/hadoop/pull/6804#issuecomment-2103966017

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 00s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  85m 42s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   4m 46s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   4m 23s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   4m 51s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   4m 29s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 141m 10s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 47s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 17s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 17s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 00s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 59s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 24s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 07s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 148m 54s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   5m 16s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 399m 15s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6804 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 505bedc9962d 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / c2452922760c6ef50b574a5f4a7ec523445da702 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6804/1/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6804/1/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> RBF: Routers keep using cached stateID even when active NN returns unset 
> header
> ---
>
> Key: HDFS-17514
> URL: https://issues.apache.org/jira/browse/HDFS-17514
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Minor
>  Labels: pull-request-available
>
> When a namenode that had "dfs.namenode.state.context.enabled" set to true is 
> restarted with the configuration set to false, routers will keep using a 
> previously cached state ID.
> Without RBF
> * clients that fetched the old stateID could have stale reads even after 
> msyncing
> * new clients will go to the active.
> With RBF
> * client that fetched the old stateID could have stale reads like above.
> * New clients will also fetch the stale stateID and potentially have stale 
> reads
> New clients that are created after the restart should not fetch the stale 
> state ID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17511) method storagespaceConsumedContiguous should use BlockInfo#getReplication to compute dsDelta

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845210#comment-17845210
 ] 

ASF GitHub Bot commented on HDFS-17511:
---

skyskyhu commented on PR #6799:
URL: https://github.com/apache/hadoop/pull/6799#issuecomment-2103926821

   Hi @ChenSammi @jojochuang @ayushtkn , could you please help review this PR 
when you have free time~ Thanks a lot.




> method storagespaceConsumedContiguous should use BlockInfo#getReplication to 
> compute dsDelta
> 
>
> Key: HDFS-17511
> URL: https://issues.apache.org/jira/browse/HDFS-17511
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
>
> As title says, we should use BlockInfo#getReplication to compute storage 
> space in method INodeFile#storagespaceConsumedContiguous.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17506) [FGL] Performance for phase 1

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845185#comment-17845185
 ] 

ASF GitHub Bot commented on HDFS-17506:
---

ferhui commented on code in PR #6806:
URL: https://github.com/apache/hadoop/pull/6806#discussion_r1596215333


##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/fgl/FSNLockBenchmarkThroughput.java:
##
@@ -0,0 +1,269 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs.server.namenode.fgl;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.hdfs.HdfsConfiguration;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.ThreadLocalRandom;
+
+/**
+ * This class benchmarks the throughput of NN for both global-lock and 
fine-grained lock.
+ */
+public class FSNLockBenchmarkThroughput {
+
+  private final int readWriteRatio;
+  private final int testingCount;
+  private final ExecutorService executorService;
+  private final FileSystem fileSystem;
+
+  public FSNLockBenchmarkThroughput(FileSystem fileSystem,
+  int readWriteRatio, int testingCount, int concurrency) {
+this.fileSystem = fileSystem;
+this.readWriteRatio = readWriteRatio;
+this.testingCount = testingCount;
+this.executorService = Executors.newFixedThreadPool(concurrency);
+  }
+
+  public void benchmark(String lockName) throws Exception {
+System.out.println("Do benchmark for " + lockName);
+Path basePath = new Path("/tmp/fsnlock/benchmark/throughput");

Review Comment:
   How about making the path as an input? if no input, can set a default path.





> [FGL] Performance for phase 1
> -
>
> Key: HDFS-17506
> URL: https://issues.apache.org/jira/browse/HDFS-17506
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> Do some benchmark testing for phase 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17509) RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file.

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845184#comment-17845184
 ] 

ASF GitHub Bot commented on HDFS-17509:
---

ZanderXu commented on code in PR #6784:
URL: https://github.com/apache/hadoop/pull/6784#discussion_r1596217290


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java:
##
@@ -667,39 +667,28 @@ public void rename2(final String src, final String dst,
   public void concat(String trg, String[] src) throws IOException {
 rpcServer.checkOperation(NameNode.OperationCategory.WRITE);
 
-// See if the src and target files are all in the same namespace
-LocatedBlocks targetBlocks = getBlockLocations(trg, 0, 1);
-if (targetBlocks == null) {
-  throw new IOException("Cannot locate blocks for target file - " + trg);
-}
-LocatedBlock lastLocatedBlock = targetBlocks.getLastLocatedBlock();
-String targetBlockPoolId = lastLocatedBlock.getBlock().getBlockPoolId();
-for (String source : src) {
-  LocatedBlocks sourceBlocks = getBlockLocations(source, 0, 1);
-  if (sourceBlocks == null) {
-throw new IOException(
-"Cannot located blocks for source file " + source);
-  }
-  String sourceBlockPoolId =
-  sourceBlocks.getLastLocatedBlock().getBlock().getBlockPoolId();
-  if (!sourceBlockPoolId.equals(targetBlockPoolId)) {
-throw new IOException("Cannot concatenate source file " + source
-+ " because it is located in a different namespace"
-+ " with block pool id " + sourceBlockPoolId
-+ " from the target file with block pool id "
-+ targetBlockPoolId);
-  }
-}
+// Concat only effects when all files in same namespace.
+// And in router view, a file only exists in one RemoteLocation.

Review Comment:
   we don't need to get NSId or BPId from the result of getFileInfo, you can 
refer to `invokeSequential` to loop all namespaces one by one.
   
   1. Get all namespaces of the input path.
   2. Send getFileInfo to each namespace one by one 
   3. The first namespace that the result of getFileInfo is not null is the one 
we need
   





> RBF: Fix ClientProtocol.concat  will throw NPE if tgr is a empty file.
> --
>
> Key: HDFS-17509
> URL: https://issues.apache.org/jira/browse/HDFS-17509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liuguanghua
>Priority: Minor
>  Labels: pull-request-available
>
> hdfs dfs -concat  /tmp/merge /tmp/t1 /tmp/t2
> When /tmp/merge is a empty file, this command will throw NPE via DFSRouter. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17506) [FGL] Performance for phase 1

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845182#comment-17845182
 ] 

ASF GitHub Bot commented on HDFS-17506:
---

ferhui commented on PR #6806:
URL: https://github.com/apache/hadoop/pull/6806#issuecomment-2103801948

   > > Thanks. What is the purpose of this PR? Provide a benchmark tool and 
each one can test the HDFS cluster performance? or just test the namenode 
performance locally?
   > 
   > `FSNLockBenchmarkThroughput` is a tool, each one can do the performance 
tests through this tool after deploying FGL. `TestFSNLockBenchmarkThroughput` 
is a UT, it mocks a `MiniQJMHACluster` and do some performance tests locally.
   
   Got it, thanks.




> [FGL] Performance for phase 1
> -
>
> Key: HDFS-17506
> URL: https://issues.apache.org/jira/browse/HDFS-17506
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> Do some benchmark testing for phase 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17503) Unreleased volume references because of OOM

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845179#comment-17845179
 ] 

ASF GitHub Bot commented on HDFS-17503:
---

zhuzilong2013 commented on PR #6782:
URL: https://github.com/apache/hadoop/pull/6782#issuecomment-2103787150

   Thanks @ZanderXu for your review and merge~




> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> When BlockSender throws an error because of OOM，the volume reference obtained 
> by the thread is not released，which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17509) RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file.

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845174#comment-17845174
 ] 

ASF GitHub Bot commented on HDFS-17509:
---

LiuGuH commented on code in PR #6784:
URL: https://github.com/apache/hadoop/pull/6784#discussion_r1596189673


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java:
##
@@ -667,39 +667,28 @@ public void rename2(final String src, final String dst,
   public void concat(String trg, String[] src) throws IOException {
 rpcServer.checkOperation(NameNode.OperationCategory.WRITE);
 
-// See if the src and target files are all in the same namespace
-LocatedBlocks targetBlocks = getBlockLocations(trg, 0, 1);
-if (targetBlocks == null) {
-  throw new IOException("Cannot locate blocks for target file - " + trg);
-}
-LocatedBlock lastLocatedBlock = targetBlocks.getLastLocatedBlock();
-String targetBlockPoolId = lastLocatedBlock.getBlock().getBlockPoolId();
-for (String source : src) {
-  LocatedBlocks sourceBlocks = getBlockLocations(source, 0, 1);
-  if (sourceBlocks == null) {
-throw new IOException(
-"Cannot located blocks for source file " + source);
-  }
-  String sourceBlockPoolId =
-  sourceBlocks.getLastLocatedBlock().getBlock().getBlockPoolId();
-  if (!sourceBlockPoolId.equals(targetBlockPoolId)) {
-throw new IOException("Cannot concatenate source file " + source
-+ " because it is located in a different namespace"
-+ " with block pool id " + sourceBlockPoolId
-+ " from the target file with block pool id "
-+ targetBlockPoolId);
-  }
-}
+// Concat only effects when all files in same namespace.
+// And in router view, a file only exists in one RemoteLocation.

Review Comment:
   
   >May we can use getFileInfo instead of getBlockLocation to fix this bug. 
BTW, getFileInfo is no needed if this path only mounts to one namespace.
   
   At the beginning , I consider use getFileInfo .  But the problem is that 
HdfsFileStatus does not have any information about nameservices or blockpoolid, 
only LocatedBlock has.   





> RBF: Fix ClientProtocol.concat  will throw NPE if tgr is a empty file.
> --
>
> Key: HDFS-17509
> URL: https://issues.apache.org/jira/browse/HDFS-17509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liuguanghua
>Priority: Minor
>  Labels: pull-request-available
>
> hdfs dfs -concat  /tmp/merge /tmp/t1 /tmp/t2
> When /tmp/merge is a empty file, this command will throw NPE via DFSRouter. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-2139) Fast copy for HDFS.

2024-05-09 Thread ZanderXu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-2139:
---
Description: 
There is a need to perform fast file copy on HDFS. The fast copy mechanism for 
a file works as
follows :

1) Query metadata for all blocks of the source file.

2) For each block 'b' of the file, find out its datanode locations.

3) For each block of the file, add an empty block to the namesystem for
the destination file.

4) For each location of the block, instruct the datanode to make a local
copy of that block.

5) Once each datanode has copied over its respective blocks, they
report to the namenode about it.

6) Wait for all blocks to be copied and exit.

This would speed up the copying process considerably by removing top of
the rack data transfers.

Note : An extra improvement, would be to instruct the datanode to create a
hardlink of the block file if we are copying a block on the same datanode

[~xuzq_zander]Provided a design doc 
https://docs.google.com/document/d/1uGHA2dXLldlNoaYF-4c63baYjCuft_T88wdvhwVgh6c/edit?usp=sharing

  was:
There is a need to perform fast file copy on HDFS. The fast copy mechanism for 
a file works as
follows :

1) Query metadata for all blocks of the source file.

2) For each block 'b' of the file, find out its datanode locations.

3) For each block of the file, add an empty block to the namesystem for
the destination file.

4) For each location of the block, instruct the datanode to make a local
copy of that block.

5) Once each datanode has copied over its respective blocks, they
report to the namenode about it.

6) Wait for all blocks to be copied and exit.

This would speed up the copying process considerably by removing top of
the rack data transfers.

Note : An extra improvement, would be to instruct the datanode to create a
hardlink of the block file if we are copying a block on the same datanode


[~xuzq_zander]Provided a design doc 
https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing


> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1uGHA2dXLldlNoaYF-4c63baYjCuft_T88wdvhwVgh6c/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-17503) Unreleased volume references because of OOM

2024-05-09 Thread ZanderXu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu resolved HDFS-17503.
-
Fix Version/s: 3.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> When BlockSender throws an error because of OOM，the volume reference obtained 
> by the thread is not released，which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17503) Unreleased volume references because of OOM

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845173#comment-17845173
 ] 

ASF GitHub Bot commented on HDFS-17503:
---

ZanderXu commented on PR #6782:
URL: https://github.com/apache/hadoop/pull/6782#issuecomment-2103752898

   Merged. Thanks @zhuzilong2013 for your contribution.




> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
>
> When BlockSender throws an error because of OOM，the volume reference obtained 
> by the thread is not released，which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17503) Unreleased volume references because of OOM

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845172#comment-17845172
 ] 

ASF GitHub Bot commented on HDFS-17503:
---

ZanderXu merged PR #6782:
URL: https://github.com/apache/hadoop/pull/6782




> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
>
> When BlockSender throws an error because of OOM，the volume reference obtained 
> by the thread is not released，which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17509) RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file.

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845171#comment-17845171
 ] 

ASF GitHub Bot commented on HDFS-17509:
---

ZanderXu commented on code in PR #6784:
URL: https://github.com/apache/hadoop/pull/6784#discussion_r1596183555


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java:
##
@@ -667,39 +667,28 @@ public void rename2(final String src, final String dst,
   public void concat(String trg, String[] src) throws IOException {
 rpcServer.checkOperation(NameNode.OperationCategory.WRITE);
 
-// See if the src and target files are all in the same namespace
-LocatedBlocks targetBlocks = getBlockLocations(trg, 0, 1);
-if (targetBlocks == null) {
-  throw new IOException("Cannot locate blocks for target file - " + trg);
-}
-LocatedBlock lastLocatedBlock = targetBlocks.getLastLocatedBlock();
-String targetBlockPoolId = lastLocatedBlock.getBlock().getBlockPoolId();
-for (String source : src) {
-  LocatedBlocks sourceBlocks = getBlockLocations(source, 0, 1);
-  if (sourceBlocks == null) {
-throw new IOException(
-"Cannot located blocks for source file " + source);
-  }
-  String sourceBlockPoolId =
-  sourceBlocks.getLastLocatedBlock().getBlock().getBlockPoolId();
-  if (!sourceBlockPoolId.equals(targetBlockPoolId)) {
-throw new IOException("Cannot concatenate source file " + source
-+ " because it is located in a different namespace"
-+ " with block pool id " + sourceBlockPoolId
-+ " from the target file with block pool id "
-+ targetBlockPoolId);
-  }
-}
+// Concat only effects when all files in same namespace.
+// And in router view, a file only exists in one RemoteLocation.

Review Comment:
   > For a file in Router view with more than one nameservices, I think should 
thrown Exception for concat method.
   
   For multiple namespaces contain one same file case, RBF just return the file 
in the first namespace currently, such as: getBlockLocation, getFileInfo, etc. 
   
   So If you want to thrown Exception for concat, maybe you need to modify all 
RPCs to throw Exception for this case.
   
   May we can use getFileInfo instead of getBlockLocation to fix this bug. BTW, 
getFileInfo is no needed if this path only mounts to one namespace.
   





> RBF: Fix ClientProtocol.concat  will throw NPE if tgr is a empty file.
> --
>
> Key: HDFS-17509
> URL: https://issues.apache.org/jira/browse/HDFS-17509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liuguanghua
>Priority: Minor
>  Labels: pull-request-available
>
> hdfs dfs -concat  /tmp/merge /tmp/t1 /tmp/t2
> When /tmp/merge is a empty file, this command will throw NPE via DFSRouter. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17506) [FGL] Performance for phase 1

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845170#comment-17845170
 ] 

ASF GitHub Bot commented on HDFS-17506:
---

ZanderXu commented on PR #6806:
URL: https://github.com/apache/hadoop/pull/6806#issuecomment-2103741784

   > Thanks. What is the purpose of this PR? Provide a benchmark tool and each 
one can test the HDFS cluster performance? or just test the namenode 
performance locally?
   
   `FSNLockBenchmarkThroughput` is a tool, each one can do the performance 
tests through this tool after deploying FGL.
   `TestFSNLockBenchmarkThroughput` is a UT, it mocks a `MiniQJMHACluster` and 
do some performance tests locally.




> [FGL] Performance for phase 1
> -
>
> Key: HDFS-17506
> URL: https://issues.apache.org/jira/browse/HDFS-17506
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> Do some benchmark testing for phase 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17506) [FGL] Performance for phase 1

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845167#comment-17845167
 ] 

ASF GitHub Bot commented on HDFS-17506:
---

ferhui commented on PR #6806:
URL: https://github.com/apache/hadoop/pull/6806#issuecomment-2103691546

   Thanks. What is the purpose of this PR? Provide a benchmark tool and each 
one can test the HDFS cluster performance? or just test the namenode 
performance locally?




> [FGL] Performance for phase 1
> -
>
> Key: HDFS-17506
> URL: https://issues.apache.org/jira/browse/HDFS-17506
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> Do some benchmark testing for phase 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17438) RBF: The newest STANDBY and UNAVAILABLE nn should be the lowest priority.

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845166#comment-17845166
 ] 

ASF GitHub Bot commented on HDFS-17438:
---

hadoop-yetus commented on PR #6655:
URL: https://github.com/apache/hadoop/pull/6655#issuecomment-2103686670

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 01s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 00s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 00s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 00s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |   3m 55s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  88m 40s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  40m 29s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   5m 51s |  |  trunk passed  |
   | -1 :x: |  mvnsite  |   4m 20s | 
[/branch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6655/7/artifact/out/branch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in trunk failed.  |
   | +1 :green_heart: |  javadoc  |   9m 41s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 163m 28s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 14s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   7m 49s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  36m 52s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |  36m 52s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 00s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   5m 49s |  |  the patch passed  |
   | -1 :x: |  mvnsite  |   4m 19s | 
[/patch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6655/7/artifact/out/patch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch failed.  |
   | +1 :green_heart: |  javadoc  |   9m 04s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 166m 42s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   5m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 521m 40s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6655 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 ae72a0e5a6ae 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 50d9f7c20216e78abe2b5c89282dff62375a |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6655/7/testReport/
 |
   | modules | C: hadoop-common-project/hadoop-common 
hadoop-hdfs-project/hadoop-hdfs-rbf U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6655/7/console
 |
   | versions | git=2.45.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> RBF: The newest STANDBY and UNAVAILABLE nn should be the lowest priority.
> -
>
> Key: HDFS-17438
> URL: https://issues.apache.org/jira/browse/HDFS-17438
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-17438.001.patch
>
>
> At present, when the status of all namenodes in an ns in the router is the 
> same, the namenode which is the newest reported will be placed at the top of 
> the cache. when the client accesses the ns through the router, it will first 
> access the namenode.
> If multiple namenodes in this route are in an active state, or if there are 
> namenodes wi

[jira] [Commented] (HDFS-17476) fix: False positive "Observer Node is too far behind" due to long overflow.

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845162#comment-17845162
 ] 

ASF GitHub Bot commented on HDFS-17476:
---

hadoop-yetus commented on PR #6747:
URL: https://github.com/apache/hadoop/pull/6747#issuecomment-2103618886

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 01s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 00s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 00s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 00s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 01s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m 00s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  87m 35s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m 26s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   4m 52s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   6m 36s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   6m 08s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 144m 09s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 23s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   3m 23s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 01s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   2m 18s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   3m 57s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m 37s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 154m 32s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   5m 17s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 413m 25s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6747 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 cdbf5294217d 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 5f6e761f92c501b59ae53a552e64b5d6f54f20c1 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6747/3/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6747/3/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> fix: False positive "Observer Node is too far behind" due to long overflow.
> ---
>
> Key: HDFS-17476
> URL: https://issues.apache.org/jira/browse/HDFS-17476
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HDFS-17476.patch, image-2024-04-18-10-57-10-481.png
>
>
> In the code GlobalStateIdContext#receiveRequestState(), if clientStateId is a 
> small negative number, clientStateId-serverStateId may be greater than 
> （ESTIMATED_TRANSACTIONS_PER_SECOND due to overflow
>                   * TimeUnit.MILLISECONDS.toSeconds(clientWaitTime)
>                   * ESTIMATED_SERVER_TIME_MULTIPLIER）,
> resulting in false positives that Observer Node is too far behind.
> !image-2024-04-18-10-57-10-481.png|width=742,height=110!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16368) DFSAdmin supports refresh topology info without restarting namenode

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845156#comment-17845156
 ] 

ASF GitHub Bot commented on HDFS-16368:
---

hadoop-yetus commented on PR #3743:
URL: https://github.com/apache/hadoop/pull/3743#issuecomment-2103598386

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 02s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  buf  |   0m 01s |  |  buf was not available.  |
   | +0 :ok: |  buf  |   0m 01s |  |  buf was not available.  |
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 21s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  93m 28s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  42m 01s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   6m 08s |  |  trunk passed  |
   | -1 :x: |  mvnsite  |   4m 41s | 
[/branch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-3743/2/artifact/out/branch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in trunk failed.  |
   | +1 :green_heart: |  javadoc  |  20m 57s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 191m 30s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 28s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  16m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  38m 28s |  |  the patch passed  |
   | +1 :green_heart: |  cc  |  38m 28s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |  38m 28s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m 01s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-3743/2/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   6m 24s |  |  the patch passed  |
   | -1 :x: |  mvnsite  |   4m 46s | 
[/patch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-3743/2/artifact/out/patch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch failed.  |
   | +1 :green_heart: |  javadoc  |  21m 09s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 200m 45s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   5m 44s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 600m 09s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/3743 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets cc buflint 
bufcompat |
   | uname | MINGW64_NT-10.0-17763 29a0d9562666 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 2e5165fd3cb5b9e8ed8dfdade8e2b4874033a182 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-3743/2/testReport/
 |
   | modules | C: hadoop-common-project/hadoop-common 
hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs-rbf U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-3743/2/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




>  DFSAdmin supports refresh topology info without restarting namenode
> 
>
> Key: HDFS-16368
> URL: https://issues.apache.org/jira/browse/HDFS-16368
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: dfsadmin, namanode
>Affects Versions:

[jira] [Commented] (HDFS-17438) RBF: The newest STANDBY and UNAVAILABLE nn should be the lowest priority.

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845152#comment-17845152
 ] 

ASF GitHub Bot commented on HDFS-17438:
---

hadoop-yetus commented on PR #6655:
URL: https://github.com/apache/hadoop/pull/6655#issuecomment-2103584427

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 01s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 24s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  93m 05s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  41m 31s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   6m 27s |  |  trunk passed  |
   | -1 :x: |  mvnsite  |   4m 44s | 
[/branch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6655/6/artifact/out/branch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in trunk failed.  |
   | +1 :green_heart: |  javadoc  |  10m 05s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 169m 20s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   2m 32s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  10m 02s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  42m 17s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |  42m 17s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 01s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   6m 43s |  |  the patch passed  |
   | -1 :x: |  mvnsite  |   5m 00s | 
[/patch-mvnsite-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6655/6/artifact/out/patch-mvnsite-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch failed.  |
   | +1 :green_heart: |  javadoc  |  10m 04s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 179m 56s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   6m 04s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 553m 14s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6655 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 d4e3488ee1e9 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 50d9f7c20216e78abe2b5c89282dff62375a |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6655/6/testReport/
 |
   | modules | C: hadoop-common-project/hadoop-common 
hadoop-hdfs-project/hadoop-hdfs-rbf U: . |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6655/6/console
 |
   | versions | git=2.45.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> RBF: The newest STANDBY and UNAVAILABLE nn should be the lowest priority.
> -
>
> Key: HDFS-17438
> URL: https://issues.apache.org/jira/browse/HDFS-17438
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-17438.001.patch
>
>
> At present, when the status of all namenodes in an ns in the router is the 
> same, the namenode which is the newest reported will be placed at the top of 
> the cache. when the client accesses the ns through the router, it will first 
> access the namenode.
> If multiple namenodes in this route are in an active state, or if there are 
> namenodes wi

[jira] [Commented] (HDFS-16993) Datanode supports configure TopN DatanodeNetworkCounts

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845120#comment-17845120
 ] 

ASF GitHub Bot commented on HDFS-16993:
---

hadoop-yetus commented on PR #5597:
URL: https://github.com/apache/hadoop/pull/5597#issuecomment-2103362606

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 02s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 00s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 00s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m 01s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  | 111m 38s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   8m 16s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   6m 02s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   8m 51s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   7m 24s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 190m 24s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   5m 57s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 21s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   4m 21s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m 00s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-5597/2/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   2m 52s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   5m 18s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   4m 42s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 194m 20s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   6m 32s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 530m 50s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/5597 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | MINGW64_NT-10.0-17763 f08ad7bac7e6 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / ecd02f00aa8adecb6f79a6422b6752d9e711fd60 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-5597/2/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-5597/2/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Datanode supports configure TopN DatanodeNetworkCounts
> --
>
> Key: HDFS-16993
> URL: https://issues.apache.org/jira/browse/HDFS-16993
> Project: Hadoop HDFS
>  Issue Type: Wish
>Affects Versions: 3.3.5
>Reporter: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> In our prod environment, we try to collect datanode metrics every 15s through 
> jmx_exporter.  we found the datanodenetworkerror metric generates a lot.
> for example, if we have a cluster with 1000 datanodes, every datanode may 
> generate 999 datanodenetworkerror metrics, and overall datanodes will 
> generate 1000 multiple 999 = 999000 metrics. This is a very expensive 
> operation. In most scenarios, we only need the topN of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17276) The nn fetch editlog forbidden in kerberos environment

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845069#comment-17845069
 ] 

ASF GitHub Bot commented on HDFS-17276:
---

hadoop-yetus commented on PR #6326:
URL: https://github.com/apache/hadoop/pull/6326#issuecomment-2103056076

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 02s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 00s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  | 127m 17s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  10m 24s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   7m 33s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |  10m 26s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   9m 20s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 212m 35s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   6m 58s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 19s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   5m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 00s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   3m 45s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   6m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   5m 14s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 226m 54s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   8m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 609m 15s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6326 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 4cdb66d2f064 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 74263653dbc9a16564d41d2a8bcd975d47d5d93f |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6326/2/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6326/2/console
 |
   | versions | git=2.44.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> The nn fetch editlog forbidden in kerberos environment
> --
>
> Key: HDFS-17276
> URL: https://issues.apache.org/jira/browse/HDFS-17276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: qjm, security
>Affects Versions: 3.3.5, 3.3.6
>Reporter: kuper
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-12-06-20-21-03-557.png, 
> image-2023-12-06-20-21-46-825.png
>
>
> * In a Kerberos environment, the namenode cannot fetch editlog from 
> journalnode because the request is rejected (403).  
> !image-2023-12-06-20-21-03-557.png!
>  * GetJournalEditServlet checks if the request's username meets the 
> requirements through the isValidRequestor function. After HDFS-16686 is 
> merged, remotePrincipal becomes ugi.getUserName().
>  * In a Kerberos environment, ugi.getUserName() gets the 
> request.getRemoteUser() via DfsServlet's getUGI to get the username, and this 
> username is not a full name.
>  * Therefore, the obtained username is similar to namenode01 instead of 
> namenode01/hos...@realm.tld, which meansit fails to pass the isValidRequestor 
> check.  !image-2023-12-06-20-21-46-825.png!
> *reproduction*
>  * In the TestGetJournalEditServlet add testSecurityRequestNameNode
> {code:java}
> @Test
> public void testSecurityRequestNameNode() throws IOException,

[jira] [Commented] (HDFS-17509) RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file.

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844966#comment-17844966
 ] 

ASF GitHub Bot commented on HDFS-17509:
---

LiuGuH commented on code in PR #6784:
URL: https://github.com/apache/hadoop/pull/6784#discussion_r1595385718


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java:
##
@@ -667,39 +667,28 @@ public void rename2(final String src, final String dst,
   public void concat(String trg, String[] src) throws IOException {
 rpcServer.checkOperation(NameNode.OperationCategory.WRITE);
 
-// See if the src and target files are all in the same namespace
-LocatedBlocks targetBlocks = getBlockLocations(trg, 0, 1);
-if (targetBlocks == null) {
-  throw new IOException("Cannot locate blocks for target file - " + trg);
-}
-LocatedBlock lastLocatedBlock = targetBlocks.getLastLocatedBlock();
-String targetBlockPoolId = lastLocatedBlock.getBlock().getBlockPoolId();
-for (String source : src) {
-  LocatedBlocks sourceBlocks = getBlockLocations(source, 0, 1);
-  if (sourceBlocks == null) {
-throw new IOException(
-"Cannot located blocks for source file " + source);
-  }
-  String sourceBlockPoolId =
-  sourceBlocks.getLastLocatedBlock().getBlock().getBlockPoolId();
-  if (!sourceBlockPoolId.equals(targetBlockPoolId)) {
-throw new IOException("Cannot concatenate source file " + source
-+ " because it is located in a different namespace"
-+ " with block pool id " + sourceBlockPoolId
-+ " from the target file with block pool id "
-+ targetBlockPoolId);
-  }
-}
+// Concat only effects when all files in same namespace.
+// And in router view, a file only exists in one RemoteLocation.

Review Comment:
   Emmm, there is a scene .
   
   
   1) If a file is already exist in two nameservices. And the add router mount.
   NS1 /user/test/file
   NS2/user/test/file
   
   2)  Add router mount.
   hdfs dfsrouteradmin -add /user/test NS1,NS2 /user/test  -order RANDOM
   
   3)   getDestination
   hdfs dfsrouteradmin -getDestination /user/test/file 
   Will return  NS1,NS2
   
   For a file in Router view with more than one nameservices, I think should 
thrown Exception for concat method.  
Look forward to your guidance , thanks ! @ZanderXu 
   





> RBF: Fix ClientProtocol.concat  will throw NPE if tgr is a empty file.
> --
>
> Key: HDFS-17509
> URL: https://issues.apache.org/jira/browse/HDFS-17509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liuguanghua
>Priority: Minor
>  Labels: pull-request-available
>
> hdfs dfs -concat  /tmp/merge /tmp/t1 /tmp/t2
> When /tmp/merge is a empty file, this command will throw NPE via DFSRouter. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17509) RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file.

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844945#comment-17844945
 ] 

ASF GitHub Bot commented on HDFS-17509:
---

LiuGuH commented on code in PR #6784:
URL: https://github.com/apache/hadoop/pull/6784#discussion_r1595273510


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java:
##
@@ -667,39 +667,28 @@ public void rename2(final String src, final String dst,
   public void concat(String trg, String[] src) throws IOException {
 rpcServer.checkOperation(NameNode.OperationCategory.WRITE);
 
-// See if the src and target files are all in the same namespace
-LocatedBlocks targetBlocks = getBlockLocations(trg, 0, 1);
-if (targetBlocks == null) {
-  throw new IOException("Cannot locate blocks for target file - " + trg);
-}
-LocatedBlock lastLocatedBlock = targetBlocks.getLastLocatedBlock();
-String targetBlockPoolId = lastLocatedBlock.getBlock().getBlockPoolId();
-for (String source : src) {
-  LocatedBlocks sourceBlocks = getBlockLocations(source, 0, 1);
-  if (sourceBlocks == null) {
-throw new IOException(
-"Cannot located blocks for source file " + source);
-  }
-  String sourceBlockPoolId =
-  sourceBlocks.getLastLocatedBlock().getBlock().getBlockPoolId();
-  if (!sourceBlockPoolId.equals(targetBlockPoolId)) {
-throw new IOException("Cannot concatenate source file " + source
-+ " because it is located in a different namespace"
-+ " with block pool id " + sourceBlockPoolId
-+ " from the target file with block pool id "
-+ targetBlockPoolId);
-  }
-}
+// Concat only effects when all files in same namespace.
+// And in router view, a file only exists in one RemoteLocation.

Review Comment:
   
![image](https://github.com/apache/hadoop/assets/6347715/f787e314-6f5e-41b2-9361-df5564e9fe52)
   DFSRouter with different order(HASH,LOCAL, RANDOM, HASH_ALL, SPACE) , a file 
cannot be written into two or more nameservices. 





> RBF: Fix ClientProtocol.concat  will throw NPE if tgr is a empty file.
> --
>
> Key: HDFS-17509
> URL: https://issues.apache.org/jira/browse/HDFS-17509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liuguanghua
>Priority: Minor
>  Labels: pull-request-available
>
> hdfs dfs -concat  /tmp/merge /tmp/t1 /tmp/t2
> When /tmp/merge is a empty file, this command will throw NPE via DFSRouter. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17509) RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file.

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844944#comment-17844944
 ] 

ASF GitHub Bot commented on HDFS-17509:
---

LiuGuH commented on code in PR #6784:
URL: https://github.com/apache/hadoop/pull/6784#discussion_r1595269674


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java:
##
@@ -667,39 +667,28 @@ public void rename2(final String src, final String dst,
   public void concat(String trg, String[] src) throws IOException {
 rpcServer.checkOperation(NameNode.OperationCategory.WRITE);
 
-// See if the src and target files are all in the same namespace
-LocatedBlocks targetBlocks = getBlockLocations(trg, 0, 1);
-if (targetBlocks == null) {
-  throw new IOException("Cannot locate blocks for target file - " + trg);
-}
-LocatedBlock lastLocatedBlock = targetBlocks.getLastLocatedBlock();
-String targetBlockPoolId = lastLocatedBlock.getBlock().getBlockPoolId();
-for (String source : src) {
-  LocatedBlocks sourceBlocks = getBlockLocations(source, 0, 1);
-  if (sourceBlocks == null) {
-throw new IOException(
-"Cannot located blocks for source file " + source);
-  }
-  String sourceBlockPoolId =
-  sourceBlocks.getLastLocatedBlock().getBlock().getBlockPoolId();
-  if (!sourceBlockPoolId.equals(targetBlockPoolId)) {
-throw new IOException("Cannot concatenate source file " + source
-+ " because it is located in a different namespace"
-+ " with block pool id " + sourceBlockPoolId
-+ " from the target file with block pool id "
-+ targetBlockPoolId);
-  }
-}
+// Concat only effects when all files in same namespace.
+// And in router view, a file only exists in one RemoteLocation.

Review Comment:
   Thanks for review.
   
   > you can refer to this case:
   > 
   > * /path mounts to NS1, NS2 and NS3
   > * NS2 and NS3 contains /path
   > * `OrderedResolver` returns NS1, NS2 and NS3
   > 
   > for this case, we should proxy this `concat` to NS2 instead of NS1, right?
   
   This only happens when /path is a directory. And for a file ,it must only be 
from one exactly nameservice via dfsrouter.
   
   





> RBF: Fix ClientProtocol.concat  will throw NPE if tgr is a empty file.
> --
>
> Key: HDFS-17509
> URL: https://issues.apache.org/jira/browse/HDFS-17509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liuguanghua
>Priority: Minor
>  Labels: pull-request-available
>
> hdfs dfs -concat  /tmp/merge /tmp/t1 /tmp/t2
> When /tmp/merge is a empty file, this command will throw NPE via DFSRouter. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-17517) [FGL] Abstract lock mode to cover all RPCs

2024-05-09 Thread ZanderXu (Jira)

ZanderXu created HDFS-17517:
---

 Summary: [FGL] Abstract lock mode to cover all RPCs
 Key: HDFS-17517
 URL: https://issues.apache.org/jira/browse/HDFS-17517
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


There are many RPCs in NameNode. Different RPCs have different process logic 
for the input path, such as: create、mkdir、getFileInfo.

Here we should abstract some of the locking modes used by resolvePath to cover 
all these RPCs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17481) [FGL] ComputeFileSize in INodeFile should avoid BM lock

2024-05-09 Thread ZanderXu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17481:

Parent Issue: HDFS-17366  (was: HDFS-17385)

> [FGL] ComputeFileSize in INodeFile should avoid BM lock
> ---
>
> Key: HDFS-17481
> URL: https://issues.apache.org/jira/browse/HDFS-17481
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>
> GetFileInfo is a very common used operations. The result FileStatus contains 
> the file size, which is computed by computeFileSize in InodeFile.
>  
> Since it involves the states of block, it needs to hold the BM readLock. But 
> holding the BM readLock will impact the performance, since getFileInfo is 
> very common used RPC.
>  
> As we all known, block size may be changed only by the following operations:
>  # fsync
>  # updatePipeline
>  # commit block
>  # commitBlockSynchronization
>  # truncate
>  # updates block in FSEditLogLoader
>  # forceCompleteBlock in FSEditLogLoader
>  
> All these above operations will be locked by the directory tree or global 
> write lock, so computeFileSize can be handled without holding BM Lock.
> But the BR and IBM may change the block status from COMMIT to COMPLETE even 
> through the block size is not changed. So computeFileSize cannot using 
> isComplete to compute the size of last block.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17509) RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file.

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844943#comment-17844943
 ] 

ASF GitHub Bot commented on HDFS-17509:
---

ZanderXu commented on code in PR #6784:
URL: https://github.com/apache/hadoop/pull/6784#discussion_r1595242159


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java:
##
@@ -667,39 +667,28 @@ public void rename2(final String src, final String dst,
   public void concat(String trg, String[] src) throws IOException {
 rpcServer.checkOperation(NameNode.OperationCategory.WRITE);
 
-// See if the src and target files are all in the same namespace
-LocatedBlocks targetBlocks = getBlockLocations(trg, 0, 1);
-if (targetBlocks == null) {
-  throw new IOException("Cannot locate blocks for target file - " + trg);
-}
-LocatedBlock lastLocatedBlock = targetBlocks.getLastLocatedBlock();
-String targetBlockPoolId = lastLocatedBlock.getBlock().getBlockPoolId();
-for (String source : src) {
-  LocatedBlocks sourceBlocks = getBlockLocations(source, 0, 1);
-  if (sourceBlocks == null) {
-throw new IOException(
-"Cannot located blocks for source file " + source);
-  }
-  String sourceBlockPoolId =
-  sourceBlocks.getLastLocatedBlock().getBlock().getBlockPoolId();
-  if (!sourceBlockPoolId.equals(targetBlockPoolId)) {
-throw new IOException("Cannot concatenate source file " + source
-+ " because it is located in a different namespace"
-+ " with block pool id " + sourceBlockPoolId
-+ " from the target file with block pool id "
-+ targetBlockPoolId);
-  }
-}
+// Concat only effects when all files in same namespace.
+// And in router view, a file only exists in one RemoteLocation.

Review Comment:
   Normally a file only exits in one Namespace. But maybe multiple namespaces 
contain this file and RBF returns the file in the first namespace to the client.
   
   This first namespace is got by the `OrderedResolver`.
   
   you can refer to this case:
   
   - /path mounts to NS1, NS2 and NS3
   - NS2 and NS3 contains /path
   - `OrderedResolver` returns NS1, NS2 and NS3
   
   for this case, we should proxy this `concat` to NS2 instead of NS1, right?





> RBF: Fix ClientProtocol.concat  will throw NPE if tgr is a empty file.
> --
>
> Key: HDFS-17509
> URL: https://issues.apache.org/jira/browse/HDFS-17509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liuguanghua
>Priority: Minor
>  Labels: pull-request-available
>
> hdfs dfs -concat  /tmp/merge /tmp/t1 /tmp/t2
> When /tmp/merge is a empty file, this command will throw NPE via DFSRouter. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17509) RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file.

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844940#comment-17844940
 ] 

ASF GitHub Bot commented on HDFS-17509:
---

ZanderXu commented on code in PR #6784:
URL: https://github.com/apache/hadoop/pull/6784#discussion_r1595242159


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java:
##
@@ -667,39 +667,28 @@ public void rename2(final String src, final String dst,
   public void concat(String trg, String[] src) throws IOException {
 rpcServer.checkOperation(NameNode.OperationCategory.WRITE);
 
-// See if the src and target files are all in the same namespace
-LocatedBlocks targetBlocks = getBlockLocations(trg, 0, 1);
-if (targetBlocks == null) {
-  throw new IOException("Cannot locate blocks for target file - " + trg);
-}
-LocatedBlock lastLocatedBlock = targetBlocks.getLastLocatedBlock();
-String targetBlockPoolId = lastLocatedBlock.getBlock().getBlockPoolId();
-for (String source : src) {
-  LocatedBlocks sourceBlocks = getBlockLocations(source, 0, 1);
-  if (sourceBlocks == null) {
-throw new IOException(
-"Cannot located blocks for source file " + source);
-  }
-  String sourceBlockPoolId =
-  sourceBlocks.getLastLocatedBlock().getBlock().getBlockPoolId();
-  if (!sourceBlockPoolId.equals(targetBlockPoolId)) {
-throw new IOException("Cannot concatenate source file " + source
-+ " because it is located in a different namespace"
-+ " with block pool id " + sourceBlockPoolId
-+ " from the target file with block pool id "
-+ targetBlockPoolId);
-  }
-}
+// Concat only effects when all files in same namespace.
+// And in router view, a file only exists in one RemoteLocation.

Review Comment:
   Normally a file only exits in one Namespace. But maybe multiple namespaces 
contain this file and RBF returns the file in the first namespace to the client.



##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java:
##
@@ -667,39 +667,28 @@ public void rename2(final String src, final String dst,
   public void concat(String trg, String[] src) throws IOException {
 rpcServer.checkOperation(NameNode.OperationCategory.WRITE);
 
-// See if the src and target files are all in the same namespace
-LocatedBlocks targetBlocks = getBlockLocations(trg, 0, 1);
-if (targetBlocks == null) {
-  throw new IOException("Cannot locate blocks for target file - " + trg);
-}
-LocatedBlock lastLocatedBlock = targetBlocks.getLastLocatedBlock();
-String targetBlockPoolId = lastLocatedBlock.getBlock().getBlockPoolId();
-for (String source : src) {
-  LocatedBlocks sourceBlocks = getBlockLocations(source, 0, 1);
-  if (sourceBlocks == null) {
-throw new IOException(
-"Cannot located blocks for source file " + source);
-  }
-  String sourceBlockPoolId =
-  sourceBlocks.getLastLocatedBlock().getBlock().getBlockPoolId();
-  if (!sourceBlockPoolId.equals(targetBlockPoolId)) {
-throw new IOException("Cannot concatenate source file " + source
-+ " because it is located in a different namespace"
-+ " with block pool id " + sourceBlockPoolId
-+ " from the target file with block pool id "
-+ targetBlockPoolId);
-  }
-}
+// Concat only effects when all files in same namespace.
+// And in router view, a file only exists in one RemoteLocation.

Review Comment:
   For the empty file, maybe we can use `getFileInfo` to get the namespace that 
this trg belongs to.



##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRpc.java:
##
@@ -1224,6 +1224,17 @@ public void testProxyConcatFile() throws Exception {
 String badPath = "/unknownlocation/unknowndir";
 compareResponses(routerProtocol, nnProtocol, m,
 new Object[] {badPath, new String[] {routerFile}});
+
+// Test when concat trg is a empty file

Review Comment:
   we also need to check the empty source file.





> RBF: Fix ClientProtocol.concat  will throw NPE if tgr is a empty file.
> --
>
> Key: HDFS-17509
> URL: https://issues.apache.org/jira/browse/HDFS-17509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liuguanghua
>Priority: Minor
>  Labels: pull-request-available
>
> hdfs dfs -concat  /tmp/merge /tmp/t1 /tmp/t2
> When /tmp/merge is a empty file, this command will throw NPE via DFSRouter. 
>  
>  



--
This message was sent by Atla

[jira] [Commented] (HDFS-17515) Erasure Coding: ErasureCodingWork is not effectively limited during a block reconstruction cycle.

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844927#comment-17844927
 ] 

ASF GitHub Bot commented on HDFS-17515:
---

zhengchenyu commented on PR #6805:
URL: https://github.com/apache/hadoop/pull/6805#issuecomment-2102237952

   Not ready for review! Wait for HDFS-17516!




> Erasure Coding: ErasureCodingWork is not effectively limited during a block 
> reconstruction cycle.
> -
>
> Key: HDFS-17515
> URL: https://issues.apache.org/jira/browse/HDFS-17515
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
>  Labels: pull-request-available
>
> In a block reconstruction cycle, ErasureCodingWork is not effectively 
> limited. I add some debug log, log when ecBlocksToBeReplicated is an integer 
> multiple of 100.
> {code:java}
> 2024-05-09 10:46:06,986 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManagerZCY: 
> ecBlocksToBeReplicated for IP:PORT already have 100 blocks
> 2024-05-09 10:46:06,987 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManagerZCY: 
> ecBlocksToBeReplicated for IP:PORT already have 200 blocks
> ...
> 2024-05-09 10:46:06,992 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManagerZCY: 
> ecBlocksToBeReplicated for IP:PORT already have 2000 blocks
> 2024-05-09 10:46:06,992 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManagerZCY: 
> ecBlocksToBeReplicated for IP:PORT already have 2100 blocks {code}
> During a block reconstruction cycle, ecBlocksToBeReplicated increases from 0 
> to 2100, This is much larger than replicationStreamsHardLimit. This brings 
> unfairness and leads to a greater tendency to copy EC blocks.
> In fact, for non ec block, this is not a problem. 
> pendingReplicationWithoutTargets increase when schedule work. When 
> pendingReplicationWithoutTargets is too big, will not schedule work for this 
> node.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17515) Erasure Coding: ErasureCodingWork is not effectively limited during a block reconstruction cycle.

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844928#comment-17844928
 ] 

ASF GitHub Bot commented on HDFS-17515:
---

zhengchenyu closed pull request #6805: HDFS-17515. Erasure Coding: 
ErasureCodingWork is not effectively limi…
URL: https://github.com/apache/hadoop/pull/6805




> Erasure Coding: ErasureCodingWork is not effectively limited during a block 
> reconstruction cycle.
> -
>
> Key: HDFS-17515
> URL: https://issues.apache.org/jira/browse/HDFS-17515
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
>  Labels: pull-request-available
>
> In a block reconstruction cycle, ErasureCodingWork is not effectively 
> limited. I add some debug log, log when ecBlocksToBeReplicated is an integer 
> multiple of 100.
> {code:java}
> 2024-05-09 10:46:06,986 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManagerZCY: 
> ecBlocksToBeReplicated for IP:PORT already have 100 blocks
> 2024-05-09 10:46:06,987 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManagerZCY: 
> ecBlocksToBeReplicated for IP:PORT already have 200 blocks
> ...
> 2024-05-09 10:46:06,992 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManagerZCY: 
> ecBlocksToBeReplicated for IP:PORT already have 2000 blocks
> 2024-05-09 10:46:06,992 DEBUG 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManagerZCY: 
> ecBlocksToBeReplicated for IP:PORT already have 2100 blocks {code}
> During a block reconstruction cycle, ecBlocksToBeReplicated increases from 0 
> to 2100, This is much larger than replicationStreamsHardLimit. This brings 
> unfairness and leads to a greater tendency to copy EC blocks.
> In fact, for non ec block, this is not a problem. 
> pendingReplicationWithoutTargets increase when schedule work. When 
> pendingReplicationWithoutTargets is too big, will not schedule work for this 
> node.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17506) [FGL] Performance for phase 1

2024-05-09 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17506:
--
Labels: pull-request-available  (was: )

> [FGL] Performance for phase 1
> -
>
> Key: HDFS-17506
> URL: https://issues.apache.org/jira/browse/HDFS-17506
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> Do some benchmark testing for phase 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17516) Erasure Coding: Some reconstruction blocks and metrics are inaccuracy when decommission DN which contains many EC blocks.

2024-05-09 Thread Chenyu Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chenyu Zheng updated HDFS-17516:

Description: 
When decommission DN  which contains many EC blocks, this DN will mark as busy 
by scheduleReconstruction, then ErasureCodingWork::addTaskToDatanode will not 
generate any block to ecBlocksToBeReplicated. 
Although no DNA_TRANSFER BlockCommand will be generated for this block, 
pendingReconstruction and neededReconstruction are still updated, and 
blockmanager mistakenly believes that the block is being copied.
The periodic increases of Metrics 
`fs_namesystem_num_timed_out_pending_reconstructions` and 
`fs_namesystem_under_replicated_blocks` also prove this. In fact, many blocks 
are not actually copied. These blocks are re-added to neededReconstruction 
until they time out.
!截屏2024-05-09 下午3.59.44.png|width=470,height=160!!截屏2024-05-09 
下午3.59.22.png|width=465,height=160!
 

  was:
When decommission DN  which contains many EC blocks, this DN will mark as busy 
by 
scheduleReconstruction, then ErasureCodingWork::addTaskToDatanode will not 
generate any block to ecBlocksToBeReplicated. 
Although no DNA_TRANSFER BlockCommand will be generated for this block, 
pendingReconstruction and neededReconstruction are still updated, and 
blockmanager mistakenly believes that the block is being copied.
The periodic increases of Metrics 
`fs_namesystem_num_timed_out_pending_reconstructions` and 
`fs_namesystem_under_replicated_blocks` also prove this. In fact, many blocks 
are not actually copied. These blocks are re-added to neededReconstruction 
until they time out.
!截屏2024-05-09 下午3.59.44.png|width=470,height=160!!截屏2024-05-09 
下午3.59.22.png|width=465,height=160!
 


> Erasure Coding: Some reconstruction blocks and metrics are inaccuracy when 
> decommission DN  which contains many EC blocks.
> --
>
> Key: HDFS-17516
> URL: https://issues.apache.org/jira/browse/HDFS-17516
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
> Attachments: 截屏2024-05-09 下午3.59.22.png, 截屏2024-05-09 下午3.59.44.png
>
>
> When decommission DN  which contains many EC blocks, this DN will mark as 
> busy by scheduleReconstruction, then ErasureCodingWork::addTaskToDatanode 
> will not generate any block to ecBlocksToBeReplicated. 
> Although no DNA_TRANSFER BlockCommand will be generated for this block, 
> pendingReconstruction and neededReconstruction are still updated, and 
> blockmanager mistakenly believes that the block is being copied.
> The periodic increases of Metrics 
> `fs_namesystem_num_timed_out_pending_reconstructions` and 
> `fs_namesystem_under_replicated_blocks` also prove this. In fact, many blocks 
> are not actually copied. These blocks are re-added to neededReconstruction 
> until they time out.
> !截屏2024-05-09 下午3.59.44.png|width=470,height=160!!截屏2024-05-09 
> 下午3.59.22.png|width=465,height=160!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17516) Erasure Coding: Some reconstruction blocks and metrics are inaccuracy when decommission DN which contains many EC blocks.

2024-05-09 Thread Chenyu Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chenyu Zheng updated HDFS-17516:

Description: 
When decommission DN  which contains many EC blocks, this DN will mark as busy 
by 
scheduleReconstruction, then ErasureCodingWork::addTaskToDatanode will not 
generate any block to ecBlocksToBeReplicated. 
Although no DNA_TRANSFER BlockCommand will be generated for this block, 
pendingReconstruction and neededReconstruction are still updated, and 
blockmanager mistakenly believes that the block is being copied.
The periodic increases of Metrics 
`fs_namesystem_num_timed_out_pending_reconstructions` and 
`fs_namesystem_under_replicated_blocks` also prove this. In fact, many blocks 
are not actually copied. These blocks are re-added to neededReconstruction 
until they time out.
!截屏2024-05-09 下午3.59.44.png|width=470,height=160!!截屏2024-05-09 
下午3.59.22.png|width=465,height=160!
 

  was:
When decommission DN  which contains many EC blocks, this DN will mark as busy 
by 
scheduleReconstruction, then ErasureCodingWork::addTaskToDatanode will not 
generate any block to ecBlocksToBeReplicated. 
Although no DNA_TRANSFER BlockCommand will be generated for this block, 
pendingReconstruction and neededReconstruction are still updated, and 
blockmanager mistakenly believes that the block is being copied.
The periodic increases of Metrics 
`fs_namesystem_num_timed_out_pending_reconstructions` and 
`fs_namesystem_under_replicated_blocks` also prove this. In fact, many blocks 
are not actually copied. These blocks are re-added to neededReconstruction 
until they time out.
!截屏2024-05-09 下午3.59.44.png!!截屏2024-05-09 下午3.59.22.png!
 


> Erasure Coding: Some reconstruction blocks and metrics are inaccuracy when 
> decommission DN  which contains many EC blocks.
> --
>
> Key: HDFS-17516
> URL: https://issues.apache.org/jira/browse/HDFS-17516
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
> Attachments: 截屏2024-05-09 下午3.59.22.png, 截屏2024-05-09 下午3.59.44.png
>
>
> When decommission DN  which contains many EC blocks, this DN will mark as 
> busy by 
> scheduleReconstruction, then ErasureCodingWork::addTaskToDatanode will not 
> generate any block to ecBlocksToBeReplicated. 
> Although no DNA_TRANSFER BlockCommand will be generated for this block, 
> pendingReconstruction and neededReconstruction are still updated, and 
> blockmanager mistakenly believes that the block is being copied.
> The periodic increases of Metrics 
> `fs_namesystem_num_timed_out_pending_reconstructions` and 
> `fs_namesystem_under_replicated_blocks` also prove this. In fact, many blocks 
> are not actually copied. These blocks are re-added to neededReconstruction 
> until they time out.
> !截屏2024-05-09 下午3.59.44.png|width=470,height=160!!截屏2024-05-09 
> 下午3.59.22.png|width=465,height=160!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-17516) Erasure Coding: Some reconstruction blocks and metrics are inaccuracy when decommission DN which contains many EC blocks.

2024-05-09 Thread Chenyu Zheng (Jira)

Chenyu Zheng created HDFS-17516:
---

 Summary: Erasure Coding: Some reconstruction blocks and metrics 
are inaccuracy when decommission DN  which contains many EC blocks.
 Key: HDFS-17516
 URL: https://issues.apache.org/jira/browse/HDFS-17516
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chenyu Zheng
Assignee: Chenyu Zheng
 Attachments: 截屏2024-05-09 下午3.59.22.png, 截屏2024-05-09 下午3.59.44.png

When decommission DN  which contains many EC blocks, this DN will mark as busy 
by 
scheduleReconstruction, then ErasureCodingWork::addTaskToDatanode will not 
generate any block to ecBlocksToBeReplicated. 
Although no DNA_TRANSFER BlockCommand will be generated for this block, 
pendingReconstruction and neededReconstruction are still updated, and 
blockmanager mistakenly believes that the block is being copied.
The periodic increases of Metrics 
`fs_namesystem_num_timed_out_pending_reconstructions` and 
`fs_namesystem_under_replicated_blocks` also prove this. In fact, many blocks 
are not actually copied. These blocks are re-added to neededReconstruction 
until they time out.
!截屏2024-05-09 下午3.59.44.png!!截屏2024-05-09 下午3.59.22.png!
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-17402) StartupSafeMode should not exit when resources are from low to available

2024-05-09 Thread Zilong Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zilong Zhu reassigned HDFS-17402:
-

Assignee: Zilong Zhu

> StartupSafeMode should not exit when resources are from low to available
> 
>
> Key: HDFS-17402
> URL: https://issues.apache.org/jira/browse/HDFS-17402
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
>
> After HDFS-17231, NameNode can exit safemode automatically when resources are 
> from low to available. It used 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem#leaveSafeMode, this 
> function will change BMSafeModeStatus. However, NameNode entering resource 
> low safe mode doesn't change BMSafeModeStatus in 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem#enterSafeMode. This is 
> not equal
> Now:
> a. NN enter StartupSafeMode
> b. NN enter ResourceLowSafeMode
> c. NN resources from low to available
> d. NN safemode off
>  
> Expectations：
> a. NN enter StartupSafeMode
> b. NN enter ResourceLowSafeMode
> c. NN resources from low to available
> d. NN exit ResourceLowSafeMode but in StartupSafeMode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17503) Unreleased volume references because of OOM

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844895#comment-17844895
 ] 

ASF GitHub Bot commented on HDFS-17503:
---

zhuzilong2013 commented on PR #6782:
URL: https://github.com/apache/hadoop/pull/6782#issuecomment-2102119137

   @Hexiaoqiao Hi~ sir. Could you please help me review this PR when you are 
free? Thanks.




> Unreleased volume references because of OOM
> ---
>
> Key: HDFS-17503
> URL: https://issues.apache.org/jira/browse/HDFS-17503
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
>
> When BlockSender throws an error because of OOM，the volume reference obtained 
> by the thread is not released，which causes the thread trying to remove the 
> volume to wait and fall into an infinite loop.
> I found HDFS-15963 catched exception and release volume reference. But it did 
> not handle the case of throwing errors. I think "catch (Throwable t)" should 
> be used instead of "catch (IOException ioe)".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17504) DN process should exit when BPServiceActor exit

2024-05-09 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844894#comment-17844894
 ] 

ASF GitHub Bot commented on HDFS-17504:
---

zhuzilong2013 commented on PR #6792:
URL: https://github.com/apache/hadoop/pull/6792#issuecomment-2102118561

   @Hexiaoqiao Hi~ sir. Could you please help me review this PR when you are 
free? Thanks.




> DN process should exit when BPServiceActor exit
> ---
>
> Key: HDFS-17504
> URL: https://issues.apache.org/jira/browse/HDFS-17504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
>
> BPServiceActor is a very important thread. In a non-HA cluster, the exit of 
> the BPServiceActor thread will cause the DN process to exit. However, in a HA 
> cluster, this is not the case.
> I found HDFS-15651 causes BPServiceActor thread to exit and sets the 
> "runningState" from "RunningState.FAILED" to "RunningState.EXITED",  it can 
> be confusing during troubleshooting.
> I believe that the DN process should exit when the flag of the BPServiceActor 
> is set to RunningState.FAILED because at this point, the DN is unable to 
> recover and establish a heartbeat connection with the ANN on its own.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] (HDFS-16954) RBF: The operation of renaming a multi-subcluster directory to a single-cluster directory should throw ioexception

2024-05-09 Thread chuanjie.duan (Jira)



[ https://issues.apache.org/jira/browse/HDFS-16954 ]


chuanjie.duan deleted comment on HDFS-16954:
--

was (Author: chuanjie.duan):
if srcdir and distdir are same (single mount point or mult mount point), is 
that allowd to rename?

> RBF: The operation of renaming a multi-subcluster directory to a 
> single-cluster directory should throw ioexception
> --
>
> Key: HDFS-16954
> URL: https://issues.apache.org/jira/browse/HDFS-16954
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The operation of renaming a multi-subcluster directory to a single-cluster 
> directory may cause inconsistent behavior of the file system. This operation 
> should throw exception to be reasonable.
> Examples are as follows:
> 1. add  hash_all mount point   `hdfs dfsrouteradmin -add /tmp/foo 
> subcluster1,subcluster2  /tmp/foo -order HASH_ALL`
> 2. add   mount point   `hdfs dfsrouteradmin -add /user/foo subcluster1 
> /user/foo`
> 3. mkdir dir for all subcluster.  ` hdfs dfs -mkdir /tmp/foo/123 `
> 4. check dir and all subclusters will have dir `/tmp/foo/123`
> `hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`;
> `hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will show dir 
> `hdfs://subcluster1/tmp/foo/123`;
> `hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir 
> `hdfs://subcluster2/tmp/foo/123`;
> 5. rename `/tmp/foo/123` to `/user/foo/123`. The op will succeed. `hdfs dfs 
> -mv /tmp/foo/123 /user/foo/123 `
> 6. check dir again, rbf cluster still show dir `/tmp/foo/123`
> `hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`;
> `hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will no dirs;
> `hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir 
> `hdfs://subcluster2/tmp/foo/123`;
> The step 5 should throw exception.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-16954) RBF: The operation of renaming a multi-subcluster directory to a single-cluster directory should throw ioexception

2024-05-09 Thread chuanjie.duan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844893#comment-17844893
 ] 

chuanjie.duan edited comment on HDFS-16954 at 5/9/24 7:32 AM:
--

if srcdir and distdir are same (single mount point or mult mount point), is 
that allowd to rename?


was (Author: chuanjie.duan):
if srcdir and distdir is same (single mount point or mult mount point), is that 
allowd to rename?

> RBF: The operation of renaming a multi-subcluster directory to a 
> single-cluster directory should throw ioexception
> --
>
> Key: HDFS-16954
> URL: https://issues.apache.org/jira/browse/HDFS-16954
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The operation of renaming a multi-subcluster directory to a single-cluster 
> directory may cause inconsistent behavior of the file system. This operation 
> should throw exception to be reasonable.
> Examples are as follows:
> 1. add  hash_all mount point   `hdfs dfsrouteradmin -add /tmp/foo 
> subcluster1,subcluster2  /tmp/foo -order HASH_ALL`
> 2. add   mount point   `hdfs dfsrouteradmin -add /user/foo subcluster1 
> /user/foo`
> 3. mkdir dir for all subcluster.  ` hdfs dfs -mkdir /tmp/foo/123 `
> 4. check dir and all subclusters will have dir `/tmp/foo/123`
> `hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`;
> `hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will show dir 
> `hdfs://subcluster1/tmp/foo/123`;
> `hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir 
> `hdfs://subcluster2/tmp/foo/123`;
> 5. rename `/tmp/foo/123` to `/user/foo/123`. The op will succeed. `hdfs dfs 
> -mv /tmp/foo/123 /user/foo/123 `
> 6. check dir again, rbf cluster still show dir `/tmp/foo/123`
> `hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`;
> `hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will no dirs;
> `hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir 
> `hdfs://subcluster2/tmp/foo/123`;
> The step 5 should throw exception.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16954) RBF: The operation of renaming a multi-subcluster directory to a single-cluster directory should throw ioexception

2024-05-09 Thread chuanjie.duan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844893#comment-17844893
 ] 

chuanjie.duan commented on HDFS-16954:
--

if srcdir and distdir is same (single mount point or mult mount point), is that 
allowd to rename?

> RBF: The operation of renaming a multi-subcluster directory to a 
> single-cluster directory should throw ioexception
> --
>
> Key: HDFS-16954
> URL: https://issues.apache.org/jira/browse/HDFS-16954
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Max  Xie
>Assignee: Max  Xie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The operation of renaming a multi-subcluster directory to a single-cluster 
> directory may cause inconsistent behavior of the file system. This operation 
> should throw exception to be reasonable.
> Examples are as follows:
> 1. add  hash_all mount point   `hdfs dfsrouteradmin -add /tmp/foo 
> subcluster1,subcluster2  /tmp/foo -order HASH_ALL`
> 2. add   mount point   `hdfs dfsrouteradmin -add /user/foo subcluster1 
> /user/foo`
> 3. mkdir dir for all subcluster.  ` hdfs dfs -mkdir /tmp/foo/123 `
> 4. check dir and all subclusters will have dir `/tmp/foo/123`
> `hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`;
> `hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will show dir 
> `hdfs://subcluster1/tmp/foo/123`;
> `hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir 
> `hdfs://subcluster2/tmp/foo/123`;
> 5. rename `/tmp/foo/123` to `/user/foo/123`. The op will succeed. `hdfs dfs 
> -mv /tmp/foo/123 /user/foo/123 `
> 6. check dir again, rbf cluster still show dir `/tmp/foo/123`
> `hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`;
> `hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will no dirs;
> `hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir 
> `hdfs://subcluster2/tmp/foo/123`;
> The step 5 should throw exception.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

39 matches

Mail list logo