[jira] [Updated] (HDFS-17325) Doc: Fix the documentation of fs expunge command in FileSystemShell.md

2024-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17325:
--
Labels: pull-request-available  (was: )

> Doc: Fix the documentation of fs expunge command in FileSystemShell.md
> --
>
> Key: HDFS-17325
> URL: https://issues.apache.org/jira/browse/HDFS-17325
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17325) Doc: Fix the documentation of fs expunge command in FileSystemShell.md

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803445#comment-17803445
 ] 

ASF GitHub Bot commented on HDFS-17325:
---

LiuGuH opened a new pull request, #6413:
URL: https://github.com/apache/hadoop/pull/6413

   
   
   ### Description of PR
   hadoop fs -expunge --immediate   should be hadoop fs -expunge -immediate




> Doc: Fix the documentation of fs expunge command in FileSystemShell.md
> --
>
> Key: HDFS-17325
> URL: https://issues.apache.org/jira/browse/HDFS-17325
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17317) DebugAdmin metaOut not need multiple close

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803442#comment-17803442
 ] 

ASF GitHub Bot commented on HDFS-17317:
---

xuzifu666 commented on PR #6402:
URL: https://github.com/apache/hadoop/pull/6402#issuecomment-1878265631

   > there is some release branching going out, let me recircle once that is 
done & I will merge this
   
   OK,other release branch testing seems not in github CI process




> DebugAdmin metaOut not  need multiple close
> ---
>
> Key: HDFS-17317
> URL: https://issues.apache.org/jira/browse/HDFS-17317
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: xy
>Priority: Major
>  Labels: pull-request-available
>
> DebugAdmin metaOut not  need multiple close



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17325) Doc: Fix the documentation of fs expunge command in FileSystemShell.md

2024-01-04 Thread liuguanghua (Jira)
liuguanghua created HDFS-17325:
--

 Summary: Doc: Fix the documentation of fs expunge command in 
FileSystemShell.md
 Key: HDFS-17325
 URL: https://issues.apache.org/jira/browse/HDFS-17325
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: liuguanghua






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17324) RBF: Router should not return nameservices that not enable observer read in RpcResponseHeaderProto

2024-01-04 Thread liuguanghua (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuguanghua reassigned HDFS-17324:
--

Assignee: liuguanghua

> RBF: Router should not return nameservices that not enable observer read in 
> RpcResponseHeaderProto
> --
>
> Key: HDFS-17324
> URL: https://issues.apache.org/jira/browse/HDFS-17324
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>
> {color:#172b4d}Router Observer Read is controled by 
> RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and 
> RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES.{color}
> {color:#172b4d}If nameservice is not enable for observer read in Router, 
> RpcResponseHeaderProto  in Router should not return it.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803438#comment-17803438
 ] 

ASF GitHub Bot commented on HDFS-17306:
---

LiuGuH commented on PR #6385:
URL: https://github.com/apache/hadoop/pull/6385#issuecomment-1878253176

   
   > > Thanks for review . And I will be add a separate pull request for 
DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and DFS_ROUTER_OBSERVER_READ_OVERRIDES 
check.
   
   This is implemented in  https://github.com/apache/hadoop/pull/6412 .  Thanks 
for review  @simbadzina  




> RBF:Router should not return nameservices that does not enable observer nodes 
> in RpcResponseHeaderProto
> ---
>
> Key: HDFS-17306
> URL: https://issues.apache.org/jira/browse/HDFS-17306
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
>       If a cluster has 3 nameservices: ns1, ns2,ns3, and  ns1 has observer 
> nodes, and client via DFSRouter comminutes with nns.
>       If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable,  the client will 
> receive all nameservices in RpcResponseHeaderProto. 
>        We should reduce rpc response size if nameservices don't enable 
> observer nodes.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17324) RBF: Router should not return nameservices that not enable observer read in RpcResponseHeaderProto

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803437#comment-17803437
 ] 

ASF GitHub Bot commented on HDFS-17324:
---

LiuGuH opened a new pull request, #6412:
URL: https://github.com/apache/hadoop/pull/6412

   …ead in RpcResponseHeaderProto
   
   
   
   ### Description of PR
   Router Observer Read is controled by 
RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and 
RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES.
   
   If a nameservice is not enable for observer read in Router, 
RpcResponseHeaderProto  in Router should not return it.
   
   




> RBF: Router should not return nameservices that not enable observer read in 
> RpcResponseHeaderProto
> --
>
> Key: HDFS-17324
> URL: https://issues.apache.org/jira/browse/HDFS-17324
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Priority: Major
>
> {color:#172b4d}Router Observer Read is controled by 
> RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and 
> RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES.{color}
> {color:#172b4d}If nameservice is not enable for observer read in Router, 
> RpcResponseHeaderProto  in Router should not return it.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17324) RBF: Router should not return nameservices that not enable observer read in RpcResponseHeaderProto

2024-01-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17324:
--
Labels: pull-request-available  (was: )

> RBF: Router should not return nameservices that not enable observer read in 
> RpcResponseHeaderProto
> --
>
> Key: HDFS-17324
> URL: https://issues.apache.org/jira/browse/HDFS-17324
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>
> {color:#172b4d}Router Observer Read is controled by 
> RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and 
> RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES.{color}
> {color:#172b4d}If nameservice is not enable for observer read in Router, 
> RpcResponseHeaderProto  in Router should not return it.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17324) RBF: Router should not return nameservices that not enable observer read in RpcResponseHeaderProto

2024-01-04 Thread liuguanghua (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuguanghua updated HDFS-17324:
---
Description: 
{color:#172b4d}Router Observer Read is controled by 
RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and 
RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES.{color}

{color:#172b4d}If nameservice is not enable for observer read in Router, 
RpcResponseHeaderProto  in Router should not return it.{color}

  was:
{color:#172b4d}Router Observer Read is controled by 
RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and 
RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES.
{color}

{color:#172b4d}If nameservice is not enable for observer read in router, {color}


> RBF: Router should not return nameservices that not enable observer read in 
> RpcResponseHeaderProto
> --
>
> Key: HDFS-17324
> URL: https://issues.apache.org/jira/browse/HDFS-17324
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Priority: Major
>
> {color:#172b4d}Router Observer Read is controled by 
> RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and 
> RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES.{color}
> {color:#172b4d}If nameservice is not enable for observer read in Router, 
> RpcResponseHeaderProto  in Router should not return it.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17324) RBF: Router should not return nameservices that not enable observer read in RpcResponseHeaderProto

2024-01-04 Thread liuguanghua (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuguanghua updated HDFS-17324:
---
Description: 
{color:#172b4d}Router Observer Read is controled by 
RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and 
RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES.
{color}

{color:#172b4d}If nameservice is not enable for observer read in router, {color}

> RBF: Router should not return nameservices that not enable observer read in 
> RpcResponseHeaderProto
> --
>
> Key: HDFS-17324
> URL: https://issues.apache.org/jira/browse/HDFS-17324
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Priority: Major
>
> {color:#172b4d}Router Observer Read is controled by 
> RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and 
> RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_OVERRIDES.
> {color}
> {color:#172b4d}If nameservice is not enable for observer read in router, 
> {color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17324) RBF: Router should not return nameservices that not enable observer read in RpcResponseHeaderProto

2024-01-04 Thread liuguanghua (Jira)
liuguanghua created HDFS-17324:
--

 Summary: RBF: Router should not return nameservices that not 
enable observer read in RpcResponseHeaderProto
 Key: HDFS-17324
 URL: https://issues.apache.org/jira/browse/HDFS-17324
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: liuguanghua






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17317) DebugAdmin metaOut not need multiple close

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803423#comment-17803423
 ] 

ASF GitHub Bot commented on HDFS-17317:
---

ayushtkn commented on PR #6402:
URL: https://github.com/apache/hadoop/pull/6402#issuecomment-1878194565

   there is some release branching going out, let me recircle once that is done 
& I will merge this




> DebugAdmin metaOut not  need multiple close
> ---
>
> Key: HDFS-17317
> URL: https://issues.apache.org/jira/browse/HDFS-17317
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: xy
>Priority: Major
>  Labels: pull-request-available
>
> DebugAdmin metaOut not  need multiple close



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17321) RBF: Add RouterAutoMsyncService for auto msync in Router

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803418#comment-17803418
 ] 

ASF GitHub Bot commented on HDFS-17321:
---

LiuGuH commented on code in PR #6404:
URL: https://github.com/apache/hadoop/pull/6404#discussion_r1442525470


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterAutoMsyncService.java:
##
@@ -0,0 +1,86 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.server.federation.router;
+
+import static 
org.apache.hadoop.hdfs.server.federation.FederationTestUtils.NAMENODES;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdfs.server.federation.MiniRouterDFSCluster;
+import org.junit.AfterClass;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TestName;
+
+import java.io.IOException;
+
+/**
+ * Test the service that msync to all nameservices.
+ */
+public class TestRouterAutoMsyncService {
+
+  private static MiniRouterDFSCluster cluster;
+  private static Router router;
+  private static RouterAutoMsyncService service;
+  private static long msyncInterval = 1000;
+
+  @Rule
+  public TestName name = new TestName();
+
+  @BeforeClass
+  public static void globalSetUp() throws Exception {
+Configuration conf = new Configuration();
+conf.setBoolean(RBFConfigKeys.DFS_ROUTER_AUTO_MSYNC_ENABLE, true);
+conf.setLong(RBFConfigKeys.DFS_ROUTER_AUTO_MSYNC_INTERVAL_MS, 
msyncInterval);
+conf.setBoolean(RBFConfigKeys.DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY, true);
+
+cluster = new MiniRouterDFSCluster(true, 1, conf);
+
+// Start NNs and DNs and wait until ready
+cluster.startCluster(conf);
+cluster.startRouters();
+cluster.waitClusterUp();
+
+// Making one Namenodes active per nameservice
+if (cluster.isHighAvailability()) {
+  for (String ns : cluster.getNameservices()) {
+cluster.switchToActive(ns, NAMENODES[0]);
+cluster.switchToStandby(ns, NAMENODES[1]);
+  }
+}
+cluster.waitActiveNamespaces();
+
+router = cluster.getRandomRouter().getRouter();
+service = router.getRouterAutoMsyncService();
+  }
+
+  @AfterClass
+  public static void tearDown() throws IOException {
+cluster.shutdown();
+service.stop();
+service.close();
+  }
+
+  @Test
+  public void testMsync() throws InterruptedException, IOException {
+Thread.sleep(msyncInterval);

Review Comment:
   Changed with GenericTestUtils . Thanks.





> RBF: Add RouterAutoMsyncService for auto msync in Router
> 
>
> Key: HDFS-17321
> URL: https://issues.apache.org/jira/browse/HDFS-17321
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>
>  
> Router should have the ability to to auto msync to a nameservice. And it can 
> ensure router periodically refreshes its record of a namespace's state.
> Different from HDFS-17027, this is controled by router itself without 
> configuring with AbstractNNFailoverProxyProvider.
> And HDFS-16890 maybe lead to many read requests into active NN at the same 
> time.
> This PR provides a new way to implememts auto msync in Router.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803403#comment-17803403
 ] 

ASF GitHub Bot commented on HDFS-17290:
---

simbadzina merged PR #6359:
URL: https://github.com/apache/hadoop/pull/6359




> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher priority queues when connection between client and 
> namenode remains open. Currently IPC server just emits a single metrics for 
> all the backoffs.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17317) DebugAdmin metaOut not need multiple close

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803402#comment-17803402
 ] 

ASF GitHub Bot commented on HDFS-17317:
---

xuzifu666 commented on PR #6402:
URL: https://github.com/apache/hadoop/pull/6402#issuecomment-1878100095

   @ayushtkn could you help to merge?




> DebugAdmin metaOut not  need multiple close
> ---
>
> Key: HDFS-17317
> URL: https://issues.apache.org/jira/browse/HDFS-17317
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: xy
>Priority: Major
>  Labels: pull-request-available
>
> DebugAdmin metaOut not  need multiple close



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803380#comment-17803380
 ] 

ASF GitHub Bot commented on HDFS-17290:
---

hadoop-yetus commented on PR #6359:
URL: https://github.com/apache/hadoop/pull/6359#issuecomment-1878039127

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 46s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  54m  9s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  18m 13s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |  17m 11s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 23s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 13s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m 32s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 37s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 53s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |  17m 52s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 31s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  16m 31s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 13s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/14/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 2 new + 215 
unchanged - 0 fixed = 217 total (was 215)  |
   | +1 :green_heart: |  mvnsite  |   1m 35s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 49s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m 41s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  39m 56s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 11s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 58s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 243m  8s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/14/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6359 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
|
   | uname | Linux a5e66f216291 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / a72993b090ddbfd680bda309943cd98a01ef51c3 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/14/testReport/ |
   | Max. process+thread count | 1311 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: 

[jira] [Resolved] (HDFS-17322) RetryCache#MAX_CAPACITY seems to be MIN_CAPACITY

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved HDFS-17322.
---
   Fix Version/s: 3.4.0
Hadoop Flags: Reviewed
Target Version/s: 3.4.0  (was: 3.5.0)
  Resolution: Fixed

> RetryCache#MAX_CAPACITY seems to be MIN_CAPACITY
> 
>
> Key: HDFS-17322
> URL: https://issues.apache.org/jira/browse/HDFS-17322
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc
>Affects Versions: 3.3.6
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> From the code logic, we can infer that RetryCache#MAX_CAPACITY should  better 
> be  MIN_CAPACITY.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved HDFS-17306.
---
   Fix Version/s: 3.4.0
Hadoop Flags: Reviewed
Target Version/s: 3.3.6, 3.4.0
  Resolution: Fixed

> RBF:Router should not return nameservices that does not enable observer nodes 
> in RpcResponseHeaderProto
> ---
>
> Key: HDFS-17306
> URL: https://issues.apache.org/jira/browse/HDFS-17306
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
>       If a cluster has 3 nameservices: ns1, ns2,ns3, and  ns1 has observer 
> nodes, and client via DFSRouter comminutes with nns.
>       If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable,  the client will 
> receive all nameservices in RpcResponseHeaderProto. 
>        We should reduce rpc response size if nameservices don't enable 
> observer nodes.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17322) RetryCache#MAX_CAPACITY seems to be MIN_CAPACITY

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803360#comment-17803360
 ] 

ASF GitHub Bot commented on HDFS-17322:
---

hfutatzhanghb commented on PR #6405:
URL: https://github.com/apache/hadoop/pull/6405#issuecomment-1877939818

   @simbadzina Sir, thanks for reviewing and merging~




> RetryCache#MAX_CAPACITY seems to be MIN_CAPACITY
> 
>
> Key: HDFS-17322
> URL: https://issues.apache.org/jira/browse/HDFS-17322
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc
>Affects Versions: 3.3.6
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
>
> From the code logic, we can infer that RetryCache#MAX_CAPACITY should  better 
> be  MIN_CAPACITY.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17321) RBF: Add RouterAutoMsyncService for auto msync in Router

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803346#comment-17803346
 ] 

ASF GitHub Bot commented on HDFS-17321:
---

simbadzina commented on PR #6404:
URL: https://github.com/apache/hadoop/pull/6404#issuecomment-1877871135

   > > I'm wondering if we can instead fix the existing mechanism such that 
only a single read is sent to the active, vs. adding a new mechanism.
   > 
   > Yes, it can. But for only a single read is send to the active, we should 
add synchronized. And this maybe have performance impact. Add a seperate 
RouterAutoMsyncService maybe a way to slove it.
   > 
   > > Additionally, the periodic redirection of calls to the active only 
happens in the case when there are no calls going to the active already so 
having some reads be sent to the active should not overload it.
   > 
   > In most cases, that's true. But I think add RouterAutoMsyncService will be 
more robustness. Thanks
   
   That is fair. I'll take a closer look at the implementation now that we've 
discussed the higher level details. Please also address the unit test comment 
by @slfan1989 




> RBF: Add RouterAutoMsyncService for auto msync in Router
> 
>
> Key: HDFS-17321
> URL: https://issues.apache.org/jira/browse/HDFS-17321
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>
>  
> Router should have the ability to to auto msync to a nameservice. And it can 
> ensure router periodically refreshes its record of a namespace's state.
> Different from HDFS-17027, this is controled by router itself without 
> configuring with AbstractNNFailoverProxyProvider.
> And HDFS-16890 maybe lead to many read requests into active NN at the same 
> time.
> This PR provides a new way to implememts auto msync in Router.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803345#comment-17803345
 ] 

ASF GitHub Bot commented on HDFS-17306:
---

simbadzina merged PR #6385:
URL: https://github.com/apache/hadoop/pull/6385




> RBF:Router should not return nameservices that does not enable observer nodes 
> in RpcResponseHeaderProto
> ---
>
> Key: HDFS-17306
> URL: https://issues.apache.org/jira/browse/HDFS-17306
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>
>       If a cluster has 3 nameservices: ns1, ns2,ns3, and  ns1 has observer 
> nodes, and client via DFSRouter comminutes with nns.
>       If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable,  the client will 
> receive all nameservices in RpcResponseHeaderProto. 
>        We should reduce rpc response size if nameservices don't enable 
> observer nodes.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803344#comment-17803344
 ] 

ASF GitHub Bot commented on HDFS-17306:
---

simbadzina commented on PR #6385:
URL: https://github.com/apache/hadoop/pull/6385#issuecomment-1877857200

   > > I'm okay with saving this optimization for a separate pull request 
though.
   > 
   > Thanks for review . And I will be add a separate pull request for 
DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY and DFS_ROUTER_OBSERVER_READ_OVERRIDES 
check.
   
   Great. Thanks for the contribution.




> RBF:Router should not return nameservices that does not enable observer nodes 
> in RpcResponseHeaderProto
> ---
>
> Key: HDFS-17306
> URL: https://issues.apache.org/jira/browse/HDFS-17306
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>
>       If a cluster has 3 nameservices: ns1, ns2,ns3, and  ns1 has observer 
> nodes, and client via DFSRouter comminutes with nns.
>       If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable,  the client will 
> receive all nameservices in RpcResponseHeaderProto. 
>        We should reduce rpc response size if nameservices don't enable 
> observer nodes.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17322) RetryCache#MAX_CAPACITY seems to be MIN_CAPACITY

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803342#comment-17803342
 ] 

ASF GitHub Bot commented on HDFS-17322:
---

simbadzina merged PR #6405:
URL: https://github.com/apache/hadoop/pull/6405




> RetryCache#MAX_CAPACITY seems to be MIN_CAPACITY
> 
>
> Key: HDFS-17322
> URL: https://issues.apache.org/jira/browse/HDFS-17322
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc
>Affects Versions: 3.3.6
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
>
> From the code logic, we can infer that RetryCache#MAX_CAPACITY should  better 
> be  MIN_CAPACITY.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803341#comment-17803341
 ] 

ASF GitHub Bot commented on HDFS-17290:
---

simbadzina commented on code in PR #6359:
URL: https://github.com/apache/hadoop/pull/6359#discussion_r1442306564


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/metrics/RpcMetrics.java:
##
@@ -205,7 +207,6 @@ public static TimeUnit getMetricsTimeUnit(Configuration 
conf) {
   // abstract class if we decide to do custom instrumentation classes a la
   // JobTrackerInstrumentation. The methods with //@Override comment are
   // candidates for abstract methods in a abstract instrumentation class.
-

Review Comment:
   Could you please remove this whitespace change. I think it is causing the 
second checkstyle warning.





> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher priority queues when connection between client and 
> namenode remains open. Currently IPC server just emits a single metrics for 
> all the backoffs.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17283) Change the name of variable SECOND in HdfsClientConfigKeys

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803157#comment-17803157
 ] 

ASF GitHub Bot commented on HDFS-17283:
---

hfutatzhanghb commented on PR #6339:
URL: https://github.com/apache/hadoop/pull/6339#issuecomment-1877053892

   @zhangshuyan0 Sir, thanks for reviewing and merging. @xinglin Sir, thanks 
for reviewing.
   




> Change the name of variable SECOND in HdfsClientConfigKeys
> --
>
> Key: HDFS-17283
> URL: https://issues.apache.org/jira/browse/HDFS-17283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.3.6
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17283) Change the name of variable SECOND in HdfsClientConfigKeys

2024-01-04 Thread Shuyan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuyan Zhang resolved HDFS-17283.
-
   Fix Version/s: 3.4.0
Hadoop Flags: Reviewed
Target Version/s: 3.4.0  (was: 3.5.0)
  Resolution: Fixed

> Change the name of variable SECOND in HdfsClientConfigKeys
> --
>
> Key: HDFS-17283
> URL: https://issues.apache.org/jira/browse/HDFS-17283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.3.6
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17283) Change the name of variable SECOND in HdfsClientConfigKeys

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803006#comment-17803006
 ] 

ASF GitHub Bot commented on HDFS-17283:
---

zhangshuyan0 merged PR #6339:
URL: https://github.com/apache/hadoop/pull/6339




> Change the name of variable SECOND in HdfsClientConfigKeys
> --
>
> Key: HDFS-17283
> URL: https://issues.apache.org/jira/browse/HDFS-17283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.3.6
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17300) [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a configured time

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802837#comment-17802837
 ] 

ASF GitHub Bot commented on HDFS-17300:
---

hadoop-yetus commented on PR #6383:
URL: https://github.com/apache/hadoop/pull/6383#issuecomment-1876795690

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 22s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 13s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  20m 13s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   9m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   8m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   2m 14s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 45s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 17s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | -1 :x: |  spotbugs  |   1m 56s | 
[/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6383/6/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html)
 |  hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant spotbugs warnings.  |
   | +1 :green_heart: |  shadedclient  |  21m 49s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 23s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   8m 54s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   8m 29s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   2m 16s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6383/6/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 1 new + 304 unchanged - 0 fixed = 305 total (was 
304)  |
   | +1 :green_heart: |  mvnsite  |   1m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 13s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 37s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 30s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 15s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  16m 31s | 
[/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6383/6/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch passed.  |
   | -1 :x: |  unit  | 263m 37s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6383/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 42s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 418m 19s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.conf.TestCommonConfigurationFields |
   |   | hadoop.hdfs.TestFileChecksum |
   |   | hadoop.hdfs.TestRenameWhileOpen |
   |   | hadoop.hdfs.TestDecommissionWithStriped |
   |   | hadoop.hdfs.TestDFSStripedInputStream |
   |   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
   |   | hadoop.hdfs.TestDFSFinalize |
   
   
   | Subsystem | Report/Notes |
   

[jira] [Commented] (HDFS-17300) [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a configured time

2024-01-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802705#comment-17802705
 ] 

ASF GitHub Bot commented on HDFS-17300:
---

hadoop-yetus commented on PR #6383:
URL: https://github.com/apache/hadoop/pull/6383#issuecomment-1876717720

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 24s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 15s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  20m 20s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   9m 20s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   8m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   2m 14s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 43s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 18s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | -1 :x: |  spotbugs  |   1m 57s | 
[/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6383/5/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html)
 |  hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant spotbugs warnings.  |
   | +1 :green_heart: |  shadedclient  |  21m 31s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m  6s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   9m  3s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   9m  3s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 17s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   8m 17s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   2m 10s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6383/5/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 1 new + 304 unchanged - 0 fixed = 305 total (was 
304)  |
   | +1 :green_heart: |  mvnsite  |   1m 40s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 11s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 27s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m  4s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  16m 43s | 
[/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6383/5/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch passed.  |
   | -1 :x: |  unit  | 207m 44s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6383/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 361m 43s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.conf.TestCommonConfigurationFields |
   |   | hadoop.hdfs.TestReadStripedFileWithDecoding |
   |   | hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor |
   |   | hadoop.hdfs.TestDFSOutputStream |
   |   | hadoop.hdfs.TestDFSStripedOutputStream |
   |   | hadoop.hdfs.tools.TestDebugAdmin |
   |   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
   |   | 

[jira] [Updated] (HDFS-14261) Kerberize JournalNodeSyncer unit test

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14261:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Kerberize JournalNodeSyncer unit test
> -
>
> Key: HDFS-14261
> URL: https://issues.apache.org/jira/browse/HDFS-14261
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: journal-node, security, test
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-14261.001.patch
>
>
> This jira is an addition to HDFS-14140. Making the unit tests in 
> TestJournalNodeSync run on a Kerberized cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16093) DataNodes under decommission will still be returned to the client via getLocatedBlocks, so the client may request decommissioning datanodes to read which will cause bad

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802633#comment-17802633
 ] 

Shilun Fan commented on HDFS-16093:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> DataNodes under decommission will still be returned to the client via 
> getLocatedBlocks, so the client may request decommissioning datanodes to read 
> which will cause badly competation on disk IO.
> --
>
> Key: HDFS-16093
> URL: https://issues.apache.org/jira/browse/HDFS-16093
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.1
>Reporter: Daniel Ma
>Assignee: Daniel Ma
>Priority: Critical
>
> DataNodes under decommission will still be returned to the client via 
> getLocatedBlocks, so the client may request decommissioning datanodes to read 
> which will cause badly competation on disk IO.
> Therefore, datanodes under decommission should be removed from the return 
> list of getLocatedBlocks api.
> !image-2021-06-29-10-50-44-739.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14038) Expose HdfsDataOutputStreamBuilder to include Spark in LimitedPrivate

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802440#comment-17802440
 ] 

Shilun Fan edited comment on HDFS-14038 at 1/4/24 8:18 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release.

> Expose HdfsDataOutputStreamBuilder to include Spark in LimitedPrivate
> -
>
> Key: HDFS-14038
> URL: https://issues.apache.org/jira/browse/HDFS-14038
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xiao Chen
>Priority: Major
>
> In SPARK-25855 / 
> https://github.com/apache/spark/pull/22881#issuecomment-434359237, Spark 
> prefer to create Spark event log files with replication (instead of EC). To 
> do this currently, it has to be done by some casting / reflection, to get a 
> DistributedFileSystem object (or use the {{HdfsDataOutputStreamBuilder}} 
> subclass of it).
> We should officially expose this for Spark's usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16093) DataNodes under decommission will still be returned to the client via getLocatedBlocks, so the client may request decommissioning datanodes to read which will cause badly

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16093:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> DataNodes under decommission will still be returned to the client via 
> getLocatedBlocks, so the client may request decommissioning datanodes to read 
> which will cause badly competation on disk IO.
> --
>
> Key: HDFS-16093
> URL: https://issues.apache.org/jira/browse/HDFS-16093
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.1
>Reporter: Daniel Ma
>Assignee: Daniel Ma
>Priority: Critical
>
> DataNodes under decommission will still be returned to the client via 
> getLocatedBlocks, so the client may request decommissioning datanodes to read 
> which will cause badly competation on disk IO.
> Therefore, datanodes under decommission should be removed from the return 
> list of getLocatedBlocks api.
> !image-2021-06-29-10-50-44-739.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14038) Expose HdfsDataOutputStreamBuilder to include Spark in LimitedPrivate

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14038:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Expose HdfsDataOutputStreamBuilder to include Spark in LimitedPrivate
> -
>
> Key: HDFS-14038
> URL: https://issues.apache.org/jira/browse/HDFS-14038
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xiao Chen
>Priority: Major
>
> In SPARK-25855 / 
> https://github.com/apache/spark/pull/22881#issuecomment-434359237, Spark 
> prefer to create Spark event log files with replication (instead of EC). To 
> do this currently, it has to be done by some casting / reflection, to get a 
> DistributedFileSystem object (or use the {{HdfsDataOutputStreamBuilder}} 
> subclass of it).
> We should officially expose this for Spark's usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14194) Mention HDFS ACL incompatible changes more explicitly

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14194:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Mention HDFS ACL incompatible changes more explicitly
> -
>
> Key: HDFS-14194
> URL: https://issues.apache.org/jira/browse/HDFS-14194
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation, namenode
>Affects Versions: 3.0.0-beta1
>Reporter: Wei-Chiu Chuang
>Priority: Major
>
> HDFS-11957 enabled POSIX ACL inheritance by default, setting 
> dfs.namenode.posix.acl.inheritance.enabled.
> Even though it was documented in the ACL doc, it is not explicit. Users 
> upgrade to Hadoop 3.0 and beyond will be caught in surprise. The doc should 
> be updated to make it clear, preferably with examples to show what to expect, 
> so that search engines can hopefully find the doc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14261) Kerberize JournalNodeSyncer unit test

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802437#comment-17802437
 ] 

Shilun Fan edited comment on HDFS-14261 at 1/4/24 8:17 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release.

> Kerberize JournalNodeSyncer unit test
> -
>
> Key: HDFS-14261
> URL: https://issues.apache.org/jira/browse/HDFS-14261
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: journal-node, security, test
>Affects Versions: 3.2.0, 3.1.2
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Attachments: HDFS-14261.001.patch
>
>
> This jira is an addition to HDFS-14140. Making the unit tests in 
> TestJournalNodeSync run on a Kerberized cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14312) KMS-o-meter: Scale test KMS using kms audit log

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14312:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> KMS-o-meter: Scale test KMS using kms audit log
> ---
>
> Key: HDFS-14312
> URL: https://issues.apache.org/jira/browse/HDFS-14312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: kms
>Affects Versions: 3.3.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> It appears to me that Dynamometer's architecture allows KMS scale tests too.
> I imagine there are two ways to scale test a KMS.
> # Take KMS audit logs, and replay the logs against a KMS.
> # Configure Dynamometer to start KMS in addition to NameNode. Assuming the 
> fsimage comes from an encrypted cluster, replaying HDFS audit log also tests 
> KMS.
> It would be even more interesting to have a tool that converts uncrypted 
> cluster fsimage to an encrypted one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14312) KMS-o-meter: Scale test KMS using kms audit log

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802434#comment-17802434
 ] 

Shilun Fan edited comment on HDFS-14312 at 1/4/24 8:17 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release.

> KMS-o-meter: Scale test KMS using kms audit log
> ---
>
> Key: HDFS-14312
> URL: https://issues.apache.org/jira/browse/HDFS-14312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: kms
>Affects Versions: 3.3.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> It appears to me that Dynamometer's architecture allows KMS scale tests too.
> I imagine there are two ways to scale test a KMS.
> # Take KMS audit logs, and replay the logs against a KMS.
> # Configure Dynamometer to start KMS in addition to NameNode. Assuming the 
> fsimage comes from an encrypted cluster, replaying HDFS audit log also tests 
> KMS.
> It would be even more interesting to have a tool that converts uncrypted 
> cluster fsimage to an encrypted one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13928) Add the corner case testings for log roll

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-13928:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Add the corner case testings for log roll
> -
>
> Key: HDFS-13928
> URL: https://issues.apache.org/jira/browse/HDFS-13928
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.1
>Reporter: Yiqun Lin
>Priority: Major
>
> We find some corner cases for log roll when doing jounalnode migration in our 
> cluster. We use a online way for the migration. And there will cause some 
> corner cases:
>  * Multiple in-progress  edits_inprogress* files exists in edits dir, if log 
> roll is updated correctly. In this case, we can divide this into two cases:
>  1.With redundant in-progress file with higher txid than current in-progress 
> file
>  2.With redundant in-progress file with lower txid than current in-progress 
> file
>  * In a HA mode, SBN is down, what's the behavior of log roll will become?
> We can complete the log roll UTs for above cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13928) Add the corner case testings for log roll

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802441#comment-17802441
 ] 

Shilun Fan edited comment on HDFS-13928 at 1/4/24 8:17 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release.

> Add the corner case testings for log roll
> -
>
> Key: HDFS-13928
> URL: https://issues.apache.org/jira/browse/HDFS-13928
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.1.1
>Reporter: Yiqun Lin
>Priority: Major
>
> We find some corner cases for log roll when doing jounalnode migration in our 
> cluster. We use a online way for the migration. And there will cause some 
> corner cases:
>  * Multiple in-progress  edits_inprogress* files exists in edits dir, if log 
> roll is updated correctly. In this case, we can divide this into two cases:
>  1.With redundant in-progress file with higher txid than current in-progress 
> file
>  2.With redundant in-progress file with lower txid than current in-progress 
> file
>  * In a HA mode, SBN is down, what's the behavior of log roll will become?
> We can complete the log roll UTs for above cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14528) Failover from Active to Standby Failed

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802433#comment-17802433
 ] 

Shilun Fan edited comment on HDFS-14528 at 1/4/24 8:17 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release.

> Failover from Active to Standby Failed  
> 
>
> Key: HDFS-14528
> URL: https://issues.apache.org/jira/browse/HDFS-14528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, 
> HDFS-14528.005.patch, HDFS-14528.006.patch, HDFS-14528.007.patch, 
> HDFS-14528.2.Patch, ZKFC_issue.patch
>
>
>  *In a cluster with more than one Standby namenode, manual failover throws 
> exception for some cases*
> *When trying to exectue the failover command from active to standby* 
> *._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*
>   Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
> connection exception: java.net.ConnectException: Connection refused
> This is encountered in the following cases :
>  Scenario 1 : 
> Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is 
> thrown
> Scenario 2 :
>  Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> ZKFC's -              ZKFC1,            ZKFC2,            ZKFC3
> When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is 
> down, Exception is thrown



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14528) Failover from Active to Standby Failed

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14528:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Failover from Active to Standby Failed  
> 
>
> Key: HDFS-14528
> URL: https://issues.apache.org/jira/browse/HDFS-14528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Reporter: Ravuri Sushma sree
>Assignee: Ravuri Sushma sree
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14528.003.patch, HDFS-14528.004.patch, 
> HDFS-14528.005.patch, HDFS-14528.006.patch, HDFS-14528.007.patch, 
> HDFS-14528.2.Patch, ZKFC_issue.patch
>
>
>  *In a cluster with more than one Standby namenode, manual failover throws 
> exception for some cases*
> *When trying to exectue the failover command from active to standby* 
> *._/hdfs haadmin  -failover nn1 nn2, below Exception is thrown_*
>   Operation failed: Call From X-X-X-X/X-X-X-X to Y-Y-Y-Y: failed on 
> connection exception: java.net.ConnectException: Connection refused
> This is encountered in the following cases :
>  Scenario 1 : 
> Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> When trying to manually failover from NN1 to NN2 if NN3 is down, Exception is 
> thrown
> Scenario 2 :
>  Namenodes - NN1(Active) , NN2(Standby), NN3(Standby)
> ZKFC's -              ZKFC1,            ZKFC2,            ZKFC3
> When trying to manually failover using NN1 to NN3 if NN3's ZKFC (ZKFC3) is 
> down, Exception is thrown



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14607) Support OpenTracing in WebHDFS

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14607:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Support OpenTracing in WebHDFS
> --
>
> Key: HDFS-14607
> URL: https://issues.apache.org/jira/browse/HDFS-14607
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Nikhil Navadiya
>Assignee: Nikhil Navadiya
>Priority: Major
> Attachments: HDFS-14607.WIP.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14606) Support OpenTracing in HDFS

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802432#comment-17802432
 ] 

Shilun Fan edited comment on HDFS-14606 at 1/4/24 8:16 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release. 

> Support OpenTracing in HDFS
> ---
>
> Key: HDFS-14606
> URL: https://issues.apache.org/jira/browse/HDFS-14606
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Nikhil Navadiya
>Assignee: Siyao Meng
>Priority: Major
>
> Add OpenTracing support in HDFS



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14606) Support OpenTracing in HDFS

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14606:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Support OpenTracing in HDFS
> ---
>
> Key: HDFS-14606
> URL: https://issues.apache.org/jira/browse/HDFS-14606
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Nikhil Navadiya
>Assignee: Siyao Meng
>Priority: Major
>
> Add OpenTracing support in HDFS



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14305:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Konstantin Shvachko
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14305-007.patch, HDFS-14305-008.patch, 
> HDFS-14305.001.patch, HDFS-14305.002.patch, HDFS-14305.003.patch, 
> HDFS-14305.004.patch, HDFS-14305.005.patch, HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14607) Support OpenTracing in WebHDFS

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802431#comment-17802431
 ] 

Shilun Fan edited comment on HDFS-14607 at 1/4/24 8:16 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release. 

> Support OpenTracing in WebHDFS
> --
>
> Key: HDFS-14607
> URL: https://issues.apache.org/jira/browse/HDFS-14607
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Nikhil Navadiya
>Assignee: Nikhil Navadiya
>Priority: Major
> Attachments: HDFS-14607.WIP.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14305) Serial number in BlockTokenSecretManager could overlap between different namenodes

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802435#comment-17802435
 ] 

Shilun Fan edited comment on HDFS-14305 at 1/4/24 8:16 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.1 release.

> Serial number in BlockTokenSecretManager could overlap between different 
> namenodes
> --
>
> Key: HDFS-14305
> URL: https://issues.apache.org/jira/browse/HDFS-14305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, security
>Reporter: Chao Sun
>Assignee: Konstantin Shvachko
>Priority: Major
>  Labels: multi-sbnn
> Attachments: HDFS-14305-007.patch, HDFS-14305-008.patch, 
> HDFS-14305.001.patch, HDFS-14305.002.patch, HDFS-14305.003.patch, 
> HDFS-14305.004.patch, HDFS-14305.005.patch, HDFS-14305.006.patch
>
>
> Currently, a {{BlockTokenSecretManager}} starts with a random integer as the 
> initial serial number, and then use this formula to rotate it:
> {code:java}
> this.intRange = Integer.MAX_VALUE / numNNs;
> this.nnRangeStart = intRange * nnIndex;
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
>  {code}
> while {{numNNs}} is the total number of NameNodes in the cluster, and 
> {{nnIndex}} is the index of the current NameNode specified in the 
> configuration {{dfs.ha.namenodes.}}.
> However, with this approach, different NameNode could have overlapping ranges 
> for serial number. For simplicity, let's assume {{Integer.MAX_VALUE}} is 100, 
> and we have 2 NameNodes {{nn1}} and {{nn2}} in configuration. Then the ranges 
> for these two are:
> {code}
> nn1 -> [-49, 49]
> nn2 -> [1, 99]
> {code}
> This is because the initial serial number could be any negative integer.
> Moreover, when the keys are updated, the serial number will again be updated 
> with the formula:
> {code}
> this.serialNo = (this.serialNo % intRange) + (nnRangeStart);
> {code}
> which means the new serial number could be updated to a range that belongs to 
> a different NameNode, thus increasing the chance of collision again.
> When the collision happens, DataNodes could overwrite an existing key which 
> will cause clients to fail because of {{InvalidToken}} error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14626) Decommission all nodes hosting last block of open file succeeds unexpectedly

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14626:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Decommission all nodes hosting last block of open file succeeds unexpectedly 
> -
>
> Key: HDFS-14626
> URL: https://issues.apache.org/jira/browse/HDFS-14626
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Priority: Major
> Attachments: test-to-reproduce.patch
>
>
> I have been investigating scenarios that cause decommission to hang, 
> especially around one long standing issue. That is, an open block on the host 
> which is being decommissioned can cause the process to never complete.
> Checking the history, there seems to have been at least one change in 
> HDFS-5579 which greatly improved the situation, but from reading comments and 
> support cases, there still seems to be some scenarios where open blocks on a 
> DN host cause the decommission to get stuck.
> No matter what I try, I have not been able to reproduce this, but I think I 
> have uncovered another issue that may partly explain why.
> If I do the following, the nodes will decommission without any issues:
> 1. Create a file and write to it so it crosses a block boundary. Then there 
> is one complete block and one under construction block. Keep the file open, 
> and write a few bytes periodically.
> 2. Now note the nodes which the UC block is currently being written on, and 
> decommission them all.
> 3. The decommission should succeed.
> 4. Now attempt to close the open file, and it will fail to close with an 
> error like below, probably as decommissioned nodes are not allowed to send 
> IBRs:
> {code:java}
> java.io.IOException: Unable to close file because the last block 
> BP-646926902-192.168.0.20-1562099323291:blk_1073741827_1003 does not have 
> enough number of replicas.
>     at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:968)
>     at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:911)
>     at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:894)
>     at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:849)
>     at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>     at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101){code}
> Interestingly, if you recommission the nodes without restarting them before 
> closing the file, it will close OK, and writes to it can continue even once 
> decommission has completed.
> I don't think this is expected - ie decommission should not complete on all 
> nodes hosting the last UC block of a file?
> From what I have figured out, I don't think UC blocks are considered in the 
> DatanodeAdminManager at all. This is because the original list of blocks it 
> cares about, are taken from the Datanode block Iterator, which takes them 
> from the DatanodeStorageInfo objects attached to the datanode instance. I 
> believe UC blocks don't make it into the DatanodeStoreageInfo until after 
> they have been completed and an IBR sent, so the decommission logic never 
> considers them.
> What troubles me about this explanation, is how did open files previously 
> cause decommission to get stuck if it never checks for them, so I suspect I 
> am missing something.
> I will attach a patch with a test case that demonstrates this issue. This 
> reproduces on trunk and I also tested on CDH 5.8.1, which is based on the 2.6 
> branch, but with a lot of backports.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14626) Decommission all nodes hosting last block of open file succeeds unexpectedly

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802430#comment-17802430
 ] 

Shilun Fan edited comment on HDFS-14626 at 1/4/24 8:15 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release. 

> Decommission all nodes hosting last block of open file succeeds unexpectedly 
> -
>
> Key: HDFS-14626
> URL: https://issues.apache.org/jira/browse/HDFS-14626
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Priority: Major
> Attachments: test-to-reproduce.patch
>
>
> I have been investigating scenarios that cause decommission to hang, 
> especially around one long standing issue. That is, an open block on the host 
> which is being decommissioned can cause the process to never complete.
> Checking the history, there seems to have been at least one change in 
> HDFS-5579 which greatly improved the situation, but from reading comments and 
> support cases, there still seems to be some scenarios where open blocks on a 
> DN host cause the decommission to get stuck.
> No matter what I try, I have not been able to reproduce this, but I think I 
> have uncovered another issue that may partly explain why.
> If I do the following, the nodes will decommission without any issues:
> 1. Create a file and write to it so it crosses a block boundary. Then there 
> is one complete block and one under construction block. Keep the file open, 
> and write a few bytes periodically.
> 2. Now note the nodes which the UC block is currently being written on, and 
> decommission them all.
> 3. The decommission should succeed.
> 4. Now attempt to close the open file, and it will fail to close with an 
> error like below, probably as decommissioned nodes are not allowed to send 
> IBRs:
> {code:java}
> java.io.IOException: Unable to close file because the last block 
> BP-646926902-192.168.0.20-1562099323291:blk_1073741827_1003 does not have 
> enough number of replicas.
>     at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:968)
>     at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:911)
>     at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:894)
>     at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:849)
>     at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>     at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101){code}
> Interestingly, if you recommission the nodes without restarting them before 
> closing the file, it will close OK, and writes to it can continue even once 
> decommission has completed.
> I don't think this is expected - ie decommission should not complete on all 
> nodes hosting the last UC block of a file?
> From what I have figured out, I don't think UC blocks are considered in the 
> DatanodeAdminManager at all. This is because the original list of blocks it 
> cares about, are taken from the Datanode block Iterator, which takes them 
> from the DatanodeStorageInfo objects attached to the datanode instance. I 
> believe UC blocks don't make it into the DatanodeStoreageInfo until after 
> they have been completed and an IBR sent, so the decommission logic never 
> considers them.
> What troubles me about this explanation, is how did open files previously 
> cause decommission to get stuck if it never checks for them, so I suspect I 
> am missing something.
> I will attach a patch with a test case that demonstrates this issue. This 
> reproduces on trunk and I also tested on CDH 5.8.1, which is based on the 2.6 
> branch, but with a lot of backports.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15357) Do not trust bad block reports from clients

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15357:
--
Target Version/s: 2.10.3, 3.5.0  (was: 2.10.3, 3.4.1)

> Do not trust bad block reports from clients
> ---
>
> Key: HDFS-15357
> URL: https://issues.apache.org/jira/browse/HDFS-15357
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Major
>
> {{reportBadBlocks()}} is implemented by both ClientNamenodeProtocol and 
> DatanodeProtocol. When DFSClient is calling it, a faulty client can cause 
> data availability issues in a cluster. 
> In the past we had such an incident where a node with a faulty NIC was 
> randomly corrupting data. All clients ran on the machine reported all 
> accessed blocks and all associated replicas to be corrupt.  More recently, a 
> single faulty client process caused  a small number of missing blocks.  In 
> all cases, actual data was fine.
> The bad block reports from clients shouldn't be trusted blindly. Instead, the 
> namenode should send a datanode command to verify the claim. A bonus would be 
> to keep the record for a while and ignore repeated reports from the same 
> nodes.
> At minimum, there should be an option to ignore bad block reports from 
> clients, perhaps after logging it. A very crude way would be to make it short 
> out in {{ClientNamenodeProtocolServerSideTranslatorPB#reportBadBlocks()}}. 
> More sophisticated way would be to check for the datanode user name in 
> {{FSNamesystem#reportBadBlocks()}} so that it can be easily logged, or 
> optionally do further processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15357) Do not trust bad block reports from clients

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802416#comment-17802416
 ] 

Shilun Fan edited comment on HDFS-15357 at 1/4/24 8:15 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release.

> Do not trust bad block reports from clients
> ---
>
> Key: HDFS-15357
> URL: https://issues.apache.org/jira/browse/HDFS-15357
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Priority: Major
>
> {{reportBadBlocks()}} is implemented by both ClientNamenodeProtocol and 
> DatanodeProtocol. When DFSClient is calling it, a faulty client can cause 
> data availability issues in a cluster. 
> In the past we had such an incident where a node with a faulty NIC was 
> randomly corrupting data. All clients ran on the machine reported all 
> accessed blocks and all associated replicas to be corrupt.  More recently, a 
> single faulty client process caused  a small number of missing blocks.  In 
> all cases, actual data was fine.
> The bad block reports from clients shouldn't be trusted blindly. Instead, the 
> namenode should send a datanode command to verify the claim. A bonus would be 
> to keep the record for a while and ignore repeated reports from the same 
> nodes.
> At minimum, there should be an option to ignore bad block reports from 
> clients, perhaps after logging it. A very crude way would be to make it short 
> out in {{ClientNamenodeProtocolServerSideTranslatorPB#reportBadBlocks()}}. 
> More sophisticated way would be to check for the datanode user name in 
> {{FSNamesystem#reportBadBlocks()}} so that it can be easily logged, or 
> optionally do further processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15686) Provide documentation for ViewHDFS

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15686:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Provide documentation for ViewHDFS
> --
>
> Key: HDFS-15686
> URL: https://issues.apache.org/jira/browse/HDFS-15686
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs, viewfsOverloadScheme, ViewHDFS
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15388) DFS cacheadmin, ECAdmin, StoragePolicyAdmin commands should handle ViewFSOverloadScheme

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802626#comment-17802626
 ] 

Shilun Fan commented on HDFS-15388:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> DFS cacheadmin, ECAdmin, StoragePolicyAdmin commands should handle 
> ViewFSOverloadScheme 
> 
>
> Key: HDFS-15388
> URL: https://issues.apache.org/jira/browse/HDFS-15388
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>
> There are some more DFS specific admin tools, which should handle 
> ViewFSOverloadScheme when scheme is hdfs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14978) In-place Erasure Coding Conversion

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802423#comment-17802423
 ] 

Shilun Fan edited comment on HDFS-14978 at 1/4/24 8:14 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release.

> In-place Erasure Coding Conversion
> --
>
> Key: HDFS-14978
> URL: https://issues.apache.org/jira/browse/HDFS-14978
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Aravindan Vijayan
>Priority: Major
> Attachments: In-place Erasure Coding Conversion.pdf
>
>
> HDFS Erasure Coding is a new feature added in Apache Hadoop 3.0. It uses 
> encoding algorithms to reduce disk space usage while retaining redundancy 
> necessary for data recovery. It was a huge amount of work but it is just 
> getting adopted after almost 2 years.
> One usability problem that’s blocking users from adopting HDFS Erasure Coding 
> is that existing replicated files have to be copied to an EC-enabled 
> directory explicitly. Renaming a file/directory to an EC-enabled directory 
> does not automatically convert the blocks. Therefore users typically perform 
> the following steps to erasure-code existing files:
> {noformat}
> Create $tmp directory, set EC policy at it
> Distcp $src to $tmp
> Delete $src (rm -rf $src)
> mv $tmp $src
> {noformat}
> There are several reasons why this is not popular:
> * Complex. The process involves several steps: distcp data to a temporary 
> destination; delete source file; move destination to the source path.
> * Availability: there is a short period where nothing exists at the source 
> path, and jobs may fail unexpectedly.
> * Overhead. During the copy phase, there is a point in time where all of 
> source and destination files exist at the same time, exhausting disk space.
> * Not snapshot-friendly. If a snapshot is taken prior to performing the 
> conversion, the source (replicated) files will be preserved in the cluster 
> too. Therefore, the conversion actually increase storage space usage.
> * Not management-friendly. This approach changes file inode number, 
> modification time and access time. Erasure coded files are supposed to store 
> cold data, but this conversion makes data “hot” again.
> * Bulky. It’s either all or nothing. The directory may be partially erasure 
> coded, but this approach simply erasure code everything again.
> To ease data management, we should offer a utility tool to convert replicated 
> files to erasure coded files in-place.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15609) Implement hdfs getconf -serverDefaults

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15609:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Implement hdfs getconf -serverDefaults
> --
>
> Key: HDFS-15609
> URL: https://issues.apache.org/jira/browse/HDFS-15609
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>
> There doesn't seem to be an existing way to easily verify whether a 
> Hadoop/HDFS client correctly receives the FsServerDefaults from the NameNode.
> Here I propose extending {{hadoop getconf}} command with parameter 
> {{-serverDefaults}}. When executed, the client shall get {{FsServerDefaults}} 
> from the specified NameNode and print the configuration values out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15609) Implement hdfs getconf -serverDefaults

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802627#comment-17802627
 ] 

Shilun Fan commented on HDFS-15609:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Implement hdfs getconf -serverDefaults
> --
>
> Key: HDFS-15609
> URL: https://issues.apache.org/jira/browse/HDFS-15609
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>
> There doesn't seem to be an existing way to easily verify whether a 
> Hadoop/HDFS client correctly receives the FsServerDefaults from the NameNode.
> Here I propose extending {{hadoop getconf}} command with parameter 
> {{-serverDefaults}}. When executed, the client shall get {{FsServerDefaults}} 
> from the specified NameNode and print the configuration values out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14978) In-place Erasure Coding Conversion

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14978:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> In-place Erasure Coding Conversion
> --
>
> Key: HDFS-14978
> URL: https://issues.apache.org/jira/browse/HDFS-14978
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Wei-Chiu Chuang
>Assignee: Aravindan Vijayan
>Priority: Major
> Attachments: In-place Erasure Coding Conversion.pdf
>
>
> HDFS Erasure Coding is a new feature added in Apache Hadoop 3.0. It uses 
> encoding algorithms to reduce disk space usage while retaining redundancy 
> necessary for data recovery. It was a huge amount of work but it is just 
> getting adopted after almost 2 years.
> One usability problem that’s blocking users from adopting HDFS Erasure Coding 
> is that existing replicated files have to be copied to an EC-enabled 
> directory explicitly. Renaming a file/directory to an EC-enabled directory 
> does not automatically convert the blocks. Therefore users typically perform 
> the following steps to erasure-code existing files:
> {noformat}
> Create $tmp directory, set EC policy at it
> Distcp $src to $tmp
> Delete $src (rm -rf $src)
> mv $tmp $src
> {noformat}
> There are several reasons why this is not popular:
> * Complex. The process involves several steps: distcp data to a temporary 
> destination; delete source file; move destination to the source path.
> * Availability: there is a short period where nothing exists at the source 
> path, and jobs may fail unexpectedly.
> * Overhead. During the copy phase, there is a point in time where all of 
> source and destination files exist at the same time, exhausting disk space.
> * Not snapshot-friendly. If a snapshot is taken prior to performing the 
> conversion, the source (replicated) files will be preserved in the cluster 
> too. Therefore, the conversion actually increase storage space usage.
> * Not management-friendly. This approach changes file inode number, 
> modification time and access time. Erasure coded files are supposed to store 
> cold data, but this conversion makes data “hot” again.
> * Bulky. It’s either all or nothing. The directory may be partially erasure 
> coded, but this approach simply erasure code everything again.
> To ease data management, we should offer a utility tool to convert replicated 
> files to erasure coded files in-place.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15388) DFS cacheadmin, ECAdmin, StoragePolicyAdmin commands should handle ViewFSOverloadScheme

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15388:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> DFS cacheadmin, ECAdmin, StoragePolicyAdmin commands should handle 
> ViewFSOverloadScheme 
> 
>
> Key: HDFS-15388
> URL: https://issues.apache.org/jira/browse/HDFS-15388
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>
> There are some more DFS specific admin tools, which should handle 
> ViewFSOverloadScheme when scheme is hdfs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15042) Add more tests for ByteBufferPositionedReadable

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802422#comment-17802422
 ] 

Shilun Fan edited comment on HDFS-15042 at 1/4/24 8:14 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release.

> Add more tests for ByteBufferPositionedReadable 
> 
>
> Key: HDFS-15042
> URL: https://issues.apache.org/jira/browse/HDFS-15042
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There's a few corner cases of ByteBufferPositionedReadable which need to be 
> tested, mainly illegal read positions. Add them



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15042) Add more tests for ByteBufferPositionedReadable

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15042:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Add more tests for ByteBufferPositionedReadable 
> 
>
> Key: HDFS-15042
> URL: https://issues.apache.org/jira/browse/HDFS-15042
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There's a few corner cases of ByteBufferPositionedReadable which need to be 
> tested, mainly illegal read positions. Add them



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15337) Support available space choosing policy in HDFS Persistent Memory Cache

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15337:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Support available space choosing policy in HDFS Persistent Memory Cache
> ---
>
> Key: HDFS-15337
> URL: https://issues.apache.org/jira/browse/HDFS-15337
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching, datanode
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
>
> In HDFS-13762, we introduced HDFS Persistent Memory Cache feature. In that 
> implementation, if more than one persistent memory volume is specified by 
> user, a simple round-robin policy is used to pick up a volume to cache data. 
> Evidently, the large difference of volume capacity can lead to imbalance 
> issue. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15329) Provide FileContext based ViewFSOverloadScheme implementation

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802419#comment-17802419
 ] 

Shilun Fan edited comment on HDFS-15329 at 1/4/24 8:13 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release.

> Provide FileContext based ViewFSOverloadScheme implementation
> -
>
> Key: HDFS-15329
> URL: https://issues.apache.org/jira/browse/HDFS-15329
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hdfs, viewfs, viewfsOverloadScheme
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Abhishek Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This Jira to track for FileContext based ViewFSOverloadScheme implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15354) clearCorruptLazyPersistFiles incrementalBlock removal should be out side write lock

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15354:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> clearCorruptLazyPersistFiles incrementalBlock removal should be out side 
> write lock
> ---
>
> Key: HDFS-15354
> URL: https://issues.apache.org/jira/browse/HDFS-15354
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>
>  In LazyPersistFileScrubber#clearCorruptLazyPersistFiles collecting blocks 
> for removal and also removing them in write lock.
> removeBlocks should be moved out of writelock as removeBlocks has incremental 
> deletion logic in which it will acquire write lock and unlock for every block 
> removal.
> If there are more corrupt blocks to remove in cluster, it may hold write lock 
> for longer time. 
> {code:java}
>   for (BlockCollection bc : filesToDelete) {
>           LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>           BlocksMapUpdateInfo toRemoveBlocks =
>               FSDirDeleteOp.deleteInternal(
>                   FSNamesystem.this,
>                   INodesInPath.fromINode((INodeFile) bc), false);
>           changed |= toRemoveBlocks != null;
>           if (toRemoveBlocks != null) {
>             removeBlocks(toRemoveBlocks); // Incremental deletion of blocks
>           }
>         }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802420#comment-17802420
 ] 

Shilun Fan edited comment on HDFS-15274 at 1/4/24 8:13 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release.

> NN doesn't remove the blocks from the failed DatanodeStorageInfo
> 
>
> Key: HDFS-15274
> URL: https://issues.apache.org/jira/browse/HDFS-15274
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
> Attachments: HDFS-15274.001.patch, HDFS-15274.002.patch
>
>
> In our federation cluster, we found there were some inconsistency failure 
> volumes between two namespaces. The following logs are two NS separately.
> NS1 received the failed storage info and removed the blocks associated with 
> the failed storage.
> {code:java}
> [INFO] [IPC Server handler 76 on 8021] : Number of failed storages changes 
> from 0 to 1
> [INFO] [IPC Server handler 76 on 8021] : 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:NORMAL:X.X.X.X:50010:/data0/dfs 
> failed.
> [INFO] 
> [org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@4fb57fb3]
>  : Removed blocks associated with storage 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs 
> from DataNode X.X.X.X:50010
> [INFO] [IPC Server handler 73 on 8021] : Removed storage 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs 
> from DataNode X.X.X.X:50010{code}
> NS2 just received the failed storage.
> {code:java}
> [INFO] [IPC Server handler 87 on 8021] : Number of failed storages changes 
> from 0 to 1  {code}
>  
> After digging into the code and trying to simulate disk failed with
> {code:java}
> echo offline > /sys/block/sda/device/state
> echo 1 > /sys/block/sda/device/delete
> # re-mount the failed disk
> rescan-scsi-bus.sh -a
> systemctl daemon-reload
> mount /data0
> {code}
> I found the root reason is the inconsistency between StorageReport and 
> VolumeFailureSummary in BPServiceActor#sendHeartBeat.
> {code}
> StorageReport[] reports =
> dn.getFSDataset().getStorageReports(bpos.getBlockPoolId());
>   ..
>   // the DISK may FAILED before executing the next line
> VolumeFailureSummary volumeFailureSummary = dn.getFSDataset()
> .getVolumeFailureSummary();
> int numFailedVolumes = volumeFailureSummary != null ?
> volumeFailureSummary.getFailedStorageLocations().length : 0;
> {code} 
> I improved the tolerance in NN DatanodeDescriptor#updateStorageStats to solve 
> this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15354) clearCorruptLazyPersistFiles incrementalBlock removal should be out side write lock

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802417#comment-17802417
 ] 

Shilun Fan edited comment on HDFS-15354 at 1/4/24 8:13 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release.

> clearCorruptLazyPersistFiles incrementalBlock removal should be out side 
> write lock
> ---
>
> Key: HDFS-15354
> URL: https://issues.apache.org/jira/browse/HDFS-15354
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>
>  In LazyPersistFileScrubber#clearCorruptLazyPersistFiles collecting blocks 
> for removal and also removing them in write lock.
> removeBlocks should be moved out of writelock as removeBlocks has incremental 
> deletion logic in which it will acquire write lock and unlock for every block 
> removal.
> If there are more corrupt blocks to remove in cluster, it may hold write lock 
> for longer time. 
> {code:java}
>   for (BlockCollection bc : filesToDelete) {
>           LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>           BlocksMapUpdateInfo toRemoveBlocks =
>               FSDirDeleteOp.deleteInternal(
>                   FSNamesystem.this,
>                   INodesInPath.fromINode((INodeFile) bc), false);
>           changed |= toRemoveBlocks != null;
>           if (toRemoveBlocks != null) {
>             removeBlocks(toRemoveBlocks); // Incremental deletion of blocks
>           }
>         }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15329) Provide FileContext based ViewFSOverloadScheme implementation

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15329:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Provide FileContext based ViewFSOverloadScheme implementation
> -
>
> Key: HDFS-15329
> URL: https://issues.apache.org/jira/browse/HDFS-15329
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hdfs, viewfs, viewfsOverloadScheme
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Abhishek Das
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This Jira to track for FileContext based ViewFSOverloadScheme implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15274) NN doesn't remove the blocks from the failed DatanodeStorageInfo

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15274:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> NN doesn't remove the blocks from the failed DatanodeStorageInfo
> 
>
> Key: HDFS-15274
> URL: https://issues.apache.org/jira/browse/HDFS-15274
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Major
> Attachments: HDFS-15274.001.patch, HDFS-15274.002.patch
>
>
> In our federation cluster, we found there were some inconsistency failure 
> volumes between two namespaces. The following logs are two NS separately.
> NS1 received the failed storage info and removed the blocks associated with 
> the failed storage.
> {code:java}
> [INFO] [IPC Server handler 76 on 8021] : Number of failed storages changes 
> from 0 to 1
> [INFO] [IPC Server handler 76 on 8021] : 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:NORMAL:X.X.X.X:50010:/data0/dfs 
> failed.
> [INFO] 
> [org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager$Monitor@4fb57fb3]
>  : Removed blocks associated with storage 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs 
> from DataNode X.X.X.X:50010
> [INFO] [IPC Server handler 73 on 8021] : Removed storage 
> [DISK]DS-298de29e-9104-48dd-a674-5443a6126969:FAILED:X.X.X.X:50010:/data0/dfs 
> from DataNode X.X.X.X:50010{code}
> NS2 just received the failed storage.
> {code:java}
> [INFO] [IPC Server handler 87 on 8021] : Number of failed storages changes 
> from 0 to 1  {code}
>  
> After digging into the code and trying to simulate disk failed with
> {code:java}
> echo offline > /sys/block/sda/device/state
> echo 1 > /sys/block/sda/device/delete
> # re-mount the failed disk
> rescan-scsi-bus.sh -a
> systemctl daemon-reload
> mount /data0
> {code}
> I found the root reason is the inconsistency between StorageReport and 
> VolumeFailureSummary in BPServiceActor#sendHeartBeat.
> {code}
> StorageReport[] reports =
> dn.getFSDataset().getStorageReports(bpos.getBlockPoolId());
>   ..
>   // the DISK may FAILED before executing the next line
> VolumeFailureSummary volumeFailureSummary = dn.getFSDataset()
> .getVolumeFailureSummary();
> int numFailedVolumes = volumeFailureSummary != null ?
> volumeFailureSummary.getFailedStorageLocations().length : 0;
> {code} 
> I improved the tolerance in NN DatanodeDescriptor#updateStorageStats to solve 
> this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14897) [hadoop-hdfs] Fix order of actual and expected expression in assert statements

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802425#comment-17802425
 ] 

Shilun Fan edited comment on HDFS-14897 at 1/4/24 8:13 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release.

> [hadoop-hdfs] Fix order of actual and expected expression in assert statements
> --
>
> Key: HDFS-14897
> URL: https://issues.apache.org/jira/browse/HDFS-14897
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Priority: Major
>
> Fix order of actual and expected expression in assert statements which gives 
> misleading message when test case fails. Attached file has some of the places 
> where it is placed wrongly.
> {code:java}
> [ERROR] 
> testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 3.385 s  <<< FAILURE!
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
> {code}
> For long term, [AssertJ|http://joel-costigliola.github.io/assertj/] can be 
> used for new test cases which avoids such mistakes.
> This is a follow-up jira for the hadoop-hdfs project.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14897) [hadoop-hdfs] Fix order of actual and expected expression in assert statements

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14897:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> [hadoop-hdfs] Fix order of actual and expected expression in assert statements
> --
>
> Key: HDFS-14897
> URL: https://issues.apache.org/jira/browse/HDFS-14897
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Priority: Major
>
> Fix order of actual and expected expression in assert statements which gives 
> misleading message when test case fails. Attached file has some of the places 
> where it is placed wrongly.
> {code:java}
> [ERROR] 
> testNodeRemovalGracefully(org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService)
>   Time elapsed: 3.385 s  <<< FAILURE!
> java.lang.AssertionError: Shutdown nodes should be 0 now expected:<1> but 
> was:<0>
> {code}
> For long term, [AssertJ|http://joel-costigliola.github.io/assertj/] can be 
> used for new test cases which avoids such mistakes.
> This is a follow-up jira for the hadoop-hdfs project.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14663) HttpFS: LISTSTATUS_BATCH does not return batches

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14663:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> HttpFS: LISTSTATUS_BATCH does not return batches
> 
>
> Key: HDFS-14663
> URL: https://issues.apache.org/jira/browse/HDFS-14663
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: httpfs
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Siyao Meng
>Priority: Major
>
> The webhdfs protocol supports a LISTSTATUS_BATCH operation where it can 
> retrieve the file listing for a large directory in chunks.
> When using the webhdfs service embedded in the namenode, this works as 
> expected, but when using HTTPFS, any call to LISTSTATUS_BATCH simply returns 
> the entire listing rather than batches, working effectively like LISTSTATUS 
> instead.
> This seems to be because HTTPFS falls back to using the method 
> org.apache.hadoop.fs.FileSystem#listStatusBatch, which is intended to be 
> overridden, but the implementation used in HTTPFS has not done that, leading 
> to this limitation.
> This feature (LISTSTATUS_BATCH) was added to HTTPFS by HDFS-10823, but based 
> on my testing it does not work as intended. I suspect it is because the 
> listStatusBatch operation was added to the WebHdfsFileSystem and 
> HttpFSFileSystem as part of the above Jira, but behind the scenes HTTPFS 
> seems to use DistributeFileSystem and hence it falls back to the default 
> implementation "org.apache.hadoop.fs.FileSystem#listStatusBatch" which 
> returns all entries in a single batch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14786) A new block placement policy tolerating availability zone failure

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802427#comment-17802427
 ] 

Shilun Fan edited comment on HDFS-14786 at 1/4/24 8:12 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release. 

> A new block placement policy tolerating availability zone failure
> -
>
> Key: HDFS-14786
> URL: https://issues.apache.org/jira/browse/HDFS-14786
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement
>Reporter: Mingliang Liu
>Priority: Major
>
> {{NetworkTopology}} assumes "/datacenter/rack/host" 3 layer topology. Default 
> block placement policies are rack awareness for better fault tolerance. Newer 
> block placement policy like {{BlockPlacementPolicyRackFaultTolerant}} tries 
> its best to place the replicas to most racks, which further tolerates more 
> racks failing. HADOOP-8470 brought {{NetworkTopologyWithNodeGroup}} to add 
> another layer under rack, i.e. "/datacenter/rack/host/nodegroup" 4 layer 
> topology. With that, replicas within a rack can be placed in different node 
> groups for better isolation.
> Existing block placement policies tolerate one rack failure since at least 
> two racks are chosen in those cases. Chances are all replicas could be placed 
> in the same datacenter, though there are multiple data centers in the same 
> cluster topology. In other words, fault of higher layers beyond rack is not 
> well tolerated.
> However, more deployments in public cloud are leveraging multiple available 
> zones (AZ) for high-availability since the inter-AZ latency seems affordable 
> in many cases. In a single AZ, some cloud providers like AWS support 
> [partitioned placement 
> groups|https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-partition]
>  which basically are different racks. A simple network topology mapped to 
> HDFS is "/availabilityzone/rack/host" 3 layers.
> To achieve high availability tolerating zone failure, this JIRA proposes a 
> new data placement policy which tries its best to place replicas in most AZs, 
> most racks, and most evenly distributed.
> Examples with 3 replicas, we choose racks as following:
>  - 1AZ: fall back to {{BlockPlacementPolicyRackFaultTolerant}} to place among 
> most racks
>  - 2AZ: randomly choose one rack in one AZ and randomly choose two racks in 
> the other AZ
>  - 3AZ: randomly choose one rack in every AZ
>  - 4AZ: randomly choose three AZs and randomly choose one rack in every AZ
> After racks are picked, hosts are chosen randomly within racks honoring local 
> storage, favorite nodes, excluded nodes, storage types etc. Data may become 
> imbalance if topology is very uneven in AZs. This seems not a problem as in 
> public cloud, infrastructure provisioning is more flexible than 1P.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14903) Update access time in toCompleteFile

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802424#comment-17802424
 ] 

Shilun Fan edited comment on HDFS-14903 at 1/4/24 8:12 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release.

> Update access time in toCompleteFile
> 
>
> Key: HDFS-14903
> URL: https://issues.apache.org/jira/browse/HDFS-14903
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.0
>Reporter: lihanran
>Assignee: lihanran
>Priority: Major
> Attachments: HDFS-14903.001.patch
>
>
> when cleate a file,accesstime and modifitime are different



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14903) Update access time in toCompleteFile

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14903:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Update access time in toCompleteFile
> 
>
> Key: HDFS-14903
> URL: https://issues.apache.org/jira/browse/HDFS-14903
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.0
>Reporter: lihanran
>Assignee: lihanran
>Priority: Major
> Attachments: HDFS-14903.001.patch
>
>
> when cleate a file,accesstime and modifitime are different



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14786) A new block placement policy tolerating availability zone failure

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14786:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> A new block placement policy tolerating availability zone failure
> -
>
> Key: HDFS-14786
> URL: https://issues.apache.org/jira/browse/HDFS-14786
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement
>Reporter: Mingliang Liu
>Priority: Major
>
> {{NetworkTopology}} assumes "/datacenter/rack/host" 3 layer topology. Default 
> block placement policies are rack awareness for better fault tolerance. Newer 
> block placement policy like {{BlockPlacementPolicyRackFaultTolerant}} tries 
> its best to place the replicas to most racks, which further tolerates more 
> racks failing. HADOOP-8470 brought {{NetworkTopologyWithNodeGroup}} to add 
> another layer under rack, i.e. "/datacenter/rack/host/nodegroup" 4 layer 
> topology. With that, replicas within a rack can be placed in different node 
> groups for better isolation.
> Existing block placement policies tolerate one rack failure since at least 
> two racks are chosen in those cases. Chances are all replicas could be placed 
> in the same datacenter, though there are multiple data centers in the same 
> cluster topology. In other words, fault of higher layers beyond rack is not 
> well tolerated.
> However, more deployments in public cloud are leveraging multiple available 
> zones (AZ) for high-availability since the inter-AZ latency seems affordable 
> in many cases. In a single AZ, some cloud providers like AWS support 
> [partitioned placement 
> groups|https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-partition]
>  which basically are different racks. A simple network topology mapped to 
> HDFS is "/availabilityzone/rack/host" 3 layers.
> To achieve high availability tolerating zone failure, this JIRA proposes a 
> new data placement policy which tries its best to place replicas in most AZs, 
> most racks, and most evenly distributed.
> Examples with 3 replicas, we choose racks as following:
>  - 1AZ: fall back to {{BlockPlacementPolicyRackFaultTolerant}} to place among 
> most racks
>  - 2AZ: randomly choose one rack in one AZ and randomly choose two racks in 
> the other AZ
>  - 3AZ: randomly choose one rack in every AZ
>  - 4AZ: randomly choose three AZs and randomly choose one rack in every AZ
> After racks are picked, hosts are chosen randomly within racks honoring local 
> storage, favorite nodes, excluded nodes, storage types etc. Data may become 
> imbalance if topology is very uneven in AZs. This seems not a problem as in 
> public cloud, infrastructure provisioning is more flexible than 1P.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15513) Allow client to query snapshot status on one directory

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15513:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Allow client to query snapshot status on one directory
> --
>
> Key: HDFS-15513
> URL: https://issues.apache.org/jira/browse/HDFS-15513
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, hdfs-client
>Affects Versions: 3.3.0
>Reporter: Siyao Meng
>Priority: Major
>
> Alternatively, we can allow the client to query snapshot status on *a list 
> of* given directories by the client. Thoughts?
> Rationale:
> At the moment, we could only retrieve the full list of snapshottable 
> directories with 
> [{{getSnapshottableDirListing()}}|https://github.com/apache/hadoop/blob/233619a0a462ae2eb7e7253b6bb8ae48eaa5eb19/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L6986-L6994].
>  This leads to the inefficiency In HDFS-15492 that we have to get the 
> *entire* list of snapshottable directory to check if a file being deleted is 
> inside a snapshottable directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15714) HDFS Provided Storage Read/Write Mount Support On-the-fly

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802622#comment-17802622
 ] 

Shilun Fan commented on HDFS-15714:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> HDFS Provided Storage Read/Write Mount Support On-the-fly
> -
>
> Key: HDFS-15714
> URL: https://issues.apache.org/jira/browse/HDFS-15714
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 3.4.0
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15714-01.patch, HDFS-15714-02.patch, 
> HDFS_Provided_Storage_Design-V1.pdf, HDFS_Provided_Storage_Performance-V1.pdf
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> HDFS Provided Storage (PS) is a feature to tier HDFS over other file systems. 
> In HDFS-9806, PROVIDED storage type was introduced to HDFS. Through 
> configuring external storage with PROVIDED tag for DataNode, user can enable 
> application to access data stored externally from HDFS side. However, there 
> are two issues need to be addressed. Firstly, mounting external storage 
> on-the-fly, namely dynamic mount, is lacking. It is necessary to get it 
> supported to flexibly combine HDFS with an external storage at runtime. 
> Secondly, PS write is not supported by current HDFS. But in real 
> applications, it is common to transfer data bi-directionally for read/write 
> between HDFS and external storage.
> Through this JIRA, we are presenting our work for PS write support and 
> dynamic mount support for both read & write. Please note in the community 
> several JIRAs have been filed for these topics. Our work is based on these 
> previous community work, with new design & implementation to support called 
> writeBack mount and enable admin to add any mount on-the-fly. We appreciate 
> those folks in the community for their great contribution! See their pending 
> JIRAs: HDFS-14805 & HDFS-12090.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15714) HDFS Provided Storage Read/Write Mount Support On-the-fly

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15714:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> HDFS Provided Storage Read/Write Mount Support On-the-fly
> -
>
> Key: HDFS-15714
> URL: https://issues.apache.org/jira/browse/HDFS-15714
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 3.4.0
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15714-01.patch, HDFS-15714-02.patch, 
> HDFS_Provided_Storage_Design-V1.pdf, HDFS_Provided_Storage_Performance-V1.pdf
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> HDFS Provided Storage (PS) is a feature to tier HDFS over other file systems. 
> In HDFS-9806, PROVIDED storage type was introduced to HDFS. Through 
> configuring external storage with PROVIDED tag for DataNode, user can enable 
> application to access data stored externally from HDFS side. However, there 
> are two issues need to be addressed. Firstly, mounting external storage 
> on-the-fly, namely dynamic mount, is lacking. It is necessary to get it 
> supported to flexibly combine HDFS with an external storage at runtime. 
> Secondly, PS write is not supported by current HDFS. But in real 
> applications, it is common to transfer data bi-directionally for read/write 
> between HDFS and external storage.
> Through this JIRA, we are presenting our work for PS write support and 
> dynamic mount support for both read & write. Please note in the community 
> several JIRAs have been filed for these topics. Our work is based on these 
> previous community work, with new design & implementation to support called 
> writeBack mount and enable admin to add any mount on-the-fly. We appreciate 
> those folks in the community for their great contribution! See their pending 
> JIRAs: HDFS-14805 & HDFS-12090.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14718) HttpFS: Sort response by key names as WebHDFS does

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802428#comment-17802428
 ] 

Shilun Fan edited comment on HDFS-14718 at 1/4/24 8:11 AM:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
updated the target version for preparing 3.4.0 release. 

> HttpFS: Sort response by key names as WebHDFS does
> --
>
> Key: HDFS-14718
> URL: https://issues.apache.org/jira/browse/HDFS-14718
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Example*
> See description of HDFS-14665 for an example of LISTSTATUS.
> *Analysis*
> WebHDFS is [using a 
> TreeMap|https://github.com/apache/hadoop/blob/99bf1dc9eb18f9b4d0338986d1b8fd2232f1232f/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java#L120]
>  to serialize HdfsFileStatus, while HttpFS [uses a 
> LinkedHashMap|https://github.com/apache/hadoop/blob/6fcc5639ae32efa5a5d55a6b6cf23af06fc610c3/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/FSOperations.java#L107]
>  to serialize FileStatus.
> *Questions*
> Why the difference? Is this intentional?
> - I looked into the Git history. It seems it's simply because WebHDFS uses 
> TreeMap from the beginning; and HttpFS uses LinkedHashMap from the beginning. 
> It is not only limited to LISTSTATUS, but ALL other request's JSON 
> serialization.
> Now the real question: Could/Should we replace ALL LinkedHashMap into TreeMap 
> in HttpFS serialization in FSOperations class?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14718) HttpFS: Sort response by key names as WebHDFS does

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14718:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> HttpFS: Sort response by key names as WebHDFS does
> --
>
> Key: HDFS-14718
> URL: https://issues.apache.org/jira/browse/HDFS-14718
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Example*
> See description of HDFS-14665 for an example of LISTSTATUS.
> *Analysis*
> WebHDFS is [using a 
> TreeMap|https://github.com/apache/hadoop/blob/99bf1dc9eb18f9b4d0338986d1b8fd2232f1232f/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/JsonUtil.java#L120]
>  to serialize HdfsFileStatus, while HttpFS [uses a 
> LinkedHashMap|https://github.com/apache/hadoop/blob/6fcc5639ae32efa5a5d55a6b6cf23af06fc610c3/hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http/server/FSOperations.java#L107]
>  to serialize FileStatus.
> *Questions*
> Why the difference? Is this intentional?
> - I looked into the Git history. It seems it's simply because WebHDFS uses 
> TreeMap from the beginning; and HttpFS uses LinkedHashMap from the beginning. 
> It is not only limited to LISTSTATUS, but ALL other request's JSON 
> serialization.
> Now the real question: Could/Should we replace ALL LinkedHashMap into TreeMap 
> in HttpFS serialization in FSOperations class?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15701) Add resolveMountPath API in FileSystem

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15701:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Add resolveMountPath API in FileSystem
> --
>
> Key: HDFS-15701
> URL: https://issues.apache.org/jira/browse/HDFS-15701
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, ViewHDFS
>Affects Versions: 3.4.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>
> Currently FileSystem has an API resolvePath. To know where the path has 
> mounted, the applications can use that API as the retuned path is from actual 
> target path in the case of mount file systems like ViewFS, 
> ViewFSOverloadScheme or ViewDistributedFileSystem.
> However, resolvePath does more than what is needed by Apps when they want to 
> know where the path has mounted. It's because resolvePath internally calls 
> "getFileStatus". 
> This additional call is unnecessary when apps just want to where the path 
> mounted. 
> Since we have mounted filesystems available in FS, I think it's good to add 
> resolveMountPath API, which will just do the following.
> If the fs is mounted fs, then it will resolve it's mount tables and return 
> the actual target path. If the fs is non mounted, then it will simply return 
> the same path.
> Currently Applications like Hive, Ranger using resolvePath API. ( this is 
> forcing to do additional RPC internally)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15513) Allow client to query snapshot status on one directory

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802625#comment-17802625
 ] 

Shilun Fan commented on HDFS-15513:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Allow client to query snapshot status on one directory
> --
>
> Key: HDFS-15513
> URL: https://issues.apache.org/jira/browse/HDFS-15513
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, hdfs-client
>Affects Versions: 3.3.0
>Reporter: Siyao Meng
>Priority: Major
>
> Alternatively, we can allow the client to query snapshot status on *a list 
> of* given directories by the client. Thoughts?
> Rationale:
> At the moment, we could only retrieve the full list of snapshottable 
> directories with 
> [{{getSnapshottableDirListing()}}|https://github.com/apache/hadoop/blob/233619a0a462ae2eb7e7253b6bb8ae48eaa5eb19/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L6986-L6994].
>  This leads to the inefficiency In HDFS-15492 that we have to get the 
> *entire* list of snapshottable directory to check if a file being deleted is 
> inside a snapshottable directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15552) Let DeadNode Detector also work for EC cases

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802623#comment-17802623
 ] 

Shilun Fan commented on HDFS-15552:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Let DeadNode Detector also work for EC cases
> 
>
> Key: HDFS-15552
> URL: https://issues.apache.org/jira/browse/HDFS-15552
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: dfsclient, ec
>Affects Versions: 3.3.0
>Reporter: dark_num
>Assignee: imbajin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, the EC stream (`DFSStripedInputStream`) is not handled properly 
> while exception occurs.
> For example, while reading EC-blocks, if the client timed out when connecting 
> to the DataNode, it will throws `SocketTimeoutException` , then add current 
> DN to localDeadNode.
> However, the local dead nodes will not be removed until the stream is closed, 
> which will cause *missing block IOException* to be thrown in the use scenario 
> of Hbase.
> So we need to use detector to deal with dead nodes under EC to avoid reading 
> failures.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15552) Let DeadNode Detector also work for EC cases

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15552:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Let DeadNode Detector also work for EC cases
> 
>
> Key: HDFS-15552
> URL: https://issues.apache.org/jira/browse/HDFS-15552
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: dfsclient, ec
>Affects Versions: 3.3.0
>Reporter: dark_num
>Assignee: imbajin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, the EC stream (`DFSStripedInputStream`) is not handled properly 
> while exception occurs.
> For example, while reading EC-blocks, if the client timed out when connecting 
> to the DataNode, it will throws `SocketTimeoutException` , then add current 
> DN to localDeadNode.
> However, the local dead nodes will not be removed until the stream is closed, 
> which will cause *missing block IOException* to be thrown in the use scenario 
> of Hbase.
> So we need to use detector to deal with dead nodes under EC to avoid reading 
> failures.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16380) Result in unnecessary requeue on Observer NameNode when access by RBF in some scenes

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802621#comment-17802621
 ] 

Shilun Fan commented on HDFS-16380:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Result in unnecessary requeue on Observer NameNode when access by RBF in some 
> scenes
> 
>
> Key: HDFS-16380
> URL: https://issues.apache.org/jira/browse/HDFS-16380
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.3.1
>Reporter: zy.jordan
>Priority: Major
> Fix For: 3.3.1
>
>   Original Estimate: 40h
>  Remaining Estimate: 40h
>
> As HDFS-13522 introduced, user can access Observer NameNode by RBF. But it 
> will result in Extra requeue on Observer side, because RBF maintain an global 
> alignContext for each namespace. When a call go ANN, it update stateId in 
> alignContext instead of msync time arrived, and it will requeue on ONN side. 
> In some scenes (eg, Adhoc query), it is unnecessary.
> Separate alignContext for different scenes, for example 
> AlignContext-A(msync:5s) for adhoc, AlignContext-B(msync:0s) for ETL. Maybe 
> it can reduce requeue time on ONN.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16000:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802619#comment-17802619
 ] 

Shilun Fan commented on HDFS-16000:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> HDFS : Rename performance optimization
> --
>
> Key: HDFS-16000
> URL: https://issues.apache.org/jira/browse/HDFS-16000
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namenode
>Affects Versions: 3.1.4, 3.3.1
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16133) Support refresh of IP addresses behind DNS for clients

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16133:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Support refresh of IP addresses behind DNS for clients
> --
>
> Key: HDFS-16133
> URL: https://issues.apache.org/jira/browse/HDFS-16133
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Reporter: Srinidhi V K
>Assignee: Srinidhi V K
>Priority: Major
>
> Support for using a single DNS for clients was added as part of HDFS-14118. 
> Java client does the resolution once and caches it. This causes a problem 
> whenever a node is added or removed behind DNS. The idea with this task is to 
> handle this scenario and refresh the IP addresses automatically in Java 
> client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16380) Result in unnecessary requeue on Observer NameNode when access by RBF in some scenes

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16380:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Result in unnecessary requeue on Observer NameNode when access by RBF in some 
> scenes
> 
>
> Key: HDFS-16380
> URL: https://issues.apache.org/jira/browse/HDFS-16380
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.3.1
>Reporter: zy.jordan
>Priority: Major
> Fix For: 3.3.1
>
>   Original Estimate: 40h
>  Remaining Estimate: 40h
>
> As HDFS-13522 introduced, user can access Observer NameNode by RBF. But it 
> will result in Extra requeue on Observer side, because RBF maintain an global 
> alignContext for each namespace. When a call go ANN, it update stateId in 
> alignContext instead of msync time arrived, and it will requeue on ONN side. 
> In some scenes (eg, Adhoc query), it is unnecessary.
> Separate alignContext for different scenes, for example 
> AlignContext-A(msync:5s) for adhoc, AlignContext-B(msync:0s) for ETL. Maybe 
> it can reduce requeue time on ONN.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16431) Truncate CallerContext in client side

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802618#comment-17802618
 ] 

Shilun Fan commented on HDFS-16431:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Truncate CallerContext in client side
> -
>
> Key: HDFS-16431
> URL: https://issues.apache.org/jira/browse/HDFS-16431
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nn
>Reporter: Chengwei Wang
>Assignee: Chengwei Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The context of CallerContext would be truncated  when  it exceeds the maximum 
> allowed length in server side. I think it's better to do check and truncate 
> in client side to reduce the unnecessary overhead of network and memory for 
> NN.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16431) Truncate CallerContext in client side

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16431:
--
Target Version/s: 3.3.1, 3.5.0  (was: 3.3.1, 3.4.1)

> Truncate CallerContext in client side
> -
>
> Key: HDFS-16431
> URL: https://issues.apache.org/jira/browse/HDFS-16431
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: nn
>Reporter: Chengwei Wang
>Assignee: Chengwei Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The context of CallerContext would be truncated  when  it exceeds the maximum 
> allowed length in server side. I think it's better to do check and truncate 
> in client side to reduce the unnecessary overhead of network and memory for 
> NN.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16762) Make the default value of dfs.federation.router.client.allow-partial-listing as false.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16762:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Make the default value of dfs.federation.router.client.allow-partial-listing 
> as false.
> --
>
> Key: HDFS-16762
> URL: https://issues.apache.org/jira/browse/HDFS-16762
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Chengwei Wang
>Assignee: Chengwei Wang
>Priority: Major
>  Labels: pull-request-available
>
>  AS the default value of 
> _*dfs.federation.router.client.allow-partial-listing*_ is {*}_true_{*},  the 
> hdfs client will got _*partial result*_ when one or more of the subclusters 
> are unavailable for no permissions or other Exceptions, but _*user may not 
> know.*_ It will lead to some fault. 
> So I think it's better to make the default value as false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16762) Make the default value of dfs.federation.router.client.allow-partial-listing as false.

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802617#comment-17802617
 ] 

Shilun Fan commented on HDFS-16762:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Make the default value of dfs.federation.router.client.allow-partial-listing 
> as false.
> --
>
> Key: HDFS-16762
> URL: https://issues.apache.org/jira/browse/HDFS-16762
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Chengwei Wang
>Assignee: Chengwei Wang
>Priority: Major
>  Labels: pull-request-available
>
>  AS the default value of 
> _*dfs.federation.router.client.allow-partial-listing*_ is {*}_true_{*},  the 
> hdfs client will got _*partial result*_ when one or more of the subclusters 
> are unavailable for no permissions or other Exceptions, but _*user may not 
> know.*_ It will lead to some fault. 
> So I think it's better to make the default value as false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17252) RBF: getListing results is incorrect.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17252:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> RBF: getListing results is incorrect.
> -
>
> Key: HDFS-17252
> URL: https://issues.apache.org/jira/browse/HDFS-17252
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Priority: Major
>
> Suppose we have two subclusters ns1, ns2.
> Also we have below mount table entries:
> /user/zhb/test_ec -> ns2 -> /user/zhb/test_ec
> We also create diretory /user/zhb/test_ec in ns1. 
> Then we use configuration of router to execute ls command.
> hdfs dfs -ls /user/zhb/test_ec
> we will get the following error message:
> File hdfs://dap-hdfs-dev/user/zhb/test_ec/test_ec does not exist.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17245) Add getAverageLoadOneMinute RPC in ClientDatanodeProtocol.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17245:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Add getAverageLoadOneMinute RPC in ClientDatanodeProtocol.
> --
>
> Key: HDFS-17245
> URL: https://issues.apache.org/jira/browse/HDFS-17245
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>
> I think we should add a getAverageLoadOneMinute RPC in ClientDatanodeProtocol 
> to get the average load of the datanode in recent one minute. 
> We can use this RPC in HDFS client side and other places. For example, in 
> method addDatanode2ExistingPipeline.  We can choose the datanode with less 
> average load as source datanode and then transfer block to new datanode which 
> is added to existed pipeline.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-17245) Add getAverageLoadOneMinute RPC in ClientDatanodeProtocol.

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802402#comment-17802402
 ] 

Shilun Fan edited comment on HDFS-17245 at 1/4/24 8:09 AM:
---

We do not need to fill in the fix version. This field needs to be filled in 
only after jira is completed.

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


was (Author: slfan1989):
We do not need to fill in the fix version. This field needs to be filled in 
only after jira is completed.

> Add getAverageLoadOneMinute RPC in ClientDatanodeProtocol.
> --
>
> Key: HDFS-17245
> URL: https://issues.apache.org/jira/browse/HDFS-17245
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>
> I think we should add a getAverageLoadOneMinute RPC in ClientDatanodeProtocol 
> to get the average load of the datanode in recent one minute. 
> We can use this RPC in HDFS client side and other places. For example, in 
> method addDatanode2ExistingPipeline.  We can choose the datanode with less 
> average load as source datanode and then transfer block to new datanode which 
> is added to existed pipeline.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17239) Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17239:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write 
> lock. 
> -
>
> Key: HDFS-17239
> URL: https://issues.apache.org/jira/browse/HDFS-17239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> In method FsDatasetImpl#removeReplicaFromMem,  there exists some loggings in 
> the range of BLOCK_POOl write lock.  We should move them out of write lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17280) Pipeline recovery should better end block in advance when bytes acked greater than half of blocksize.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17280:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Pipeline recovery should better end block in advance when bytes acked greater 
> than half of blocksize.
> -
>
> Key: HDFS-17280
> URL: https://issues.apache.org/jira/browse/HDFS-17280
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17239) Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write lock.

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802612#comment-17802612
 ] 

Shilun Fan commented on HDFS-17239:
---

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Remove logging of method removeReplicaFromMem which is in BLOCK_POOl write 
> lock. 
> -
>
> Key: HDFS-17239
> URL: https://issues.apache.org/jira/browse/HDFS-17239
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> In method FsDatasetImpl#removeReplicaFromMem,  there exists some loggings in 
> the range of BLOCK_POOl write lock.  We should move them out of write lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17289) Considering the size of non-lastBlocks equals to complete block size can cause append failure.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17289:
--
Target Version/s: 3.5.0  (was: 3.4.1)

> Considering the size of non-lastBlocks equals to complete block size can 
> cause append failure.
> --
>
> Key: HDFS-17289
> URL: https://issues.apache.org/jira/browse/HDFS-17289
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.6
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   >