[jira] [Work logged] (HDFS-15974) RBF: Unable to display the datanode UI of the router

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15974?focusedWorklogId=585571=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585571
 ]

ASF GitHub Bot logged work on HDFS-15974:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 05:53
Start Date: 20/Apr/21 05:53
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2915:
URL: https://github.com/apache/hadoop/pull/2915#issuecomment-822993787


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 39s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  36m  8s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 28s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 15s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  14m 20s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 28s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 17s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 48s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  14m 14s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  17m 46s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2915/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 34s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  95m 34s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2915/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2915 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 8a84274adc84 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 1ac78a83e20f267b8c5da2831823c594f01e8951 |
   | Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2915/2/testReport/ |
   | Max. process+thread count | 2352 (vs. ulimit of 5500) |
   | modules 

[jira] [Work logged] (HDFS-15989) Split TestBalancer into two classes

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15989?focusedWorklogId=585554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585554
 ]

ASF GitHub Bot logged work on HDFS-15989:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 05:00
Start Date: 20/Apr/21 05:00
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on a change in pull request #2923:
URL: https://github.com/apache/hadoop/pull/2923#discussion_r616344648



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer2.java
##
@@ -0,0 +1,746 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs.server.balancer;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.StorageType;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.hdfs.DFSClient;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
+import org.apache.hadoop.hdfs.DFSTestUtil;
+import org.apache.hadoop.hdfs.DFSUtil;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.HdfsConfiguration;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hdfs.NameNodeProxies;
+import org.apache.hadoop.hdfs.client.HdfsClientConfigKeys;
+import org.apache.hadoop.hdfs.protocol.ClientProtocol;
+import org.apache.hadoop.hdfs.protocol.DatanodeID;
+import org.apache.hadoop.hdfs.protocol.DatanodeInfo;
+import org.apache.hadoop.hdfs.protocol.HdfsConstants;
+import org.apache.hadoop.hdfs.protocol.LocatedBlock;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy;
+import 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyWithUpgradeDomain;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementStatus;
+import org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager;
+import org.apache.hadoop.hdfs.server.datanode.DataNode;
+import org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils;
+import org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset;
+import 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.hadoop.test.GenericTestUtils;
+import org.junit.After;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.slf4j.event.Level;
+
+import java.io.OutputStream;
+import java.net.InetSocketAddress;
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import static org.apache.hadoop.fs.StorageType.DEFAULT;
+import static org.apache.hadoop.fs.StorageType.RAM_DISK;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BALANCER_MAX_SIZE_TO_MOVE_KEY;
+import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BLOCK_SIZE_KEY;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_BLOCK_PINNING_ENABLED;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_LAZY_WRITER_INTERVAL_SEC;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY;
+import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_LAZY_PERSIST_FILE_SCRUB_INTERVAL_SEC;
+import static org.apache.hadoop.test.PlatformAssumptions.assumeNotWindows;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Some long running Balancer tasks.
+ */
+public class TestBalancer2 {

Review comment:
   Thanks @aajisaka. Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking

[jira] [Commented] (HDFS-15963) Unreleased volume references cause an infinite loop

2021-04-19 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325472#comment-17325472
 ] 

Xiaoqiao He commented on HDFS-15963:


[~zhangshuyan] would you like to offer another patch for branch-3.3, just check 
it could not cherry-pick back to branch-3.3 smooth.

> Unreleased volume references cause an infinite loop
> ---
>
> Key: HDFS-15963
> URL: https://issues.apache.org/jira/browse/HDFS-15963
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, 
> HDFS-15963.003.patch
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> When BlockSender throws an exception because the meta-data cannot be found, 
> the volume reference obtained by the thread is not released, which causes the 
> thread trying to remove the volume to wait and fall into an infinite loop.
> {code:java}
> boolean checkVolumesRemoved() {
>   Iterator it = volumesBeingRemoved.iterator();
>   while (it.hasNext()) {
> FsVolumeImpl volume = it.next();
> if (!volume.checkClosed()) {
>   return false;
> }
> it.remove();
>   }
>   return true;
> }
> boolean checkClosed() {
>   // always be true.
>   if (this.reference.getReferenceCount() > 0) {
> FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.",
> this, reference.getReferenceCount());
> return false;
>   }
>   return true;
> }
> {code}
> At the same time, because the thread has been holding checkDirsLock when 
> removing the volume, other threads trying to acquire the same lock will be 
> permanently blocked.
> Similar problems also occur in RamDiskAsyncLazyPersistService and 
> FsDatasetAsyncDiskService.
> This patch releases the three previously unreleased volume references.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15989) Split TestBalancer into two classes

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15989?focusedWorklogId=585551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585551
 ]

ASF GitHub Bot logged work on HDFS-15989:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 04:29
Start Date: 20/Apr/21 04:29
Worklog Time Spent: 10m 
  Work Description: aajisaka commented on a change in pull request #2923:
URL: https://github.com/apache/hadoop/pull/2923#discussion_r616335313



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer2.java
##
@@ -0,0 +1,746 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs.server.balancer;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.StorageType;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.hdfs.DFSClient;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
+import org.apache.hadoop.hdfs.DFSTestUtil;
+import org.apache.hadoop.hdfs.DFSUtil;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.HdfsConfiguration;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hdfs.NameNodeProxies;
+import org.apache.hadoop.hdfs.client.HdfsClientConfigKeys;
+import org.apache.hadoop.hdfs.protocol.ClientProtocol;
+import org.apache.hadoop.hdfs.protocol.DatanodeID;
+import org.apache.hadoop.hdfs.protocol.DatanodeInfo;
+import org.apache.hadoop.hdfs.protocol.HdfsConstants;
+import org.apache.hadoop.hdfs.protocol.LocatedBlock;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy;
+import 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyWithUpgradeDomain;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementStatus;
+import org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager;
+import org.apache.hadoop.hdfs.server.datanode.DataNode;
+import org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils;
+import org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset;
+import 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.hadoop.test.GenericTestUtils;
+import org.junit.After;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.slf4j.event.Level;
+
+import java.io.OutputStream;
+import java.net.InetSocketAddress;
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import static org.apache.hadoop.fs.StorageType.DEFAULT;
+import static org.apache.hadoop.fs.StorageType.RAM_DISK;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BALANCER_MAX_SIZE_TO_MOVE_KEY;
+import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BLOCK_SIZE_KEY;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_BLOCK_PINNING_ENABLED;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_LAZY_WRITER_INTERVAL_SEC;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY;
+import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_LAZY_PERSIST_FILE_SCRUB_INTERVAL_SEC;
+import static org.apache.hadoop.test.PlatformAssumptions.assumeNotWindows;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Some long running Balancer tasks.
+ */
+public class TestBalancer2 {

Review comment:
   Would you rename the class to describe the test cases, such as 
TestBalancerLongRunningTasks?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact 

[jira] [Work logged] (HDFS-15974) RBF: Unable to display the datanode UI of the router

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15974?focusedWorklogId=585548=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585548
 ]

ASF GitHub Bot logged work on HDFS-15974:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 04:19
Start Date: 20/Apr/21 04:19
Worklog Time Spent: 10m 
  Work Description: zhuxiangyi commented on a change in pull request #2915:
URL: https://github.com/apache/hadoop/pull/2915#discussion_r616332454



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/NamenodeBeanMetrics.java
##
@@ -855,7 +855,7 @@ public long getNumberOfSnapshottableDirs() {
 
   @Override
   public String getEnteringMaintenanceNodes() {
-return "N/A";
+return null;

Review comment:
   Thanks for your review @goiri , I think returning "{}" is more friendly 
than "null". I submitted new code and added tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585548)
Time Spent: 0.5h  (was: 20m)

> RBF: Unable to display the datanode UI of the router
> 
>
> Key: HDFS-15974
> URL: https://issues.apache.org/jira/browse/HDFS-15974
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, ui
>Affects Versions: 3.4.0
>Reporter: zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15358-1.patch, image-2021-04-15-11-36-47-644.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Clicking the Datanodes tag on the Router UI does not respond.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15975) Use LongAdder instead of AtomicLong

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15975?focusedWorklogId=585547=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585547
 ]

ASF GitHub Bot logged work on HDFS-15975:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 04:18
Start Date: 20/Apr/21 04:18
Worklog Time Spent: 10m 
  Work Description: tomscut opened a new pull request #2936:
URL: https://github.com/apache/hadoop/pull/2936


   JIRA: [HDFS-15975](https://issues.apache.org/jira/browse/HDFS-15975)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585547)
Time Spent: 4h 10m  (was: 4h)

> Use LongAdder instead of AtomicLong
> ---
>
> Key: HDFS-15975
> URL: https://issues.apache.org/jira/browse/HDFS-15975
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When counting some indicators, we can use LongAdder instead of AtomicLong to 
> improve performance. The long value is not an atomic snapshot in LongAdder, 
> but I think we can tolerate that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8023) Erasure Coding: retrieve eraure coding schema for a file from NameNode

2021-04-19 Thread Stone (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-8023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325441#comment-17325441
 ] 

Stone commented on HDFS-8023:
-

To use EC feature I will upgrade hdfs from 2.8 to 3.0+. if upgrade active 
NameNode first,standByNode will read new editlog including  EC feature,then 
standByNode will failure as it can not resolve ec editlog.How to upgrade NN 

> Erasure Coding: retrieve eraure coding schema for a file from NameNode
> --
>
> Key: HDFS-8023
> URL: https://issues.apache.org/jira/browse/HDFS-8023
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Vinayakumar B
>Priority: Major
> Fix For: HDFS-7285
>
> Attachments: HDFS-8023-01.patch, HDFS-8023-02.patch
>
>
> NameNode needs to provide RPC call for client and tool to retrieve eraure 
> coding schema for a file from NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15990) No need to write to sub_section when serialize SnapshotDiff Section

2021-04-19 Thread Xuze Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325409#comment-17325409
 ] 

Xuze Yang edited comment on HDFS-15990 at 4/20/21, 1:30 AM:


[~sodonnell] Thank you for your timely reply. I got it. Now I just think it may 
cause a little confusion. Would code comments be a better choice?Anyway, this 
is not a big problem.

I want to ask another question. In HDFS-7784 and HDFS-14617, they both aim to 
load INodeSection and INodeDirectorySection in parallel. But their 
implementation is a bit different. Take INodeSection for example, HDFS-7784 
divide INodeSection into several INodeSection, while HDFS-14617 introduce 
INodeSection_Sub. In practice, HDFS-14617 may encounter downgrade problems(see 
HDFS-14771), but HDFS-7784 does not have this problem. So I want to ask why you 
choose HDFS-14617 implementation. In another word, compare to HDFS-7784, what 
are the advantages of HDFS-14617.

Looking forward to your answer, thanks!


was (Author: xuze yang):
[~sodonnell] Thank you for your timely reply. I got it. Now I just think it may 
cause a lit confusion. Would code comments be a better choice?Anyway, this is 
not a big problem.

I want to ask another question. In HDFS-7784 and HDFS-14617, they both aim to 
load INodeSection and INodeDirectorySection in parallel. But their 
implementation is a bit different. Take INodeSection for example, HDFS-7784 
divide INodeSection into several INodeSection, while HDFS-14617 introduce 
INodeSection_Sub. In practice, HDFS-14617 may encounter downgrade problems(see 
HDFS-14771), but HDFS-7784 does not have this problem. So I want to ask why you 
choose HDFS-14617 implementation. In another word, compare to HDFS-7784, what 
are the advantages of HDFS-14617.

Looking forward to your answer, thanks!

> No need to write to sub_section when serialize SnapshotDiff Section
> ---
>
> Key: HDFS-15990
> URL: https://issues.apache.org/jira/browse/HDFS-15990
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Xuze Yang
>Priority: Minor
>
> In FSImageFormatPBSnapshot.serializeSnapshotDiffSection(), the following code 
> exists:
> {code:java}
> if (i % parent.getInodesPerSubSection() == 0) {
>   parent.commitSubSection(headers,
>   FSImageFormatProtobuf.SectionName.SNAPSHOT_DIFF_SUB);
> }{code}
> It aims to serialize SnapshotDiff information into several sub_sections(i.e. 
> additional sub_sections information will be written to FileSummary Section). 
> But in FSImageFormatPBSnapshot.loadSnapshotDiffSection(), it treats 
> SnapshotDiffSection as a whole, rather than several sub_sections. So it's no 
> need to introduce sub_sections here.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15990) No need to write to sub_section when serialize SnapshotDiff Section

2021-04-19 Thread Xuze Yang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325409#comment-17325409
 ] 

Xuze Yang commented on HDFS-15990:
--

[~sodonnell] Thank you for your timely reply. I got it. Now I just think it may 
cause a lit confusion. Would code comments be a better choice?Anyway, this is 
not a big problem.

I want to ask another question. In HDFS-7784 and HDFS-14617, they both aim to 
load INodeSection and INodeDirectorySection in parallel. But their 
implementation is a bit different. Take INodeSection for example, HDFS-7784 
divide INodeSection into several INodeSection, while HDFS-14617 introduce 
INodeSection_Sub. In practice, HDFS-14617 may encounter downgrade problems(see 
HDFS-14771), but HDFS-7784 does not have this problem. So I want to ask why you 
choose HDFS-14617 implementation. In another word, compare to HDFS-7784, what 
are the advantages of HDFS-14617.

Looking forward to your answer, thanks!

> No need to write to sub_section when serialize SnapshotDiff Section
> ---
>
> Key: HDFS-15990
> URL: https://issues.apache.org/jira/browse/HDFS-15990
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Xuze Yang
>Priority: Minor
>
> In FSImageFormatPBSnapshot.serializeSnapshotDiffSection(), the following code 
> exists:
> {code:java}
> if (i % parent.getInodesPerSubSection() == 0) {
>   parent.commitSubSection(headers,
>   FSImageFormatProtobuf.SectionName.SNAPSHOT_DIFF_SUB);
> }{code}
> It aims to serialize SnapshotDiff information into several sub_sections(i.e. 
> additional sub_sections information will be written to FileSummary Section). 
> But in FSImageFormatPBSnapshot.loadSnapshotDiffSection(), it treats 
> SnapshotDiffSection as a whole, rather than several sub_sections. So it's no 
> need to introduce sub_sections here.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=585507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585507
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 01:22
Start Date: 20/Apr/21 01:22
Worklog Time Spent: 10m 
  Work Description: functioner edited a comment on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-822806901


   > No, it's a non-blocking write so by definition it will never hang – unless 
induced by fault injection.
   
   @daryn-sharp I have considered this counterargument in 
https://github.com/apache/hadoop/pull/2737#issuecomment-822591028, where I 
proposed another argument that it may hang when it's a huge payload, because 
there's a while-loop. Please take a look.
   
   Yes, `channel.write(buffer)` is non-blocking. But I suspect that 
`channelIO(null, channel, buffer)` is blocking, otherwise we won't have the 
performance issue in 
[HDFS-15486](https://issues.apache.org/jira/browse/HDFS-15486). Within 
`channelIO(null, channel, buffer)`, the large payload is split into multiple 
parts, and it will won't jump out of the loop until the remaining part of 
payload does not exceed the buffer limit, meaning that it's waiting for the 
network to finish sending some content.
   
   I'm not sure whether it can defend my argument. Can you provide more 
explanation? Maybe I'm not correct. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585507)
Time Spent: 6.5h  (was: 6h 20m)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();   // line 
> 365
> } else {
>   

[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=585505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585505
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 20/Apr/21 01:18
Start Date: 20/Apr/21 01:18
Worklog Time Spent: 10m 
  Work Description: functioner edited a comment on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-822806901


   > No, it's a non-blocking write so by definition it will never hang – unless 
induced by fault injection.
   
   @daryn-sharp I have considered this counterargument in 
https://github.com/apache/hadoop/pull/2737#issuecomment-822591028, where I 
proposed another argument that it may hang when it's a huge payload, because 
there's a while-loop. Please take a look.
   
   Yes, `channel.write(buffer)` is non-blocking. But I suspect that 
`channelIO(null, channel, buffer)` is blocking, otherwise we won't have the 
performance issue in 
[HDFS-15486](https://issues.apache.org/jira/browse/HDFS-15486). Within 
`channelIO(null, channel, buffer)`, the large payload is split into multiple 
parts, and only the last call is non-blocking due to the buffer limit.
   
   I'm not sure whether it can defend my argument. Can you provide more 
explanation? Maybe I'm not correct. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585505)
Time Spent: 6h 20m  (was: 6h 10m)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();   // line 
> 365
> } else {
>   call.abortResponse(syncEx);
> }
>   } catch (Exception e) {} // don't care if not sent.
> }
> // ...
>   

[jira] [Commented] (HDFS-15614) Initialize snapshot trash root during NameNode startup if enabled

2021-04-19 Thread Arpit Agarwal (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325352#comment-17325352
 ] 

Arpit Agarwal commented on HDFS-15614:
--

bq. , if providing an external command to create the Trash directory by admins 
is feasible and makes sense

The external command will add more friction to enabling the feature. We want it 
to be transparent as far as possible. I like the option to auto create the 
.Trash dir better.

> Initialize snapshot trash root during NameNode startup if enabled
> -
>
> Key: HDFS-15614
> URL: https://issues.apache.org/jira/browse/HDFS-15614
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up to HDFS-15607.
> Goal:
> Initialize (create) snapshot trash root for all existing snapshottable 
> directories if {{dfs.namenode.snapshot.trashroot.enabled}} is set to 
> {{true}}. So admins won't have to run {{dfsadmin -provisionTrash}} manually 
> on all those existing snapshottable directories.
> The change is expected to land in {{FSNamesystem}}.
> Discussion:
> 1. Currently in HDFS-15607, the snapshot trash root creation logic is on the 
> client side. But in order for NN to create it at startup, the logic must 
> (also) be implemented on the server side as well. -- which is also a 
> requirement by WebHDFS (HDFS-15612).
> 2. Alternatively, we can provide an extra parameter to the 
> {{-provisionTrash}} command like: {{dfsadmin -provisionTrash -all}} to 
> initialize/provision trash root on all existing snapshottable dirs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15989) Split TestBalancer into two classes

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15989?focusedWorklogId=585413=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585413
 ]

ASF GitHub Bot logged work on HDFS-15989:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 22:26
Start Date: 19/Apr/21 22:26
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2923:
URL: https://github.com/apache/hadoop/pull/2923#issuecomment-822827055


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 54s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  39m 10s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 34s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  7s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 31s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 24s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  17m  5s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 23s |  |  
hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with 
JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 0 new + 469 unchanged - 1 
fixed = 469 total (was 470)  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m 11s |  |  
hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 generated 0 new 
+ 453 unchanged - 1 fixed = 453 total (was 454)  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 56s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2923/11/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 7 new + 158 unchanged 
- 49 fixed = 165 total (was 207)  |
   | +1 :green_heart: |  mvnsite  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 50s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 24s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  16m 50s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 389m 18s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2923/11/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 46s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 484m 41s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.web.TestWebHdfsFileSystemContract |
   |   | hadoop.hdfs.TestPersistBlocks |
   |   | 
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor |
   |   | hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand |
   |   | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
   |   | hadoop.hdfs.TestDFSShell |
   |   | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby |
   |   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
   |   | hadoop.hdfs.server.namenode.TestFileTruncate |
   |   | 

[jira] [Work logged] (HDFS-15957) The ignored IOException in the RPC response sent by FSEditLogAsync can cause the HDFS client to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15957?focusedWorklogId=585412=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585412
 ]

ASF GitHub Bot logged work on HDFS-15957:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 22:24
Start Date: 19/Apr/21 22:24
Worklog Time Spent: 10m 
  Work Description: functioner commented on a change in pull request #2878:
URL: https://github.com/apache/hadoop/pull/2878#discussion_r616213541



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
##
@@ -378,13 +381,18 @@ public void logSyncWait() {
 
 @Override
 public void logSyncNotify(RuntimeException syncEx) {
-  try {
-if (syncEx == null) {
-  call.sendResponse();
-} else {
-  call.abortResponse(syncEx);
+  for (int retries = 0; retries <= RESPONSE_SEND_RETRIES; retries++) {

Review comment:
   @daryn-sharp Thanks for your explanation!
   Now I understand that:
   1. If the response cannot be sent it's either because the connection is 
already closed or there's a bug preventing the encoding of the response.
   2. If a speculated future bug left the connection open it's in an 
unknown/inconsistent state with possible partial data written so writing 
anything more is a corrupted or duplicate response for the client.
   
   And I have more questions:
   1. What kind of messages will be sent in this specific `call.sendResponse()`?
   2. Among these messages, are there critical ones? I mean, maybe some 
messages are important to the protocol and we don't want to lose them. If we 
delegate them to a dedicated RPC thread/service/framework, the 
disconnection/exception can be automatically handled so that those important 
messages can reliably reach the receiver's side.
   
   The `call.sendResponse()` is invoked after the edit log sync finishes, so 
the transaction has been done. However, it swallows the possible exception so 
it doesn't care whether it successfully replies (to datanode/client). Even if 
we have the implication of disconnection in this scenario, I was wondering if 
it's possible that a bug is hidden by this possibly ignored message?
   
   For example, If the receiver (datanode/client) is doing some end-to-end 
retry (e.g., can't get the reply and request the same transaction again), then 
this identical transaction may get rejected because it's already committed in 
the edit log. Probably we should at least add some warning/logging when 
`call.sendResponse()` fails to send the message.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585412)
Time Spent: 1.5h  (was: 1h 20m)

> The ignored IOException in the RPC response sent by FSEditLogAsync can cause 
> the HDFS client to hang
> 
>
> Key: HDFS-15957
> URL: https://issues.apache.org/jira/browse/HDFS-15957
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Priority: Critical
>  Labels: pull-request-available
> Attachments: fsshell.txt, namenode.txt, reproduce.patch, 
> secondnamenode.txt
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
>     In FSEditLogAsync, the RpcEdit notification in line 248 could be skipped, 
> because the possible exception (e.g., IOException) thrown in line 365 is 
> always ignored.
>  
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
> class FSEditLogAsync extends FSEditLog implements Runnable {
>   // ...
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit();
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit();
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId());
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
>

[jira] [Work logged] (HDFS-15957) The ignored IOException in the RPC response sent by FSEditLogAsync can cause the HDFS client to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15957?focusedWorklogId=585405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585405
 ]

ASF GitHub Bot logged work on HDFS-15957:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 22:17
Start Date: 19/Apr/21 22:17
Worklog Time Spent: 10m 
  Work Description: functioner commented on a change in pull request #2878:
URL: https://github.com/apache/hadoop/pull/2878#discussion_r616213541



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
##
@@ -378,13 +381,18 @@ public void logSyncWait() {
 
 @Override
 public void logSyncNotify(RuntimeException syncEx) {
-  try {
-if (syncEx == null) {
-  call.sendResponse();
-} else {
-  call.abortResponse(syncEx);
+  for (int retries = 0; retries <= RESPONSE_SEND_RETRIES; retries++) {

Review comment:
   @daryn-sharp Thanks for your explanation!
   Now I understand that:
   1. If the response cannot be sent it's either because the connection is 
already closed or there's a bug preventing the encoding of the response.
   2. If a speculated future bug left the connection open it's in an 
unknown/inconsistent state with possible partial data written so writing 
anything more is a corrupted or duplicate response for the client.
   
   And I have more questions:
   1. What kind of messages will be sent in this specific `call.sendResponse()`?
   2. Among these messages, are there critical ones? I mean, maybe some 
messages are important to the protocol and we don't want to lose them. If we 
delegate them to a dedicated RPC thread/service/framework, the 
disconnection/exception can be automatically handled so that those important 
messages can reliably reach the receiver's side.
   
   The `call.sendResponse()` is invoked after the edit log sync finishes, so 
the transaction has been done. However, it swallows the possible exception so 
it doesn't care whether it successfully replies (to datanode/client). Even if 
we have the implication of disconnection in this scenario, I was wondering if 
it's possible that a bug is hidden by this possibly ignored message? Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585405)
Time Spent: 1h 20m  (was: 1h 10m)

> The ignored IOException in the RPC response sent by FSEditLogAsync can cause 
> the HDFS client to hang
> 
>
> Key: HDFS-15957
> URL: https://issues.apache.org/jira/browse/HDFS-15957
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Priority: Critical
>  Labels: pull-request-available
> Attachments: fsshell.txt, namenode.txt, reproduce.patch, 
> secondnamenode.txt
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
>     In FSEditLogAsync, the RpcEdit notification in line 248 could be skipped, 
> because the possible exception (e.g., IOException) thrown in line 365 is 
> always ignored.
>  
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
> class FSEditLogAsync extends FSEditLog implements Runnable {
>   // ...
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit();
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit();
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId());
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);   // line 
> 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
>   // the calling rpc thread will 

[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=585385=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585385
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 21:45
Start Date: 19/Apr/21 21:45
Worklog Time Spent: 10m 
  Work Description: functioner commented on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-822806901


   > No, it's a non-blocking write so by definition it will never hang – unless 
induced by fault injection.
   
   @daryn-sharp I have considered this counterargument in 
https://github.com/apache/hadoop/pull/2737#issuecomment-822591028, where I 
proposed another argument that it may hang when it's a huge payload, because 
there's a while-loop. Please take a look.
   
   I'm not sure whether it can defend my argument. Can you provide more 
explanation? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585385)
Time Spent: 6h 10m  (was: 6h)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();   // line 
> 365
> } else {
>   call.abortResponse(syncEx);
> }
>   } catch (Exception e) {} // don't care if not sent.
> }
> // ...
>   }{code}
>     If the sendResponse operation in line 365 gets stuck, then the whole 
> FSEditLogAsync thread is not able to proceed. In this case, the critical 
> logSync (line 243) can’t be executed, for the incoming transactions. Then the 
> namenode hangs. This is undesirable because FSEditLogAsync’s key feature is 
> asynchronous edit logging that is supposed to tolerate slow I/O.
>     To see why the sendResponse operation 

[jira] [Commented] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-04-19 Thread Daryn Sharp (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325331#comment-17325331
 ] 

Daryn Sharp commented on HDFS-15869:


This is again a fault injection only issue.  A non-blocking write will by 
definition not block.  The cited ZKs issues appear to be regarding 
serialization + blocking write inside of a synchronized section.

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();   // line 
> 365
> } else {
>   call.abortResponse(syncEx);
> }
>   } catch (Exception e) {} // don't care if not sent.
> }
> // ...
>   }{code}
>     If the sendResponse operation in line 365 gets stuck, then the whole 
> FSEditLogAsync thread is not able to proceed. In this case, the critical 
> logSync (line 243) can’t be executed, for the incoming transactions. Then the 
> namenode hangs. This is undesirable because FSEditLogAsync’s key feature is 
> asynchronous edit logging that is supposed to tolerate slow I/O.
>     To see why the sendResponse operation in line 365 may get stuck, here is 
> the stack trace:
> {code:java}
>  '(org.apache.hadoop.ipc.Server,channelWrite,3593)',
>  '(org.apache.hadoop.ipc.Server,access$1700,139)',
>  '(org.apache.hadoop.ipc.Server$Responder,processResponse,1657)',
>  '(org.apache.hadoop.ipc.Server$Responder,doRespond,1727)',
>  '(org.apache.hadoop.ipc.Server$Connection,sendResponse,2828)',
>  '(org.apache.hadoop.ipc.Server$Connection,access$300,1799)',
>  '(org.apache.hadoop.ipc.Server$RpcCall,doResponse,)',
>  '(org.apache.hadoop.ipc.Server$Call,doResponse,903)',
>  '(org.apache.hadoop.ipc.Server$Call,sendResponse,889)',
>  
> '(org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$RpcEdit,logSyncNotify,365)',
>  '(org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync,run,248)',
>  '(java.lang.Thread,run,748)'
> {code}
>  The `channelWrite` function is defined as follows:
> {code:java}
> 

[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=585376=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585376
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 21:27
Start Date: 19/Apr/21 21:27
Worklog Time Spent: 10m 
  Work Description: daryn-sharp commented on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-822797022


   bq. the FSEditLogAsync thread (without this patch) directly invokes a 
network I/O call call.sendResponse(), so if this network I/O invocation hangs, 
the FSEditLogAsync thread also hangs
   bq. My intention is to defend that we should not remove the references to 
"hanging" problems.
   
   No, it's a non-blocking write so by definition it will never hang – unless 
induced by fault injection.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585376)
Time Spent: 5h 50m  (was: 5h 40m)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();   // line 
> 365
> } else {
>   call.abortResponse(syncEx);
> }
>   } catch (Exception e) {} // don't care if not sent.
> }
> // ...
>   }{code}
>     If the sendResponse operation in line 365 gets stuck, then the whole 
> FSEditLogAsync thread is not able to proceed. In this case, the critical 
> logSync (line 243) can’t be executed, for the incoming transactions. Then the 
> namenode hangs. This is undesirable because FSEditLogAsync’s key feature is 
> asynchronous edit logging that is supposed to tolerate slow I/O.
>     To see why the sendResponse operation in line 365 may get stuck, here is 
> the stack trace:
> 

[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=585377=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585377
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 21:27
Start Date: 19/Apr/21 21:27
Worklog Time Spent: 10m 
  Work Description: daryn-sharp edited a comment on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-822797022


   > the FSEditLogAsync thread (without this patch) directly invokes a network 
I/O call call.sendResponse(), so if this network I/O invocation hangs, the 
FSEditLogAsync thread also hangs
   > My intention is to defend that we should not remove the references to 
"hanging" problems.
   
   No, it's a non-blocking write so by definition it will never hang – unless 
induced by fault injection.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585377)
Time Spent: 6h  (was: 5h 50m)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();   // line 
> 365
> } else {
>   call.abortResponse(syncEx);
> }
>   } catch (Exception e) {} // don't care if not sent.
> }
> // ...
>   }{code}
>     If the sendResponse operation in line 365 gets stuck, then the whole 
> FSEditLogAsync thread is not able to proceed. In this case, the critical 
> logSync (line 243) can’t be executed, for the incoming transactions. Then the 
> namenode hangs. This is undesirable because FSEditLogAsync’s key feature is 
> asynchronous edit logging that is supposed to tolerate slow I/O.
>     To see why the sendResponse operation in line 365 may get stuck, here is 
> the stack trace:
> 

[jira] [Work logged] (HDFS-15991) Add location into datanode info for NameNodeMXBean

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15991?focusedWorklogId=585348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585348
 ]

ASF GitHub Bot logged work on HDFS-15991:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 20:23
Start Date: 19/Apr/21 20:23
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2933:
URL: https://github.com/apache/hadoop/pull/2933#issuecomment-822760304


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  34m 23s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  3s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 23s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m  9s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 16s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 12s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m  6s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 53s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 44s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m  7s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 59s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 230m 18s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2933/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 317m 36s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots |
   |   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
   |   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   |   | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2933/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2933 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 90e25c73c94f 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 31726f45042950c75bef119d72c4f13283453f15 |
   | Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
   | Multi-JDK versions | 

[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=585345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585345
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 19:51
Start Date: 19/Apr/21 19:51
Worklog Time Spent: 10m 
  Work Description: functioner edited a comment on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-822713500


   > Thanks @functioner
   > The detailed discussions (except the lambda argument) should have been on 
the Jira.
   > 
   > > IMO, this is a classic Producer-Consumer problem, and it is natural idea 
to improve performance using parallel way.
   > 
   > > So, call.sendResponse() (network service) affects FSEditLogAsync (edit 
log sync service). So, I would say it's a bug.
   > 
   > Now, I am really even more confused about the (Bug Vs. Improvement). So, I 
am going to pass on reviewing.
   
   @amahussein Thanks for your feedback, and your time!
   Sorry for all the possible confusion I made.
   
   It's not a big deal whether it's marked as bug or improvement. One of my bug 
reports ([HADOOP-17552](https://issues.apache.org/jira/browse/HADOOP-17552)) is 
also finally marked as improvement rather than bug. The point is that the 
developers (in 
[HADOOP-17552](https://issues.apache.org/jira/browse/HADOOP-17552)) finally 
realized that there's a potential hanging issue as I point out, and the patch 
(as well as the relevant discussion) is very helpful for the developers and the 
users.
   
   > Ok, is the purpose of the change is to improve performance of the 
FSEditLogAsync.java by executing sendResponse() in parallel?
   > In that case, please change the title of the Jira and the description to 
remove references to "hanging" problems.
   
   My intention is to defend that we should not remove the references to 
"hanging" problems.
   
   In short, the discussion above can be summarized into 3 arguments:
   1. https://github.com/apache/hadoop/pull/2737#issuecomment-822151838: this 
is a classic Producer-Consumer problem, and it is natural idea to improve 
performance using parallel way
   2. https://github.com/apache/hadoop/pull/2737#issuecomment-822591028: the 
`call.sendResponse()` may hang due to network issue, without throwing any 
exception
   3. https://github.com/apache/hadoop/pull/2737#issuecomment-822617097: 
   a). the `FSEditLogAsync` thread (without this patch) directly invokes a 
network I/O call `call.sendResponse()`, so if this network I/O invocation 
hangs, the `FSEditLogAsync` thread also hangs
   b). in the "correct" system design, if this network I/O invocation hangs in 
this way, then that should be fine, because HDFS (as a fault-tolerant system) 
should tolerate it.
   c). when the system tolerates this network issue, the `FSEditLogAsync` 
thread should not hang, otherwise everybody can't commit the log.
   d). our expected behavior is that, when the system tolerates this network 
issue, the `FSEditLogAsync` thread should continue, so that everything still 
works well, despite this network issue.
   
   Both Argument 3 and Argument 1 can be resolved with this patch.
   
   In conclusion, this patch not only improves the performance, but also 
enhances the availability & fault-tolerance.
   
   So, I think the references to "hanging" problems should not be removed.
   
   If it keeps "Improvement" tag instead of "Bug" tag, that's fine.
   
   P.S. I will summarize our discussion with a comment in Jira after we reach a 
consensus.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585345)
Time Spent: 5h 40m  (was: 5.5h)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the 

[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=585330=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585330
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 19:14
Start Date: 19/Apr/21 19:14
Worklog Time Spent: 10m 
  Work Description: functioner edited a comment on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-822713500


   > Thanks @functioner
   > The detailed discussions (except the lambda argument) should have been on 
the Jira.
   > 
   > > IMO, this is a classic Producer-Consumer problem, and it is natural idea 
to improve performance using parallel way.
   > 
   > > So, call.sendResponse() (network service) affects FSEditLogAsync (edit 
log sync service). So, I would say it's a bug.
   > 
   > Now, I am really even more confused about the (Bug Vs. Improvement). So, I 
am going to pass on reviewing.
   
   @amahussein Thanks for your feedback, and your time!
   Sorry for all the possible confusion I made.
   
   It's not a big deal whether it's marked as bug or improvement. One of my bug 
reports ([HADOOP-17552](https://issues.apache.org/jira/browse/HADOOP-17552)) is 
also finally marked as improvement rather than bug. The point is that the 
developers (in 
[HADOOP-17552](https://issues.apache.org/jira/browse/HADOOP-17552)) finally 
realized that there's a potential hanging issue as I point out, and the patch 
(as well as the relevant discussion) is very helpful for the developers and the 
users.
   
   > Ok, is the purpose of the change is to improve performance of the 
FSEditLogAsync.java by executing sendResponse() in parallel?
   > In that case, please change the title of the Jira and the description to 
remove references to "hanging" problems.
   
   My intention is to defend that we should not remove the references to 
"hanging" problems.
   
   In short, the discussion above can be summarized into 3 arguments:
   1. https://github.com/apache/hadoop/pull/2737#issuecomment-822151838: this 
is a classic Producer-Consumer problem, and it is natural idea to improve 
performance using parallel way
   2. https://github.com/apache/hadoop/pull/2737#issuecomment-822591028: the 
`call.sendResponse()` may hang due to network issue, without throwing any 
exception
   3. https://github.com/apache/hadoop/pull/2737#issuecomment-822617097: 
   a). the `FSEditLogAsync` thread (without this patch) directly invokes a 
network I/O call `call.sendResponse()`, so if this network I/O invocation 
hangs, the `FSEditLogAsync` thread also hangs
   b). in the "correct" system design, if this network I/O invocation hangs in 
this way, then that should be fine, because HDFS (as a fault-tolerant system) 
should tolerate it.
   c). when the system tolerates this network issue, the `FSEditLogAsync` 
thread should not hang, otherwise everybody can't commit the log.
   d). our expected behavior is that, when the system tolerates this network 
issue, the `FSEditLogAsync` thread should continue, so that everything still 
works well, despite this network issue.
   
   Both Argument 3 and Argument 1 can be resolved with this patch.
   
   In conclusion, this patch not only improves the performance, but also 
enhances the availability & fault-tolerance.
   
   So, I think the references to "hanging" problems should not be removed.
   
   If it keeps "Improvement" tag instead of "Bug" tag, I won't disagree with it.
   
   P.S. I will summarize our discussion with a comment in Jira after we reach a 
consensus.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585330)
Time Spent: 5.5h  (was: 5h 20m)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do 

[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=585328=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585328
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 19:07
Start Date: 19/Apr/21 19:07
Worklog Time Spent: 10m 
  Work Description: functioner commented on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-822713500


   > Thanks @functioner
   > The detailed discussions (except the lambda argument) should have been on 
the Jira.
   > 
   > > IMO, this is a classic Producer-Consumer problem, and it is natural idea 
to improve performance using parallel way.
   > 
   > > So, call.sendResponse() (network service) affects FSEditLogAsync (edit 
log sync service). So, I would say it's a bug.
   > 
   > Now, I am really even more confused about the (Bug Vs. Improvement). So, I 
am going to pass on reviewing.
   
   @amahussein Thanks for your feedback, and your time!
   Sorry for all the possible confusion I made.
   
   It's not a big deal whether it's marked as bug or improvement. One of my bug 
reports ([HADOOP-17552](https://issues.apache.org/jira/browse/HADOOP-17552)) is 
also finally marked as improvement rather than bug. The point is that the 
developers (in 
[HADOOP-17552](https://issues.apache.org/jira/browse/HADOOP-17552)) finally 
realized that there's a potential hanging issue as I point out, and the patch 
(as well as the relevant discussion) is very helpful for the developers and the 
users.
   
   > Ok, is the purpose of the change is to improve performance of the 
FSEditLogAsync.java by executing sendResponse() in parallel?
   > In that case, please change the title of the Jira and the description to 
remove references to "hanging" problems.
   
   My intention is to defend that we should not remove the references to 
"hanging" problems.
   
   In short, the discussion above can be summarized into 3 arguments:
   1. https://github.com/apache/hadoop/pull/2737#issuecomment-822151838: this 
is a classic Producer-Consumer problem, and it is natural idea to improve 
performance using parallel way
   2. https://github.com/apache/hadoop/pull/2737#issuecomment-822591028: the 
`call.sendResponse()` may hang due to network issue, without throwing any 
exception
   3. https://github.com/apache/hadoop/pull/2737#issuecomment-822617097: 
   a). the `FSEditLogAsync` thread (without this patch) directly invokes a 
network I/O call `call.sendResponse()`, so if this network I/O invocation 
hangs, the `FSEditLogAsync` thread also hangs
   b). in the "correct" system design, if this network I/O invocation hangs in 
this way, then that should be fine, because HDFS (as a fault-tolerant system) 
should tolerate it.
   c). when the system tolerates this network issue, the `FSEditLogAsync` 
thread should not hang, otherwise everybody can't commit the log.
   d). our expected behavior is that, when the system tolerates this network 
issue, the `FSEditLogAsync` thread should continue, so that everything still 
works well, despite this network issue.
   
   Both Argument 3 and Argument 1 can be resolved with this patch.
   
   In conclusion, this patch not only improves the performance, but also 
enhances the availability & fault-tolerance.
   
   So, I think the references to "hanging" problems should not be removed.
   
   P.S. I will summarize our discussion with a comment in Jira after we reach a 
consensus.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585328)
Time Spent: 5h 20m  (was: 5h 10m)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify 

[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=585277=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585277
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 17:36
Start Date: 19/Apr/21 17:36
Worklog Time Spent: 10m 
  Work Description: functioner commented on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-822649368


   > Thanks @functioner
   > The detailed discussions (except the lambda argument) should have been on 
the Jira.
   
   I see. I will make a comment of summary in Jira after the discussion in this 
PR is finalized.
   
   > > > If any concerns about lambda expression, we could improve it rather 
than reject it directly.
   > > 
   > > 
   > > @amahussein A common way to eliminate such overhead is preparing 
multiple consumer threads, and feed them with requests.
   > > If the lambda expressions cause significant overhead, we can improve in 
that way.
   > > This design pattern is widely used in Cassandra. Example: SEPWorker - 
SEPExecutor
   > > 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPWorker.java
   > > 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java
   > 
   > This is not what I meant. It is recommended to avoid use of lambda 
expressions in hot execution paths.
   
   Actually we are on the same page. Maybe my comment has made some confusion.
   
   > There are so many ways to avoid lambda expressions simply by having 
runnables waiting for tasks to be added to a queue.
   
   That's exactly what I meant. I will push a commit soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585277)
Time Spent: 5h 10m  (was: 5h)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   private static class RpcEdit extends Edit {
> // 

[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=585274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585274
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 17:23
Start Date: 19/Apr/21 17:23
Worklog Time Spent: 10m 
  Work Description: amahussein edited a comment on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-822639062


   Thanks @functioner 
   The detailed discussions (except the lambda argument) should have been on 
the Jira.
   
   > IMO, this is a classic Producer-Consumer problem, and it is natural idea 
to improve performance using parallel way. 
   
   > So, call.sendResponse() (network service) affects FSEditLogAsync (edit log 
sync service). So, I would say it's a bug.
   
   Now, I am really even more confused about the (Bug Vs. Improvement). So, I 
am going to pass on reviewing.
   
   > > If any concerns about lambda expression, we could improve it rather than 
reject it directly.
   > 
   > @amahussein A common way to eliminate such overhead is preparing multiple 
consumer threads, and feed them with requests.
   > If the lambda expressions cause significant overhead, we can improve in 
that way.
   > This design pattern is widely used in Cassandra. Example: SEPWorker - 
SEPExecutor
   > 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPWorker.java
   > 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java
   
   This is not what I meant. It is recommended to avoid use of lambda 
expressions in hot execution paths.
   There are so many ways to avoid lambda expressions simply by having 
runnables waiting for tasks to be added to a queue. 
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585274)
Time Spent: 5h  (was: 4h 50m)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> 

[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=585271=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585271
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 17:20
Start Date: 19/Apr/21 17:20
Worklog Time Spent: 10m 
  Work Description: amahussein commented on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-822639062


   Thanks @functioner 
   The detailed discussions (except the lambda argument) should have been on 
the Jira.
   Now, I am really even more confused about the (Bug Vs. Improvement). So, I 
am going to pass on reviewing.
   
   > > If any concerns about lambda expression, we could improve it rather than 
reject it directly.
   > 
   > @amahussein A common way to eliminate such overhead is preparing multiple 
consumer threads, and feed them with requests.
   > If the lambda expressions cause significant overhead, we can improve in 
that way.
   > This design pattern is widely used in Cassandra. Example: SEPWorker - 
SEPExecutor
   > 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPWorker.java
   > 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java
   
   This is not what I meant. It is recommended to avoid use of lambda 
expressions in hot execution paths.
   There are so many ways to avoid lambda expressions simply by having 
runnables waiting for tasks to be added to a queue. 
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585271)
Time Spent: 4h 50m  (was: 4h 40m)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   

[jira] [Updated] (HDFS-15561) RBF: Fix NullPointException when start dfsrouter

2021-04-19 Thread Fengnan Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengnan Li updated HDFS-15561:
--
Summary: RBF: Fix NullPointException when start dfsrouter  (was: Fix 
NullPointException when start dfsrouter)

> RBF: Fix NullPointException when start dfsrouter
> 
>
> Key: HDFS-15561
> URL: https://issues.apache.org/jira/browse/HDFS-15561
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.3.0
>Reporter: Xie Lei
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> when start dfsrouter, it throw NPE
> {code:java}
> 2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: null2020-09-08 19:41:14,989 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService: 
> Unexpected exception while communicating with null:null: 
> java.net.UnknownHostException: nulljava.lang.IllegalArgumentException: 
> java.net.UnknownHostException: null at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:447)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:171)
>  at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:123) 
> at 
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:95) 
> at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.getNamenodeStatusReport(NamenodeHeartbeatService.java:248)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:205)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
>  at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:300)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
>  at java.base/java.lang.Thread.run(Thread.java:844)Caused by: 
> java.net.UnknownHostException: null ... 14 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=585241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585241
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 16:46
Start Date: 19/Apr/21 16:46
Worklog Time Spent: 10m 
  Work Description: functioner commented on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-822617097


   > > In that case, please change the title of the Jira and the description to 
remove references to "hanging" problems.
   > 
   > @amahussein I still would like to argue about this "hanging" issue.
   
   Another aspect of the argument is the design of availability and fault 
tolerance. Actually distributed systems can tolerate such hanging issues in 
many scenarios, but sometimes it's seen as a bug like 
[ZOOKEEPER-2201](https://issues.apache.org/jira/browse/ZOOKEEPER-2201).
   
   So an important question is: when it's a bug; and when it's not (i.e., it's 
a feature)
   
   I've been doing research on fault injection for some time and I have 
submitted multiple bug reports accepted by the open source community (e.g., 
[HADOOP-17552](https://issues.apache.org/jira/browse/HADOOP-17552)). My 
criteria for determining whether it is bug, are:
   1. if we inject a fault in **module X** and it only affects **module X**, 
then it’s not a bug.
   2. if we inject a fault in **module X** and it affects not only **module X** 
but also **module Y** which should not relate to **module X**, then probably it 
would be a bug, because in the system design, each module should be responsible 
for itself and report the problem (e.g., by logging), rather than affect 
another irrelevant module.
   
   In our scenario 
([HDFS-15869](https://issues.apache.org/jira/browse/HDFS-15869)), this possible 
hanging (_if you agree with my argument of network hanging_) can block the 
`FSEditLogAsync` thread, because now `call.sendResponse()` is invoked by the 
`FSEditLogAsync` thread.
   
   So, `call.sendResponse()` (network service) affects `FSEditLogAsync` (edit 
log sync service). So, I would say it's a bug.
   
   The network service should be responsible for all its behaviors, and handle 
all the possible network issues (e.g., IOException, disconnection, hanging). It 
should determine how to handle them, e.g., by logging the error, rather than 
affecting other services like `FSEditLogAsync`.
   
   I'm not saying that we have to use a complete and slow RPC framework for 
this network service. But IMO, decoupling it from `FSEditLogAsync` by 
delegating to a thread pool is at least a better design.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585241)
Time Spent: 4h 40m  (was: 4.5h)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need 

[jira] [Work logged] (HDFS-15810) RBF: RBFMetrics's TotalCapacity out of bounds

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15810?focusedWorklogId=585234=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585234
 ]

ASF GitHub Bot logged work on HDFS-15810:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 16:35
Start Date: 19/Apr/21 16:35
Worklog Time Spent: 10m 
  Work Description: goiri commented on a change in pull request #2910:
URL: https://github.com/apache/hadoop/pull/2910#discussion_r616004081



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/metrics/TestRBFMetrics.java
##
@@ -18,10 +18,7 @@
 package org.apache.hadoop.hdfs.server.federation.metrics;
 
 import static 
org.apache.hadoop.hdfs.server.federation.FederationTestUtils.getBean;
-import static org.junit.Assert.assertEquals;

Review comment:
   Keep extended.

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/metrics/TestRBFMetrics.java
##
@@ -351,21 +349,32 @@ private void validateClusterStatsRouterBean(RouterMBean 
bean) {
 assertFalse(bean.isSecurityEnabled());
   }
 
-  private void testCapacity(FederationMBean bean) {
+  private void testCapacity(FederationMBean bean) throws  IOException {

Review comment:
   Too many spaces.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585234)
Time Spent: 1h 40m  (was: 1.5h)

> RBF: RBFMetrics's TotalCapacity out of bounds
> -
>
> Key: HDFS-15810
> URL: https://issues.apache.org/jira/browse/HDFS-15810
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiaoxing Wei
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-02-02-10-59-17-113.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The Long type fields TotalCapacity,UsedCapacity and RemainingCapacity in 
> RBFMetrics maybe ** out of bounds.
> !image-2021-02-02-10-59-17-113.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15878) RBF: Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-04-19 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-15878:
---
Summary: RBF: Flaky test 
TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
Trunk  (was: Flaky test 
TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
Trunk)

> RBF: Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> 
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> 

[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=585222=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585222
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 16:09
Start Date: 19/Apr/21 16:09
Worklog Time Spent: 10m 
  Work Description: functioner commented on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-822591028


   > In that case, please change the title of the Jira and the description to 
remove references to "hanging" problems.
   
   @amahussein I still would like to argue about this "hanging" issue.
   
   There has been reported TCP network I/O issues which hangs for >15min 
without throwing any exception. 
[ZOOKEEPER-2201](https://issues.apache.org/jira/browse/ZOOKEEPER-2201) is a 
perfect example, and you can find the TCP level explanation for this hanging 
issue in https://www.usenix.org/conference/srecon16/program/presentation/nadolny
   Similar hanging bugs are also accepted by ZooKeeper community, such as:
   - [ZOOKEEPER-3531](https://issues.apache.org/jira/browse/ZOOKEEPER-3531): 
very similar to ZK-2201; the patch is merged
   - [ZOOKEEPER-4074](https://issues.apache.org/jira/browse/ZOOKEEPER-4074): a 
similar network hanging bug I reported; already confirmed by community; more 
discussion can be found in https://github.com/apache/zookeeper/pull/1582
   
   However, in our scenario 
([HDFS-15869](https://issues.apache.org/jira/browse/HDFS-15869)), a possible 
counterargument is: the `call.sendResponse()` invocation eventually invokes 
`channel.write(buffer)` (line 3611), which is non-blocking mode, so it might 
not be affected by this potential issue.
   
https://github.com/apache/hadoop/blob/3c57512d104e3a92391c9a03ce4005a00267c07f/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java#L3607-L3616
   However, as we point out in 
[HDFS-15869](https://issues.apache.org/jira/browse/HDFS-15869), when the 
payload is huge,  in line 3611, it won't invoke `channel.write(buffer)`; 
instead, it invokes `channelIO(null, channel, buffer)` which brings us to:
   
https://github.com/apache/hadoop/blob/3c57512d104e3a92391c9a03ce4005a00267c07f/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java#L3646-L3672
   If the payload is split in two batches, the second batch will have to wait 
for the first batch to be sent out, which may encounter high packet loss rate 
and thus slow I/O.
   
   Hence, I would say the hanging problem still exists.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585222)
Time Spent: 4.5h  (was: 4h 20m)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
>  

[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=585193=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585193
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 15:34
Start Date: 19/Apr/21 15:34
Worklog Time Spent: 10m 
  Work Description: functioner commented on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-822564254


   > If any concerns about lambda expression, we could improve it rather than 
reject it directly.
   
   @amahussein A common way to eliminate such overhead is preparing multiple 
consumer threads, and feed them with requests.
   If the lambda expressions cause significant overhead, we can improve in that 
way.
   This design pattern is widely used in Cassandra. Example: SEPWorker - 
SEPExecutor
   
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPWorker.java
   
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585193)
Time Spent: 4h 20m  (was: 4h 10m)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();   // line 
> 365
> } else {
>   call.abortResponse(syncEx);
> }
>   } catch (Exception e) {} // don't care if not sent.
> }
> // ...
>   }{code}
>     If the sendResponse operation in line 365 gets stuck, then the whole 
> FSEditLogAsync thread is not able to proceed. In this case, the critical 
> logSync (line 243) can’t be executed, for the incoming transactions. Then the 
> namenode hangs. This is undesirable 

[jira] [Work logged] (HDFS-15991) Add location into datanode info for NameNodeMXBean

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15991?focusedWorklogId=585186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585186
 ]

ASF GitHub Bot logged work on HDFS-15991:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 15:04
Start Date: 19/Apr/21 15:04
Worklog Time Spent: 10m 
  Work Description: tomscut opened a new pull request #2933:
URL: https://github.com/apache/hadoop/pull/2933


   JIRA: [HDFS-15991](https://issues.apache.org/jira/browse/HDFS-15991)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585186)
Remaining Estimate: 0h
Time Spent: 10m

> Add location into datanode info for NameNodeMXBean
> --
>
> Key: HDFS-15991
> URL: https://issues.apache.org/jira/browse/HDFS-15991
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add location into datanode info for NameNodeMXBean.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15991) Add location into datanode info for NameNodeMXBean

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-15991:
--
Labels: pull-request-available  (was: )

> Add location into datanode info for NameNodeMXBean
> --
>
> Key: HDFS-15991
> URL: https://issues.apache.org/jira/browse/HDFS-15991
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add location into datanode info for NameNodeMXBean.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15991) Add location into datanode info for NameNodeMXBean

2021-04-19 Thread tomscut (Jira)
tomscut created HDFS-15991:
--

 Summary: Add location into datanode info for NameNodeMXBean
 Key: HDFS-15991
 URL: https://issues.apache.org/jira/browse/HDFS-15991
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut


Add location into datanode info for NameNodeMXBean.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15979) Move within EZ fails and cannot remove nested EZs

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15979?focusedWorklogId=585177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585177
 ]

ASF GitHub Bot logged work on HDFS-15979:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 14:46
Start Date: 19/Apr/21 14:46
Worklog Time Spent: 10m 
  Work Description: amahussein commented on pull request #2919:
URL: https://github.com/apache/hadoop/pull/2919#issuecomment-822525000


   The failures are not related. Those were the intermittent failures reported 
in the daily reports and existing jiras are addressing them.
   
   > The changes were contributed by Daryn Sharp and we have our internal 
clusters running on those changes with hadoop-2.8 and hadoop-2.10.
   
   @jojochuang Do you have any feedback on those changes?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585177)
Time Spent: 0.5h  (was: 20m)

> Move within EZ fails and cannot remove nested EZs
> -
>
> Key: HDFS-15979
> URL: https://issues.apache.org/jira/browse/HDFS-15979
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15979.001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Moving between EZ directories should work fine if the EZ key for the 
> directories is identical. If the key is name identical then no 
> decrypt/re-encrypt is necessary.
> However, the rename operation checks more than the key name. It compares the 
> inode number (unique identifier) of the source and dest dirs which will never 
> be the same for 2 dirs resulting in the cited failure. Note it also 
> incorrectly compares the key version.
> A related issue is if an ancestor of a EZ share the same key (ie. 
> /projects/foo and /projects/foo/bar/blah both use same key), files also 
> cannot be moved from the child to a parent dir, plus the child EZ cannot be 
> removed even though it's now covered by the ancestor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15989) Split TestBalancer into two classes

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15989?focusedWorklogId=585174=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585174
 ]

ASF GitHub Bot logged work on HDFS-15989:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 14:36
Start Date: 19/Apr/21 14:36
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on pull request #2923:
URL: https://github.com/apache/hadoop/pull/2923#issuecomment-822517515


   Thanks for updating it. +1, pending Jenkins.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585174)
Time Spent: 3h 20m  (was: 3h 10m)

> Split TestBalancer into two classes
> ---
>
> Key: HDFS-15989
> URL: https://issues.apache.org/jira/browse/HDFS-15989
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> TestBalancer has many tests accumulated, it would be good to split it up into 
> two classes. Moreover, TestBalancer#testMaxIterationTime is flaky. We should 
> also resolve it with this Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15970) Print network topology on the web

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15970?focusedWorklogId=585172=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585172
 ]

ASF GitHub Bot logged work on HDFS-15970:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 14:35
Start Date: 19/Apr/21 14:35
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #2896:
URL: https://github.com/apache/hadoop/pull/2896#issuecomment-822516400


   Thanks a lot. @tasanuma @goiri @ayushtkn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585172)
Time Spent: 3h 20m  (was: 3h 10m)

> Print network topology on the web
> -
>
> Key: HDFS-15970
> URL: https://issues.apache.org/jira/browse/HDFS-15970
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
> Attachments: hdfs-topology-json.jpg, hdfs-topology.jpg, hdfs-web.jpg
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In order to query the network topology information conveniently, we can print 
> it on the web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=585171=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585171
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 14:27
Start Date: 19/Apr/21 14:27
Worklog Time Spent: 10m 
  Work Description: amahussein commented on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-822510445


   > @amahussein Thanks for your quick response. I think this is not the same 
concept/issue between 
[HDFS-15957](https://issues.apache.org/jira/browse/HDFS-15957) and 
[HDFS-15869](https://issues.apache.org/jira/browse/HDFS-15869). I have leave 
comment at [HDFS-15869](https://issues.apache.org/jira/browse/HDFS-15869) and 
suggest to change it to `improvement` rather than `bug fix`; (cc @functioner)
   > IMO, this is a classic Producer-Consumer problem, and it is natural idea 
to improve performance using parallel way. And Yiqun has reported the same 
issue at [HDFS-15486](https://issues.apache.org/jira/browse/HDFS-15486). In my 
own production env it was about 5% E2E saving time for write operation.
   > My suggestion,
   > A. Update description for improvement rather than bug fix.
   > B. If any concerns about lambda expression, we could improve it rather 
than reject it directly.
   > Welcome any more discussion. Thanks everyone here.
   
   Thanks @Hexiaoqiao Hexiaoqiao for the comment.
   Ok, is the purpose of the change is to improve performance of the 
`FSEditLogAsync.java` by executing `sendResponse()` in parallel?
   In that case, please change the title of the Jira and the description to 
remove references to "hanging" problems.
   Then I will take another look. I am sorry for the inconvenience as I want to 
make sure I understand the purpose of the change before revieweing.
   
   
   > @amahussein Thanks for the comment.
   > Can I send an email to you to explain more about the issue? @Hexiaoqiao 
and I have some more discussion on it, and some discussion is inconvenient to 
put in here. You can contact me via 
[oier...@gmail.com](mailto:oier...@gmail.com) or 
[ha...@jhu.edu](mailto:ha...@jhu.edu) and then I will reply.
   
   Thanks @functioner ! I really appreciate that.
   I think @Hexiaoqiao Hexiaoqiao reply already clarified some of the confusion 
about the scope of the work.
   Please feel free to reach me through email at anytime. I am on the 
common-dev mailing list.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585171)
Time Spent: 4h 10m  (was: 4h)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>  

[jira] [Commented] (HDFS-15973) RBF: Add permission check before doting router federation rename.

2021-04-19 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325077#comment-17325077
 ] 

Hadoop QA commented on HDFS-15973:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
43s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
47s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  3s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 16m 
49s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  1m 
10s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
33s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
29s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 40s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} the patch passed with 

[jira] [Work logged] (HDFS-15989) Split TestBalancer into two classes

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15989?focusedWorklogId=585164=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585164
 ]

ASF GitHub Bot logged work on HDFS-15989:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 14:20
Start Date: 19/Apr/21 14:20
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on a change in pull request #2923:
URL: https://github.com/apache/hadoop/pull/2923#discussion_r615884691



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer2.java
##
@@ -0,0 +1,744 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs.server.balancer;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.StorageType;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.hdfs.DFSClient;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
+import org.apache.hadoop.hdfs.DFSTestUtil;
+import org.apache.hadoop.hdfs.DFSUtil;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.HdfsConfiguration;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hdfs.NameNodeProxies;
+import org.apache.hadoop.hdfs.client.HdfsClientConfigKeys;
+import org.apache.hadoop.hdfs.protocol.ClientProtocol;
+import org.apache.hadoop.hdfs.protocol.DatanodeID;
+import org.apache.hadoop.hdfs.protocol.DatanodeInfo;
+import org.apache.hadoop.hdfs.protocol.HdfsConstants;
+import org.apache.hadoop.hdfs.protocol.LocatedBlock;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy;
+import 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyWithUpgradeDomain;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementStatus;
+import org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager;
+import org.apache.hadoop.hdfs.server.datanode.DataNode;
+import org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils;
+import org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset;
+import 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.hadoop.test.GenericTestUtils;
+import org.junit.After;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.slf4j.event.Level;
+
+import java.io.OutputStream;
+import java.net.InetSocketAddress;
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import static org.apache.hadoop.fs.StorageType.DEFAULT;
+import static org.apache.hadoop.fs.StorageType.RAM_DISK;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BALANCER_MAX_SIZE_TO_MOVE_KEY;
+import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BLOCK_SIZE_KEY;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_BLOCK_PINNING_ENABLED;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_LAZY_WRITER_INTERVAL_SEC;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY;
+import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_LAZY_PERSIST_FILE_SCRUB_INTERVAL_SEC;
+import static org.apache.hadoop.test.PlatformAssumptions.assumeNotWindows;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Some long running Balancer tasks.
+ */
+public class TestBalancer2 {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(TestBalancer2.class);
+
+  static {
+GenericTestUtils.setLogLevel(Balancer.LOG, Level.TRACE);
+GenericTestUtils.setLogLevel(Dispatcher.LOG, Level.DEBUG);
+  }
+
+  private final static long CAPACITY = 5000L;
+  private final static String RACK0 = "/rack0";
+  

[jira] [Work logged] (HDFS-15989) Split TestBalancer into two classes

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15989?focusedWorklogId=585143=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585143
 ]

ASF GitHub Bot logged work on HDFS-15989:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 14:04
Start Date: 19/Apr/21 14:04
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on pull request #2923:
URL: https://github.com/apache/hadoop/pull/2923#issuecomment-822493305


   We can remove `DEFAULT_RAM_DISK_BLOCK_SIZE` in TestBalancer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585143)
Time Spent: 3h  (was: 2h 50m)

> Split TestBalancer into two classes
> ---
>
> Key: HDFS-15989
> URL: https://issues.apache.org/jira/browse/HDFS-15989
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> TestBalancer has many tests accumulated, it would be good to split it up into 
> two classes. Moreover, TestBalancer#testMaxIterationTime is flaky. We should 
> also resolve it with this Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15989) Split TestBalancer into two classes

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15989?focusedWorklogId=585142=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585142
 ]

ASF GitHub Bot logged work on HDFS-15989:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 14:03
Start Date: 19/Apr/21 14:03
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on a change in pull request #2923:
URL: https://github.com/apache/hadoop/pull/2923#discussion_r615876497



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer2.java
##
@@ -0,0 +1,744 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs.server.balancer;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.StorageType;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.hdfs.DFSClient;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
+import org.apache.hadoop.hdfs.DFSTestUtil;
+import org.apache.hadoop.hdfs.DFSUtil;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.HdfsConfiguration;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hdfs.NameNodeProxies;
+import org.apache.hadoop.hdfs.client.HdfsClientConfigKeys;
+import org.apache.hadoop.hdfs.protocol.ClientProtocol;
+import org.apache.hadoop.hdfs.protocol.DatanodeID;
+import org.apache.hadoop.hdfs.protocol.DatanodeInfo;
+import org.apache.hadoop.hdfs.protocol.HdfsConstants;
+import org.apache.hadoop.hdfs.protocol.LocatedBlock;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy;
+import 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyWithUpgradeDomain;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementStatus;
+import org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager;
+import org.apache.hadoop.hdfs.server.datanode.DataNode;
+import org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils;
+import org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset;
+import 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.hadoop.test.GenericTestUtils;
+import org.junit.After;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.slf4j.event.Level;
+
+import java.io.OutputStream;
+import java.net.InetSocketAddress;
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import static org.apache.hadoop.fs.StorageType.DEFAULT;
+import static org.apache.hadoop.fs.StorageType.RAM_DISK;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BALANCER_MAX_SIZE_TO_MOVE_KEY;
+import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BLOCK_SIZE_KEY;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_BLOCK_PINNING_ENABLED;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_LAZY_WRITER_INTERVAL_SEC;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY;
+import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_LAZY_PERSIST_FILE_SCRUB_INTERVAL_SEC;
+import static org.apache.hadoop.test.PlatformAssumptions.assumeNotWindows;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Some long running Balancer tasks.
+ */
+public class TestBalancer2 {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(TestBalancer2.class);
+
+  static {
+GenericTestUtils.setLogLevel(Balancer.LOG, Level.TRACE);
+GenericTestUtils.setLogLevel(Dispatcher.LOG, Level.DEBUG);
+  }
+
+  private final static long CAPACITY = 5000L;
+  private final static String RACK0 = "/rack0";
+  private 

[jira] [Commented] (HDFS-15990) No need to write to sub_section when serialize SnapshotDiff Section

2021-04-19 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325056#comment-17325056
 ] 

Stephen O'Donnell commented on HDFS-15990:
--

The reason the sub-sections are written here, is so we can add the ability to 
load them in parallel later (like with the inodes and directory sections). 
However so far nobody has started that work.

> No need to write to sub_section when serialize SnapshotDiff Section
> ---
>
> Key: HDFS-15990
> URL: https://issues.apache.org/jira/browse/HDFS-15990
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Xuze Yang
>Priority: Minor
>
> In FSImageFormatPBSnapshot.serializeSnapshotDiffSection(), the following code 
> exists:
> {code:java}
> if (i % parent.getInodesPerSubSection() == 0) {
>   parent.commitSubSection(headers,
>   FSImageFormatProtobuf.SectionName.SNAPSHOT_DIFF_SUB);
> }{code}
> It aims to serialize SnapshotDiff information into several sub_sections(i.e. 
> additional sub_sections information will be written to FileSummary Section). 
> But in FSImageFormatPBSnapshot.loadSnapshotDiffSection(), it treats 
> SnapshotDiffSection as a whole, rather than several sub_sections. So it's no 
> need to introduce sub_sections here.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-04-19 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325054#comment-17325054
 ] 

Wei-Chiu Chuang commented on HDFS-15796:


[~Daniel Ma] ping. Please let us know more details. Meanwhile I updated target 
version to 3.4.0.

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-04-19 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15796:
---
Target Version/s: 3.4.0  (was: 3.3.1)

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2021-04-19 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15796:
---
Fix Version/s: (was: 3.1.1)

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Priority: Critical
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15973) RBF: Add permission check before doting router federation rename.

2021-04-19 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325026#comment-17325026
 ] 

Jinglun commented on HDFS-15973:


Hi [~zhengzhuobinzzb], thanks your comments ! The security mode is not 
considered in v03, thanks your explanation ! Submit v04.

> RBF: Add permission check before doting router federation rename.
> -
>
> Key: HDFS-15973
> URL: https://issues.apache.org/jira/browse/HDFS-15973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15973.001.patch, HDFS-15973.002.patch, 
> HDFS-15973.003.patch, HDFS-15973.004.patch
>
>
> The router federation rename is lack of permission check. It is a security 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15973) RBF: Add permission check before doting router federation rename.

2021-04-19 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-15973:
---
Attachment: HDFS-15973.004.patch

> RBF: Add permission check before doting router federation rename.
> -
>
> Key: HDFS-15973
> URL: https://issues.apache.org/jira/browse/HDFS-15973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15973.001.patch, HDFS-15973.002.patch, 
> HDFS-15973.003.patch, HDFS-15973.004.patch
>
>
> The router federation rename is lack of permission check. It is a security 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15990) No need to write to sub_section when serialize SnapshotDiff Section

2021-04-19 Thread Xuze Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuze Yang updated HDFS-15990:
-
Priority: Minor  (was: Major)

> No need to write to sub_section when serialize SnapshotDiff Section
> ---
>
> Key: HDFS-15990
> URL: https://issues.apache.org/jira/browse/HDFS-15990
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Xuze Yang
>Priority: Minor
>
> In FSImageFormatPBSnapshot.serializeSnapshotDiffSection(), the following code 
> exists:
> {code:java}
> if (i % parent.getInodesPerSubSection() == 0) {
>   parent.commitSubSection(headers,
>   FSImageFormatProtobuf.SectionName.SNAPSHOT_DIFF_SUB);
> }{code}
> It aims to serialize SnapshotDiff information into several sub_sections(i.e. 
> additional sub_sections information will be written to FileSummary Section). 
> But in FSImageFormatPBSnapshot.loadSnapshotDiffSection(), it treats 
> SnapshotDiffSection as a whole, rather than several sub_sections. So it's no 
> need to introduce sub_sections here.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15990) No need to write to sub_section when serialize SnapshotDiff Section

2021-04-19 Thread Xuze Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuze Yang updated HDFS-15990:
-
Description: 
In FSImageFormatPBSnapshot.serializeSnapshotDiffSection(), the following code 
exists:
{code:java}
if (i % parent.getInodesPerSubSection() == 0) {
  parent.commitSubSection(headers,
  FSImageFormatProtobuf.SectionName.SNAPSHOT_DIFF_SUB);
}{code}
It aims to serialize SnapshotDiff information into several sub_sections(i.e. 
additional sub_sections information will be written to FileSummary Section). 
But in FSImageFormatPBSnapshot.loadSnapshotDiffSection(), it treats 
SnapshotDiffSection as a whole, rather than several sub_sections. So it's no 
need to introduce sub_sections here.

 

 

 

  was:
In FSImageFormatPBSnapshot.serializeSnapshotDiffSection(), the following code 
exists:
{quote}if (i % parent.getInodesPerSubSection() == 0) {
    parent.commitSubSection(headers,
        FSImageFormatProtobuf.SectionName.SNAPSHOT_DIFF_SUB);
}{quote}
It aims to serialize SnapshotDiff information into several sub_sections(i.e. 
additional sub_sections information will be written to FileSummary Section). 
But in FSImageFormatPBSnapshot.loadSnapshotDiffSection(), it treats 
SnapshotDiffSection as a whole, rather than several sub_sections. So it's no 
need to introduce sub_sections here.

 

 

 


> No need to write to sub_section when serialize SnapshotDiff Section
> ---
>
> Key: HDFS-15990
> URL: https://issues.apache.org/jira/browse/HDFS-15990
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Xuze Yang
>Priority: Major
>
> In FSImageFormatPBSnapshot.serializeSnapshotDiffSection(), the following code 
> exists:
> {code:java}
> if (i % parent.getInodesPerSubSection() == 0) {
>   parent.commitSubSection(headers,
>   FSImageFormatProtobuf.SectionName.SNAPSHOT_DIFF_SUB);
> }{code}
> It aims to serialize SnapshotDiff information into several sub_sections(i.e. 
> additional sub_sections information will be written to FileSummary Section). 
> But in FSImageFormatPBSnapshot.loadSnapshotDiffSection(), it treats 
> SnapshotDiffSection as a whole, rather than several sub_sections. So it's no 
> need to introduce sub_sections here.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15990) No need to write to sub_section when serialize SnapshotDiff Section

2021-04-19 Thread Xuze Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuze Yang updated HDFS-15990:
-
Description: 
In FSImageFormatPBSnapshot.serializeSnapshotDiffSection(), the following code 
exists:
{quote}if (i % parent.getInodesPerSubSection() == 0) {
    parent.commitSubSection(headers,
        FSImageFormatProtobuf.SectionName.SNAPSHOT_DIFF_SUB);
}{quote}
It aims to serialize SnapshotDiff information into several sub_sections(i.e. 
additional sub_sections information will be written to FileSummary Section). 
But in FSImageFormatPBSnapshot.loadSnapshotDiffSection(), it treats 
SnapshotDiffSection as a whole, rather than several sub_sections. So it's no 
need to introduce sub_sections here.

 

 

 

  was:
In FSImageFormatPBSnapshot.serializeSnapshotDiffSection(), the following code 
exists:
{quote}if (i % parent.getInodesPerSubSection() == 0){ 

        parent.commitSubSection(headers, 
FSImageFormatProtobuf.SectionName.SNAPSHOT_DIFF_SUB);

 }
{quote}
It aims to serialize SnapshotDiff information into several sub_sections(i.e. 
additional sub_sections information will be written to FileSummary Section). 
But in FSImageFormatPBSnapshot.loadSnapshotDiffSection(), it treats 
SnapshotDiffSection as a whole, rather than several sub_sections. So it's no 
need to introduce sub_sections here.

 

 

 


> No need to write to sub_section when serialize SnapshotDiff Section
> ---
>
> Key: HDFS-15990
> URL: https://issues.apache.org/jira/browse/HDFS-15990
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Xuze Yang
>Priority: Major
>
> In FSImageFormatPBSnapshot.serializeSnapshotDiffSection(), the following code 
> exists:
> {quote}if (i % parent.getInodesPerSubSection() == 0) {
>     parent.commitSubSection(headers,
>         FSImageFormatProtobuf.SectionName.SNAPSHOT_DIFF_SUB);
> }{quote}
> It aims to serialize SnapshotDiff information into several sub_sections(i.e. 
> additional sub_sections information will be written to FileSummary Section). 
> But in FSImageFormatPBSnapshot.loadSnapshotDiffSection(), it treats 
> SnapshotDiffSection as a whole, rather than several sub_sections. So it's no 
> need to introduce sub_sections here.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15990) No need to write to sub_section when serialize SnapshotDiff Section

2021-04-19 Thread Xuze Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuze Yang updated HDFS-15990:
-
Description: 
In FSImageFormatPBSnapshot.serializeSnapshotDiffSection(), the following code 
exists:
{quote}if (i % parent.getInodesPerSubSection() == 0){ 

        parent.commitSubSection(headers, 
FSImageFormatProtobuf.SectionName.SNAPSHOT_DIFF_SUB);

 }
{quote}
It aims to serialize SnapshotDiff information into several sub_sections(i.e. 
additional sub_sections information will be written to FileSummary Section). 
But in FSImageFormatPBSnapshot.loadSnapshotDiffSection(), it treats 
SnapshotDiffSection as a whole, rather than several sub_sections. So it's no 
need to introduce sub_sections here.

 

 

 

  was:
In FSImageFormatPBSnapshot.serializeSnapshotDiffSection(), the following code 
exists:
{quote}if (i % parent.getInodesPerSubSection() == 0) {
 parent.commitSubSection(headers,
 FSImageFormatProtobuf.SectionName.SNAPSHOT_DIFF_SUB);
}
{quote}
It aims to serialize SnapshotDiff information into several sub_sections(i.e. 
additional sub_sections information will be written to FileSummary Section). 
But in FSImageFormatPBSnapshot.loadSnapshotDiffSection(), it treats 
SnapshotDiffSection as a whole, rather than several sub_sections. So it's no 
need to introduce sub_sections here.

 

 

 


> No need to write to sub_section when serialize SnapshotDiff Section
> ---
>
> Key: HDFS-15990
> URL: https://issues.apache.org/jira/browse/HDFS-15990
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Xuze Yang
>Priority: Major
>
> In FSImageFormatPBSnapshot.serializeSnapshotDiffSection(), the following code 
> exists:
> {quote}if (i % parent.getInodesPerSubSection() == 0){ 
>         parent.commitSubSection(headers, 
> FSImageFormatProtobuf.SectionName.SNAPSHOT_DIFF_SUB);
>  }
> {quote}
> It aims to serialize SnapshotDiff information into several sub_sections(i.e. 
> additional sub_sections information will be written to FileSummary Section). 
> But in FSImageFormatPBSnapshot.loadSnapshotDiffSection(), it treats 
> SnapshotDiffSection as a whole, rather than several sub_sections. So it's no 
> need to introduce sub_sections here.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15990) No need to write to sub_section when serialize SnapshotDiff Section

2021-04-19 Thread Xuze Yang (Jira)
Xuze Yang created HDFS-15990:


 Summary: No need to write to sub_section when serialize 
SnapshotDiff Section
 Key: HDFS-15990
 URL: https://issues.apache.org/jira/browse/HDFS-15990
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.10.0
Reporter: Xuze Yang


In FSImageFormatPBSnapshot.serializeSnapshotDiffSection(), the following code 
exists:
{quote}if (i % parent.getInodesPerSubSection() == 0) {
 parent.commitSubSection(headers,
 FSImageFormatProtobuf.SectionName.SNAPSHOT_DIFF_SUB);
}
{quote}
It aims to serialize SnapshotDiff information into several sub_sections(i.e. 
additional sub_sections information will be written to FileSummary Section). 
But in FSImageFormatPBSnapshot.loadSnapshotDiffSection(), it treats 
SnapshotDiffSection as a whole, rather than several sub_sections. So it's no 
need to introduce sub_sections here.

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15973) RBF: Add permission check before doting router federation rename.

2021-04-19 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-15973:
---
Attachment: (was: HDFS-15973.004.patch)

> RBF: Add permission check before doting router federation rename.
> -
>
> Key: HDFS-15973
> URL: https://issues.apache.org/jira/browse/HDFS-15973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15973.001.patch, HDFS-15973.002.patch, 
> HDFS-15973.003.patch
>
>
> The router federation rename is lack of permission check. It is a security 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15973) RBF: Add permission check before doting router federation rename.

2021-04-19 Thread Jinglun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-15973:
---
Attachment: HDFS-15973.004.patch

> RBF: Add permission check before doting router federation rename.
> -
>
> Key: HDFS-15973
> URL: https://issues.apache.org/jira/browse/HDFS-15973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15973.001.patch, HDFS-15973.002.patch, 
> HDFS-15973.003.patch, HDFS-15973.004.patch
>
>
> The router federation rename is lack of permission check. It is a security 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15970) Print network topology on the web

2021-04-19 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-15970:

Fix Version/s: 3.3.1

> Print network topology on the web
> -
>
> Key: HDFS-15970
> URL: https://issues.apache.org/jira/browse/HDFS-15970
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
> Attachments: hdfs-topology-json.jpg, hdfs-topology.jpg, hdfs-web.jpg
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In order to query the network topology information conveniently, we can print 
> it on the web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15811) completeFile should log final file size

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15811?focusedWorklogId=585084=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585084
 ]

ASF GitHub Bot logged work on HDFS-15811:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 11:47
Start Date: 19/Apr/21 11:47
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2670:
URL: https://github.com/apache/hadoop/pull/2670#issuecomment-822404635


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 11s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  2s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  39m  8s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 46s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 17s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 47s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  5s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 45s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   4m 16s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 37s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 44s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 58s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m  7s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 257m 16s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2670/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 37s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 366m 14s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks |
   |   | hadoop.hdfs.server.namenode.TestFileTruncate |
   |   | hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock |
   |   | hadoop.hdfs.TestGetBlocks |
   |   | hadoop.hdfs.server.diskbalancer.TestDiskBalancerRPC |
   |   | hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap |
   |   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
   |   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
   |   | 
hadoop.hdfs.server.namenode.snapshot.TestRenameWithOrderedSnapshotDeletion |
   |   | hadoop.hdfs.TestClientReportBadBlock |
   |   | hadoop.hdfs.server.namenode.snapshot.TestAclWithSnapshot |
   |   | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport |
   |   | hadoop.hdfs.server.blockmanagement.TestErasureCodingCorruption |
   |   | hadoop.hdfs.server.namenode.TestMetadataVersionOutput |
   |   | hadoop.hdfs.server.namenode.ha.TestBootstrapStandbyWithQJM |
   |   | hadoop.hdfs.server.blockmanagement.TestSlowDiskTracker |
   |   | 

[jira] [Resolved] (HDFS-15970) Print network topology on the web

2021-04-19 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-15970.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Print network topology on the web
> -
>
> Key: HDFS-15970
> URL: https://issues.apache.org/jira/browse/HDFS-15970
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: hdfs-topology-json.jpg, hdfs-topology.jpg, hdfs-web.jpg
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In order to query the network topology information conveniently, we can print 
> it on the web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15970) Print network topology on the web

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15970?focusedWorklogId=585083=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585083
 ]

ASF GitHub Bot logged work on HDFS-15970:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 11:43
Start Date: 19/Apr/21 11:43
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on pull request #2896:
URL: https://github.com/apache/hadoop/pull/2896#issuecomment-822402613


   Merged to trunk. Thanks for your contribution, @tomscut.
   Thanks for your review and your comments, @goiri and @ayushtkn.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585083)
Time Spent: 3h 10m  (was: 3h)

> Print network topology on the web
> -
>
> Key: HDFS-15970
> URL: https://issues.apache.org/jira/browse/HDFS-15970
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Attachments: hdfs-topology-json.jpg, hdfs-topology.jpg, hdfs-web.jpg
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> In order to query the network topology information conveniently, we can print 
> it on the web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15970) Print network topology on the web

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15970?focusedWorklogId=585082=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585082
 ]

ASF GitHub Bot logged work on HDFS-15970:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 11:42
Start Date: 19/Apr/21 11:42
Worklog Time Spent: 10m 
  Work Description: tasanuma merged pull request #2896:
URL: https://github.com/apache/hadoop/pull/2896


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 585082)
Time Spent: 3h  (was: 2h 50m)

> Print network topology on the web
> -
>
> Key: HDFS-15970
> URL: https://issues.apache.org/jira/browse/HDFS-15970
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Attachments: hdfs-topology-json.jpg, hdfs-topology.jpg, hdfs-web.jpg
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> In order to query the network topology information conveniently, we can print 
> it on the web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15879) Exclude slow nodes when choose targets for blocks

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15879?focusedWorklogId=585075=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585075
 ]

ASF GitHub Bot logged work on HDFS-15879:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 10:53
Start Date: 19/Apr/21 10:53
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2928:
URL: https://github.com/apache/hadoop/pull/2928#issuecomment-822375565


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  5s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ branch-3.3 Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 28s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  24m  3s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  compile  |  20m 49s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  checkstyle  |   3m 25s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  mvnsite  |   4m 42s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  javadoc  |   4m 26s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  spotbugs  |  10m 32s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  shadedclient  |  22m 47s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 31s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 44s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 48s |  |  the patch passed  |
   | -1 :x: |  javac  |  20m 48s | 
[/results-compile-javac-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2928/2/artifact/out/results-compile-javac-root.txt)
 |  root generated 1 new + 1846 unchanged - 1 fixed = 1847 total (was 1847)  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   3m 17s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2928/2/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 5 new + 775 unchanged - 5 fixed = 780 total (was 
780)  |
   | +1 :green_heart: |  mvnsite  |   4m 46s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   4m 30s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |  10m 10s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 14s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m  9s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   2m 45s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 249m 34s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2928/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  3s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 446m 57s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
   |   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
   |   | hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks |
   |   | hadoop.hdfs.TestReconstructStripedFile |
   |   | hadoop.hdfs.TestReconstructStripedFileWithValidator |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2928/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2928 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux 694ae6170bac 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.3 / ffaf672b3f95b83013c1d941544f6886c625791f |
   | Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~18.04-b08 |
   |  Test Results | 

[jira] [Updated] (HDFS-15872) Add the failed reason to Metrics during choosing Datanode.

2021-04-19 Thread Yang Yun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yun updated HDFS-15872:

Summary: Add the failed reason to Metrics during choosing Datanode.  (was: 
Add the failed reason to Metrics duiring choosing Datanode.)

> Add the failed reason to Metrics during choosing Datanode.
> --
>
> Key: HDFS-15872
> URL: https://issues.apache.org/jira/browse/HDFS-15872
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement, namenode
> Environment: Add the failed reason to Metrics duiring  choosing 
> Datanode.
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
> Attachments: HDFS-15872.001.patch
>
>
> Add the failed reason to metrics duiring choosing Datanode. So we can 
> troubleshoot or add storage related monitoring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15810) RBF: RBFMetrics's TotalCapacity out of bounds

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15810?focusedWorklogId=585019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-585019
 ]

ASF GitHub Bot logged work on HDFS-15810:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 10:13
Start Date: 19/Apr/21 10:13
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2910:
URL: https://github.com/apache/hadoop/pull/2910#issuecomment-822352050


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 57s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  16m  5s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  25m 56s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  28m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |  21m  0s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   4m  2s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 30s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 52s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 51s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   4m 30s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  18m 14s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 24s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  1s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  25m 13s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |  25m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 10s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  javac  |  20m 10s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   4m  5s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 17s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 42s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 39s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   4m 26s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  17m 13s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  18m 14s |  |  hadoop-common in the patch 
passed.  |
   | -1 :x: |  unit  |  27m 24s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2910/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  3s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 255m 32s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.federation.router.TestRouterRpc |
   |   | hadoop.hdfs.server.federation.router.TestRouterFederationRename |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2910/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2910 |
   | Optional Tests | dupname asflicense mvnsite codespell markdownlint compile 
javac javadoc mvninstall unit shadedclient spotbugs checkstyle |
   | uname | Linux c35012684956 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 
05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 64dbbf2d0c6bcbcb8a8714e5e53f3b136c758af5 |
   | Default Java | Private 

[jira] [Commented] (HDFS-15982) Deleted data on the Web UI must be saved to the trash

2021-04-19 Thread Bhavik Patel (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17324868#comment-17324868
 ] 

Bhavik Patel commented on HDFS-15982:
-

Thank you [~vjasani] for working on it. Provided PR#2927 looking good to me.

[~ste...@apache.org] [~weichiu] any thoughts?

> Deleted data on the Web UI must be saved to the trash 
> --
>
> Key: HDFS-15982
> URL: https://issues.apache.org/jira/browse/HDFS-15982
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Reporter: Bhavik Patel
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If we delete the data from the Web UI then it should be first moved to 
> configured/default Trash directory and after the trash interval time, it 
> should be removed. currently, data directly removed from the system[This 
> behavior should be the same as CLI cmd]
>  
> This can be helpful when the user accidentally deletes data from the Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15879) Exclude slow nodes when choose targets for blocks

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15879?focusedWorklogId=584963=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584963
 ]

ASF GitHub Bot logged work on HDFS-15879:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 08:32
Start Date: 19/Apr/21 08:32
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2928:
URL: https://github.com/apache/hadoop/pull/2928#issuecomment-822281671


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  32m 41s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ branch-3.3 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m 50s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  checkstyle  |   0m 56s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  spotbugs  |   3m 22s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  shadedclient  |  20m 32s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 52s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   3m 27s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  19m 47s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 250m 36s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2928/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 47s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 375m 37s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
   |   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
   |   | hadoop.hdfs.TestStripedFileAppend |
   |   | hadoop.hdfs.TestReconstructStripedFile |
   |   | hadoop.hdfs.TestReconstructStripedFileWithValidator |
   |   | hadoop.hdfs.server.namenode.ha.TestHAAppend |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2928/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2928 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux fd8cb4bfaac8 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.3 / ba44159ab2f3ce5ae78ec94d3e6a754b010620e7 |
   | Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~18.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2928/1/testReport/ |
   | Max. process+thread count | 2824 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2928/1/console |
   | versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact 

[jira] [Work logged] (HDFS-15989) Split TestBalancer into two classes

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15989?focusedWorklogId=584921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584921
 ]

ASF GitHub Bot logged work on HDFS-15989:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 06:58
Start Date: 19/Apr/21 06:58
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on a change in pull request #2923:
URL: https://github.com/apache/hadoop/pull/2923#discussion_r615585428



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer2.java
##
@@ -0,0 +1,744 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs.server.balancer;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.StorageType;
+import org.apache.hadoop.fs.permission.FsPermission;
+import org.apache.hadoop.hdfs.DFSClient;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
+import org.apache.hadoop.hdfs.DFSTestUtil;
+import org.apache.hadoop.hdfs.DFSUtil;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.hdfs.HdfsConfiguration;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.hadoop.hdfs.NameNodeProxies;
+import org.apache.hadoop.hdfs.client.HdfsClientConfigKeys;
+import org.apache.hadoop.hdfs.protocol.ClientProtocol;
+import org.apache.hadoop.hdfs.protocol.DatanodeID;
+import org.apache.hadoop.hdfs.protocol.DatanodeInfo;
+import org.apache.hadoop.hdfs.protocol.HdfsConstants;
+import org.apache.hadoop.hdfs.protocol.LocatedBlock;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy;
+import 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyWithUpgradeDomain;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementStatus;
+import org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager;
+import org.apache.hadoop.hdfs.server.datanode.DataNode;
+import org.apache.hadoop.hdfs.server.datanode.DataNodeTestUtils;
+import org.apache.hadoop.hdfs.server.datanode.SimulatedFSDataset;
+import 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.LazyPersistTestCase;
+import org.apache.hadoop.io.IOUtils;
+import org.apache.hadoop.test.GenericTestUtils;
+import org.junit.After;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.slf4j.event.Level;
+
+import java.io.OutputStream;
+import java.net.InetSocketAddress;
+import java.net.URI;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import static org.apache.hadoop.fs.StorageType.DEFAULT;
+import static org.apache.hadoop.fs.StorageType.RAM_DISK;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BALANCER_MAX_SIZE_TO_MOVE_KEY;
+import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BLOCK_SIZE_KEY;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_BLOCK_PINNING_ENABLED;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_LAZY_WRITER_INTERVAL_SEC;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY;
+import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_KEY;
+import static 
org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_LAZY_PERSIST_FILE_SCRUB_INTERVAL_SEC;
+import static org.apache.hadoop.test.PlatformAssumptions.assumeNotWindows;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Some long running Balancer tasks.
+ */
+public class TestBalancer2 {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(TestBalancer2.class);
+
+  static {
+GenericTestUtils.setLogLevel(Balancer.LOG, Level.TRACE);
+GenericTestUtils.setLogLevel(Dispatcher.LOG, Level.DEBUG);
+  }
+
+  private final static long CAPACITY = 5000L;
+  private final static String RACK0 = "/rack0";
+  

[jira] [Work logged] (HDFS-15810) RBF: RBFMetrics's TotalCapacity out of bounds

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15810?focusedWorklogId=584902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584902
 ]

ASF GitHub Bot logged work on HDFS-15810:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 06:10
Start Date: 19/Apr/21 06:10
Worklog Time Spent: 10m 
  Work Description: aajisaka commented on pull request #2910:
URL: https://github.com/apache/hadoop/pull/2910#issuecomment-822196865


   Would you update the web UI to use the new metrics?
   
https://github.com/aajisaka/hadoop/blob/486ddb73f693177787e4abff7c932be9b925/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/webapps/router/federationhealth.html#L116-L118


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 584902)
Time Spent: 1h 20m  (was: 1h 10m)

> RBF: RBFMetrics's TotalCapacity out of bounds
> -
>
> Key: HDFS-15810
> URL: https://issues.apache.org/jira/browse/HDFS-15810
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Xiaoxing Wei
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-02-02-10-59-17-113.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The Long type fields TotalCapacity,UsedCapacity and RemainingCapacity in 
> RBFMetrics maybe ** out of bounds.
> !image-2021-02-02-10-59-17-113.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15989) Split TestBalancer into two classes

2021-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15989?focusedWorklogId=584898=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-584898
 ]

ASF GitHub Bot logged work on HDFS-15989:
-

Author: ASF GitHub Bot
Created on: 19/Apr/21 05:59
Start Date: 19/Apr/21 05:59
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on pull request #2923:
URL: https://github.com/apache/hadoop/pull/2923#issuecomment-822191601


   @aajisaka would you like to take a look?
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 584898)
Time Spent: 2.5h  (was: 2h 20m)

> Split TestBalancer into two classes
> ---
>
> Key: HDFS-15989
> URL: https://issues.apache.org/jira/browse/HDFS-15989
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> TestBalancer has many tests accumulated, it would be good to split it up into 
> two classes. Moreover, TestBalancer#testMaxIterationTime is flaky. We should 
> also resolve it with this Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org