[jira] [Commented] (HDFS-16393) RBF: Fix TestRouterRPCMultipleDestinationMountTableResolver

2021-12-20 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463026#comment-17463026
 ] 

Ayush Saxena commented on HDFS-16393:
-

Guess just changing
 dfsCluster.restartNameNode(0, false); ->  dfsCluster.restartNameNode(0); 
should fix the test. 
seems due to change in minidfs cluster restart logic change

> RBF: Fix TestRouterRPCMultipleDestinationMountTableResolver
> ---
>
> Key: HDFS-16393
> URL: https://issues.apache.org/jira/browse/HDFS-16393
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Priority: Major
>
> Fails in the after block
> https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/724/testReport/junit/org.apache.hadoop.hdfs.server.federation.router/TestRouterRPCMultipleDestinationMountTableResolver/testInvokeAtAvailableNs/



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16393) RBF: Fix TestRouterRPCMultipleDestinationMountTableResolver

2021-12-20 Thread Ayush Saxena (Jira)
Ayush Saxena created HDFS-16393:
---

 Summary: RBF: Fix 
TestRouterRPCMultipleDestinationMountTableResolver
 Key: HDFS-16393
 URL: https://issues.apache.org/jira/browse/HDFS-16393
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ayush Saxena


Fails in the after block
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/724/testReport/junit/org.apache.hadoop.hdfs.server.federation.router/TestRouterRPCMultipleDestinationMountTableResolver/testInvokeAtAvailableNs/



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16348) Mark slownode as badnode to recover pipeline

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16348?focusedWorklogId=699208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699208
 ]

ASF GitHub Bot logged work on HDFS-16348:
-

Author: ASF GitHub Bot
Created on: 21/Dec/21 05:33
Start Date: 21/Dec/21 05:33
Worklog Time Spent: 10m 
  Work Description: symious commented on pull request #3704:
URL: https://github.com/apache/hadoop/pull/3704#issuecomment-998486754


   @tasanuma Thanks for the detailed review. Updated as suggested, please have 
a check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 699208)
Time Spent: 3h 20m  (was: 3h 10m)

> Mark slownode as badnode to recover pipeline
> 
>
> Key: HDFS-16348
> URL: https://issues.apache.org/jira/browse/HDFS-16348
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Janus Chow
>Assignee: Janus Chow
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In HDFS-16320, the DataNode can retrieve the SLOW status from each NameNode. 
> This ticket is to send this information back to Clients who are writing 
> blocks. If a Clients noticed the pipeline is build on a slownode, he/she can 
> choose to mark the slownode as a badnode to exclude the node or rebuild a 
> pipeline.
> In order to avoid the false positives, we added a config of "threshold", only 
> clients continuously receives slownode reply from the same node will the node 
> be marked as SLOW.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=699205&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699205
 ]

ASF GitHub Bot logged work on HDFS-16303:
-

Author: ASF GitHub Bot
Created on: 21/Dec/21 05:17
Start Date: 21/Dec/21 05:17
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on pull request #3675:
URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998480240


   > If there are fewer nodes being decommissioned than max tracked nodes, then 
there are no nodes in the pendingNodes queue & all nodes are being tracked for 
decommissioning. Therefore, there is no possibility that any healthy nodes are 
blocked in the pendingNodes queue
   
   Yes makes sense. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 699205)
Time Spent: 10.5h  (was: 10h 20m)

> Losing over 100 datanodes in state decommissioning results in full blockage 
> of all datanode decommissioning
> ---
>
> Key: HDFS-16303
> URL: https://issues.apache.org/jira/browse/HDFS-16303
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1, 3.3.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> h2. Impact
> HDFS datanode decommissioning does not make any forward progress. For 
> example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X 
> of those datanodes remain in state decommissioning forever without making any 
> forward progress towards being decommissioned.
> h2. Root Cause
> The HDFS Namenode class "DatanodeAdminManager" is responsible for 
> decommissioning datanodes.
> As per this "hdfs-site" configuration:
> {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes 
>  Default Value = 100
> The maximum number of decommission-in-progress datanodes nodes that will be 
> tracked at one time by the namenode. Tracking a decommission-in-progress 
> datanode consumes additional NN memory proportional to the number of blocks 
> on the datnode. Having a conservative limit reduces the potential impact of 
> decomissioning a large number of nodes at once. A value of 0 means no limit 
> will be enforced.
> {quote}
> The Namenode will only actively track up to 100 datanodes for decommissioning 
> at any given time, as to avoid Namenode memory pressure.
> Looking into the "DatanodeAdminManager" code:
>  * a new datanode is only removed from the "tracked.nodes" set when it 
> finishes decommissioning
>  * a new datanode is only added to the "tracked.nodes" set if there is fewer 
> than 100 datanodes being tracked
> So in the event that there are more than 100 datanodes being decommissioned 
> at a given time, some of those datanodes will not be in the "tracked.nodes" 
> set until 1 or more datanodes in the "tracked.nodes" finishes 
> decommissioning. This is generally not a problem because the datanodes in 
> "tracked.nodes" will eventually finish decommissioning, but there is an edge 
> case where this logic prevents the namenode from making any forward progress 
> towards decommissioning.
> If all 100 datanodes in the "tracked.nodes" are unable to finish 
> decommissioning, then other datanodes (which may be able to be 
> decommissioned) will never get added to "tracked.nodes" and therefore will 
> never get the opportunity to be decommissioned.
> This can occur due the following issue:
> {quote}2021-10-21 12:39:24,048 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager 
> (DatanodeAdminMonitor-0): Node W.X.Y.Z:50010 is dead while in Decommission In 
> Progress. Cannot be safely decommissioned or be in maintenance since there is 
> risk of reduced data durability or data loss. Either restart the failed node 
> or force decommissioning or maintenance by removing, calling refreshNodes, 
> then re-adding to the excludes or host config files.
> {quote}
> If a Datanode is lost while decommissioning (for example if the underlying 
> hardware fails or is lost), then it will remain in state decommissioning 
> forever.
> If 100 or more Datanodes are lost while decommissioning over the Hadoop 
> cluster lifetime, then this is enough to completely fill up the 
> "tracked.nodes" set. With the entire "tracked.nodes" set filled with 
> datanodes that can never finish deco

[jira] [Work logged] (HDFS-16348) Mark slownode as badnode to recover pipeline

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16348?focusedWorklogId=699169&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699169
 ]

ASF GitHub Bot logged work on HDFS-16348:
-

Author: ASF GitHub Bot
Created on: 21/Dec/21 04:09
Start Date: 21/Dec/21 04:09
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on a change in pull request #3704:
URL: https://github.com/apache/hadoop/pull/3704#discussion_r772796622



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
##
@@ -1254,10 +1273,52 @@ public void run() {
   }
 }
 
+void markSlowNode(List slownodesFromAck) throws IOException {
+  Set discontinuousNodes = new 
HashSet<>(slowNodeMap.keySet());
+  for (DatanodeInfo slowNode : slownodesFromAck) {
+if (!slowNodeMap.containsKey(slowNode)) {
+  slowNodeMap.put(slowNode, 1);
+} else {
+  int oldCount = slowNodeMap.get(slowNode);
+  slowNodeMap.put(slowNode, ++oldCount);
+}
+discontinuousNodes.remove(slowNode);
+  }
+  for (DatanodeInfo discontinuousNode : discontinuousNodes) {
+slowNodeMap.remove(discontinuousNode);
+  }
+
+  if (!slowNodeMap.isEmpty()) {
+for (Map.Entry entry : slowNodeMap.entrySet()) {
+  if (entry.getValue() >= markSlowNodeAsBadNodeThreshold) {
+DatanodeInfo slowNode = entry.getKey();
+int index = getDatanodeIndex(slowNode);
+if (index >= 0) {
+  errorState.setBadNodeIndex(
+  getDatanodeIndex(entry.getKey()));

Review comment:
   We can reuse `index` variable.
   ```suggestion
 errorState.setBadNodeIndex(index);
   ```

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java
##
@@ -230,14 +260,27 @@ public static ECN getECNFromHeader(int header) {
 return StatusFormat.getECN(header);
   }
 
+  public static SLOW getSLOWFromHeader(int header) {
+return StatusFormat.getSLOW(header);
+  }
+
   public static int setStatusForHeader(int old, Status status) {
 return StatusFormat.setStatus(old, status);
   }
 
+  public static int setSLOWForHeader(int old, SLOW slow) {

Review comment:
   Only the unit test uses this method. Would you please add 
VisibleForTesting?
   ```suggestion
 @VisibleForTesting
 public static int setSLOWForHeader(int old, SLOW slow) {
   ```

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java
##
@@ -230,14 +260,27 @@ public static ECN getECNFromHeader(int header) {
 return StatusFormat.getECN(header);
   }
 
+  public static SLOW getSLOWFromHeader(int header) {
+return StatusFormat.getSLOW(header);
+  }
+
   public static int setStatusForHeader(int old, Status status) {
 return StatusFormat.setStatus(old, status);
   }
 
+  public static int setSLOWForHeader(int old, SLOW slow) {
+return StatusFormat.setSLOW(old, slow);
+  }
+
   public static int combineHeader(ECN ecn, Status status) {
+return combineHeader(ecn, status, SLOW.DISABLED);
+  }
+
+  public static int combineHeader(ECN ecn, Status status, SLOW slow) {

Review comment:
   I want `PipelineAck#getHeaderFlag()` to use this method.
   ```java
 public int getHeaderFlag(int i) {
   if (proto.getFlagCount() > 0) {
 return proto.getFlag(i);
   } else {
 return combineHeader(ECN.DISABLED, proto.getReply(i), SLOW.DISABLED);
   }
 }
   ```

##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
##
@@ -1620,8 +1623,10 @@ private void sendAckUpstreamUnprotected(PipelineAck ack, 
long seqno,
 // downstream nodes, reply should contain one reply.
 replies = new int[] { myHeader };
   } else if (mirrorError) { // ack read error
-int h = PipelineAck.combineHeader(datanode.getECN(), Status.SUCCESS);
-int h1 = PipelineAck.combineHeader(datanode.getECN(), Status.ERROR);
+int h = PipelineAck.combineHeader(datanode.getECN(), Status.SUCCESS,
+datanode.getSLOW());
+int h1 = PipelineAck.combineHeader(datanode.getECN(), Status.ERROR,
+datanode.getSLOW());

Review comment:
   Why it doesn't use 
`datanode.getSLOWByBlockPoolId(block.getBlockPoolId())`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-

[jira] [Commented] (HDFS-16368) DFSAdmin supports refresh topology info without restarting namenode

2021-12-20 Thread zhanghaobo (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462940#comment-17462940
 ] 

zhanghaobo commented on HDFS-16368:
---

[~ferhui] ,yeah, looks like the same.

>  DFSAdmin supports refresh topology info without restarting namenode
> 
>
> Key: HDFS-16368
> URL: https://issues.apache.org/jira/browse/HDFS-16368
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: dfsadmin, namanode
>Affects Versions: 2.7.7, 3.3.1
>Reporter: zhanghaobo
>Priority: Major
>  Labels: features, pull-request-available
> Attachments: 0001.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently in HDFS, if we update the rack info for rack-awareness, we may need 
> to rolling restart namenodes to let it be effective. If cluster is large, the 
> cost time of rolling restart namenodes is very log. So, we develope a method 
> to refresh topology info without rolling restart namenodes.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16389) Improve NNThroughputBenchmark test mkdirs

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16389?focusedWorklogId=699112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699112
 ]

ASF GitHub Bot logged work on HDFS-16389:
-

Author: ASF GitHub Bot
Created on: 21/Dec/21 02:08
Start Date: 21/Dec/21 02:08
Worklog Time Spent: 10m 
  Work Description: jianghuazhu commented on pull request #3819:
URL: https://github.com/apache/hadoop/pull/3819#issuecomment-998409764


   Could you help review this pr, @aajisaka  @virajjasani .
   Thank you very much. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 699112)
Time Spent: 0.5h  (was: 20m)

> Improve NNThroughputBenchmark test mkdirs
> -
>
> Key: HDFS-16389
> URL: https://issues.apache.org/jira/browse/HDFS-16389
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: benchmarks, namenode
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When using the NNThroughputBenchmark test to create a large number of 
> directories, some abnormal information will be prompted.
> Here is the command:
> ./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs 
> hdfs:// -op mkdirs -threads 30 -dirs 500
> There are some exceptions here, such as:
> 21/12/20 10:25:00 INFO namenode.NNThroughputBenchmark: Starting benchmark: 
> mkdirs
> 21/12/20 10:25:01 INFO namenode.NNThroughputBenchmark: Generate 500 
> inputs for mkdirs
> 21/12/20 10:25:08 ERROR namenode.NNThroughputBenchmark: 
> java.lang.ArrayIndexOutOfBoundsException: 20
>   at 
> org.apache.hadoop.hdfs.server.namenode.FileNameGenerator.getNextDirName(FileNameGenerator.java:65)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FileNameGenerator.getNextFileName(FileNameGenerator.java:73)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$MkdirsStats.generateInputs(NNThroughputBenchmark.java:668)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$OperationStatsBase.benchmark(NNThroughputBenchmark.java:257)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.run(NNThroughputBenchmark.java:1528)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.runBenchmark(NNThroughputBenchmark.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.main(NNThroughputBenchmark.java:1550)
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 20
>   at 
> org.apache.hadoop.hdfs.server.namenode.FileNameGenerator.getNextDirName(FileNameGenerator.java:65)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FileNameGenerator.getNextFileName(FileNameGenerator.java:73)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$MkdirsStats.generateInputs(NNThroughputBenchmark.java:668)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$OperationStatsBase.benchmark(NNThroughputBenchmark.java:257)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.run(NNThroughputBenchmark.java:1528)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.runBenchmark(NNThroughputBenchmark.java:1430)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.main(NNThroughputBenchmark.java:1550)
> These messages appear because some parameters are incorrectly set, such as 
> dirsPerDir or filesPerDir.
> When we see this log, this will make us have some questions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16385) Fix Datanode retrieve slownode information bug.

2021-12-20 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16385.
-
Resolution: Fixed

Merged the PR. Thanks for your contribution, [~JacksonWang]. I added you to a 
contributor role.

> Fix Datanode retrieve slownode information bug.
> ---
>
> Key: HDFS-16385
> URL: https://issues.apache.org/jira/browse/HDFS-16385
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jackson Wang
>Assignee: Jackson Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> in HDFS-16320, the DataNode will retrieve the SLOW status from each NameNode. 
> But namenode did not set isSlowNode to HeartbeatResponseProto in 
> DatanodeProtocolServerSideTranslatorPB#sendHeartbeat.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16385) Fix Datanode retrieve slownode information bug.

2021-12-20 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma reassigned HDFS-16385:
---

Assignee: Jackson Wang

> Fix Datanode retrieve slownode information bug.
> ---
>
> Key: HDFS-16385
> URL: https://issues.apache.org/jira/browse/HDFS-16385
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jackson Wang
>Assignee: Jackson Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> in HDFS-16320, the DataNode will retrieve the SLOW status from each NameNode. 
> But namenode did not set isSlowNode to HeartbeatResponseProto in 
> DatanodeProtocolServerSideTranslatorPB#sendHeartbeat.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16386?focusedWorklogId=699110&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699110
 ]

ASF GitHub Bot logged work on HDFS-16386:
-

Author: ASF GitHub Bot
Created on: 21/Dec/21 02:04
Start Date: 21/Dec/21 02:04
Worklog Time Spent: 10m 
  Work Description: jianghuazhu commented on pull request #3806:
URL: https://github.com/apache/hadoop/pull/3806#issuecomment-998408050


   Thank you for your attention and comments, @brahmareddybattula .
   I will continue to work. If necessary, I will create a new jira.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 699110)
Time Spent: 3h  (was: 2h 50m)

> Reduce DataNode load when FsDatasetAsyncDiskService is working
> --
>
> Key: HDFS-16386
> URL: https://issues.apache.org/jira/browse/HDFS-16386
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: monitor.png
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it 
> will cause a high load on the DataNode.
> Here are some monitoring related to memory:
>  !monitor.png! 
> Since each disk deletes the block asynchronously, and each thread allows 4 
> threads to work,
> This will cause some troubles to the DataNode, such as increased cpu and 
> increased memory.
> We should appropriately reduce the number of jobs of the total thread so that 
> the DataNode can work better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16385) Fix Datanode retrieve slownode information bug.

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16385?focusedWorklogId=699109&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699109
 ]

ASF GitHub Bot logged work on HDFS-16385:
-

Author: ASF GitHub Bot
Created on: 21/Dec/21 02:04
Start Date: 21/Dec/21 02:04
Worklog Time Spent: 10m 
  Work Description: tasanuma merged pull request #3803:
URL: https://github.com/apache/hadoop/pull/3803


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 699109)
Time Spent: 50m  (was: 40m)

> Fix Datanode retrieve slownode information bug.
> ---
>
> Key: HDFS-16385
> URL: https://issues.apache.org/jira/browse/HDFS-16385
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jackson Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> in HDFS-16320, the DataNode will retrieve the SLOW status from each NameNode. 
> But namenode did not set isSlowNode to HeartbeatResponseProto in 
> DatanodeProtocolServerSideTranslatorPB#sendHeartbeat.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16385) Fix Datanode retrieve slownode information bug.

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16385?focusedWorklogId=699111&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699111
 ]

ASF GitHub Bot logged work on HDFS-16385:
-

Author: ASF GitHub Bot
Created on: 21/Dec/21 02:04
Start Date: 21/Dec/21 02:04
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on pull request #3803:
URL: https://github.com/apache/hadoop/pull/3803#issuecomment-998408212


   Thanks for fixing the issue, @Jackson-Wang-7. Thanks for your reviews, 
@symious, @ferhui, @tomscut.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 699111)
Time Spent: 1h  (was: 50m)

> Fix Datanode retrieve slownode information bug.
> ---
>
> Key: HDFS-16385
> URL: https://issues.apache.org/jira/browse/HDFS-16385
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jackson Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> in HDFS-16320, the DataNode will retrieve the SLOW status from each NameNode. 
> But namenode did not set isSlowNode to HeartbeatResponseProto in 
> DatanodeProtocolServerSideTranslatorPB#sendHeartbeat.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16386?focusedWorklogId=699107&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699107
 ]

ASF GitHub Bot logged work on HDFS-16386:
-

Author: ASF GitHub Bot
Created on: 21/Dec/21 01:59
Start Date: 21/Dec/21 01:59
Worklog Time Spent: 10m 
  Work Description: jianghuazhu commented on pull request #3806:
URL: https://github.com/apache/hadoop/pull/3806#issuecomment-998406033


   Thank you for your reminder and help, @jojochuang .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 699107)
Time Spent: 2h 50m  (was: 2h 40m)

> Reduce DataNode load when FsDatasetAsyncDiskService is working
> --
>
> Key: HDFS-16386
> URL: https://issues.apache.org/jira/browse/HDFS-16386
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: monitor.png
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it 
> will cause a high load on the DataNode.
> Here are some monitoring related to memory:
>  !monitor.png! 
> Since each disk deletes the block asynchronously, and each thread allows 4 
> threads to work,
> This will cause some troubles to the DataNode, such as increased cpu and 
> increased memory.
> We should appropriately reduce the number of jobs of the total thread so that 
> the DataNode can work better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16371) Exclude slow disks when choosing volume

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16371?focusedWorklogId=699101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699101
 ]

ASF GitHub Bot logged work on HDFS-16371:
-

Author: ASF GitHub Bot
Created on: 21/Dec/21 01:49
Start Date: 21/Dec/21 01:49
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #3753:
URL: https://github.com/apache/hadoop/pull/3753#issuecomment-998402471


   Hi @tasanuma @jojochuang @ayushtkn . Please help to review this PR. Thank 
you very much.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 699101)
Time Spent: 1h 20m  (was: 1h 10m)

> Exclude slow disks when choosing volume
> ---
>
> Key: HDFS-16371
> URL: https://issues.apache.org/jira/browse/HDFS-16371
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, the datanode can detect slow disks. See HDFS-11461.
> And after HDFS-16311, the slow disk information we collected is more accurate.
> So we can exclude these slow disks according to some rules when choosing 
> volume. This will prevents some slow disks from affecting the throughput of 
> the whole datanode.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16376) Expose metrics of NodeNotChosenReason to JMX

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16376?focusedWorklogId=699100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699100
 ]

ASF GitHub Bot logged work on HDFS-16376:
-

Author: ASF GitHub Bot
Created on: 21/Dec/21 01:47
Start Date: 21/Dec/21 01:47
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #3778:
URL: https://github.com/apache/hadoop/pull/3778#issuecomment-998401775


   Hi @ayushtkn @jojochuang @ferhui @tasanuma , could you please take a look at 
this? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 699100)
Time Spent: 1h  (was: 50m)

> Expose metrics of NodeNotChosenReason to JMX
> 
>
> Key: HDFS-16376
> URL: https://issues.apache.org/jira/browse/HDFS-16376
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2021-12-09-23-48-42-865.png, 
> image-2021-12-09-23-55-29-017.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In our cluster, we can see logs for nodes that are not chosen. But it's hard 
> to see the percentages in each reason from the logs. It is best to add 
> relevant metrics to monitor the entire cluster.
> !image-2021-12-09-23-48-42-865.png|width=517,height=187!
> *JMX metrics:*
> !image-2021-12-09-23-55-29-017.png|width=620,height=152!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=699038&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699038
 ]

ASF GitHub Bot logged work on HDFS-16303:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 23:46
Start Date: 20/Dec/21 23:46
Worklog Time Spent: 10m 
  Work Description: KevinWikant edited a comment on pull request #3675:
URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998300466


   @virajjasani, please see my response to your comments below
   
   > hence if few nodes are really in bad state (hardware/network issues), the 
plan is to keep re-queueing them until more nodes are getting decommissioned 
than max tracked nodes right?
   
   It's the opposite, the unhealthy nodes will only be re-queued when there are 
more nodes being decommissioned than max tracked nodes. Otherwise, if there are 
fewer nodes being decommissioned than max tracked nodes, then the unhealthy 
nodes will not be re-queued because they do not risk blocking the 
decommissioning of queued healthy nodes (i.e. because the queue is empty).
   
   One potential performance impact that comes to mind is that if there are say 
200 unhealthy decommissioning nodes & max tracked nodes = 100, then this may 
cause some churn in the queueing/de-queueing process because each 
DatanodeAdminMonitor tick all 100 tracked nodes will be re-queued & then 100 
queued nodes will be de-queued/tracked. Note that this churn (and any 
associated performance impact) will only take effect when:
   - there are more nodes being decommissioned than max tracked nodes
   - AND either:
   - number of healthy decommissioning nodes < max tracked nodes
   - number of unhealthy decommissioning nodes > max tracked nodes
   
   The amount of re-queued/de-queued nodes per tick can be quantified as:
   
   `numRequeue = numDecommissioning <= numTracked ? 0 : numDeadDecommissioning 
- (numDecommissioning - numTracked)`
   
   This churn of queueing/de-queueing will not occur at all under typical 
decommissioning scenarios (i.e. where there isn't a large number of dead 
decommissioning nodes).
   
   One idea to mitigate this is to have DatanodeAdminMonitor maintain counters 
used to track the number of healthy nodes in the pendingNodes queue; then this 
count can be used to make an improved re-queue decision. In particular, 
unhealthy nodes are only re-queued if there are healthy nodes in the 
pendingNodes queue. But this approach has some flaws, for example an unhealthy 
node in the queue could come alive again, but then an unhealthy node in the 
tracked set wouldn't be re-queued because the healthy queued node count hasn't 
been updated. To solve this, we would need to scan the pendingNodes queue to 
update the healthy/unhealthy node counts periodically, this scan could prove 
expensive.
   
   > Since unhealthy node getting decommissioned might anyways require some 
sort of retry, shall we requeue them even if the condition is not met (i.e. 
total no of decomm in progress < max tracked nodes) as a limited retries?
   
   If there are fewer nodes being decommissioned than max tracked nodes, then 
there are no nodes in the pendingNodes queue & all nodes are being tracked for 
decommissioning. Therefore, there is no possibility that any healthy nodes are 
blocked in the pendingNodes queue (preventing them from being decommissioned) & 
so in my opinion there is no benefit to re-queueing the unhealthy nodes in this 
case. Furthermore, this will negatively impact performance through frequent 
re-queueing & de-queueing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 699038)
Time Spent: 10h 20m  (was: 10h 10m)

> Losing over 100 datanodes in state decommissioning results in full blockage 
> of all datanode decommissioning
> ---
>
> Key: HDFS-16303
> URL: https://issues.apache.org/jira/browse/HDFS-16303
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1, 3.3.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> h2. Impact
> HDFS datanode decommissioning does not make any forward progress. For 
> example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X 
> of those datanodes remain in state decommissioning forever with

[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=699037&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699037
 ]

ASF GitHub Bot logged work on HDFS-16303:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 23:45
Start Date: 20/Dec/21 23:45
Worklog Time Spent: 10m 
  Work Description: KevinWikant edited a comment on pull request #3675:
URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998300466


   > hence if few nodes are really in bad state (hardware/network issues), the 
plan is to keep re-queueing them until more nodes are getting decommissioned 
than max tracked nodes right?
   
   It's the opposite, the unhealthy nodes will only be re-queued when there are 
more nodes being decommissioned than max tracked nodes. Otherwise, if there are 
fewer nodes being decommissioned than max tracked nodes, then the unhealthy 
nodes will not be re-queued because they do not risk blocking the 
decommissioning of queued healthy nodes (i.e. because the queue is empty).
   
   One potential performance impact that comes to mind is that if there are say 
200 unhealthy decommissioning nodes & max tracked nodes = 100, then this may 
cause some churn in the queueing/de-queueing process because each 
DatanodeAdminMonitor tick all 100 tracked nodes will be re-queued & then 100 
queued nodes will be de-queued/tracked. Note that this churn (and any 
associated performance impact) will only take effect when:
   - there are more nodes being decommissioned than max tracked nodes
   - AND either:
   - number of healthy decommissioning nodes < max tracked nodes
   - number of unhealthy decommissioning nodes > max tracked nodes
   
   The amount of re-queued/de-queued nodes per tick can be quantified as:
   
   `numRequeue = numDecommissioning <= numTracked ? 0 : numDeadDecommissioning 
- (numDecommissioning - numTracked)`
   
   This churn of queueing/de-queueing will not occur at all under typical 
decommissioning scenarios (i.e. where there isn't a large number of dead 
decommissioning nodes).
   
   One idea to mitigate this is to have DatanodeAdminMonitor maintain counters 
used to track the number of healthy nodes in the pendingNodes queue; then this 
count can be used to make an improved re-queue decision. In particular, 
unhealthy nodes are only re-queued if there are healthy nodes in the 
pendingNodes queue. But this approach has some flaws, for example an unhealthy 
node in the queue could come alive again, but then an unhealthy node in the 
tracked set wouldn't be re-queued because the healthy queued node count hasn't 
been updated. To solve this, we would need to scan the pendingNodes queue to 
update the healthy/unhealthy node counts periodically, this scan could prove 
expensive.
   
   > Since unhealthy node getting decommissioned might anyways require some 
sort of retry, shall we requeue them even if the condition is not met (i.e. 
total no of decomm in progress < max tracked nodes) as a limited retries?
   
   If there are fewer nodes being decommissioned than max tracked nodes, then 
there are no nodes in the pendingNodes queue & all nodes are being tracked for 
decommissioning. Therefore, there is no possibility that any healthy nodes are 
blocked in the pendingNodes queue (preventing them from being decommissioned) & 
so in my opinion there is no benefit to re-queueing the unhealthy nodes in this 
case. Furthermore, this will negatively impact performance through frequent 
re-queueing & de-queueing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 699037)
Time Spent: 10h 10m  (was: 10h)

> Losing over 100 datanodes in state decommissioning results in full blockage 
> of all datanode decommissioning
> ---
>
> Key: HDFS-16303
> URL: https://issues.apache.org/jira/browse/HDFS-16303
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1, 3.3.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> h2. Impact
> HDFS datanode decommissioning does not make any forward progress. For 
> example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X 
> of those datanodes remain in state decommissioning forever without making any 
> forward progress towards being decommissioned.
> h2. 

[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=699036&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699036
 ]

ASF GitHub Bot logged work on HDFS-16303:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 23:44
Start Date: 20/Dec/21 23:44
Worklog Time Spent: 10m 
  Work Description: KevinWikant edited a comment on pull request #3675:
URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998300466


   > hence if few nodes are really in bad state (hardware/network issues), the 
plan is to keep re-queueing them until more nodes are getting decommissioned 
than max tracked nodes right?
   
   It's the opposite, the unhealthy nodes will only be re-queued when there are 
more nodes being decommissioned than max tracked nodes. Otherwise, if there are 
fewer nodes being decommissioned than max tracked nodes, then the unhealthy 
nodes will not be re-queued because they do not risk blocking the 
decommissioning of queued healthy nodes (i.e. because the queue is empty).
   
   One potential performance impact that does come to mind is that if there are 
say 200 unhealthy decommissioning nodes & max tracked nodes = 100, then this 
may cause some churn in the queueing/de-queueing process because each 
DatanodeAdminMonitor tick all 100 tracked nodes will be re-queued & then 100 
queued nodes will be de-queued/tracked. Note that this churn (and any 
associated performance impact) will only take effect when:
   - there are more nodes being decommissioned than max tracked nodes
   - AND either:
   - number of healthy decommissioning nodes < max tracked nodes
   - number of unhealthy decommissioning nodes > max tracked nodes
   
   The amount of re-queued/de-queued nodes per tick can be quantified as:
   
   `numRequeue = numDecommissioning <= numTracked ? 0 : numDeadDecommissioning 
-(numDecommissioning - numTracked)`
   
   This churn of queueing/de-queueing will not occur at all under typical 
decommissioning scenarios (i.e. where there isn't a large number of dead 
decommissioning nodes).
   
   One idea to mitigate this is to have DatanodeAdminMonitor maintain counters 
used to track the number of healthy nodes in the pendingNodes queue; then this 
count can be used to make an improved re-queue decision. In particular, 
unhealthy nodes are only re-queued if there are healthy nodes in the 
pendingNodes queue. But this approach has some flaws, for example an unhealthy 
node in the queue could come alive again, but then an unhealthy node in the 
tracked set wouldn't be re-queued because the healthy queued node count hasn't 
been updated. To solve this, we would need to scan the pendingNodes queue to 
update the healthy/unhealthy node counts periodically, this scan could prove 
expensive.
   
   > Since unhealthy node getting decommissioned might anyways require some 
sort of retry, shall we requeue them even if the condition is not met (i.e. 
total no of decomm in progress < max tracked nodes) as a limited retries?
   
   If there are fewer nodes being decommissioned than max tracked nodes, then 
there are no nodes in the pendingNodes queue & all nodes are being tracked for 
decommissioning. Therefore, there is no possibility that any healthy nodes are 
blocked in the pendingNodes queue (preventing them from being decommissioned) & 
so in my opinion there is no benefit to re-queueing the unhealthy nodes in this 
case. Furthermore, this will negatively impact performance through frequent 
re-queueing & de-queueing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 699036)
Time Spent: 10h  (was: 9h 50m)

> Losing over 100 datanodes in state decommissioning results in full blockage 
> of all datanode decommissioning
> ---
>
> Key: HDFS-16303
> URL: https://issues.apache.org/jira/browse/HDFS-16303
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1, 3.3.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> h2. Impact
> HDFS datanode decommissioning does not make any forward progress. For 
> example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X 
> of those datanodes remain in state decommissioning forever without making any 
> forward progress towards being decommissioned.
> h2. Ro

[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=699002&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699002
 ]

ASF GitHub Bot logged work on HDFS-16303:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 21:59
Start Date: 20/Dec/21 21:59
Worklog Time Spent: 10m 
  Work Description: KevinWikant edited a comment on pull request #3675:
URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998300466


   > hence if few nodes are really in bad state (hardware/network issues), the 
plan is to keep re-queueing them until more nodes are getting decommissioned 
than max tracked nodes right?
   
   It's the opposite, the unhealthy nodes will only be re-queued when there are 
more nodes being decommissioned than max tracked nodes. Otherwise, if there are 
fewer nodes being decommissioned than max tracked nodes, then the unhealthy 
nodes will not be re-queued because they do not risk blocking the 
decommissioning of queued healthy nodes (i.e. because the queue is empty).
   
   One potential performance impact that does come to mind is that if there are 
say 200 unhealthy decommissioning nodes & max tracked nodes = 100, then this 
may cause some churn in the queueing/de-queueing process because each 
DatanodeAdminMonitor tick all 100 tracked nodes will be re-queued & then 100 
queued nodes will be de-queued/tracked. Note that this churn (and any 
associated performance impact) will only take effect when:
   - there are more nodes being decommissioned than max tracked nodes
   - AND either:
   - number of healthy decommissioning nodes < max tracked nodes
   - number of unhealthy decommissioning nodes > max tracked nodes
   
   The amount of re-queued/de-queued nodes per tick can be quantified as:
   
   `numRequeue = numDecommissioning <= numTracked ? 0 : numDeadDecommissioning 
-(numDecommissioning - numTracked)`
   
   This churn of queueing/de-queueing will not occur at all under typical 
decommissioning scenarios (i.e. where there isn't a large number of dead 
decommissioning nodes).
   
   One idea to mitigate this is to have DatanodeAdminMonitor maintain counters 
used to track the number of healthy in the pendingNodes queue; then these 
counts could be used to make an improved re-queue decision. In particular, 
unhealthy nodes are only re-queued if there are healthy nodes in the 
pendingNodes queue. But this approach has some flaws, for example an unhealthy 
node in the queue could come alive again, but an unhealthy node in the tracked 
set wouldn't be re-queued to make space for it because its still counted as a 
unhealthy node. To solve this, we would need to scan the pendingNodes queue to 
update the healthy/unhealthy node counts periodically, this scan could prove 
expensive.
   
   > Since unhealthy node getting decommissioned might anyways require some 
sort of retry, shall we requeue them even if the condition is not met (i.e. 
total no of decomm in progress < max tracked nodes) as a limited retries?
   
   If there are fewer nodes being decommissioned than max tracked nodes, then 
there are no nodes in the pendingNodes queue & all nodes are being tracked for 
decommissioning. Therefore, there is no possibility that any healthy nodes are 
blocked in the pendingNodes queue (preventing them from being decommissioned) & 
so in my opinion there is no benefit to re-queueing the unhealthy nodes in this 
case. Furthermore, this will negatively impact performance through frequent 
re-queueing & de-queueing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 699002)
Time Spent: 9h 50m  (was: 9h 40m)

> Losing over 100 datanodes in state decommissioning results in full blockage 
> of all datanode decommissioning
> ---
>
> Key: HDFS-16303
> URL: https://issues.apache.org/jira/browse/HDFS-16303
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1, 3.3.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> h2. Impact
> HDFS datanode decommissioning does not make any forward progress. For 
> example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X 
> of those datanodes remain in state decommissioning forever without making any 
> forward progress towards being decommissioned.

[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=699001&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699001
 ]

ASF GitHub Bot logged work on HDFS-16303:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 21:58
Start Date: 20/Dec/21 21:58
Worklog Time Spent: 10m 
  Work Description: KevinWikant commented on pull request #3675:
URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998300466


   > hence if few nodes are really in bad state (hardware/network issues), the 
plan is to keep re-queueing them until more nodes are getting decommissioned 
than max tracked nodes right?
   
   It's the opposite, the unhealthy nodes will only be re-queued when there are 
more nodes being decommissioned than max tracked nodes. Otherwise, if there are 
fewer nodes being decommissioned than max tracked nodes, then the unhealthy 
nodes will not be re-queued because they do not risk blocking the 
decommissioning of queued healthy nodes (i.e. because the queue is empty).
   
   One potential performance impact that does come to mind is that if there are 
say 200 unhealthy decommissioning nodes & max tracked nodes = 100, then this 
may cause some churn in the queueing/de-queueing process because each 
DatanodeAdminMonitor tick all 100 tracked nodes will be re-queued & then 100 
queued nodes will be de-queued/tracked. Note that this churn (and any 
associated performance impact) will only take effect when:
   - there are more nodes being decommissioned than max tracked nodes
   - AND either:
   - number of healthy decommissioning nodes < max tracked nodes
   - number of unhealthy decommissioning nodes > max tracked nodes
   
   The amount of re-queued/de-queued nodes per tick can be quantified as:
   
   > numRequeue = numDecommissioning <= numTracked ? 0 : numDeadDecommissioning 
-(numDecommissioning - numTracked)
   
   This churn of queueing/de-queueing will not occur at all under typical 
decommissioning scenarios (i.e. where there isn't a large number of dead 
decommissioning nodes).
   
   One idea to mitigate this is to have DatanodeAdminMonitor maintain counters 
used to track the number of healthy in the pendingNodes queue; then these 
counts could be used to make an improved re-queue decision. In particular, 
unhealthy nodes are only re-queued if there are healthy nodes in the 
pendingNodes queue. But this approach has some flaws, for example an unhealthy 
node in the queue could come alive again, but an unhealthy node in the tracked 
set wouldn't be re-queued to make space for it because its still counted as a 
unhealthy node. To solve this, we would need to scan the pendingNodes queue to 
update the healthy/unhealthy node counts periodically, this scan could prove 
expensive.
   
   > Since unhealthy node getting decommissioned might anyways require some 
sort of retry, shall we requeue them even if the condition is not met (i.e. 
total no of decomm in progress < max tracked nodes) as a limited retries?
   
   If there are fewer nodes being decommissioned than max tracked nodes, then 
there are no nodes in the pendingNodes queue & all nodes are being tracked for 
decommissioning. Therefore, there is no possibility that any healthy nodes are 
blocked in the pendingNodes queue (preventing them from being decommissioned) & 
so in my opinion there is no benefit to re-queueing the unhealthy nodes in this 
case. Furthermore, this will negatively impact performance through frequent 
re-queueing & de-queueing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 699001)
Time Spent: 9h 40m  (was: 9.5h)

> Losing over 100 datanodes in state decommissioning results in full blockage 
> of all datanode decommissioning
> ---
>
> Key: HDFS-16303
> URL: https://issues.apache.org/jira/browse/HDFS-16303
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1, 3.3.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> h2. Impact
> HDFS datanode decommissioning does not make any forward progress. For 
> example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X 
> of those datanodes remain in state decommissioning forever without making any 
> forward progress towards being decommissioned.
> h2. Roo

[jira] [Work logged] (HDFS-16392) TestWebHdfsFileSystemContract#testResponseCode fails

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16392?focusedWorklogId=698930&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698930
 ]

ASF GitHub Bot logged work on HDFS-16392:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 19:32
Start Date: 20/Dec/21 19:32
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3821:
URL: https://github.com/apache/hadoop/pull/3821#issuecomment-998211979


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 51s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  34m  5s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 29s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 17s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 16s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 53s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 22s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 21s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 19s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 234m 33s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 48s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 336m 38s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3821/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3821 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 32af67601f79 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 
23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ebee32ccaff22e47805aeee1afb4ef9826af6f93 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3821/1/testReport/ |
   | Max. process+thread count | 3462 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3821/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This 

[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=698801&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698801
 ]

ASF GitHub Bot logged work on HDFS-16303:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 16:16
Start Date: 20/Dec/21 16:16
Worklog Time Spent: 10m 
  Work Description: virajjasani edited a comment on pull request #3675:
URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998059075


   Sorry, I could not get to this PR last week. I will review later this week 
but I don't mean to block this work. If I find something odd or something as an 
improvement over this, we can anyways get it clarified later on the PR/Jira or 
create addendum PR later.
   Thanks for your work @KevinWikant, this might be really helpful going 
forward.
   
   With a quick glance, just one question for now: Overall it seems the goal is 
to improve and continue the decommissioning of healthy nodes over unhealthy 
ones (by removing and then re-queueing the entries), hence if few nodes are 
really in bad state (hardware/network issues), the plan is to keep re-queueing 
them until more nodes are getting decommissioned than max tracked nodes right? 
Since unhealthy node getting decommissioned might anyways require some sort of 
retry, shall we requeue them even if the condition is not met (i.e. total no of 
decomm in progress < max tracked nodes) as a limited retries? I am just 
thinking at high level, yet to catch up with the PR.
   
   Also, good to know HDFS-7374 is not broken.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 698801)
Time Spent: 9.5h  (was: 9h 20m)

> Losing over 100 datanodes in state decommissioning results in full blockage 
> of all datanode decommissioning
> ---
>
> Key: HDFS-16303
> URL: https://issues.apache.org/jira/browse/HDFS-16303
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1, 3.3.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> h2. Impact
> HDFS datanode decommissioning does not make any forward progress. For 
> example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X 
> of those datanodes remain in state decommissioning forever without making any 
> forward progress towards being decommissioned.
> h2. Root Cause
> The HDFS Namenode class "DatanodeAdminManager" is responsible for 
> decommissioning datanodes.
> As per this "hdfs-site" configuration:
> {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes 
>  Default Value = 100
> The maximum number of decommission-in-progress datanodes nodes that will be 
> tracked at one time by the namenode. Tracking a decommission-in-progress 
> datanode consumes additional NN memory proportional to the number of blocks 
> on the datnode. Having a conservative limit reduces the potential impact of 
> decomissioning a large number of nodes at once. A value of 0 means no limit 
> will be enforced.
> {quote}
> The Namenode will only actively track up to 100 datanodes for decommissioning 
> at any given time, as to avoid Namenode memory pressure.
> Looking into the "DatanodeAdminManager" code:
>  * a new datanode is only removed from the "tracked.nodes" set when it 
> finishes decommissioning
>  * a new datanode is only added to the "tracked.nodes" set if there is fewer 
> than 100 datanodes being tracked
> So in the event that there are more than 100 datanodes being decommissioned 
> at a given time, some of those datanodes will not be in the "tracked.nodes" 
> set until 1 or more datanodes in the "tracked.nodes" finishes 
> decommissioning. This is generally not a problem because the datanodes in 
> "tracked.nodes" will eventually finish decommissioning, but there is an edge 
> case where this logic prevents the namenode from making any forward progress 
> towards decommissioning.
> If all 100 datanodes in the "tracked.nodes" are unable to finish 
> decommissioning, then other datanodes (which may be able to be 
> decommissioned) will never get added to "tracked.nodes" and therefore will 
> never get the opportunity to be decommissioned.
> This can occur due the following issue:
> {quote}2021-10-21 12:39:24,048 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager 
> (DatanodeAdminMo

[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=698800&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698800
 ]

ASF GitHub Bot logged work on HDFS-16303:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 16:06
Start Date: 20/Dec/21 16:06
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on pull request #3675:
URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998061174


   > Unit testing failed due to unrelated flaky tests
   > 
   > > [ERROR] Errors:
   > > [ERROR] 
org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(org.apache.hadoop.hdfs.TestRollingUpgrade)
   > > [ERROR]   Run 1: 
TestRollingUpgrade.testRollback:329->waitForNullMxBean:361 » Timeout Timed 
out...
   > > [ERROR]   Run 2: 
TestRollingUpgrade.testRollback:329->waitForNullMxBean:361 » Timeout Timed 
out...
   > > [ERROR]   Run 3: 
TestRollingUpgrade.testRollback:329->waitForNullMxBean:361 » Timeout Timed 
out...
   > > [INFO]
   > > [WARNING] Flakes:
   > > [WARNING] 
org.apache.hadoop.hdfs.TestRollingUpgrade.testCheckpoint(org.apache.hadoop.hdfs.TestRollingUpgrade)
   > > [ERROR]   Run 1: 
TestRollingUpgrade.testCheckpoint:599->testCheckpoint:686 Test resulted in an 
unexpected exit
   > > [INFO]   Run 2: PASS
   
   Yeah this test failure is not relevant. Even after the recent attempt, it is 
still flaky, we might require better insights for this test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 698800)
Time Spent: 9h 20m  (was: 9h 10m)

> Losing over 100 datanodes in state decommissioning results in full blockage 
> of all datanode decommissioning
> ---
>
> Key: HDFS-16303
> URL: https://issues.apache.org/jira/browse/HDFS-16303
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1, 3.3.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> h2. Impact
> HDFS datanode decommissioning does not make any forward progress. For 
> example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X 
> of those datanodes remain in state decommissioning forever without making any 
> forward progress towards being decommissioned.
> h2. Root Cause
> The HDFS Namenode class "DatanodeAdminManager" is responsible for 
> decommissioning datanodes.
> As per this "hdfs-site" configuration:
> {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes 
>  Default Value = 100
> The maximum number of decommission-in-progress datanodes nodes that will be 
> tracked at one time by the namenode. Tracking a decommission-in-progress 
> datanode consumes additional NN memory proportional to the number of blocks 
> on the datnode. Having a conservative limit reduces the potential impact of 
> decomissioning a large number of nodes at once. A value of 0 means no limit 
> will be enforced.
> {quote}
> The Namenode will only actively track up to 100 datanodes for decommissioning 
> at any given time, as to avoid Namenode memory pressure.
> Looking into the "DatanodeAdminManager" code:
>  * a new datanode is only removed from the "tracked.nodes" set when it 
> finishes decommissioning
>  * a new datanode is only added to the "tracked.nodes" set if there is fewer 
> than 100 datanodes being tracked
> So in the event that there are more than 100 datanodes being decommissioned 
> at a given time, some of those datanodes will not be in the "tracked.nodes" 
> set until 1 or more datanodes in the "tracked.nodes" finishes 
> decommissioning. This is generally not a problem because the datanodes in 
> "tracked.nodes" will eventually finish decommissioning, but there is an edge 
> case where this logic prevents the namenode from making any forward progress 
> towards decommissioning.
> If all 100 datanodes in the "tracked.nodes" are unable to finish 
> decommissioning, then other datanodes (which may be able to be 
> decommissioned) will never get added to "tracked.nodes" and therefore will 
> never get the opportunity to be decommissioned.
> This can occur due the following issue:
> {quote}2021-10-21 12:39:24,048 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager 
> (DatanodeAdminMonitor-0): Node W.X.Y.Z:50010 is dead while in Decommission In 
> Progress. Cannot be safely decommissioned or be i

[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=698798&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698798
 ]

ASF GitHub Bot logged work on HDFS-16303:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 16:04
Start Date: 20/Dec/21 16:04
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on pull request #3675:
URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998059075


   Sorry, I could not get to this PR last week. I will review later this week 
but I don't mean to block this work. If I find something odd or something as an 
improvement over this, we can anyways get it clarified later on the PR/Jira or 
create addendum PR later.
   Thanks for your work @KevinWikant, this might be really helpful going 
forward.
   
   With a quick glance, just one question for now: Overall it seems the goal is 
to improve and continue the decommissioning of healthy nodes over unhealthy 
ones (by removing and then re-queueing the entries), hence if few nodes are 
really in bad state (hardware/network issues), the plan is to keep re-queueing 
them until more nodes are getting decommissioned than max tracked nodes right? 
Since unhealthy node getting decommissioned might anyways require some sort of 
retry, shall we requeue them even if the condition is not met (i.e. total no of 
decomm in progress < max tracked nodes)? I am just thinking at high level, yet 
to catch up with the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 698798)
Time Spent: 9h 10m  (was: 9h)

> Losing over 100 datanodes in state decommissioning results in full blockage 
> of all datanode decommissioning
> ---
>
> Key: HDFS-16303
> URL: https://issues.apache.org/jira/browse/HDFS-16303
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1, 3.3.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> h2. Impact
> HDFS datanode decommissioning does not make any forward progress. For 
> example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X 
> of those datanodes remain in state decommissioning forever without making any 
> forward progress towards being decommissioned.
> h2. Root Cause
> The HDFS Namenode class "DatanodeAdminManager" is responsible for 
> decommissioning datanodes.
> As per this "hdfs-site" configuration:
> {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes 
>  Default Value = 100
> The maximum number of decommission-in-progress datanodes nodes that will be 
> tracked at one time by the namenode. Tracking a decommission-in-progress 
> datanode consumes additional NN memory proportional to the number of blocks 
> on the datnode. Having a conservative limit reduces the potential impact of 
> decomissioning a large number of nodes at once. A value of 0 means no limit 
> will be enforced.
> {quote}
> The Namenode will only actively track up to 100 datanodes for decommissioning 
> at any given time, as to avoid Namenode memory pressure.
> Looking into the "DatanodeAdminManager" code:
>  * a new datanode is only removed from the "tracked.nodes" set when it 
> finishes decommissioning
>  * a new datanode is only added to the "tracked.nodes" set if there is fewer 
> than 100 datanodes being tracked
> So in the event that there are more than 100 datanodes being decommissioned 
> at a given time, some of those datanodes will not be in the "tracked.nodes" 
> set until 1 or more datanodes in the "tracked.nodes" finishes 
> decommissioning. This is generally not a problem because the datanodes in 
> "tracked.nodes" will eventually finish decommissioning, but there is an edge 
> case where this logic prevents the namenode from making any forward progress 
> towards decommissioning.
> If all 100 datanodes in the "tracked.nodes" are unable to finish 
> decommissioning, then other datanodes (which may be able to be 
> decommissioned) will never get added to "tracked.nodes" and therefore will 
> never get the opportunity to be decommissioned.
> This can occur due the following issue:
> {quote}2021-10-21 12:39:24,048 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager 
> (DatanodeAdminMonitor-0): Node W.X.Y.Z:50010 is dead while in Decommission In 
> Progress. Cann

[jira] [Work logged] (HDFS-16368) DFSAdmin supports refresh topology info without restarting namenode

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16368?focusedWorklogId=698766&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698766
 ]

ASF GitHub Bot logged work on HDFS-16368:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 15:19
Start Date: 20/Dec/21 15:19
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3743:
URL: https://github.com/apache/hadoop/pull/3743#issuecomment-998017917


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 49s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  buf  |   0m  0s |  |  buf was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 40s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  22m 34s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 32s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   5m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 17s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 19s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 35s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   3m 16s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   7m 36s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 23s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 26s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  cc  |   5m 32s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   5m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  cc  |   5m 16s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   5m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 12s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3743/3/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 8 new + 456 unchanged - 0 fixed = 
464 total (was 456)  |
   | +1 :green_heart: |  mvnsite  |   2m 44s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   0m 37s | 
[/results-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs-client-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3743/3/artifact/out/results-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs-client-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  
hadoop-hdfs-project_hadoop-hdfs-client-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04
 with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 1 new + 98 unchanged 
- 1 fixed = 99 total (was 99)  |
   | -1 :x: |  javadoc  |   0m 54s | 
[/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3743/3/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt)
 |  hadoop-hdfs in the patch failed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.  |
   | +1 :green_heart: |  javadoc  |   2m 51s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   7m 28s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 43s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 17s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 228m 52s |  |  hadoop-hdfs in the patch 
passed.  |
   | -1 :x: |  unit  |  21m 10s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs

[jira] [Work logged] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16386?focusedWorklogId=698757&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698757
 ]

ASF GitHub Bot logged work on HDFS-16386:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 15:11
Start Date: 20/Dec/21 15:11
Worklog Time Spent: 10m 
  Work Description: brahmareddybattula commented on pull request #3806:
URL: https://github.com/apache/hadoop/pull/3806#issuecomment-998010239


   thanks for working on working.. can you guys commit to branch-3.2.3 also..?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 698757)
Time Spent: 2h 40m  (was: 2.5h)

> Reduce DataNode load when FsDatasetAsyncDiskService is working
> --
>
> Key: HDFS-16386
> URL: https://issues.apache.org/jira/browse/HDFS-16386
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: monitor.png
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it 
> will cause a high load on the DataNode.
> Here are some monitoring related to memory:
>  !monitor.png! 
> Since each disk deletes the block asynchronously, and each thread allows 4 
> threads to work,
> This will cause some troubles to the DataNode, such as increased cpu and 
> increased memory.
> We should appropriately reduce the number of jobs of the total thread so that 
> the DataNode can work better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16391?focusedWorklogId=698725&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698725
 ]

ASF GitHub Bot logged work on HDFS-16391:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 14:32
Start Date: 20/Dec/21 14:32
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3820:
URL: https://github.com/apache/hadoop/pull/3820#issuecomment-997976901


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 43s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 10s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 26s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 16s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 17s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 47s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 35s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  21m  2s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3820/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 109m 31s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.federation.router.TestRouterRPCMultipleDestinationMountTableResolver
 |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3820/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3820 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 7ae99fb7e675 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 
23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 680ec9a8a2678ccf0947e9b39946592dee47f502 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 

[jira] [Work logged] (HDFS-16168) TestHDFSFileSystemContract#testAppend fails

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16168?focusedWorklogId=698685&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698685
 ]

ASF GitHub Bot logged work on HDFS-16168:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 13:56
Start Date: 20/Dec/21 13:56
Worklog Time Spent: 10m 
  Work Description: secfree commented on pull request #3815:
URL: https://github.com/apache/hadoop/pull/3815#issuecomment-997945343


   > @secfree Thanks for your contribution, it looks good, will merge this if 
no other comments. BTW, as you mentioned in jira, FileSystemContractBaseTest 
affects all its sub classes. Maybe you can check whether other test cases 
except the one here are affected and resolve them if they are affected.
   
   @ferhui thanks for your suggestion. I checked all sub classes of 
FileSystemContractBaseTest and found one more case. Here is the details: 
https://issues.apache.org/jira/browse/HDFS-16392


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 698685)
Time Spent: 1h  (was: 50m)

> TestHDFSFileSystemContract#testAppend fails
> ---
>
> Key: HDFS-16168
> URL: https://issues.apache.org/jira/browse/HDFS-16168
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Hui Fei
>Assignee: secfree
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [ERROR] Tests run: 46, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 225.478 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestHDFSFileSystemContract 
> [ERROR] testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time 
> elapsed: 30.14 s <<< ERROR! org.junit.runners.model.TestTimedOutException: 
> test timed out after 3 milliseconds at java.lang.Thread.sleep(Native 
> Method) at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1002)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:938) 
> at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:902) 
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at 
> org.apache.hadoop.hdfs.AppendTestUtil.testAppend(AppendTestUtil.java:257) at 
> org.apache.hadoop.hdfs.TestHDFSFileSystemContract.testAppend(TestHDFSFileSystemContract.java:68)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748) [ERROR] 
> testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time elapsed: 
> 30.003 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed 
> out after 3 milliseconds at java.lang.Object.wait(Native Method) at 
> java.lang.Object.wait(Object.java:502) at 
> org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1525) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1422) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.j

[jira] [Work logged] (HDFS-16392) TestWebHdfsFileSystemContract#testResponseCode fails

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16392?focusedWorklogId=698683&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698683
 ]

ASF GitHub Bot logged work on HDFS-16392:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 13:53
Start Date: 20/Dec/21 13:53
Worklog Time Spent: 10m 
  Work Description: secfree opened a new pull request #3821:
URL: https://github.com/apache/hadoop/pull/3821


   ### Description of PR
   
   1. Fix random timeout failures of 
TestWebHdfsFileSystemContract#testResponseCode
   
   ### How was this patch tested?
   
   1. UT
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 698683)
Remaining Estimate: 0h
Time Spent: 10m

> TestWebHdfsFileSystemContract#testResponseCode fails
> 
>
> Key: HDFS-16392
> URL: https://issues.apache.org/jira/browse/HDFS-16392
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0
>Reporter: secfree
>Assignee: secfree
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We can find a lot of failed cases with searching 
> "TestWebHdfsFileSystemContract" in "pull requests" 
> (https://github.com/apache/hadoop/pulls?q=is%3Apr+is%3Aopen+TestWebHdfsFileSystemContract)
> And they all have the following exception log
> {code}
> [ERROR] 
> testResponseCode(org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract)  
> Time elapsed: 30.019 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> ...
> [ERROR] 
> org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract.testResponseCode(org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract)
> [ERROR]   Run 1: TestWebHdfsFileSystemContract.testResponseCode:473 » 
> TestTimedOut test timed o...
> [ERROR]   Run 2: TestWebHdfsFileSystemContract.testResponseCode:473 » 
> TestTimedOut test timed o...
> [ERROR]   Run 3: TestWebHdfsFileSystemContract.testResponseCode:473 » 
> TestTimedOut test timed o...
> {code}
> This issue has the same root cause as HDFS-16168



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16392) TestWebHdfsFileSystemContract#testResponseCode fails

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16392:
--
Labels: pull-request-available  (was: )

> TestWebHdfsFileSystemContract#testResponseCode fails
> 
>
> Key: HDFS-16392
> URL: https://issues.apache.org/jira/browse/HDFS-16392
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0
>Reporter: secfree
>Assignee: secfree
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We can find a lot of failed cases with searching 
> "TestWebHdfsFileSystemContract" in "pull requests" 
> (https://github.com/apache/hadoop/pulls?q=is%3Apr+is%3Aopen+TestWebHdfsFileSystemContract)
> And they all have the following exception log
> {code}
> [ERROR] 
> testResponseCode(org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract)  
> Time elapsed: 30.019 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 3 
> milliseconds
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> ...
> [ERROR] 
> org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract.testResponseCode(org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract)
> [ERROR]   Run 1: TestWebHdfsFileSystemContract.testResponseCode:473 » 
> TestTimedOut test timed o...
> [ERROR]   Run 2: TestWebHdfsFileSystemContract.testResponseCode:473 » 
> TestTimedOut test timed o...
> [ERROR]   Run 3: TestWebHdfsFileSystemContract.testResponseCode:473 » 
> TestTimedOut test timed o...
> {code}
> This issue has the same root cause as HDFS-16168



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16392) TestWebHdfsFileSystemContract#testResponseCode fails

2021-12-20 Thread secfree (Jira)
secfree created HDFS-16392:
--

 Summary: TestWebHdfsFileSystemContract#testResponseCode fails
 Key: HDFS-16392
 URL: https://issues.apache.org/jira/browse/HDFS-16392
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.4.0
Reporter: secfree
Assignee: secfree


We can find a lot of failed cases with searching 
"TestWebHdfsFileSystemContract" in "pull requests" 
(https://github.com/apache/hadoop/pulls?q=is%3Apr+is%3Aopen+TestWebHdfsFileSystemContract)

And they all have the following exception log

{code}
[ERROR] 
testResponseCode(org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract)  
Time elapsed: 30.019 s  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 3 
milliseconds
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
...

[ERROR] 
org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract.testResponseCode(org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract)
[ERROR]   Run 1: TestWebHdfsFileSystemContract.testResponseCode:473 » 
TestTimedOut test timed o...
[ERROR]   Run 2: TestWebHdfsFileSystemContract.testResponseCode:473 » 
TestTimedOut test timed o...
[ERROR]   Run 3: TestWebHdfsFileSystemContract.testResponseCode:473 » 
TestTimedOut test timed o...
{code}

This issue has the same root cause as HDFS-16168



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16391?focusedWorklogId=698637&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698637
 ]

ASF GitHub Bot logged work on HDFS-16391:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 12:42
Start Date: 20/Dec/21 12:42
Worklog Time Spent: 10m 
  Work Description: wzhallright commented on a change in pull request #3820:
URL: https://github.com/apache/hadoop/pull/3820#discussion_r772333495



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/NamenodeHeartbeatService.java
##
@@ -272,7 +272,7 @@ private void updateState() {
 } else if (localTarget == null) {
   // block info available, HA status not expected
   LOG.debug(
-  "Reporting non-HA namenode as operational: " + getNamenodeDesc());
+  "Reporting non-HA namenode as operational: {}", getNamenodeDesc());

Review comment:
   Modified, please help review again. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 698637)
Time Spent: 50m  (was: 40m)

> Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService
> ---
>
> Key: HDFS-16391
> URL: https://issues.apache.org/jira/browse/HDFS-16391
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: wangzhaohui
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16391?focusedWorklogId=698639&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698639
 ]

ASF GitHub Bot logged work on HDFS-16391:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 12:45
Start Date: 20/Dec/21 12:45
Worklog Time Spent: 10m 
  Work Description: sodonnel commented on a change in pull request #3820:
URL: https://github.com/apache/hadoop/pull/3820#discussion_r772334971



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/NamenodeHeartbeatService.java
##
@@ -272,7 +272,7 @@ private void updateState() {
 } else if (localTarget == null) {
   // block info available, HA status not expected
   LOG.debug(
-  "Reporting non-HA namenode as operational: " + getNamenodeDesc());
+  "Reporting non-HA namenode as operational: {}", getNamenodeDesc());

Review comment:
   The latest change looks good. Thanks for changing both the log lines.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 698639)
Time Spent: 1h  (was: 50m)

> Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService
> ---
>
> Key: HDFS-16391
> URL: https://issues.apache.org/jira/browse/HDFS-16391
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: wangzhaohui
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16391?focusedWorklogId=698630&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698630
 ]

ASF GitHub Bot logged work on HDFS-16391:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 12:25
Start Date: 20/Dec/21 12:25
Worklog Time Spent: 10m 
  Work Description: sodonnel commented on a change in pull request #3820:
URL: https://github.com/apache/hadoop/pull/3820#discussion_r772322657



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/NamenodeHeartbeatService.java
##
@@ -272,7 +272,7 @@ private void updateState() {
 } else if (localTarget == null) {
   // block info available, HA status not expected
   LOG.debug(
-  "Reporting non-HA namenode as operational: " + getNamenodeDesc());
+  "Reporting non-HA namenode as operational: {}", getNamenodeDesc());

Review comment:
   Note there is another debug message just above this one which looks 
similar and could be wrapped in a if LOG.isDebugEnabled at 270 for the same 
reason.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 698630)
Time Spent: 40m  (was: 0.5h)

> Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService
> ---
>
> Key: HDFS-16391
> URL: https://issues.apache.org/jira/browse/HDFS-16391
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: wangzhaohui
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16391?focusedWorklogId=698627&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698627
 ]

ASF GitHub Bot logged work on HDFS-16391:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 12:19
Start Date: 20/Dec/21 12:19
Worklog Time Spent: 10m 
  Work Description: sodonnel commented on a change in pull request #3820:
URL: https://github.com/apache/hadoop/pull/3820#discussion_r772319216



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/NamenodeHeartbeatService.java
##
@@ -272,7 +272,7 @@ private void updateState() {
 } else if (localTarget == null) {
   // block info available, HA status not expected
   LOG.debug(
-  "Reporting non-HA namenode as operational: " + getNamenodeDesc());
+  "Reporting non-HA namenode as operational: {}", getNamenodeDesc());

Review comment:
   I think this statement really needs wrapped in an if 
`(LOG.isDebugEnabled())`. The reason is that the method call 
`getNamenodeDesc()` needs to be evaluated whether we log or not, so its result 
can be passed into the `LOG.debug()` method. Inside `LOG.debug`, it will skip 
the logging, but by then, we have already formed the string inside 
`getNamenodeDesc()` and never use it.
   
   If we are passing an object into `LOG.debug`, this change would be the 
correct thing to do, as toString() will only get called on the object if the 
log message is created. However when we pass a method call into the method, it 
needs to be evaluated first AFAIK.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 698627)
Time Spent: 0.5h  (was: 20m)

> Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService
> ---
>
> Key: HDFS-16391
> URL: https://issues.apache.org/jira/browse/HDFS-16391
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: wangzhaohui
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16382) RBF: getContentSummary RPC compute sub-directory repeatedly

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16382?focusedWorklogId=698626&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698626
 ]

ASF GitHub Bot logged work on HDFS-16382:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 12:16
Start Date: 20/Dec/21 12:16
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3797:
URL: https://github.com/apache/hadoop/pull/3797#issuecomment-997872501


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 50s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 44s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 27s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 20s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 10s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3797/3/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   0m 35s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3797/3/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 33 new + 2 
unchanged - 0 fixed = 35 total (was 2)  |
   | +1 :green_heart: |  mvnsite  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 48s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  25m  5s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3797/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 119m 31s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.federation.router.TestRouterRPCMultipleDestinationMountTableResolver
 |
   |   | hadoop.hdfs.rbfbalance.TestRouterDistCpProcedure |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3797/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3797 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 36d591ffdb55 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | m

[jira] [Resolved] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working

2021-12-20 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16386.
--
Resolution: Fixed

> Reduce DataNode load when FsDatasetAsyncDiskService is working
> --
>
> Key: HDFS-16386
> URL: https://issues.apache.org/jira/browse/HDFS-16386
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: monitor.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it 
> will cause a high load on the DataNode.
> Here are some monitoring related to memory:
>  !monitor.png! 
> Since each disk deletes the block asynchronously, and each thread allows 4 
> threads to work,
> This will cause some troubles to the DataNode, such as increased cpu and 
> increased memory.
> We should appropriately reduce the number of jobs of the total thread so that 
> the DataNode can work better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working

2021-12-20 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16386:
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.3

> Reduce DataNode load when FsDatasetAsyncDiskService is working
> --
>
> Key: HDFS-16386
> URL: https://issues.apache.org/jira/browse/HDFS-16386
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: monitor.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it 
> will cause a high load on the DataNode.
> Here are some monitoring related to memory:
>  !monitor.png! 
> Since each disk deletes the block asynchronously, and each thread allows 4 
> threads to work,
> This will cause some troubles to the DataNode, such as increased cpu and 
> increased memory.
> We should appropriately reduce the number of jobs of the total thread so that 
> the DataNode can work better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16389) Improve NNThroughputBenchmark test mkdirs

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16389?focusedWorklogId=698597&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698597
 ]

ASF GitHub Bot logged work on HDFS-16389:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 11:34
Start Date: 20/Dec/21 11:34
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3819:
URL: https://github.com/apache/hadoop/pull/3819#issuecomment-997844668


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 40s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 35s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  2s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 11s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 27s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 12s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 52s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 50s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 10s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m  3s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 229m 46s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 328m 38s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3819/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3819 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 1e61f3198ab1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 3229263b630a4801cd4f78e2ad5c4cb0fe4654be |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3819/1/testReport/ |
   | Max. process+thread count | 3323 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3819/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This mes

[jira] [Work logged] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16386?focusedWorklogId=698594&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698594
 ]

ASF GitHub Bot logged work on HDFS-16386:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 11:29
Start Date: 20/Dec/21 11:29
Worklog Time Spent: 10m 
  Work Description: sodonnel merged pull request #3806:
URL: https://github.com/apache/hadoop/pull/3806


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 698594)
Time Spent: 2.5h  (was: 2h 20m)

> Reduce DataNode load when FsDatasetAsyncDiskService is working
> --
>
> Key: HDFS-16386
> URL: https://issues.apache.org/jira/browse/HDFS-16386
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: monitor.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it 
> will cause a high load on the DataNode.
> Here are some monitoring related to memory:
>  !monitor.png! 
> Since each disk deletes the block asynchronously, and each thread allows 4 
> threads to work,
> This will cause some troubles to the DataNode, such as increased cpu and 
> increased memory.
> We should appropriately reduce the number of jobs of the total thread so that 
> the DataNode can work better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16390) Enhance ErasureCodeBenchmarkThroughput for support random read and make buffer size customizable

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16390?focusedWorklogId=698586&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698586
 ]

ASF GitHub Bot logged work on HDFS-16390:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 11:14
Start Date: 20/Dec/21 11:14
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3818:
URL: https://github.com/apache/hadoop/pull/3818#issuecomment-997829494


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 43s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  1s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 22s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  2s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 30s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 10s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 18s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 51s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3818/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 20 unchanged - 
0 fixed = 22 total (was 20)  |
   | +1 :green_heart: |  mvnsite  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 49s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 14s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m  3s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 226m 21s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 44s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 324m 57s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3818/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3818 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux d2952bfabf37 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 10732107c8220d5e2befdf9c7cf9b290d7fc9201 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3818/1/testReport/ |
   | Max. process+thread count | 3206 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hado

[jira] [Commented] (HDFS-16168) TestHDFSFileSystemContract#testAppend fails

2021-12-20 Thread Hui Fei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462495#comment-17462495
 ] 

Hui Fei commented on HDFS-16168:


Merged. [~secfree.teng]Thanks for your contribution.

> TestHDFSFileSystemContract#testAppend fails
> ---
>
> Key: HDFS-16168
> URL: https://issues.apache.org/jira/browse/HDFS-16168
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Hui Fei
>Assignee: secfree.teng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [ERROR] Tests run: 46, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 225.478 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestHDFSFileSystemContract 
> [ERROR] testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time 
> elapsed: 30.14 s <<< ERROR! org.junit.runners.model.TestTimedOutException: 
> test timed out after 3 milliseconds at java.lang.Thread.sleep(Native 
> Method) at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1002)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:938) 
> at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:902) 
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at 
> org.apache.hadoop.hdfs.AppendTestUtil.testAppend(AppendTestUtil.java:257) at 
> org.apache.hadoop.hdfs.TestHDFSFileSystemContract.testAppend(TestHDFSFileSystemContract.java:68)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748) [ERROR] 
> testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time elapsed: 
> 30.003 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed 
> out after 3 milliseconds at java.lang.Object.wait(Native Method) at 
> java.lang.Object.wait(Object.java:502) at 
> org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1525) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1422) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
>  at com.sun.proxy.$Proxy25.append(Unknown Source) at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:415)
>  at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
>  at com.sun.proxy.$Proxy26.append(Unknown Source) at 
> org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1385) at 
> org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1407) at 
> org.apache.hadoop.hdfs.DFSClient.append(DFSClient.jav

[jira] [Commented] (HDFS-16368) DFSAdmin supports refresh topology info without restarting namenode

2021-12-20 Thread Hui Fei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462496#comment-17462496
 ] 

Hui Fei commented on HDFS-16368:


It is the same to HDFS-11242, yes?

>  DFSAdmin supports refresh topology info without restarting namenode
> 
>
> Key: HDFS-16368
> URL: https://issues.apache.org/jira/browse/HDFS-16368
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: dfsadmin, namanode
>Affects Versions: 2.7.7, 3.3.1
>Reporter: zhanghaobo
>Priority: Major
>  Labels: features, pull-request-available
> Attachments: 0001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently in HDFS, if we update the rack info for rack-awareness, we may need 
> to rolling restart namenodes to let it be effective. If cluster is large, the 
> cost time of rolling restart namenodes is very log. So, we develope a method 
> to refresh topology info without rolling restart namenodes.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16168) TestHDFSFileSystemContract#testAppend fails

2021-12-20 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei resolved HDFS-16168.

Fix Version/s: 3.4.0
   Resolution: Fixed

> TestHDFSFileSystemContract#testAppend fails
> ---
>
> Key: HDFS-16168
> URL: https://issues.apache.org/jira/browse/HDFS-16168
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Hui Fei
>Assignee: secfree.teng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [ERROR] Tests run: 46, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 225.478 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestHDFSFileSystemContract 
> [ERROR] testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time 
> elapsed: 30.14 s <<< ERROR! org.junit.runners.model.TestTimedOutException: 
> test timed out after 3 milliseconds at java.lang.Thread.sleep(Native 
> Method) at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1002)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:938) 
> at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:902) 
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at 
> org.apache.hadoop.hdfs.AppendTestUtil.testAppend(AppendTestUtil.java:257) at 
> org.apache.hadoop.hdfs.TestHDFSFileSystemContract.testAppend(TestHDFSFileSystemContract.java:68)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748) [ERROR] 
> testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time elapsed: 
> 30.003 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed 
> out after 3 milliseconds at java.lang.Object.wait(Native Method) at 
> java.lang.Object.wait(Object.java:502) at 
> org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1525) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1422) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
>  at com.sun.proxy.$Proxy25.append(Unknown Source) at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:415)
>  at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
>  at com.sun.proxy.$Proxy26.append(Unknown Source) at 
> org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1385) at 
> org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1407) at 
> org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1476) at 
> org.apache.hadoop.hdfs.DFSClient.append(DFSCl

[jira] [Assigned] (HDFS-16168) TestHDFSFileSystemContract#testAppend fails

2021-12-20 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei reassigned HDFS-16168:
--

Assignee: secfree.teng

> TestHDFSFileSystemContract#testAppend fails
> ---
>
> Key: HDFS-16168
> URL: https://issues.apache.org/jira/browse/HDFS-16168
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Hui Fei
>Assignee: secfree.teng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [ERROR] Tests run: 46, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 225.478 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestHDFSFileSystemContract 
> [ERROR] testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time 
> elapsed: 30.14 s <<< ERROR! org.junit.runners.model.TestTimedOutException: 
> test timed out after 3 milliseconds at java.lang.Thread.sleep(Native 
> Method) at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1002)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:938) 
> at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:902) 
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at 
> org.apache.hadoop.hdfs.AppendTestUtil.testAppend(AppendTestUtil.java:257) at 
> org.apache.hadoop.hdfs.TestHDFSFileSystemContract.testAppend(TestHDFSFileSystemContract.java:68)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748) [ERROR] 
> testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time elapsed: 
> 30.003 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed 
> out after 3 milliseconds at java.lang.Object.wait(Native Method) at 
> java.lang.Object.wait(Object.java:502) at 
> org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1525) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1422) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
>  at com.sun.proxy.$Proxy25.append(Unknown Source) at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:415)
>  at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
>  at com.sun.proxy.$Proxy26.append(Unknown Source) at 
> org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1385) at 
> org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1407) at 
> org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1476) at 
> org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1446) at 
> org.apache.hadoop.hdfs.Dist

[jira] [Work logged] (HDFS-16168) TestHDFSFileSystemContract#testAppend fails

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16168?focusedWorklogId=698542&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698542
 ]

ASF GitHub Bot logged work on HDFS-16168:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 10:16
Start Date: 20/Dec/21 10:16
Worklog Time Spent: 10m 
  Work Description: ferhui commented on pull request #3815:
URL: https://github.com/apache/hadoop/pull/3815#issuecomment-997786853


   @secfree Thanks for your contribution. @ayushtkn Thanks for your review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 698542)
Time Spent: 50m  (was: 40m)

> TestHDFSFileSystemContract#testAppend fails
> ---
>
> Key: HDFS-16168
> URL: https://issues.apache.org/jira/browse/HDFS-16168
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Hui Fei
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [ERROR] Tests run: 46, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 225.478 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestHDFSFileSystemContract 
> [ERROR] testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time 
> elapsed: 30.14 s <<< ERROR! org.junit.runners.model.TestTimedOutException: 
> test timed out after 3 milliseconds at java.lang.Thread.sleep(Native 
> Method) at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1002)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:938) 
> at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:902) 
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at 
> org.apache.hadoop.hdfs.AppendTestUtil.testAppend(AppendTestUtil.java:257) at 
> org.apache.hadoop.hdfs.TestHDFSFileSystemContract.testAppend(TestHDFSFileSystemContract.java:68)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748) [ERROR] 
> testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time elapsed: 
> 30.003 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed 
> out after 3 milliseconds at java.lang.Object.wait(Native Method) at 
> java.lang.Object.wait(Object.java:502) at 
> org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1525) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1422) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
>  at com.sun.proxy.$Proxy25.append(Unknown Source) at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:415)
>  at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocation

[jira] [Work logged] (HDFS-16168) TestHDFSFileSystemContract#testAppend fails

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16168?focusedWorklogId=698541&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698541
 ]

ASF GitHub Bot logged work on HDFS-16168:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 10:16
Start Date: 20/Dec/21 10:16
Worklog Time Spent: 10m 
  Work Description: ferhui merged pull request #3815:
URL: https://github.com/apache/hadoop/pull/3815


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 698541)
Time Spent: 40m  (was: 0.5h)

> TestHDFSFileSystemContract#testAppend fails
> ---
>
> Key: HDFS-16168
> URL: https://issues.apache.org/jira/browse/HDFS-16168
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Hui Fei
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [ERROR] Tests run: 46, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 
> 225.478 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestHDFSFileSystemContract 
> [ERROR] testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time 
> elapsed: 30.14 s <<< ERROR! org.junit.runners.model.TestTimedOutException: 
> test timed out after 3 milliseconds at java.lang.Thread.sleep(Native 
> Method) at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1002)
>  at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:938) 
> at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:902) 
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at 
> org.apache.hadoop.hdfs.AppendTestUtil.testAppend(AppendTestUtil.java:257) at 
> org.apache.hadoop.hdfs.TestHDFSFileSystemContract.testAppend(TestHDFSFileSystemContract.java:68)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748) [ERROR] 
> testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time elapsed: 
> 30.003 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed 
> out after 3 milliseconds at java.lang.Object.wait(Native Method) at 
> java.lang.Object.wait(Object.java:502) at 
> org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at 
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1525) at 
> org.apache.hadoop.ipc.Client.call(Client.java:1422) at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
>  at com.sun.proxy.$Proxy25.append(Unknown Source) at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:415)
>  at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(Retry

[jira] [Work logged] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16386?focusedWorklogId=698528&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698528
 ]

ASF GitHub Bot logged work on HDFS-16386:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 10:01
Start Date: 20/Dec/21 10:01
Worklog Time Spent: 10m 
  Work Description: jojochuang commented on pull request #3806:
URL: https://github.com/apache/hadoop/pull/3806#issuecomment-997773306


   LGTM. For what is worth, we don't need two committers to approve a PR :) 
Stephen alone is a gold standard.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 698528)
Time Spent: 2h 20m  (was: 2h 10m)

> Reduce DataNode load when FsDatasetAsyncDiskService is working
> --
>
> Key: HDFS-16386
> URL: https://issues.apache.org/jira/browse/HDFS-16386
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: monitor.png
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it 
> will cause a high load on the DataNode.
> Here are some monitoring related to memory:
>  !monitor.png! 
> Since each disk deletes the block asynchronously, and each thread allows 4 
> threads to work,
> This will cause some troubles to the DataNode, such as increased cpu and 
> increased memory.
> We should appropriately reduce the number of jobs of the total thread so that 
> the DataNode can work better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16382) RBF: getContentSummary RPC compute sub-directory repeatedly

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16382?focusedWorklogId=698527&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698527
 ]

ASF GitHub Bot logged work on HDFS-16382:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 09:58
Start Date: 20/Dec/21 09:58
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3797:
URL: https://github.com/apache/hadoop/pull/3797#issuecomment-997770842


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 19s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 46s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 24s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m  6s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 32s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3797/2/artifact/out/blanks-eol.txt)
 |  The patch has 11 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   0m 18s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3797/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 36 new + 2 
unchanged - 0 fixed = 38 total (was 2)  |
   | +1 :green_heart: |  mvnsite  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 58s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  21m 32s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3797/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 32s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 109m 37s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.federation.router.TestRouterRPCMultipleDestinationMountTableResolver
 |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3797/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3797 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 266c55444ff7 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | 

[jira] [Commented] (HDFS-16382) RBF: getContentSummary RPC compute sub-directory repeatedly

2021-12-20 Thread zhanghaobo (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462447#comment-17462447
 ] 

zhanghaobo commented on HDFS-16382:
---

Hi, [~ayushtkn] [~elgoiri]  thanks for replying. I think in common case, we 
won't do things like this :/A/B to ns1 /A.   But we may do things like this : 
/A/B to ns1 /A/B.  In other words, users prefer to making src path and dest 
path looks like similar.

> RBF: getContentSummary RPC compute sub-directory repeatedly
> ---
>
> Key: HDFS-16382
> URL: https://issues.apache.org/jira/browse/HDFS-16382
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.3.1
>Reporter: zhanghaobo
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HDFS-16382.-RBF-getContentSummary-RPC-compute-sub-di.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Router getContentSummary rpc compute sub-directory repeatedly when a 
> direactory and its ancestor directory are both mounted  in the form of 
> original src path.
> For example, suppose we have mount table entries below:
> /A---ns1---/A
> /A/B—ns1,ns2—/A/B
> we put a file test.txt to directory /A/B in namepsace ns1, then execute `hdfs 
> dfs -count  hdfs://router:/A`,  the result is wrong, because we compute 
> /A/B/test.txt repeatedly



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16391?focusedWorklogId=698502&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698502
 ]

ASF GitHub Bot logged work on HDFS-16391:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 08:20
Start Date: 20/Dec/21 08:20
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3820:
URL: https://github.com/apache/hadoop/pull/3820#issuecomment-997697220


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 52s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  36m 16s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 24s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 19s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 22s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 16s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 49s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 33s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  37m 29s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3820/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 32s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 133m 57s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.federation.router.TestRouterRPCMultipleDestinationMountTableResolver
 |
   |   | hadoop.hdfs.rbfbalance.TestRouterDistCpProcedure |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3820/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3820 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 05506f034149 4.15.0-162-generic #170-Ubuntu SMP Mon Oct 18 
11:38:05 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / f5b7a1801176c695a249801d079ac558e2dea453 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.

[jira] [Updated] (HDFS-16382) RBF: getContentSummary RPC compute sub-directory repeatedly

2021-12-20 Thread zhanghaobo (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhanghaobo updated HDFS-16382:
--
Attachment: 0001-HDFS-16382.-RBF-getContentSummary-RPC-compute-sub-di.patch

> RBF: getContentSummary RPC compute sub-directory repeatedly
> ---
>
> Key: HDFS-16382
> URL: https://issues.apache.org/jira/browse/HDFS-16382
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.3.1
>Reporter: zhanghaobo
>Priority: Major
>  Labels: pull-request-available
> Attachments: 
> 0001-HDFS-16382.-RBF-getContentSummary-RPC-compute-sub-di.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Router getContentSummary rpc compute sub-directory repeatedly when a 
> direactory and its ancestor directory are both mounted  in the form of 
> original src path.
> For example, suppose we have mount table entries below:
> /A---ns1---/A
> /A/B—ns1,ns2—/A/B
> we put a file test.txt to directory /A/B in namepsace ns1, then execute `hdfs 
> dfs -count  hdfs://router:/A`,  the result is wrong, because we compute 
> /A/B/test.txt repeatedly



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16382) RBF: getContentSummary RPC compute sub-directory repeatedly

2021-12-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16382?focusedWorklogId=698494&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698494
 ]

ASF GitHub Bot logged work on HDFS-16382:
-

Author: ASF GitHub Bot
Created on: 20/Dec/21 08:10
Start Date: 20/Dec/21 08:10
Worklog Time Spent: 10m 
  Work Description: hfutatzhanghb commented on pull request #3797:
URL: https://github.com/apache/hadoop/pull/3797#issuecomment-997689445


   add some unit tests of this patch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 698494)
Time Spent: 50m  (was: 40m)

> RBF: getContentSummary RPC compute sub-directory repeatedly
> ---
>
> Key: HDFS-16382
> URL: https://issues.apache.org/jira/browse/HDFS-16382
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.3.1
>Reporter: zhanghaobo
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Router getContentSummary rpc compute sub-directory repeatedly when a 
> direactory and its ancestor directory are both mounted  in the form of 
> original src path.
> For example, suppose we have mount table entries below:
> /A---ns1---/A
> /A/B—ns1,ns2—/A/B
> we put a file test.txt to directory /A/B in namepsace ns1, then execute `hdfs 
> dfs -count  hdfs://router:/A`,  the result is wrong, because we compute 
> /A/B/test.txt repeatedly



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org