[jira] [Commented] (HDFS-16393) RBF: Fix TestRouterRPCMultipleDestinationMountTableResolver
[ https://issues.apache.org/jira/browse/HDFS-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463026#comment-17463026 ] Ayush Saxena commented on HDFS-16393: - Guess just changing dfsCluster.restartNameNode(0, false); -> dfsCluster.restartNameNode(0); should fix the test. seems due to change in minidfs cluster restart logic change > RBF: Fix TestRouterRPCMultipleDestinationMountTableResolver > --- > > Key: HDFS-16393 > URL: https://issues.apache.org/jira/browse/HDFS-16393 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Priority: Major > > Fails in the after block > https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/724/testReport/junit/org.apache.hadoop.hdfs.server.federation.router/TestRouterRPCMultipleDestinationMountTableResolver/testInvokeAtAvailableNs/ -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16393) RBF: Fix TestRouterRPCMultipleDestinationMountTableResolver
Ayush Saxena created HDFS-16393: --- Summary: RBF: Fix TestRouterRPCMultipleDestinationMountTableResolver Key: HDFS-16393 URL: https://issues.apache.org/jira/browse/HDFS-16393 Project: Hadoop HDFS Issue Type: Bug Reporter: Ayush Saxena Fails in the after block https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/724/testReport/junit/org.apache.hadoop.hdfs.server.federation.router/TestRouterRPCMultipleDestinationMountTableResolver/testInvokeAtAvailableNs/ -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16348) Mark slownode as badnode to recover pipeline
[ https://issues.apache.org/jira/browse/HDFS-16348?focusedWorklogId=699208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699208 ] ASF GitHub Bot logged work on HDFS-16348: - Author: ASF GitHub Bot Created on: 21/Dec/21 05:33 Start Date: 21/Dec/21 05:33 Worklog Time Spent: 10m Work Description: symious commented on pull request #3704: URL: https://github.com/apache/hadoop/pull/3704#issuecomment-998486754 @tasanuma Thanks for the detailed review. Updated as suggested, please have a check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 699208) Time Spent: 3h 20m (was: 3h 10m) > Mark slownode as badnode to recover pipeline > > > Key: HDFS-16348 > URL: https://issues.apache.org/jira/browse/HDFS-16348 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Janus Chow >Assignee: Janus Chow >Priority: Major > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > > In HDFS-16320, the DataNode can retrieve the SLOW status from each NameNode. > This ticket is to send this information back to Clients who are writing > blocks. If a Clients noticed the pipeline is build on a slownode, he/she can > choose to mark the slownode as a badnode to exclude the node or rebuild a > pipeline. > In order to avoid the false positives, we added a config of "threshold", only > clients continuously receives slownode reply from the same node will the node > be marked as SLOW. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=699205&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699205 ] ASF GitHub Bot logged work on HDFS-16303: - Author: ASF GitHub Bot Created on: 21/Dec/21 05:17 Start Date: 21/Dec/21 05:17 Worklog Time Spent: 10m Work Description: virajjasani commented on pull request #3675: URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998480240 > If there are fewer nodes being decommissioned than max tracked nodes, then there are no nodes in the pendingNodes queue & all nodes are being tracked for decommissioning. Therefore, there is no possibility that any healthy nodes are blocked in the pendingNodes queue Yes makes sense. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 699205) Time Spent: 10.5h (was: 10h 20m) > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Assignee: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 10.5h > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever without making any > forward progress towards being decommissioned. > h2. Root Cause > The HDFS Namenode class "DatanodeAdminManager" is responsible for > decommissioning datanodes. > As per this "hdfs-site" configuration: > {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes > Default Value = 100 > The maximum number of decommission-in-progress datanodes nodes that will be > tracked at one time by the namenode. Tracking a decommission-in-progress > datanode consumes additional NN memory proportional to the number of blocks > on the datnode. Having a conservative limit reduces the potential impact of > decomissioning a large number of nodes at once. A value of 0 means no limit > will be enforced. > {quote} > The Namenode will only actively track up to 100 datanodes for decommissioning > at any given time, as to avoid Namenode memory pressure. > Looking into the "DatanodeAdminManager" code: > * a new datanode is only removed from the "tracked.nodes" set when it > finishes decommissioning > * a new datanode is only added to the "tracked.nodes" set if there is fewer > than 100 datanodes being tracked > So in the event that there are more than 100 datanodes being decommissioned > at a given time, some of those datanodes will not be in the "tracked.nodes" > set until 1 or more datanodes in the "tracked.nodes" finishes > decommissioning. This is generally not a problem because the datanodes in > "tracked.nodes" will eventually finish decommissioning, but there is an edge > case where this logic prevents the namenode from making any forward progress > towards decommissioning. > If all 100 datanodes in the "tracked.nodes" are unable to finish > decommissioning, then other datanodes (which may be able to be > decommissioned) will never get added to "tracked.nodes" and therefore will > never get the opportunity to be decommissioned. > This can occur due the following issue: > {quote}2021-10-21 12:39:24,048 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager > (DatanodeAdminMonitor-0): Node W.X.Y.Z:50010 is dead while in Decommission In > Progress. Cannot be safely decommissioned or be in maintenance since there is > risk of reduced data durability or data loss. Either restart the failed node > or force decommissioning or maintenance by removing, calling refreshNodes, > then re-adding to the excludes or host config files. > {quote} > If a Datanode is lost while decommissioning (for example if the underlying > hardware fails or is lost), then it will remain in state decommissioning > forever. > If 100 or more Datanodes are lost while decommissioning over the Hadoop > cluster lifetime, then this is enough to completely fill up the > "tracked.nodes" set. With the entire "tracked.nodes" set filled with > datanodes that can never finish deco
[jira] [Work logged] (HDFS-16348) Mark slownode as badnode to recover pipeline
[ https://issues.apache.org/jira/browse/HDFS-16348?focusedWorklogId=699169&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699169 ] ASF GitHub Bot logged work on HDFS-16348: - Author: ASF GitHub Bot Created on: 21/Dec/21 04:09 Start Date: 21/Dec/21 04:09 Worklog Time Spent: 10m Work Description: tasanuma commented on a change in pull request #3704: URL: https://github.com/apache/hadoop/pull/3704#discussion_r772796622 ## File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java ## @@ -1254,10 +1273,52 @@ public void run() { } } +void markSlowNode(List slownodesFromAck) throws IOException { + Set discontinuousNodes = new HashSet<>(slowNodeMap.keySet()); + for (DatanodeInfo slowNode : slownodesFromAck) { +if (!slowNodeMap.containsKey(slowNode)) { + slowNodeMap.put(slowNode, 1); +} else { + int oldCount = slowNodeMap.get(slowNode); + slowNodeMap.put(slowNode, ++oldCount); +} +discontinuousNodes.remove(slowNode); + } + for (DatanodeInfo discontinuousNode : discontinuousNodes) { +slowNodeMap.remove(discontinuousNode); + } + + if (!slowNodeMap.isEmpty()) { +for (Map.Entry entry : slowNodeMap.entrySet()) { + if (entry.getValue() >= markSlowNodeAsBadNodeThreshold) { +DatanodeInfo slowNode = entry.getKey(); +int index = getDatanodeIndex(slowNode); +if (index >= 0) { + errorState.setBadNodeIndex( + getDatanodeIndex(entry.getKey())); Review comment: We can reuse `index` variable. ```suggestion errorState.setBadNodeIndex(index); ``` ## File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java ## @@ -230,14 +260,27 @@ public static ECN getECNFromHeader(int header) { return StatusFormat.getECN(header); } + public static SLOW getSLOWFromHeader(int header) { +return StatusFormat.getSLOW(header); + } + public static int setStatusForHeader(int old, Status status) { return StatusFormat.setStatus(old, status); } + public static int setSLOWForHeader(int old, SLOW slow) { Review comment: Only the unit test uses this method. Would you please add VisibleForTesting? ```suggestion @VisibleForTesting public static int setSLOWForHeader(int old, SLOW slow) { ``` ## File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java ## @@ -230,14 +260,27 @@ public static ECN getECNFromHeader(int header) { return StatusFormat.getECN(header); } + public static SLOW getSLOWFromHeader(int header) { +return StatusFormat.getSLOW(header); + } + public static int setStatusForHeader(int old, Status status) { return StatusFormat.setStatus(old, status); } + public static int setSLOWForHeader(int old, SLOW slow) { +return StatusFormat.setSLOW(old, slow); + } + public static int combineHeader(ECN ecn, Status status) { +return combineHeader(ecn, status, SLOW.DISABLED); + } + + public static int combineHeader(ECN ecn, Status status, SLOW slow) { Review comment: I want `PipelineAck#getHeaderFlag()` to use this method. ```java public int getHeaderFlag(int i) { if (proto.getFlagCount() > 0) { return proto.getFlag(i); } else { return combineHeader(ECN.DISABLED, proto.getReply(i), SLOW.DISABLED); } } ``` ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java ## @@ -1620,8 +1623,10 @@ private void sendAckUpstreamUnprotected(PipelineAck ack, long seqno, // downstream nodes, reply should contain one reply. replies = new int[] { myHeader }; } else if (mirrorError) { // ack read error -int h = PipelineAck.combineHeader(datanode.getECN(), Status.SUCCESS); -int h1 = PipelineAck.combineHeader(datanode.getECN(), Status.ERROR); +int h = PipelineAck.combineHeader(datanode.getECN(), Status.SUCCESS, +datanode.getSLOW()); +int h1 = PipelineAck.combineHeader(datanode.getECN(), Status.ERROR, +datanode.getSLOW()); Review comment: Why it doesn't use `datanode.getSLOWByBlockPoolId(block.getBlockPoolId())`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking -
[jira] [Commented] (HDFS-16368) DFSAdmin supports refresh topology info without restarting namenode
[ https://issues.apache.org/jira/browse/HDFS-16368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462940#comment-17462940 ] zhanghaobo commented on HDFS-16368: --- [~ferhui] ,yeah, looks like the same. > DFSAdmin supports refresh topology info without restarting namenode > > > Key: HDFS-16368 > URL: https://issues.apache.org/jira/browse/HDFS-16368 > Project: Hadoop HDFS > Issue Type: New Feature > Components: dfsadmin, namanode >Affects Versions: 2.7.7, 3.3.1 >Reporter: zhanghaobo >Priority: Major > Labels: features, pull-request-available > Attachments: 0001.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Currently in HDFS, if we update the rack info for rack-awareness, we may need > to rolling restart namenodes to let it be effective. If cluster is large, the > cost time of rolling restart namenodes is very log. So, we develope a method > to refresh topology info without rolling restart namenodes. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16389) Improve NNThroughputBenchmark test mkdirs
[ https://issues.apache.org/jira/browse/HDFS-16389?focusedWorklogId=699112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699112 ] ASF GitHub Bot logged work on HDFS-16389: - Author: ASF GitHub Bot Created on: 21/Dec/21 02:08 Start Date: 21/Dec/21 02:08 Worklog Time Spent: 10m Work Description: jianghuazhu commented on pull request #3819: URL: https://github.com/apache/hadoop/pull/3819#issuecomment-998409764 Could you help review this pr, @aajisaka @virajjasani . Thank you very much. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 699112) Time Spent: 0.5h (was: 20m) > Improve NNThroughputBenchmark test mkdirs > - > > Key: HDFS-16389 > URL: https://issues.apache.org/jira/browse/HDFS-16389 > Project: Hadoop HDFS > Issue Type: Improvement > Components: benchmarks, namenode >Affects Versions: 2.9.2 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When using the NNThroughputBenchmark test to create a large number of > directories, some abnormal information will be prompted. > Here is the command: > ./bin/hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs > hdfs:// -op mkdirs -threads 30 -dirs 500 > There are some exceptions here, such as: > 21/12/20 10:25:00 INFO namenode.NNThroughputBenchmark: Starting benchmark: > mkdirs > 21/12/20 10:25:01 INFO namenode.NNThroughputBenchmark: Generate 500 > inputs for mkdirs > 21/12/20 10:25:08 ERROR namenode.NNThroughputBenchmark: > java.lang.ArrayIndexOutOfBoundsException: 20 > at > org.apache.hadoop.hdfs.server.namenode.FileNameGenerator.getNextDirName(FileNameGenerator.java:65) > at > org.apache.hadoop.hdfs.server.namenode.FileNameGenerator.getNextFileName(FileNameGenerator.java:73) > at > org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$MkdirsStats.generateInputs(NNThroughputBenchmark.java:668) > at > org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$OperationStatsBase.benchmark(NNThroughputBenchmark.java:257) > at > org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.run(NNThroughputBenchmark.java:1528) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at > org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.runBenchmark(NNThroughputBenchmark.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.main(NNThroughputBenchmark.java:1550) > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 20 > at > org.apache.hadoop.hdfs.server.namenode.FileNameGenerator.getNextDirName(FileNameGenerator.java:65) > at > org.apache.hadoop.hdfs.server.namenode.FileNameGenerator.getNextFileName(FileNameGenerator.java:73) > at > org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$MkdirsStats.generateInputs(NNThroughputBenchmark.java:668) > at > org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark$OperationStatsBase.benchmark(NNThroughputBenchmark.java:257) > at > org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.run(NNThroughputBenchmark.java:1528) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at > org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.runBenchmark(NNThroughputBenchmark.java:1430) > at > org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark.main(NNThroughputBenchmark.java:1550) > These messages appear because some parameters are incorrectly set, such as > dirsPerDir or filesPerDir. > When we see this log, this will make us have some questions. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16385) Fix Datanode retrieve slownode information bug.
[ https://issues.apache.org/jira/browse/HDFS-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma resolved HDFS-16385. - Resolution: Fixed Merged the PR. Thanks for your contribution, [~JacksonWang]. I added you to a contributor role. > Fix Datanode retrieve slownode information bug. > --- > > Key: HDFS-16385 > URL: https://issues.apache.org/jira/browse/HDFS-16385 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Jackson Wang >Assignee: Jackson Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > in HDFS-16320, the DataNode will retrieve the SLOW status from each NameNode. > But namenode did not set isSlowNode to HeartbeatResponseProto in > DatanodeProtocolServerSideTranslatorPB#sendHeartbeat. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16385) Fix Datanode retrieve slownode information bug.
[ https://issues.apache.org/jira/browse/HDFS-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma reassigned HDFS-16385: --- Assignee: Jackson Wang > Fix Datanode retrieve slownode information bug. > --- > > Key: HDFS-16385 > URL: https://issues.apache.org/jira/browse/HDFS-16385 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Jackson Wang >Assignee: Jackson Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > in HDFS-16320, the DataNode will retrieve the SLOW status from each NameNode. > But namenode did not set isSlowNode to HeartbeatResponseProto in > DatanodeProtocolServerSideTranslatorPB#sendHeartbeat. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working
[ https://issues.apache.org/jira/browse/HDFS-16386?focusedWorklogId=699110&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699110 ] ASF GitHub Bot logged work on HDFS-16386: - Author: ASF GitHub Bot Created on: 21/Dec/21 02:04 Start Date: 21/Dec/21 02:04 Worklog Time Spent: 10m Work Description: jianghuazhu commented on pull request #3806: URL: https://github.com/apache/hadoop/pull/3806#issuecomment-998408050 Thank you for your attention and comments, @brahmareddybattula . I will continue to work. If necessary, I will create a new jira. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 699110) Time Spent: 3h (was: 2h 50m) > Reduce DataNode load when FsDatasetAsyncDiskService is working > -- > > Key: HDFS-16386 > URL: https://issues.apache.org/jira/browse/HDFS-16386 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.9.2 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.3 > > Attachments: monitor.png > > Time Spent: 3h > Remaining Estimate: 0h > > Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it > will cause a high load on the DataNode. > Here are some monitoring related to memory: > !monitor.png! > Since each disk deletes the block asynchronously, and each thread allows 4 > threads to work, > This will cause some troubles to the DataNode, such as increased cpu and > increased memory. > We should appropriately reduce the number of jobs of the total thread so that > the DataNode can work better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16385) Fix Datanode retrieve slownode information bug.
[ https://issues.apache.org/jira/browse/HDFS-16385?focusedWorklogId=699109&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699109 ] ASF GitHub Bot logged work on HDFS-16385: - Author: ASF GitHub Bot Created on: 21/Dec/21 02:04 Start Date: 21/Dec/21 02:04 Worklog Time Spent: 10m Work Description: tasanuma merged pull request #3803: URL: https://github.com/apache/hadoop/pull/3803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 699109) Time Spent: 50m (was: 40m) > Fix Datanode retrieve slownode information bug. > --- > > Key: HDFS-16385 > URL: https://issues.apache.org/jira/browse/HDFS-16385 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Jackson Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > in HDFS-16320, the DataNode will retrieve the SLOW status from each NameNode. > But namenode did not set isSlowNode to HeartbeatResponseProto in > DatanodeProtocolServerSideTranslatorPB#sendHeartbeat. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16385) Fix Datanode retrieve slownode information bug.
[ https://issues.apache.org/jira/browse/HDFS-16385?focusedWorklogId=699111&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699111 ] ASF GitHub Bot logged work on HDFS-16385: - Author: ASF GitHub Bot Created on: 21/Dec/21 02:04 Start Date: 21/Dec/21 02:04 Worklog Time Spent: 10m Work Description: tasanuma commented on pull request #3803: URL: https://github.com/apache/hadoop/pull/3803#issuecomment-998408212 Thanks for fixing the issue, @Jackson-Wang-7. Thanks for your reviews, @symious, @ferhui, @tomscut. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 699111) Time Spent: 1h (was: 50m) > Fix Datanode retrieve slownode information bug. > --- > > Key: HDFS-16385 > URL: https://issues.apache.org/jira/browse/HDFS-16385 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Jackson Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > in HDFS-16320, the DataNode will retrieve the SLOW status from each NameNode. > But namenode did not set isSlowNode to HeartbeatResponseProto in > DatanodeProtocolServerSideTranslatorPB#sendHeartbeat. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working
[ https://issues.apache.org/jira/browse/HDFS-16386?focusedWorklogId=699107&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699107 ] ASF GitHub Bot logged work on HDFS-16386: - Author: ASF GitHub Bot Created on: 21/Dec/21 01:59 Start Date: 21/Dec/21 01:59 Worklog Time Spent: 10m Work Description: jianghuazhu commented on pull request #3806: URL: https://github.com/apache/hadoop/pull/3806#issuecomment-998406033 Thank you for your reminder and help, @jojochuang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 699107) Time Spent: 2h 50m (was: 2h 40m) > Reduce DataNode load when FsDatasetAsyncDiskService is working > -- > > Key: HDFS-16386 > URL: https://issues.apache.org/jira/browse/HDFS-16386 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.9.2 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.3 > > Attachments: monitor.png > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it > will cause a high load on the DataNode. > Here are some monitoring related to memory: > !monitor.png! > Since each disk deletes the block asynchronously, and each thread allows 4 > threads to work, > This will cause some troubles to the DataNode, such as increased cpu and > increased memory. > We should appropriately reduce the number of jobs of the total thread so that > the DataNode can work better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16371) Exclude slow disks when choosing volume
[ https://issues.apache.org/jira/browse/HDFS-16371?focusedWorklogId=699101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699101 ] ASF GitHub Bot logged work on HDFS-16371: - Author: ASF GitHub Bot Created on: 21/Dec/21 01:49 Start Date: 21/Dec/21 01:49 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #3753: URL: https://github.com/apache/hadoop/pull/3753#issuecomment-998402471 Hi @tasanuma @jojochuang @ayushtkn . Please help to review this PR. Thank you very much. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 699101) Time Spent: 1h 20m (was: 1h 10m) > Exclude slow disks when choosing volume > --- > > Key: HDFS-16371 > URL: https://issues.apache.org/jira/browse/HDFS-16371 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, the datanode can detect slow disks. See HDFS-11461. > And after HDFS-16311, the slow disk information we collected is more accurate. > So we can exclude these slow disks according to some rules when choosing > volume. This will prevents some slow disks from affecting the throughput of > the whole datanode. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16376) Expose metrics of NodeNotChosenReason to JMX
[ https://issues.apache.org/jira/browse/HDFS-16376?focusedWorklogId=699100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699100 ] ASF GitHub Bot logged work on HDFS-16376: - Author: ASF GitHub Bot Created on: 21/Dec/21 01:47 Start Date: 21/Dec/21 01:47 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #3778: URL: https://github.com/apache/hadoop/pull/3778#issuecomment-998401775 Hi @ayushtkn @jojochuang @ferhui @tasanuma , could you please take a look at this? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 699100) Time Spent: 1h (was: 50m) > Expose metrics of NodeNotChosenReason to JMX > > > Key: HDFS-16376 > URL: https://issues.apache.org/jira/browse/HDFS-16376 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Attachments: image-2021-12-09-23-48-42-865.png, > image-2021-12-09-23-55-29-017.png > > Time Spent: 1h > Remaining Estimate: 0h > > In our cluster, we can see logs for nodes that are not chosen. But it's hard > to see the percentages in each reason from the logs. It is best to add > relevant metrics to monitor the entire cluster. > !image-2021-12-09-23-48-42-865.png|width=517,height=187! > *JMX metrics:* > !image-2021-12-09-23-55-29-017.png|width=620,height=152! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=699038&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699038 ] ASF GitHub Bot logged work on HDFS-16303: - Author: ASF GitHub Bot Created on: 20/Dec/21 23:46 Start Date: 20/Dec/21 23:46 Worklog Time Spent: 10m Work Description: KevinWikant edited a comment on pull request #3675: URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998300466 @virajjasani, please see my response to your comments below > hence if few nodes are really in bad state (hardware/network issues), the plan is to keep re-queueing them until more nodes are getting decommissioned than max tracked nodes right? It's the opposite, the unhealthy nodes will only be re-queued when there are more nodes being decommissioned than max tracked nodes. Otherwise, if there are fewer nodes being decommissioned than max tracked nodes, then the unhealthy nodes will not be re-queued because they do not risk blocking the decommissioning of queued healthy nodes (i.e. because the queue is empty). One potential performance impact that comes to mind is that if there are say 200 unhealthy decommissioning nodes & max tracked nodes = 100, then this may cause some churn in the queueing/de-queueing process because each DatanodeAdminMonitor tick all 100 tracked nodes will be re-queued & then 100 queued nodes will be de-queued/tracked. Note that this churn (and any associated performance impact) will only take effect when: - there are more nodes being decommissioned than max tracked nodes - AND either: - number of healthy decommissioning nodes < max tracked nodes - number of unhealthy decommissioning nodes > max tracked nodes The amount of re-queued/de-queued nodes per tick can be quantified as: `numRequeue = numDecommissioning <= numTracked ? 0 : numDeadDecommissioning - (numDecommissioning - numTracked)` This churn of queueing/de-queueing will not occur at all under typical decommissioning scenarios (i.e. where there isn't a large number of dead decommissioning nodes). One idea to mitigate this is to have DatanodeAdminMonitor maintain counters used to track the number of healthy nodes in the pendingNodes queue; then this count can be used to make an improved re-queue decision. In particular, unhealthy nodes are only re-queued if there are healthy nodes in the pendingNodes queue. But this approach has some flaws, for example an unhealthy node in the queue could come alive again, but then an unhealthy node in the tracked set wouldn't be re-queued because the healthy queued node count hasn't been updated. To solve this, we would need to scan the pendingNodes queue to update the healthy/unhealthy node counts periodically, this scan could prove expensive. > Since unhealthy node getting decommissioned might anyways require some sort of retry, shall we requeue them even if the condition is not met (i.e. total no of decomm in progress < max tracked nodes) as a limited retries? If there are fewer nodes being decommissioned than max tracked nodes, then there are no nodes in the pendingNodes queue & all nodes are being tracked for decommissioning. Therefore, there is no possibility that any healthy nodes are blocked in the pendingNodes queue (preventing them from being decommissioned) & so in my opinion there is no benefit to re-queueing the unhealthy nodes in this case. Furthermore, this will negatively impact performance through frequent re-queueing & de-queueing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 699038) Time Spent: 10h 20m (was: 10h 10m) > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Assignee: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 10h 20m > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever with
[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=699037&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699037 ] ASF GitHub Bot logged work on HDFS-16303: - Author: ASF GitHub Bot Created on: 20/Dec/21 23:45 Start Date: 20/Dec/21 23:45 Worklog Time Spent: 10m Work Description: KevinWikant edited a comment on pull request #3675: URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998300466 > hence if few nodes are really in bad state (hardware/network issues), the plan is to keep re-queueing them until more nodes are getting decommissioned than max tracked nodes right? It's the opposite, the unhealthy nodes will only be re-queued when there are more nodes being decommissioned than max tracked nodes. Otherwise, if there are fewer nodes being decommissioned than max tracked nodes, then the unhealthy nodes will not be re-queued because they do not risk blocking the decommissioning of queued healthy nodes (i.e. because the queue is empty). One potential performance impact that comes to mind is that if there are say 200 unhealthy decommissioning nodes & max tracked nodes = 100, then this may cause some churn in the queueing/de-queueing process because each DatanodeAdminMonitor tick all 100 tracked nodes will be re-queued & then 100 queued nodes will be de-queued/tracked. Note that this churn (and any associated performance impact) will only take effect when: - there are more nodes being decommissioned than max tracked nodes - AND either: - number of healthy decommissioning nodes < max tracked nodes - number of unhealthy decommissioning nodes > max tracked nodes The amount of re-queued/de-queued nodes per tick can be quantified as: `numRequeue = numDecommissioning <= numTracked ? 0 : numDeadDecommissioning - (numDecommissioning - numTracked)` This churn of queueing/de-queueing will not occur at all under typical decommissioning scenarios (i.e. where there isn't a large number of dead decommissioning nodes). One idea to mitigate this is to have DatanodeAdminMonitor maintain counters used to track the number of healthy nodes in the pendingNodes queue; then this count can be used to make an improved re-queue decision. In particular, unhealthy nodes are only re-queued if there are healthy nodes in the pendingNodes queue. But this approach has some flaws, for example an unhealthy node in the queue could come alive again, but then an unhealthy node in the tracked set wouldn't be re-queued because the healthy queued node count hasn't been updated. To solve this, we would need to scan the pendingNodes queue to update the healthy/unhealthy node counts periodically, this scan could prove expensive. > Since unhealthy node getting decommissioned might anyways require some sort of retry, shall we requeue them even if the condition is not met (i.e. total no of decomm in progress < max tracked nodes) as a limited retries? If there are fewer nodes being decommissioned than max tracked nodes, then there are no nodes in the pendingNodes queue & all nodes are being tracked for decommissioning. Therefore, there is no possibility that any healthy nodes are blocked in the pendingNodes queue (preventing them from being decommissioned) & so in my opinion there is no benefit to re-queueing the unhealthy nodes in this case. Furthermore, this will negatively impact performance through frequent re-queueing & de-queueing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 699037) Time Spent: 10h 10m (was: 10h) > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Assignee: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 10h 10m > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever without making any > forward progress towards being decommissioned. > h2.
[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=699036&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699036 ] ASF GitHub Bot logged work on HDFS-16303: - Author: ASF GitHub Bot Created on: 20/Dec/21 23:44 Start Date: 20/Dec/21 23:44 Worklog Time Spent: 10m Work Description: KevinWikant edited a comment on pull request #3675: URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998300466 > hence if few nodes are really in bad state (hardware/network issues), the plan is to keep re-queueing them until more nodes are getting decommissioned than max tracked nodes right? It's the opposite, the unhealthy nodes will only be re-queued when there are more nodes being decommissioned than max tracked nodes. Otherwise, if there are fewer nodes being decommissioned than max tracked nodes, then the unhealthy nodes will not be re-queued because they do not risk blocking the decommissioning of queued healthy nodes (i.e. because the queue is empty). One potential performance impact that does come to mind is that if there are say 200 unhealthy decommissioning nodes & max tracked nodes = 100, then this may cause some churn in the queueing/de-queueing process because each DatanodeAdminMonitor tick all 100 tracked nodes will be re-queued & then 100 queued nodes will be de-queued/tracked. Note that this churn (and any associated performance impact) will only take effect when: - there are more nodes being decommissioned than max tracked nodes - AND either: - number of healthy decommissioning nodes < max tracked nodes - number of unhealthy decommissioning nodes > max tracked nodes The amount of re-queued/de-queued nodes per tick can be quantified as: `numRequeue = numDecommissioning <= numTracked ? 0 : numDeadDecommissioning -(numDecommissioning - numTracked)` This churn of queueing/de-queueing will not occur at all under typical decommissioning scenarios (i.e. where there isn't a large number of dead decommissioning nodes). One idea to mitigate this is to have DatanodeAdminMonitor maintain counters used to track the number of healthy nodes in the pendingNodes queue; then this count can be used to make an improved re-queue decision. In particular, unhealthy nodes are only re-queued if there are healthy nodes in the pendingNodes queue. But this approach has some flaws, for example an unhealthy node in the queue could come alive again, but then an unhealthy node in the tracked set wouldn't be re-queued because the healthy queued node count hasn't been updated. To solve this, we would need to scan the pendingNodes queue to update the healthy/unhealthy node counts periodically, this scan could prove expensive. > Since unhealthy node getting decommissioned might anyways require some sort of retry, shall we requeue them even if the condition is not met (i.e. total no of decomm in progress < max tracked nodes) as a limited retries? If there are fewer nodes being decommissioned than max tracked nodes, then there are no nodes in the pendingNodes queue & all nodes are being tracked for decommissioning. Therefore, there is no possibility that any healthy nodes are blocked in the pendingNodes queue (preventing them from being decommissioned) & so in my opinion there is no benefit to re-queueing the unhealthy nodes in this case. Furthermore, this will negatively impact performance through frequent re-queueing & de-queueing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 699036) Time Spent: 10h (was: 9h 50m) > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Assignee: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 10h > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever without making any > forward progress towards being decommissioned. > h2. Ro
[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=699002&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699002 ] ASF GitHub Bot logged work on HDFS-16303: - Author: ASF GitHub Bot Created on: 20/Dec/21 21:59 Start Date: 20/Dec/21 21:59 Worklog Time Spent: 10m Work Description: KevinWikant edited a comment on pull request #3675: URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998300466 > hence if few nodes are really in bad state (hardware/network issues), the plan is to keep re-queueing them until more nodes are getting decommissioned than max tracked nodes right? It's the opposite, the unhealthy nodes will only be re-queued when there are more nodes being decommissioned than max tracked nodes. Otherwise, if there are fewer nodes being decommissioned than max tracked nodes, then the unhealthy nodes will not be re-queued because they do not risk blocking the decommissioning of queued healthy nodes (i.e. because the queue is empty). One potential performance impact that does come to mind is that if there are say 200 unhealthy decommissioning nodes & max tracked nodes = 100, then this may cause some churn in the queueing/de-queueing process because each DatanodeAdminMonitor tick all 100 tracked nodes will be re-queued & then 100 queued nodes will be de-queued/tracked. Note that this churn (and any associated performance impact) will only take effect when: - there are more nodes being decommissioned than max tracked nodes - AND either: - number of healthy decommissioning nodes < max tracked nodes - number of unhealthy decommissioning nodes > max tracked nodes The amount of re-queued/de-queued nodes per tick can be quantified as: `numRequeue = numDecommissioning <= numTracked ? 0 : numDeadDecommissioning -(numDecommissioning - numTracked)` This churn of queueing/de-queueing will not occur at all under typical decommissioning scenarios (i.e. where there isn't a large number of dead decommissioning nodes). One idea to mitigate this is to have DatanodeAdminMonitor maintain counters used to track the number of healthy in the pendingNodes queue; then these counts could be used to make an improved re-queue decision. In particular, unhealthy nodes are only re-queued if there are healthy nodes in the pendingNodes queue. But this approach has some flaws, for example an unhealthy node in the queue could come alive again, but an unhealthy node in the tracked set wouldn't be re-queued to make space for it because its still counted as a unhealthy node. To solve this, we would need to scan the pendingNodes queue to update the healthy/unhealthy node counts periodically, this scan could prove expensive. > Since unhealthy node getting decommissioned might anyways require some sort of retry, shall we requeue them even if the condition is not met (i.e. total no of decomm in progress < max tracked nodes) as a limited retries? If there are fewer nodes being decommissioned than max tracked nodes, then there are no nodes in the pendingNodes queue & all nodes are being tracked for decommissioning. Therefore, there is no possibility that any healthy nodes are blocked in the pendingNodes queue (preventing them from being decommissioned) & so in my opinion there is no benefit to re-queueing the unhealthy nodes in this case. Furthermore, this will negatively impact performance through frequent re-queueing & de-queueing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 699002) Time Spent: 9h 50m (was: 9h 40m) > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Assignee: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 9h 50m > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever without making any > forward progress towards being decommissioned.
[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=699001&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-699001 ] ASF GitHub Bot logged work on HDFS-16303: - Author: ASF GitHub Bot Created on: 20/Dec/21 21:58 Start Date: 20/Dec/21 21:58 Worklog Time Spent: 10m Work Description: KevinWikant commented on pull request #3675: URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998300466 > hence if few nodes are really in bad state (hardware/network issues), the plan is to keep re-queueing them until more nodes are getting decommissioned than max tracked nodes right? It's the opposite, the unhealthy nodes will only be re-queued when there are more nodes being decommissioned than max tracked nodes. Otherwise, if there are fewer nodes being decommissioned than max tracked nodes, then the unhealthy nodes will not be re-queued because they do not risk blocking the decommissioning of queued healthy nodes (i.e. because the queue is empty). One potential performance impact that does come to mind is that if there are say 200 unhealthy decommissioning nodes & max tracked nodes = 100, then this may cause some churn in the queueing/de-queueing process because each DatanodeAdminMonitor tick all 100 tracked nodes will be re-queued & then 100 queued nodes will be de-queued/tracked. Note that this churn (and any associated performance impact) will only take effect when: - there are more nodes being decommissioned than max tracked nodes - AND either: - number of healthy decommissioning nodes < max tracked nodes - number of unhealthy decommissioning nodes > max tracked nodes The amount of re-queued/de-queued nodes per tick can be quantified as: > numRequeue = numDecommissioning <= numTracked ? 0 : numDeadDecommissioning -(numDecommissioning - numTracked) This churn of queueing/de-queueing will not occur at all under typical decommissioning scenarios (i.e. where there isn't a large number of dead decommissioning nodes). One idea to mitigate this is to have DatanodeAdminMonitor maintain counters used to track the number of healthy in the pendingNodes queue; then these counts could be used to make an improved re-queue decision. In particular, unhealthy nodes are only re-queued if there are healthy nodes in the pendingNodes queue. But this approach has some flaws, for example an unhealthy node in the queue could come alive again, but an unhealthy node in the tracked set wouldn't be re-queued to make space for it because its still counted as a unhealthy node. To solve this, we would need to scan the pendingNodes queue to update the healthy/unhealthy node counts periodically, this scan could prove expensive. > Since unhealthy node getting decommissioned might anyways require some sort of retry, shall we requeue them even if the condition is not met (i.e. total no of decomm in progress < max tracked nodes) as a limited retries? If there are fewer nodes being decommissioned than max tracked nodes, then there are no nodes in the pendingNodes queue & all nodes are being tracked for decommissioning. Therefore, there is no possibility that any healthy nodes are blocked in the pendingNodes queue (preventing them from being decommissioned) & so in my opinion there is no benefit to re-queueing the unhealthy nodes in this case. Furthermore, this will negatively impact performance through frequent re-queueing & de-queueing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 699001) Time Spent: 9h 40m (was: 9.5h) > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Assignee: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 9h 40m > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever without making any > forward progress towards being decommissioned. > h2. Roo
[jira] [Work logged] (HDFS-16392) TestWebHdfsFileSystemContract#testResponseCode fails
[ https://issues.apache.org/jira/browse/HDFS-16392?focusedWorklogId=698930&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698930 ] ASF GitHub Bot logged work on HDFS-16392: - Author: ASF GitHub Bot Created on: 20/Dec/21 19:32 Start Date: 20/Dec/21 19:32 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3821: URL: https://github.com/apache/hadoop/pull/3821#issuecomment-998211979 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 51s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 34m 5s | | trunk passed | | +1 :green_heart: | compile | 1m 27s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 23s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 1s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 29s | | trunk passed | | +1 :green_heart: | javadoc | 1m 3s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 29s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 17s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 16s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 20s | | the patch passed | | +1 :green_heart: | compile | 1m 20s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 20s | | the patch passed | | +1 :green_heart: | compile | 1m 14s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 14s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 53s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 26s | | the patch passed | | +1 :green_heart: | javadoc | 0m 53s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 22s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 21s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 19s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 234m 33s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 48s | | The patch does not generate ASF License warnings. | | | | 336m 38s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3821/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3821 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 32af67601f79 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / ebee32ccaff22e47805aeee1afb4ef9826af6f93 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3821/1/testReport/ | | Max. process+thread count | 3462 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3821/1/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org | This
[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=698801&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698801 ] ASF GitHub Bot logged work on HDFS-16303: - Author: ASF GitHub Bot Created on: 20/Dec/21 16:16 Start Date: 20/Dec/21 16:16 Worklog Time Spent: 10m Work Description: virajjasani edited a comment on pull request #3675: URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998059075 Sorry, I could not get to this PR last week. I will review later this week but I don't mean to block this work. If I find something odd or something as an improvement over this, we can anyways get it clarified later on the PR/Jira or create addendum PR later. Thanks for your work @KevinWikant, this might be really helpful going forward. With a quick glance, just one question for now: Overall it seems the goal is to improve and continue the decommissioning of healthy nodes over unhealthy ones (by removing and then re-queueing the entries), hence if few nodes are really in bad state (hardware/network issues), the plan is to keep re-queueing them until more nodes are getting decommissioned than max tracked nodes right? Since unhealthy node getting decommissioned might anyways require some sort of retry, shall we requeue them even if the condition is not met (i.e. total no of decomm in progress < max tracked nodes) as a limited retries? I am just thinking at high level, yet to catch up with the PR. Also, good to know HDFS-7374 is not broken. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 698801) Time Spent: 9.5h (was: 9h 20m) > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Assignee: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 9.5h > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever without making any > forward progress towards being decommissioned. > h2. Root Cause > The HDFS Namenode class "DatanodeAdminManager" is responsible for > decommissioning datanodes. > As per this "hdfs-site" configuration: > {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes > Default Value = 100 > The maximum number of decommission-in-progress datanodes nodes that will be > tracked at one time by the namenode. Tracking a decommission-in-progress > datanode consumes additional NN memory proportional to the number of blocks > on the datnode. Having a conservative limit reduces the potential impact of > decomissioning a large number of nodes at once. A value of 0 means no limit > will be enforced. > {quote} > The Namenode will only actively track up to 100 datanodes for decommissioning > at any given time, as to avoid Namenode memory pressure. > Looking into the "DatanodeAdminManager" code: > * a new datanode is only removed from the "tracked.nodes" set when it > finishes decommissioning > * a new datanode is only added to the "tracked.nodes" set if there is fewer > than 100 datanodes being tracked > So in the event that there are more than 100 datanodes being decommissioned > at a given time, some of those datanodes will not be in the "tracked.nodes" > set until 1 or more datanodes in the "tracked.nodes" finishes > decommissioning. This is generally not a problem because the datanodes in > "tracked.nodes" will eventually finish decommissioning, but there is an edge > case where this logic prevents the namenode from making any forward progress > towards decommissioning. > If all 100 datanodes in the "tracked.nodes" are unable to finish > decommissioning, then other datanodes (which may be able to be > decommissioned) will never get added to "tracked.nodes" and therefore will > never get the opportunity to be decommissioned. > This can occur due the following issue: > {quote}2021-10-21 12:39:24,048 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager > (DatanodeAdminMo
[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=698800&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698800 ] ASF GitHub Bot logged work on HDFS-16303: - Author: ASF GitHub Bot Created on: 20/Dec/21 16:06 Start Date: 20/Dec/21 16:06 Worklog Time Spent: 10m Work Description: virajjasani commented on pull request #3675: URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998061174 > Unit testing failed due to unrelated flaky tests > > > [ERROR] Errors: > > [ERROR] org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(org.apache.hadoop.hdfs.TestRollingUpgrade) > > [ERROR] Run 1: TestRollingUpgrade.testRollback:329->waitForNullMxBean:361 » Timeout Timed out... > > [ERROR] Run 2: TestRollingUpgrade.testRollback:329->waitForNullMxBean:361 » Timeout Timed out... > > [ERROR] Run 3: TestRollingUpgrade.testRollback:329->waitForNullMxBean:361 » Timeout Timed out... > > [INFO] > > [WARNING] Flakes: > > [WARNING] org.apache.hadoop.hdfs.TestRollingUpgrade.testCheckpoint(org.apache.hadoop.hdfs.TestRollingUpgrade) > > [ERROR] Run 1: TestRollingUpgrade.testCheckpoint:599->testCheckpoint:686 Test resulted in an unexpected exit > > [INFO] Run 2: PASS Yeah this test failure is not relevant. Even after the recent attempt, it is still flaky, we might require better insights for this test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 698800) Time Spent: 9h 20m (was: 9h 10m) > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Assignee: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 9h 20m > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever without making any > forward progress towards being decommissioned. > h2. Root Cause > The HDFS Namenode class "DatanodeAdminManager" is responsible for > decommissioning datanodes. > As per this "hdfs-site" configuration: > {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes > Default Value = 100 > The maximum number of decommission-in-progress datanodes nodes that will be > tracked at one time by the namenode. Tracking a decommission-in-progress > datanode consumes additional NN memory proportional to the number of blocks > on the datnode. Having a conservative limit reduces the potential impact of > decomissioning a large number of nodes at once. A value of 0 means no limit > will be enforced. > {quote} > The Namenode will only actively track up to 100 datanodes for decommissioning > at any given time, as to avoid Namenode memory pressure. > Looking into the "DatanodeAdminManager" code: > * a new datanode is only removed from the "tracked.nodes" set when it > finishes decommissioning > * a new datanode is only added to the "tracked.nodes" set if there is fewer > than 100 datanodes being tracked > So in the event that there are more than 100 datanodes being decommissioned > at a given time, some of those datanodes will not be in the "tracked.nodes" > set until 1 or more datanodes in the "tracked.nodes" finishes > decommissioning. This is generally not a problem because the datanodes in > "tracked.nodes" will eventually finish decommissioning, but there is an edge > case where this logic prevents the namenode from making any forward progress > towards decommissioning. > If all 100 datanodes in the "tracked.nodes" are unable to finish > decommissioning, then other datanodes (which may be able to be > decommissioned) will never get added to "tracked.nodes" and therefore will > never get the opportunity to be decommissioned. > This can occur due the following issue: > {quote}2021-10-21 12:39:24,048 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager > (DatanodeAdminMonitor-0): Node W.X.Y.Z:50010 is dead while in Decommission In > Progress. Cannot be safely decommissioned or be i
[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=698798&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698798 ] ASF GitHub Bot logged work on HDFS-16303: - Author: ASF GitHub Bot Created on: 20/Dec/21 16:04 Start Date: 20/Dec/21 16:04 Worklog Time Spent: 10m Work Description: virajjasani commented on pull request #3675: URL: https://github.com/apache/hadoop/pull/3675#issuecomment-998059075 Sorry, I could not get to this PR last week. I will review later this week but I don't mean to block this work. If I find something odd or something as an improvement over this, we can anyways get it clarified later on the PR/Jira or create addendum PR later. Thanks for your work @KevinWikant, this might be really helpful going forward. With a quick glance, just one question for now: Overall it seems the goal is to improve and continue the decommissioning of healthy nodes over unhealthy ones (by removing and then re-queueing the entries), hence if few nodes are really in bad state (hardware/network issues), the plan is to keep re-queueing them until more nodes are getting decommissioned than max tracked nodes right? Since unhealthy node getting decommissioned might anyways require some sort of retry, shall we requeue them even if the condition is not met (i.e. total no of decomm in progress < max tracked nodes)? I am just thinking at high level, yet to catch up with the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 698798) Time Spent: 9h 10m (was: 9h) > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Assignee: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 9h 10m > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever without making any > forward progress towards being decommissioned. > h2. Root Cause > The HDFS Namenode class "DatanodeAdminManager" is responsible for > decommissioning datanodes. > As per this "hdfs-site" configuration: > {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes > Default Value = 100 > The maximum number of decommission-in-progress datanodes nodes that will be > tracked at one time by the namenode. Tracking a decommission-in-progress > datanode consumes additional NN memory proportional to the number of blocks > on the datnode. Having a conservative limit reduces the potential impact of > decomissioning a large number of nodes at once. A value of 0 means no limit > will be enforced. > {quote} > The Namenode will only actively track up to 100 datanodes for decommissioning > at any given time, as to avoid Namenode memory pressure. > Looking into the "DatanodeAdminManager" code: > * a new datanode is only removed from the "tracked.nodes" set when it > finishes decommissioning > * a new datanode is only added to the "tracked.nodes" set if there is fewer > than 100 datanodes being tracked > So in the event that there are more than 100 datanodes being decommissioned > at a given time, some of those datanodes will not be in the "tracked.nodes" > set until 1 or more datanodes in the "tracked.nodes" finishes > decommissioning. This is generally not a problem because the datanodes in > "tracked.nodes" will eventually finish decommissioning, but there is an edge > case where this logic prevents the namenode from making any forward progress > towards decommissioning. > If all 100 datanodes in the "tracked.nodes" are unable to finish > decommissioning, then other datanodes (which may be able to be > decommissioned) will never get added to "tracked.nodes" and therefore will > never get the opportunity to be decommissioned. > This can occur due the following issue: > {quote}2021-10-21 12:39:24,048 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager > (DatanodeAdminMonitor-0): Node W.X.Y.Z:50010 is dead while in Decommission In > Progress. Cann
[jira] [Work logged] (HDFS-16368) DFSAdmin supports refresh topology info without restarting namenode
[ https://issues.apache.org/jira/browse/HDFS-16368?focusedWorklogId=698766&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698766 ] ASF GitHub Bot logged work on HDFS-16368: - Author: ASF GitHub Bot Created on: 20/Dec/21 15:19 Start Date: 20/Dec/21 15:19 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3743: URL: https://github.com/apache/hadoop/pull/3743#issuecomment-998017917 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 49s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | buf | 0m 0s | | buf was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 40s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 22m 34s | | trunk passed | | +1 :green_heart: | compile | 5m 32s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 5m 14s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 17s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 19s | | trunk passed | | +1 :green_heart: | javadoc | 2m 35s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 3m 16s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 7m 36s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 23s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 26s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 39s | | the patch passed | | +1 :green_heart: | compile | 5m 32s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | cc | 5m 32s | | the patch passed | | +1 :green_heart: | javac | 5m 32s | | the patch passed | | +1 :green_heart: | compile | 5m 16s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | cc | 5m 16s | | the patch passed | | +1 :green_heart: | javac | 5m 16s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 12s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3743/3/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 8 new + 456 unchanged - 0 fixed = 464 total (was 456) | | +1 :green_heart: | mvnsite | 2m 44s | | the patch passed | | -1 :x: | javadoc | 0m 37s | [/results-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs-client-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3743/3/artifact/out/results-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs-client-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt) | hadoop-hdfs-project_hadoop-hdfs-client-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 1 new + 98 unchanged - 1 fixed = 99 total (was 99) | | -1 :x: | javadoc | 0m 54s | [/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3743/3/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt) | hadoop-hdfs in the patch failed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04. | | +1 :green_heart: | javadoc | 2m 51s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 7m 28s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 43s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 17s | | hadoop-hdfs-client in the patch passed. | | +1 :green_heart: | unit | 228m 52s | | hadoop-hdfs in the patch passed. | | -1 :x: | unit | 21m 10s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs
[jira] [Work logged] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working
[ https://issues.apache.org/jira/browse/HDFS-16386?focusedWorklogId=698757&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698757 ] ASF GitHub Bot logged work on HDFS-16386: - Author: ASF GitHub Bot Created on: 20/Dec/21 15:11 Start Date: 20/Dec/21 15:11 Worklog Time Spent: 10m Work Description: brahmareddybattula commented on pull request #3806: URL: https://github.com/apache/hadoop/pull/3806#issuecomment-998010239 thanks for working on working.. can you guys commit to branch-3.2.3 also..? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 698757) Time Spent: 2h 40m (was: 2.5h) > Reduce DataNode load when FsDatasetAsyncDiskService is working > -- > > Key: HDFS-16386 > URL: https://issues.apache.org/jira/browse/HDFS-16386 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.9.2 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.3 > > Attachments: monitor.png > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it > will cause a high load on the DataNode. > Here are some monitoring related to memory: > !monitor.png! > Since each disk deletes the block asynchronously, and each thread allows 4 > threads to work, > This will cause some troubles to the DataNode, such as increased cpu and > increased memory. > We should appropriately reduce the number of jobs of the total thread so that > the DataNode can work better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-16391?focusedWorklogId=698725&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698725 ] ASF GitHub Bot logged work on HDFS-16391: - Author: ASF GitHub Bot Created on: 20/Dec/21 14:32 Start Date: 20/Dec/21 14:32 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3820: URL: https://github.com/apache/hadoop/pull/3820#issuecomment-997976901 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 43s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 10s | | trunk passed | | +1 :green_heart: | compile | 0m 44s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 0m 38s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 0m 26s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 44s | | trunk passed | | +1 :green_heart: | javadoc | 0m 42s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 0m 54s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 1m 21s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 16s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 38s | | the patch passed | | +1 :green_heart: | compile | 0m 38s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 0m 38s | | the patch passed | | +1 :green_heart: | compile | 0m 32s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 0m 32s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 17s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 34s | | the patch passed | | +1 :green_heart: | javadoc | 0m 36s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 0m 47s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 1m 23s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 35s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 21m 2s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3820/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt) | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 35s | | The patch does not generate ASF License warnings. | | | | 109m 31s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.federation.router.TestRouterRPCMultipleDestinationMountTableResolver | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3820/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3820 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 7ae99fb7e675 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 680ec9a8a2678ccf0947e9b39946592dee47f502 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results |
[jira] [Work logged] (HDFS-16168) TestHDFSFileSystemContract#testAppend fails
[ https://issues.apache.org/jira/browse/HDFS-16168?focusedWorklogId=698685&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698685 ] ASF GitHub Bot logged work on HDFS-16168: - Author: ASF GitHub Bot Created on: 20/Dec/21 13:56 Start Date: 20/Dec/21 13:56 Worklog Time Spent: 10m Work Description: secfree commented on pull request #3815: URL: https://github.com/apache/hadoop/pull/3815#issuecomment-997945343 > @secfree Thanks for your contribution, it looks good, will merge this if no other comments. BTW, as you mentioned in jira, FileSystemContractBaseTest affects all its sub classes. Maybe you can check whether other test cases except the one here are affected and resolve them if they are affected. @ferhui thanks for your suggestion. I checked all sub classes of FileSystemContractBaseTest and found one more case. Here is the details: https://issues.apache.org/jira/browse/HDFS-16392 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 698685) Time Spent: 1h (was: 50m) > TestHDFSFileSystemContract#testAppend fails > --- > > Key: HDFS-16168 > URL: https://issues.apache.org/jira/browse/HDFS-16168 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0 >Reporter: Hui Fei >Assignee: secfree >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > [ERROR] Tests run: 46, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: > 225.478 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestHDFSFileSystemContract > [ERROR] testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time > elapsed: 30.14 s <<< ERROR! org.junit.runners.model.TestTimedOutException: > test timed out after 3 milliseconds at java.lang.Thread.sleep(Native > Method) at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1002) > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:938) > at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:902) > at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at > org.apache.hadoop.hdfs.AppendTestUtil.testAppend(AppendTestUtil.java:257) at > org.apache.hadoop.hdfs.TestHDFSFileSystemContract.testAppend(TestHDFSFileSystemContract.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) [ERROR] > testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time elapsed: > 30.003 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed > out after 3 milliseconds at java.lang.Object.wait(Native Method) at > java.lang.Object.wait(Object.java:502) at > org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at > org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at > org.apache.hadoop.ipc.Client.call(Client.java:1525) at > org.apache.hadoop.ipc.Client.call(Client.java:1422) at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.j
[jira] [Work logged] (HDFS-16392) TestWebHdfsFileSystemContract#testResponseCode fails
[ https://issues.apache.org/jira/browse/HDFS-16392?focusedWorklogId=698683&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698683 ] ASF GitHub Bot logged work on HDFS-16392: - Author: ASF GitHub Bot Created on: 20/Dec/21 13:53 Start Date: 20/Dec/21 13:53 Worklog Time Spent: 10m Work Description: secfree opened a new pull request #3821: URL: https://github.com/apache/hadoop/pull/3821 ### Description of PR 1. Fix random timeout failures of TestWebHdfsFileSystemContract#testResponseCode ### How was this patch tested? 1. UT ### For code changes: - [x] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 698683) Remaining Estimate: 0h Time Spent: 10m > TestWebHdfsFileSystemContract#testResponseCode fails > > > Key: HDFS-16392 > URL: https://issues.apache.org/jira/browse/HDFS-16392 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0 >Reporter: secfree >Assignee: secfree >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We can find a lot of failed cases with searching > "TestWebHdfsFileSystemContract" in "pull requests" > (https://github.com/apache/hadoop/pulls?q=is%3Apr+is%3Aopen+TestWebHdfsFileSystemContract) > And they all have the following exception log > {code} > [ERROR] > testResponseCode(org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract) > Time elapsed: 30.019 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > ... > [ERROR] > org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract.testResponseCode(org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract) > [ERROR] Run 1: TestWebHdfsFileSystemContract.testResponseCode:473 » > TestTimedOut test timed o... > [ERROR] Run 2: TestWebHdfsFileSystemContract.testResponseCode:473 » > TestTimedOut test timed o... > [ERROR] Run 3: TestWebHdfsFileSystemContract.testResponseCode:473 » > TestTimedOut test timed o... > {code} > This issue has the same root cause as HDFS-16168 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16392) TestWebHdfsFileSystemContract#testResponseCode fails
[ https://issues.apache.org/jira/browse/HDFS-16392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16392: -- Labels: pull-request-available (was: ) > TestWebHdfsFileSystemContract#testResponseCode fails > > > Key: HDFS-16392 > URL: https://issues.apache.org/jira/browse/HDFS-16392 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0 >Reporter: secfree >Assignee: secfree >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We can find a lot of failed cases with searching > "TestWebHdfsFileSystemContract" in "pull requests" > (https://github.com/apache/hadoop/pulls?q=is%3Apr+is%3Aopen+TestWebHdfsFileSystemContract) > And they all have the following exception log > {code} > [ERROR] > testResponseCode(org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract) > Time elapsed: 30.019 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > ... > [ERROR] > org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract.testResponseCode(org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract) > [ERROR] Run 1: TestWebHdfsFileSystemContract.testResponseCode:473 » > TestTimedOut test timed o... > [ERROR] Run 2: TestWebHdfsFileSystemContract.testResponseCode:473 » > TestTimedOut test timed o... > [ERROR] Run 3: TestWebHdfsFileSystemContract.testResponseCode:473 » > TestTimedOut test timed o... > {code} > This issue has the same root cause as HDFS-16168 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16392) TestWebHdfsFileSystemContract#testResponseCode fails
secfree created HDFS-16392: -- Summary: TestWebHdfsFileSystemContract#testResponseCode fails Key: HDFS-16392 URL: https://issues.apache.org/jira/browse/HDFS-16392 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.4.0 Reporter: secfree Assignee: secfree We can find a lot of failed cases with searching "TestWebHdfsFileSystemContract" in "pull requests" (https://github.com/apache/hadoop/pulls?q=is%3Apr+is%3Aopen+TestWebHdfsFileSystemContract) And they all have the following exception log {code} [ERROR] testResponseCode(org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract) Time elapsed: 30.019 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 3 milliseconds at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ... [ERROR] org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract.testResponseCode(org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract) [ERROR] Run 1: TestWebHdfsFileSystemContract.testResponseCode:473 » TestTimedOut test timed o... [ERROR] Run 2: TestWebHdfsFileSystemContract.testResponseCode:473 » TestTimedOut test timed o... [ERROR] Run 3: TestWebHdfsFileSystemContract.testResponseCode:473 » TestTimedOut test timed o... {code} This issue has the same root cause as HDFS-16168 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-16391?focusedWorklogId=698637&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698637 ] ASF GitHub Bot logged work on HDFS-16391: - Author: ASF GitHub Bot Created on: 20/Dec/21 12:42 Start Date: 20/Dec/21 12:42 Worklog Time Spent: 10m Work Description: wzhallright commented on a change in pull request #3820: URL: https://github.com/apache/hadoop/pull/3820#discussion_r772333495 ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/NamenodeHeartbeatService.java ## @@ -272,7 +272,7 @@ private void updateState() { } else if (localTarget == null) { // block info available, HA status not expected LOG.debug( - "Reporting non-HA namenode as operational: " + getNamenodeDesc()); + "Reporting non-HA namenode as operational: {}", getNamenodeDesc()); Review comment: Modified, please help review again. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 698637) Time Spent: 50m (was: 40m) > Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService > --- > > Key: HDFS-16391 > URL: https://issues.apache.org/jira/browse/HDFS-16391 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: wangzhaohui >Priority: Trivial > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-16391?focusedWorklogId=698639&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698639 ] ASF GitHub Bot logged work on HDFS-16391: - Author: ASF GitHub Bot Created on: 20/Dec/21 12:45 Start Date: 20/Dec/21 12:45 Worklog Time Spent: 10m Work Description: sodonnel commented on a change in pull request #3820: URL: https://github.com/apache/hadoop/pull/3820#discussion_r772334971 ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/NamenodeHeartbeatService.java ## @@ -272,7 +272,7 @@ private void updateState() { } else if (localTarget == null) { // block info available, HA status not expected LOG.debug( - "Reporting non-HA namenode as operational: " + getNamenodeDesc()); + "Reporting non-HA namenode as operational: {}", getNamenodeDesc()); Review comment: The latest change looks good. Thanks for changing both the log lines. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 698639) Time Spent: 1h (was: 50m) > Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService > --- > > Key: HDFS-16391 > URL: https://issues.apache.org/jira/browse/HDFS-16391 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: wangzhaohui >Priority: Trivial > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-16391?focusedWorklogId=698630&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698630 ] ASF GitHub Bot logged work on HDFS-16391: - Author: ASF GitHub Bot Created on: 20/Dec/21 12:25 Start Date: 20/Dec/21 12:25 Worklog Time Spent: 10m Work Description: sodonnel commented on a change in pull request #3820: URL: https://github.com/apache/hadoop/pull/3820#discussion_r772322657 ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/NamenodeHeartbeatService.java ## @@ -272,7 +272,7 @@ private void updateState() { } else if (localTarget == null) { // block info available, HA status not expected LOG.debug( - "Reporting non-HA namenode as operational: " + getNamenodeDesc()); + "Reporting non-HA namenode as operational: {}", getNamenodeDesc()); Review comment: Note there is another debug message just above this one which looks similar and could be wrapped in a if LOG.isDebugEnabled at 270 for the same reason. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 698630) Time Spent: 40m (was: 0.5h) > Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService > --- > > Key: HDFS-16391 > URL: https://issues.apache.org/jira/browse/HDFS-16391 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: wangzhaohui >Priority: Trivial > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-16391?focusedWorklogId=698627&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698627 ] ASF GitHub Bot logged work on HDFS-16391: - Author: ASF GitHub Bot Created on: 20/Dec/21 12:19 Start Date: 20/Dec/21 12:19 Worklog Time Spent: 10m Work Description: sodonnel commented on a change in pull request #3820: URL: https://github.com/apache/hadoop/pull/3820#discussion_r772319216 ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/NamenodeHeartbeatService.java ## @@ -272,7 +272,7 @@ private void updateState() { } else if (localTarget == null) { // block info available, HA status not expected LOG.debug( - "Reporting non-HA namenode as operational: " + getNamenodeDesc()); + "Reporting non-HA namenode as operational: {}", getNamenodeDesc()); Review comment: I think this statement really needs wrapped in an if `(LOG.isDebugEnabled())`. The reason is that the method call `getNamenodeDesc()` needs to be evaluated whether we log or not, so its result can be passed into the `LOG.debug()` method. Inside `LOG.debug`, it will skip the logging, but by then, we have already formed the string inside `getNamenodeDesc()` and never use it. If we are passing an object into `LOG.debug`, this change would be the correct thing to do, as toString() will only get called on the object if the log message is created. However when we pass a method call into the method, it needs to be evaluated first AFAIK. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 698627) Time Spent: 0.5h (was: 20m) > Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService > --- > > Key: HDFS-16391 > URL: https://issues.apache.org/jira/browse/HDFS-16391 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: wangzhaohui >Priority: Trivial > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16382) RBF: getContentSummary RPC compute sub-directory repeatedly
[ https://issues.apache.org/jira/browse/HDFS-16382?focusedWorklogId=698626&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698626 ] ASF GitHub Bot logged work on HDFS-16382: - Author: ASF GitHub Bot Created on: 20/Dec/21 12:16 Start Date: 20/Dec/21 12:16 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3797: URL: https://github.com/apache/hadoop/pull/3797#issuecomment-997872501 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 50s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 3 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 44s | | trunk passed | | +1 :green_heart: | compile | 0m 43s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 0m 39s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 0m 27s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 44s | | trunk passed | | +1 :green_heart: | javadoc | 0m 43s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 0m 58s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 1m 20s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 10s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 35s | | the patch passed | | +1 :green_heart: | compile | 0m 39s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 0m 39s | | the patch passed | | +1 :green_heart: | compile | 0m 36s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 0m 36s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3797/3/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 0m 35s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3797/3/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt) | hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 33 new + 2 unchanged - 0 fixed = 35 total (was 2) | | +1 :green_heart: | mvnsite | 0m 39s | | the patch passed | | +1 :green_heart: | javadoc | 0m 35s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 0m 53s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 1m 27s | | the patch passed | | +1 :green_heart: | shadedclient | 23m 48s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 25m 5s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3797/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt) | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 35s | | The patch does not generate ASF License warnings. | | | | 119m 31s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.federation.router.TestRouterRPCMultipleDestinationMountTableResolver | | | hadoop.hdfs.rbfbalance.TestRouterDistCpProcedure | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3797/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3797 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 36d591ffdb55 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | m
[jira] [Resolved] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working
[ https://issues.apache.org/jira/browse/HDFS-16386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell resolved HDFS-16386. -- Resolution: Fixed > Reduce DataNode load when FsDatasetAsyncDiskService is working > -- > > Key: HDFS-16386 > URL: https://issues.apache.org/jira/browse/HDFS-16386 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.9.2 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.3 > > Attachments: monitor.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it > will cause a high load on the DataNode. > Here are some monitoring related to memory: > !monitor.png! > Since each disk deletes the block asynchronously, and each thread allows 4 > threads to work, > This will cause some troubles to the DataNode, such as increased cpu and > increased memory. > We should appropriately reduce the number of jobs of the total thread so that > the DataNode can work better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working
[ https://issues.apache.org/jira/browse/HDFS-16386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell updated HDFS-16386: - Fix Version/s: 3.4.0 3.2.4 3.3.3 > Reduce DataNode load when FsDatasetAsyncDiskService is working > -- > > Key: HDFS-16386 > URL: https://issues.apache.org/jira/browse/HDFS-16386 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.9.2 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.3 > > Attachments: monitor.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it > will cause a high load on the DataNode. > Here are some monitoring related to memory: > !monitor.png! > Since each disk deletes the block asynchronously, and each thread allows 4 > threads to work, > This will cause some troubles to the DataNode, such as increased cpu and > increased memory. > We should appropriately reduce the number of jobs of the total thread so that > the DataNode can work better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16389) Improve NNThroughputBenchmark test mkdirs
[ https://issues.apache.org/jira/browse/HDFS-16389?focusedWorklogId=698597&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698597 ] ASF GitHub Bot logged work on HDFS-16389: - Author: ASF GitHub Bot Created on: 20/Dec/21 11:34 Start Date: 20/Dec/21 11:34 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3819: URL: https://github.com/apache/hadoop/pull/3819#issuecomment-997844668 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 40s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 35s | | trunk passed | | +1 :green_heart: | compile | 1m 28s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 21s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 2s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 27s | | trunk passed | | +1 :green_heart: | javadoc | 1m 2s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 35s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 11s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 27s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 15s | | the patch passed | | +1 :green_heart: | compile | 1m 17s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 17s | | the patch passed | | +1 :green_heart: | compile | 1m 12s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 12s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 52s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 17s | | the patch passed | | +1 :green_heart: | javadoc | 0m 50s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 24s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 10s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 3s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 229m 46s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 43s | | The patch does not generate ASF License warnings. | | | | 328m 38s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3819/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3819 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 1e61f3198ab1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 3229263b630a4801cd4f78e2ad5c4cb0fe4654be | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3819/1/testReport/ | | Max. process+thread count | 3323 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3819/1/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org | This mes
[jira] [Work logged] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working
[ https://issues.apache.org/jira/browse/HDFS-16386?focusedWorklogId=698594&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698594 ] ASF GitHub Bot logged work on HDFS-16386: - Author: ASF GitHub Bot Created on: 20/Dec/21 11:29 Start Date: 20/Dec/21 11:29 Worklog Time Spent: 10m Work Description: sodonnel merged pull request #3806: URL: https://github.com/apache/hadoop/pull/3806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 698594) Time Spent: 2.5h (was: 2h 20m) > Reduce DataNode load when FsDatasetAsyncDiskService is working > -- > > Key: HDFS-16386 > URL: https://issues.apache.org/jira/browse/HDFS-16386 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.9.2 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Attachments: monitor.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it > will cause a high load on the DataNode. > Here are some monitoring related to memory: > !monitor.png! > Since each disk deletes the block asynchronously, and each thread allows 4 > threads to work, > This will cause some troubles to the DataNode, such as increased cpu and > increased memory. > We should appropriately reduce the number of jobs of the total thread so that > the DataNode can work better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16390) Enhance ErasureCodeBenchmarkThroughput for support random read and make buffer size customizable
[ https://issues.apache.org/jira/browse/HDFS-16390?focusedWorklogId=698586&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698586 ] ASF GitHub Bot logged work on HDFS-16390: - Author: ASF GitHub Bot Created on: 20/Dec/21 11:14 Start Date: 20/Dec/21 11:14 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3818: URL: https://github.com/apache/hadoop/pull/3818#issuecomment-997829494 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 43s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 1s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 22s | | trunk passed | | +1 :green_heart: | compile | 1m 26s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 21s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 2s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 30s | | trunk passed | | +1 :green_heart: | javadoc | 1m 2s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 34s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 10s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 18s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 17s | | the patch passed | | +1 :green_heart: | compile | 1m 16s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 16s | | the patch passed | | +1 :green_heart: | compile | 1m 11s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 11s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 51s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3818/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 20 unchanged - 0 fixed = 22 total (was 20) | | +1 :green_heart: | mvnsite | 1m 18s | | the patch passed | | +1 :green_heart: | javadoc | 0m 49s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 23s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 14s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 3s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 226m 21s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 44s | | The patch does not generate ASF License warnings. | | | | 324m 57s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3818/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3818 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux d2952bfabf37 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 10732107c8220d5e2befdf9c7cf9b290d7fc9201 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3818/1/testReport/ | | Max. process+thread count | 3206 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hado
[jira] [Commented] (HDFS-16168) TestHDFSFileSystemContract#testAppend fails
[ https://issues.apache.org/jira/browse/HDFS-16168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462495#comment-17462495 ] Hui Fei commented on HDFS-16168: Merged. [~secfree.teng]Thanks for your contribution. > TestHDFSFileSystemContract#testAppend fails > --- > > Key: HDFS-16168 > URL: https://issues.apache.org/jira/browse/HDFS-16168 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0 >Reporter: Hui Fei >Assignee: secfree.teng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > [ERROR] Tests run: 46, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: > 225.478 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestHDFSFileSystemContract > [ERROR] testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time > elapsed: 30.14 s <<< ERROR! org.junit.runners.model.TestTimedOutException: > test timed out after 3 milliseconds at java.lang.Thread.sleep(Native > Method) at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1002) > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:938) > at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:902) > at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at > org.apache.hadoop.hdfs.AppendTestUtil.testAppend(AppendTestUtil.java:257) at > org.apache.hadoop.hdfs.TestHDFSFileSystemContract.testAppend(TestHDFSFileSystemContract.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) [ERROR] > testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time elapsed: > 30.003 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed > out after 3 milliseconds at java.lang.Object.wait(Native Method) at > java.lang.Object.wait(Object.java:502) at > org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at > org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at > org.apache.hadoop.ipc.Client.call(Client.java:1525) at > org.apache.hadoop.ipc.Client.call(Client.java:1422) at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) > at com.sun.proxy.$Proxy25.append(Unknown Source) at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:415) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) > at com.sun.proxy.$Proxy26.append(Unknown Source) at > org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1385) at > org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1407) at > org.apache.hadoop.hdfs.DFSClient.append(DFSClient.jav
[jira] [Commented] (HDFS-16368) DFSAdmin supports refresh topology info without restarting namenode
[ https://issues.apache.org/jira/browse/HDFS-16368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462496#comment-17462496 ] Hui Fei commented on HDFS-16368: It is the same to HDFS-11242, yes? > DFSAdmin supports refresh topology info without restarting namenode > > > Key: HDFS-16368 > URL: https://issues.apache.org/jira/browse/HDFS-16368 > Project: Hadoop HDFS > Issue Type: New Feature > Components: dfsadmin, namanode >Affects Versions: 2.7.7, 3.3.1 >Reporter: zhanghaobo >Priority: Major > Labels: features, pull-request-available > Attachments: 0001.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently in HDFS, if we update the rack info for rack-awareness, we may need > to rolling restart namenodes to let it be effective. If cluster is large, the > cost time of rolling restart namenodes is very log. So, we develope a method > to refresh topology info without rolling restart namenodes. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16168) TestHDFSFileSystemContract#testAppend fails
[ https://issues.apache.org/jira/browse/HDFS-16168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui Fei resolved HDFS-16168. Fix Version/s: 3.4.0 Resolution: Fixed > TestHDFSFileSystemContract#testAppend fails > --- > > Key: HDFS-16168 > URL: https://issues.apache.org/jira/browse/HDFS-16168 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0 >Reporter: Hui Fei >Assignee: secfree.teng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > [ERROR] Tests run: 46, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: > 225.478 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestHDFSFileSystemContract > [ERROR] testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time > elapsed: 30.14 s <<< ERROR! org.junit.runners.model.TestTimedOutException: > test timed out after 3 milliseconds at java.lang.Thread.sleep(Native > Method) at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1002) > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:938) > at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:902) > at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at > org.apache.hadoop.hdfs.AppendTestUtil.testAppend(AppendTestUtil.java:257) at > org.apache.hadoop.hdfs.TestHDFSFileSystemContract.testAppend(TestHDFSFileSystemContract.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) [ERROR] > testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time elapsed: > 30.003 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed > out after 3 milliseconds at java.lang.Object.wait(Native Method) at > java.lang.Object.wait(Object.java:502) at > org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at > org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at > org.apache.hadoop.ipc.Client.call(Client.java:1525) at > org.apache.hadoop.ipc.Client.call(Client.java:1422) at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) > at com.sun.proxy.$Proxy25.append(Unknown Source) at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:415) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) > at com.sun.proxy.$Proxy26.append(Unknown Source) at > org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1385) at > org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1407) at > org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1476) at > org.apache.hadoop.hdfs.DFSClient.append(DFSCl
[jira] [Assigned] (HDFS-16168) TestHDFSFileSystemContract#testAppend fails
[ https://issues.apache.org/jira/browse/HDFS-16168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui Fei reassigned HDFS-16168: -- Assignee: secfree.teng > TestHDFSFileSystemContract#testAppend fails > --- > > Key: HDFS-16168 > URL: https://issues.apache.org/jira/browse/HDFS-16168 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0 >Reporter: Hui Fei >Assignee: secfree.teng >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > [ERROR] Tests run: 46, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: > 225.478 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestHDFSFileSystemContract > [ERROR] testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time > elapsed: 30.14 s <<< ERROR! org.junit.runners.model.TestTimedOutException: > test timed out after 3 milliseconds at java.lang.Thread.sleep(Native > Method) at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1002) > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:938) > at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:902) > at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at > org.apache.hadoop.hdfs.AppendTestUtil.testAppend(AppendTestUtil.java:257) at > org.apache.hadoop.hdfs.TestHDFSFileSystemContract.testAppend(TestHDFSFileSystemContract.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) [ERROR] > testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time elapsed: > 30.003 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed > out after 3 milliseconds at java.lang.Object.wait(Native Method) at > java.lang.Object.wait(Object.java:502) at > org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at > org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at > org.apache.hadoop.ipc.Client.call(Client.java:1525) at > org.apache.hadoop.ipc.Client.call(Client.java:1422) at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) > at com.sun.proxy.$Proxy25.append(Unknown Source) at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:415) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) > at com.sun.proxy.$Proxy26.append(Unknown Source) at > org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1385) at > org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1407) at > org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1476) at > org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1446) at > org.apache.hadoop.hdfs.Dist
[jira] [Work logged] (HDFS-16168) TestHDFSFileSystemContract#testAppend fails
[ https://issues.apache.org/jira/browse/HDFS-16168?focusedWorklogId=698542&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698542 ] ASF GitHub Bot logged work on HDFS-16168: - Author: ASF GitHub Bot Created on: 20/Dec/21 10:16 Start Date: 20/Dec/21 10:16 Worklog Time Spent: 10m Work Description: ferhui commented on pull request #3815: URL: https://github.com/apache/hadoop/pull/3815#issuecomment-997786853 @secfree Thanks for your contribution. @ayushtkn Thanks for your review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 698542) Time Spent: 50m (was: 40m) > TestHDFSFileSystemContract#testAppend fails > --- > > Key: HDFS-16168 > URL: https://issues.apache.org/jira/browse/HDFS-16168 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0 >Reporter: Hui Fei >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > [ERROR] Tests run: 46, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: > 225.478 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestHDFSFileSystemContract > [ERROR] testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time > elapsed: 30.14 s <<< ERROR! org.junit.runners.model.TestTimedOutException: > test timed out after 3 milliseconds at java.lang.Thread.sleep(Native > Method) at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1002) > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:938) > at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:902) > at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at > org.apache.hadoop.hdfs.AppendTestUtil.testAppend(AppendTestUtil.java:257) at > org.apache.hadoop.hdfs.TestHDFSFileSystemContract.testAppend(TestHDFSFileSystemContract.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) [ERROR] > testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time elapsed: > 30.003 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed > out after 3 milliseconds at java.lang.Object.wait(Native Method) at > java.lang.Object.wait(Object.java:502) at > org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at > org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at > org.apache.hadoop.ipc.Client.call(Client.java:1525) at > org.apache.hadoop.ipc.Client.call(Client.java:1422) at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) > at com.sun.proxy.$Proxy25.append(Unknown Source) at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:415) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocation
[jira] [Work logged] (HDFS-16168) TestHDFSFileSystemContract#testAppend fails
[ https://issues.apache.org/jira/browse/HDFS-16168?focusedWorklogId=698541&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698541 ] ASF GitHub Bot logged work on HDFS-16168: - Author: ASF GitHub Bot Created on: 20/Dec/21 10:16 Start Date: 20/Dec/21 10:16 Worklog Time Spent: 10m Work Description: ferhui merged pull request #3815: URL: https://github.com/apache/hadoop/pull/3815 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 698541) Time Spent: 40m (was: 0.5h) > TestHDFSFileSystemContract#testAppend fails > --- > > Key: HDFS-16168 > URL: https://issues.apache.org/jira/browse/HDFS-16168 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0 >Reporter: Hui Fei >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > [ERROR] Tests run: 46, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: > 225.478 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestHDFSFileSystemContract > [ERROR] testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time > elapsed: 30.14 s <<< ERROR! org.junit.runners.model.TestTimedOutException: > test timed out after 3 milliseconds at java.lang.Thread.sleep(Native > Method) at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1002) > at > org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:938) > at org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:902) > at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850) at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at > org.apache.hadoop.hdfs.AppendTestUtil.testAppend(AppendTestUtil.java:257) at > org.apache.hadoop.hdfs.TestHDFSFileSystemContract.testAppend(TestHDFSFileSystemContract.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) [ERROR] > testAppend(org.apache.hadoop.hdfs.TestHDFSFileSystemContract) Time elapsed: > 30.003 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed > out after 3 milliseconds at java.lang.Object.wait(Native Method) at > java.lang.Object.wait(Object.java:502) at > org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at > org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) at > org.apache.hadoop.ipc.Client.call(Client.java:1525) at > org.apache.hadoop.ipc.Client.call(Client.java:1422) at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) > at com.sun.proxy.$Proxy25.append(Unknown Source) at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:415) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(Retry
[jira] [Work logged] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working
[ https://issues.apache.org/jira/browse/HDFS-16386?focusedWorklogId=698528&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698528 ] ASF GitHub Bot logged work on HDFS-16386: - Author: ASF GitHub Bot Created on: 20/Dec/21 10:01 Start Date: 20/Dec/21 10:01 Worklog Time Spent: 10m Work Description: jojochuang commented on pull request #3806: URL: https://github.com/apache/hadoop/pull/3806#issuecomment-997773306 LGTM. For what is worth, we don't need two committers to approve a PR :) Stephen alone is a gold standard. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 698528) Time Spent: 2h 20m (was: 2h 10m) > Reduce DataNode load when FsDatasetAsyncDiskService is working > -- > > Key: HDFS-16386 > URL: https://issues.apache.org/jira/browse/HDFS-16386 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.9.2 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Attachments: monitor.png > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it > will cause a high load on the DataNode. > Here are some monitoring related to memory: > !monitor.png! > Since each disk deletes the block asynchronously, and each thread allows 4 > threads to work, > This will cause some troubles to the DataNode, such as increased cpu and > increased memory. > We should appropriately reduce the number of jobs of the total thread so that > the DataNode can work better. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16382) RBF: getContentSummary RPC compute sub-directory repeatedly
[ https://issues.apache.org/jira/browse/HDFS-16382?focusedWorklogId=698527&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698527 ] ASF GitHub Bot logged work on HDFS-16382: - Author: ASF GitHub Bot Created on: 20/Dec/21 09:58 Start Date: 20/Dec/21 09:58 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3797: URL: https://github.com/apache/hadoop/pull/3797#issuecomment-997770842 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 37s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 3 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 19s | | trunk passed | | +1 :green_heart: | compile | 0m 42s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 0m 39s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 0m 29s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 46s | | trunk passed | | +1 :green_heart: | javadoc | 0m 38s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 2s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 1m 24s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 6s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 37s | | the patch passed | | +1 :green_heart: | compile | 0m 38s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 0m 38s | | the patch passed | | +1 :green_heart: | compile | 0m 32s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 0m 32s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3797/2/artifact/out/blanks-eol.txt) | The patch has 11 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 0m 18s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3797/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt) | hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 36 new + 2 unchanged - 0 fixed = 38 total (was 2) | | +1 :green_heart: | mvnsite | 0m 35s | | the patch passed | | +1 :green_heart: | javadoc | 0m 33s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 0m 52s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 1m 20s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 58s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 21m 32s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3797/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt) | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 32s | | The patch does not generate ASF License warnings. | | | | 109m 37s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.federation.router.TestRouterRPCMultipleDestinationMountTableResolver | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3797/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3797 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 266c55444ff7 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | |
[jira] [Commented] (HDFS-16382) RBF: getContentSummary RPC compute sub-directory repeatedly
[ https://issues.apache.org/jira/browse/HDFS-16382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17462447#comment-17462447 ] zhanghaobo commented on HDFS-16382: --- Hi, [~ayushtkn] [~elgoiri] thanks for replying. I think in common case, we won't do things like this :/A/B to ns1 /A. But we may do things like this : /A/B to ns1 /A/B. In other words, users prefer to making src path and dest path looks like similar. > RBF: getContentSummary RPC compute sub-directory repeatedly > --- > > Key: HDFS-16382 > URL: https://issues.apache.org/jira/browse/HDFS-16382 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.3.1 >Reporter: zhanghaobo >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HDFS-16382.-RBF-getContentSummary-RPC-compute-sub-di.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Router getContentSummary rpc compute sub-directory repeatedly when a > direactory and its ancestor directory are both mounted in the form of > original src path. > For example, suppose we have mount table entries below: > /A---ns1---/A > /A/B—ns1,ns2—/A/B > we put a file test.txt to directory /A/B in namepsace ns1, then execute `hdfs > dfs -count hdfs://router:/A`, the result is wrong, because we compute > /A/B/test.txt repeatedly -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-16391?focusedWorklogId=698502&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698502 ] ASF GitHub Bot logged work on HDFS-16391: - Author: ASF GitHub Bot Created on: 20/Dec/21 08:20 Start Date: 20/Dec/21 08:20 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3820: URL: https://github.com/apache/hadoop/pull/3820#issuecomment-997697220 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 52s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 36m 16s | | trunk passed | | +1 :green_heart: | compile | 0m 42s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 0m 35s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 0m 24s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 40s | | trunk passed | | +1 :green_heart: | javadoc | 0m 42s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 0m 53s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 1m 19s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 22s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 34s | | the patch passed | | +1 :green_heart: | compile | 0m 35s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 0m 35s | | the patch passed | | +1 :green_heart: | compile | 0m 30s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 0m 30s | | the patch passed | | +1 :green_heart: | blanks | 0m 1s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 16s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 34s | | the patch passed | | +1 :green_heart: | javadoc | 0m 31s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 0m 49s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 1m 21s | | the patch passed | | +1 :green_heart: | shadedclient | 23m 33s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 37m 29s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3820/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt) | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 32s | | The patch does not generate ASF License warnings. | | | | 133m 57s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.federation.router.TestRouterRPCMultipleDestinationMountTableResolver | | | hadoop.hdfs.rbfbalance.TestRouterDistCpProcedure | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3820/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3820 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 05506f034149 4.15.0-162-generic #170-Ubuntu SMP Mon Oct 18 11:38:05 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / f5b7a1801176c695a249801d079ac558e2dea453 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.
[jira] [Updated] (HDFS-16382) RBF: getContentSummary RPC compute sub-directory repeatedly
[ https://issues.apache.org/jira/browse/HDFS-16382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhanghaobo updated HDFS-16382: -- Attachment: 0001-HDFS-16382.-RBF-getContentSummary-RPC-compute-sub-di.patch > RBF: getContentSummary RPC compute sub-directory repeatedly > --- > > Key: HDFS-16382 > URL: https://issues.apache.org/jira/browse/HDFS-16382 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.3.1 >Reporter: zhanghaobo >Priority: Major > Labels: pull-request-available > Attachments: > 0001-HDFS-16382.-RBF-getContentSummary-RPC-compute-sub-di.patch > > Time Spent: 50m > Remaining Estimate: 0h > > Router getContentSummary rpc compute sub-directory repeatedly when a > direactory and its ancestor directory are both mounted in the form of > original src path. > For example, suppose we have mount table entries below: > /A---ns1---/A > /A/B—ns1,ns2—/A/B > we put a file test.txt to directory /A/B in namepsace ns1, then execute `hdfs > dfs -count hdfs://router:/A`, the result is wrong, because we compute > /A/B/test.txt repeatedly -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16382) RBF: getContentSummary RPC compute sub-directory repeatedly
[ https://issues.apache.org/jira/browse/HDFS-16382?focusedWorklogId=698494&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698494 ] ASF GitHub Bot logged work on HDFS-16382: - Author: ASF GitHub Bot Created on: 20/Dec/21 08:10 Start Date: 20/Dec/21 08:10 Worklog Time Spent: 10m Work Description: hfutatzhanghb commented on pull request #3797: URL: https://github.com/apache/hadoop/pull/3797#issuecomment-997689445 add some unit tests of this patch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 698494) Time Spent: 50m (was: 40m) > RBF: getContentSummary RPC compute sub-directory repeatedly > --- > > Key: HDFS-16382 > URL: https://issues.apache.org/jira/browse/HDFS-16382 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.3.1 >Reporter: zhanghaobo >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Router getContentSummary rpc compute sub-directory repeatedly when a > direactory and its ancestor directory are both mounted in the form of > original src path. > For example, suppose we have mount table entries below: > /A---ns1---/A > /A/B—ns1,ns2—/A/B > we put a file test.txt to directory /A/B in namepsace ns1, then execute `hdfs > dfs -count hdfs://router:/A`, the result is wrong, because we compute > /A/B/test.txt repeatedly -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org