[jira] [Work logged] (HDFS-16446) Consider ioutils of disk when choosing volume
[ https://issues.apache.org/jira/browse/HDFS-16446?focusedWorklogId=746947=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746947 ] ASF GitHub Bot logged work on HDFS-16446: - Author: ASF GitHub Bot Created on: 24/Mar/22 04:14 Start Date: 24/Mar/22 04:14 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3960: URL: https://github.com/apache/hadoop/pull/3960#issuecomment-1077050201 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 0s | | Docker mode activated. | | -1 :x: | docker | 14m 54s | | Docker failed to build yetus/hadoop:13467f45240. | | Subsystem | Report/Notes | |--:|:-| | GITHUB PR | https://github.com/apache/hadoop/pull/3960 | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3960/4/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746947) Time Spent: 1h 50m (was: 1h 40m) > Consider ioutils of disk when choosing volume > - > > Key: HDFS-16446 > URL: https://issues.apache.org/jira/browse/HDFS-16446 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Attachments: image-2022-02-05-09-50-12-241.png > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Consider ioutils of disk when choosing volume. > Principle is as follows: > !image-2022-02-05-09-50-12-241.png|width=309,height=159! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16446) Consider ioutils of disk when choosing volume
[ https://issues.apache.org/jira/browse/HDFS-16446?focusedWorklogId=746946=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746946 ] ASF GitHub Bot logged work on HDFS-16446: - Author: ASF GitHub Bot Created on: 24/Mar/22 04:08 Start Date: 24/Mar/22 04:08 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3960: URL: https://github.com/apache/hadoop/pull/3960#issuecomment-1077047322 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 0s | | Docker mode activated. | | -1 :x: | docker | 19m 36s | | Docker failed to build yetus/hadoop:13467f45240. | | Subsystem | Report/Notes | |--:|:-| | GITHUB PR | https://github.com/apache/hadoop/pull/3960 | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3960/3/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746946) Time Spent: 1h 40m (was: 1.5h) > Consider ioutils of disk when choosing volume > - > > Key: HDFS-16446 > URL: https://issues.apache.org/jira/browse/HDFS-16446 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Attachments: image-2022-02-05-09-50-12-241.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Consider ioutils of disk when choosing volume. > Principle is as follows: > !image-2022-02-05-09-50-12-241.png|width=309,height=159! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16519) Add throttler to EC reconstruction
[ https://issues.apache.org/jira/browse/HDFS-16519?focusedWorklogId=746941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746941 ] ASF GitHub Bot logged work on HDFS-16519: - Author: ASF GitHub Bot Created on: 24/Mar/22 03:51 Start Date: 24/Mar/22 03:51 Worklog Time Spent: 10m Work Description: cndaimin opened a new pull request #4101: URL: https://github.com/apache/hadoop/pull/4101 HDFS already have throttlers for data transfer(replication) and balancer, the throttlers reduce the impact of these background procedures to user read/write. We should add a throttler to EC background reconstruction too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746941) Remaining Estimate: 0h Time Spent: 10m > Add throttler to EC reconstruction > -- > > Key: HDFS-16519 > URL: https://issues.apache.org/jira/browse/HDFS-16519 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, ec >Affects Versions: 3.3.1, 3.3.2 >Reporter: daimin >Assignee: daimin >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > HDFS already have throttlers for data transfer(replication) and balancer, the > throttlers reduce the impact of these background procedures to user > read/write. > We should add a throttler to EC background reconstruction too. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16519) Add throttler to EC reconstruction
[ https://issues.apache.org/jira/browse/HDFS-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16519: -- Labels: pull-request-available (was: ) > Add throttler to EC reconstruction > -- > > Key: HDFS-16519 > URL: https://issues.apache.org/jira/browse/HDFS-16519 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, ec >Affects Versions: 3.3.1, 3.3.2 >Reporter: daimin >Assignee: daimin >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HDFS already have throttlers for data transfer(replication) and balancer, the > throttlers reduce the impact of these background procedures to user > read/write. > We should add a throttler to EC background reconstruction too. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16519) Add throttler to EC reconstruction
daimin created HDFS-16519: - Summary: Add throttler to EC reconstruction Key: HDFS-16519 URL: https://issues.apache.org/jira/browse/HDFS-16519 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, ec Affects Versions: 3.3.2, 3.3.1 Reporter: daimin Assignee: daimin HDFS already have throttlers for data transfer(replication) and balancer, the throttlers reduce the impact of these background procedures to user read/write. We should add a throttler to EC background reconstruction too. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
[ https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511575#comment-17511575 ] tomscut commented on HDFS-13671: Hi [~max2049] , we are still using CMS on a cluster without EC data, some parameter adjustment should be able to solve this problem. And how long is your FBR period? If it is 6 hours(default) and the cluster size is large, it may have an impact on GC. We set this to 3 days. We use G1GC on a cluster with this feature that uses EC data. The main parameters(open JDK 1.8) are as follows: {code:java} -server -Xmx200g -Xms200g -XX:MaxDirectMemorySize=2g -XX:MaxMetaspaceSize=2g -XX:MetaspaceSize=1g -XX:+UseG1GC -XX:+UnlockExperimentalVMOptions -XX:InitiatingHeapOccupancyPercent=75 -XX:G1NewSizePercent=0 -XX:G1MaxNewSizePercent=3 -XX:SurvivorRatio=2 -XX:+DisableExplicitGC -XX:MaxTenuringThreshold=15 -XX:-UseBiasedLocking -XX:ParallelGCThreads=40 -XX:ConcGCThreads=20 -XX:MaxJavaStackTraceDepth=100 -XX:MaxGCPauseMillis=200 -verbose:gc -XX:+UnlockDiagnosticVMOptions -XX:+PrintGCDetails -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCCause -XX:+PrintGCDateStamps -XX:+PrintReferenceGC -XX:+PrintHeapAtGC -XX:+PrintAdaptiveSizePolicy -XX:+G1PrintHeapRegions -XX:+PrintTenuringDistribution -Xloggc:/data1/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'`" {code} > Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet > -- > > Key: HDFS-13671 > URL: https://issues.apache.org/jira/browse/HDFS-13671 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0, 3.0.3 >Reporter: Yiqun Lin >Assignee: Haibin Huang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, > image-2021-06-10-19-28-58-359.png, image-2021-06-18-15-46-46-052.png, > image-2021-06-18-15-47-04-037.png > > Time Spent: 7h 40m > Remaining Estimate: 0h > > NameNode hung when deleting large files/blocks. The stack info: > {code} > "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 > tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > {code} > In the current deletion logic in NameNode, there are mainly two steps: > * Collect INodes and all blocks to be deleted, then delete INodes. > * Remove blocks chunk by chunk in a loop. > Actually the first step should be a more expensive operation and will takes > more time. However, now we always see NN hangs during the remove block > operation. > Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a > better performance in dealing FBR/IBRs. But compared with early > implementation in remove-block logic, {{FoldedTreeSet}} seems more slower > since It will take additional time to balance tree
[jira] [Commented] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
[ https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511573#comment-17511573 ] Max Xie commented on HDFS-13671: - [~tomscut] Hi , Would you share the parameters of G1 GC for Namenode? After our cluster use branch 3.3.0 with this patch, without EC data now, GC performance become poor. Thank you. > Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet > -- > > Key: HDFS-13671 > URL: https://issues.apache.org/jira/browse/HDFS-13671 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0, 3.0.3 >Reporter: Yiqun Lin >Assignee: Haibin Huang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, > image-2021-06-10-19-28-58-359.png, image-2021-06-18-15-46-46-052.png, > image-2021-06-18-15-47-04-037.png > > Time Spent: 7h 40m > Remaining Estimate: 0h > > NameNode hung when deleting large files/blocks. The stack info: > {code} > "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 > tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > {code} > In the current deletion logic in NameNode, there are mainly two steps: > * Collect INodes and all blocks to be deleted, then delete INodes. > * Remove blocks chunk by chunk in a loop. > Actually the first step should be a more expensive operation and will takes > more time. However, now we always see NN hangs during the remove block > operation. > Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a > better performance in dealing FBR/IBRs. But compared with early > implementation in remove-block logic, {{FoldedTreeSet}} seems more slower > since It will take additional time to balance tree node. When there are large > block to be removed/deleted, it looks bad. > For the get type operations in {{DatanodeStorageInfo}}, we only provide the > {{getBlockIterator}} to return blocks iterator and no other get operation > with specified block. Still we need to use {{FoldedTreeSet}} in > {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not > Update. Maybe we can revert this to the early implementation. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16498) Fix NPE for checkBlockReportLease
[ https://issues.apache.org/jira/browse/HDFS-16498?focusedWorklogId=746935=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746935 ] ASF GitHub Bot logged work on HDFS-16498: - Author: ASF GitHub Bot Created on: 24/Mar/22 03:24 Start Date: 24/Mar/22 03:24 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #4057: URL: https://github.com/apache/hadoop/pull/4057#issuecomment-1077029230 Hi @Hexiaoqiao @tasanuma @ferhui , could you also please review this? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746935) Time Spent: 3h 20m (was: 3h 10m) > Fix NPE for checkBlockReportLease > - > > Key: HDFS-16498 > URL: https://issues.apache.org/jira/browse/HDFS-16498 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Attachments: image-2022-03-09-20-35-22-028.png, screenshot-1.png > > Time Spent: 3h 20m > Remaining Estimate: 0h > > During the restart of Namenode, a Datanode is not registered, but this > Datanode triggers FBR, which causes NPE. > !image-2022-03-09-20-35-22-028.png|width=871,height=158! -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16518) Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager
[ https://issues.apache.org/jira/browse/HDFS-16518?focusedWorklogId=746929=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746929 ] ASF GitHub Bot logged work on HDFS-16518: - Author: ASF GitHub Bot Created on: 24/Mar/22 02:39 Start Date: 24/Mar/22 02:39 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #4100: URL: https://github.com/apache/hadoop/pull/4100#issuecomment-1077009230 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 50s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 19s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 25m 42s | | trunk passed | | +1 :green_heart: | compile | 6m 34s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 6m 13s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 22s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 47s | | trunk passed | | +1 :green_heart: | javadoc | 1m 53s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 2m 22s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 44s | | trunk passed | | +1 :green_heart: | shadedclient | 26m 57s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 23s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 8s | | the patch passed | | +1 :green_heart: | compile | 6m 27s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 6m 27s | | the patch passed | | +1 :green_heart: | compile | 6m 2s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 6m 2s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 7s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4100/1/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 2 new + 43 unchanged - 0 fixed = 45 total (was 43) | | +1 :green_heart: | mvnsite | 2m 14s | | the patch passed | | +1 :green_heart: | javadoc | 1m 29s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 2m 4s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 22s | | the patch passed | | +1 :green_heart: | shadedclient | 26m 3s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 17s | | hadoop-hdfs-client in the patch passed. | | +1 :green_heart: | unit | 335m 49s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 38s | | The patch does not generate ASF License warnings. | | | | 483m 40s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4100/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4100 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 8113c28b4d11 4.15.0-163-generic #171-Ubuntu SMP Fri Nov 5 11:55:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 6b9414bc8efd322aaac25eea6cb5598c53db7b5d | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | |
[jira] [Updated] (HDFS-16484) [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread
[ https://issues.apache.org/jira/browse/HDFS-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qinyuren updated HDFS-16484: Description: Currently, we ran SPS in our cluster and found this log. The SPSPathIdProcessor thread enters an infinite loop and prints the same log all the time. !image-2022-02-25-14-35-42-255.png|width=682,height=195! In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, then the SPSPathIdProcessor thread entry infinite loop and can't work normally. The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not exist. The inodeId will not be set to null, causing the thread hold this inodeId forever. {code:java} public void run() { LOG.info("Starting SPSPathIdProcessor!."); Long startINode = null; while (ctxt.isRunning()) { try { if (!ctxt.isInSafeMode()) { if (startINode == null) { startINode = ctxt.getNextSPSPath(); } // else same id will be retried if (startINode == null) { // Waiting for SPS path Thread.sleep(3000); } else { ctxt.scanAndCollectFiles(startINode); // check if directory was empty and no child added to queue DirPendingWorkInfo dirPendingWorkInfo = pendingWorkForDirectory.get(startINode); if (dirPendingWorkInfo != null && dirPendingWorkInfo.isDirWorkDone()) { ctxt.removeSPSHint(startINode); pendingWorkForDirectory.remove(startINode); } } startINode = null; // Current inode successfully scanned. } } catch (Throwable t) { String reClass = t.getClass().getName(); if (InterruptedException.class.getName().equals(reClass)) { LOG.info("SPSPathIdProcessor thread is interrupted. Stopping.."); break; } LOG.warn("Exception while scanning file inodes to satisfy the policy", t); try { Thread.sleep(3000); } catch (InterruptedException e) { LOG.info("Interrupted while waiting in SPSPathIdProcessor", t); break; } } } } {code} was: In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, then the SPSPathIdProcessor thread entry infinite loop and can't work normally. !image-2022-02-25-14-35-42-255.png|width=682,height=195! > [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread > - > > Key: HDFS-16484 > URL: https://issues.apache.org/jira/browse/HDFS-16484 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: qinyuren >Assignee: qinyuren >Priority: Major > Labels: pull-request-available > Attachments: image-2022-02-25-14-35-42-255.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, we ran SPS in our cluster and found this log. The > SPSPathIdProcessor thread enters an infinite loop and prints the same log all > the time. > !image-2022-02-25-14-35-42-255.png|width=682,height=195! > In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, > then the SPSPathIdProcessor thread entry infinite loop and can't work > normally. > The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not > exist. The inodeId will not be set to null, causing the thread hold this > inodeId forever. > {code:java} > public void run() { > LOG.info("Starting SPSPathIdProcessor!."); > Long startINode = null; > while (ctxt.isRunning()) { > try { > if (!ctxt.isInSafeMode()) { > if (startINode == null) { > startINode = ctxt.getNextSPSPath(); > } // else same id will be retried > if (startINode == null) { > // Waiting for SPS path > Thread.sleep(3000); > } else { > ctxt.scanAndCollectFiles(startINode); > // check if directory was empty and no child added to queue > DirPendingWorkInfo dirPendingWorkInfo = > pendingWorkForDirectory.get(startINode); > if (dirPendingWorkInfo != null > && dirPendingWorkInfo.isDirWorkDone()) { > ctxt.removeSPSHint(startINode); > pendingWorkForDirectory.remove(startINode); > } > } > startINode = null; // Current inode successfully scanned. > } > } catch (Throwable t) { > String reClass = t.getClass().getName(); > if (InterruptedException.class.getName().equals(reClass)) { > LOG.info("SPSPathIdProcessor thread is interrupted. Stopping.."); > break; > } > LOG.warn("Exception while scanning file inodes to satisfy the policy", > t); > try { > Thread.sleep(3000); > } catch (InterruptedException e) { > LOG.info("Interrupted while waiting
[jira] [Work logged] (HDFS-16518) Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager
[ https://issues.apache.org/jira/browse/HDFS-16518?focusedWorklogId=746923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746923 ] ASF GitHub Bot logged work on HDFS-16518: - Author: ASF GitHub Bot Created on: 24/Mar/22 02:10 Start Date: 24/Mar/22 02:10 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #4100: URL: https://github.com/apache/hadoop/pull/4100#issuecomment-1076995582 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 44s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 32s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 25m 30s | | trunk passed | | +1 :green_heart: | compile | 6m 23s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 5m 54s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 11s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 21s | | trunk passed | | +1 :green_heart: | javadoc | 1m 44s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 2m 14s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 5m 45s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 55s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 27s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 12s | | the patch passed | | +1 :green_heart: | compile | 6m 17s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 6m 17s | | the patch passed | | +1 :green_heart: | compile | 5m 48s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 5m 48s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 4s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4100/3/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0) | | +1 :green_heart: | mvnsite | 2m 10s | | the patch passed | | +1 :green_heart: | javadoc | 1m 28s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 57s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 5m 59s | | the patch passed | | +1 :green_heart: | shadedclient | 24m 50s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 26s | | hadoop-hdfs-client in the patch passed. | | +1 :green_heart: | unit | 239m 48s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 49s | | The patch does not generate ASF License warnings. | | | | 381m 23s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4100/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4100 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux e899737cf03e 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 81cbc541ee5101332e9038e7f620c255e9cc01f9 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | |
[jira] [Work logged] (HDFS-16518) Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager
[ https://issues.apache.org/jira/browse/HDFS-16518?focusedWorklogId=746920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746920 ] ASF GitHub Bot logged work on HDFS-16518: - Author: ASF GitHub Bot Created on: 24/Mar/22 01:55 Start Date: 24/Mar/22 01:55 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #4100: URL: https://github.com/apache/hadoop/pull/4100#issuecomment-1076988908 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 42s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 1s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 33s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 24m 7s | | trunk passed | | +1 :green_heart: | compile | 6m 29s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 5m 43s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 11s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 27s | | trunk passed | | +1 :green_heart: | javadoc | 1m 45s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 2m 13s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 23s | | trunk passed | | +1 :green_heart: | shadedclient | 24m 15s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 28s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 4s | | the patch passed | | +1 :green_heart: | compile | 5m 50s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 5m 50s | | the patch passed | | +1 :green_heart: | compile | 5m 48s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 5m 48s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 0s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4100/2/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) | | +1 :green_heart: | mvnsite | 2m 15s | | the patch passed | | +1 :green_heart: | javadoc | 1m 32s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 2m 4s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 42s | | the patch passed | | +1 :green_heart: | shadedclient | 23m 46s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 20s | | hadoop-hdfs-client in the patch passed. | | +1 :green_heart: | unit | 242m 26s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 44s | | The patch does not generate ASF License warnings. | | | | 382m 2s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4100/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4100 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 34bde18cbf15 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / e6e0f0165f2be565f6da8e720a5a3ef094b73036 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | |
[jira] [Resolved] (HDFS-16517) In 2.10 the distance metric is wrong for non-DN machines
[ https://issues.apache.org/jira/browse/HDFS-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HDFS-16517. -- Fix Version/s: 2.10.2 Resolution: Fixed > In 2.10 the distance metric is wrong for non-DN machines > > > Key: HDFS-16517 > URL: https://issues.apache.org/jira/browse/HDFS-16517 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Labels: pull-request-available > Fix For: 2.10.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > In 2.10, the metric for distance between the client and the data node is > wrong for machines that aren't running data nodes (ie. > getWeightUsingNetworkLocation). The code works correctly in 3.3+. > Currently > > ||Client||DataNode||getWeight||getWeightUsingNetworkLocation|| > |/rack1/node1|/rack1/node1|0|0| > |/rack1/node1|/rack1/node2|2|2| > |/rack1/node1|/rack2/node2|4|2| > |/pod1/rack1/node1|/pod1/rack1/node2|2|2| > |/pod1/rack1/node1|/pod1/rack2/node2|4|2| > |/pod1/rack1/node1|/pod2/rack2/node2|6|4| > > This bug will destroy data locality on clusters where the clients share racks > with DataNodes, but are running on machines that aren't running DataNodes, > such as striping federated HDFS clusters across racks. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16517) In 2.10 the distance metric is wrong for non-DN machines
[ https://issues.apache.org/jira/browse/HDFS-16517?focusedWorklogId=746881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746881 ] ASF GitHub Bot logged work on HDFS-16517: - Author: ASF GitHub Bot Created on: 23/Mar/22 21:54 Start Date: 23/Mar/22 21:54 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #4091: URL: https://github.com/apache/hadoop/pull/4091#issuecomment-1076857473 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 0s | | Docker mode activated. | | -1 :x: | patch | 0m 19s | | https://github.com/apache/hadoop/pull/4091 does not apply to branch-2.10. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help. | | Subsystem | Report/Notes | |--:|:-| | GITHUB PR | https://github.com/apache/hadoop/pull/4091 | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4091/5/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746881) Time Spent: 1.5h (was: 1h 20m) > In 2.10 the distance metric is wrong for non-DN machines > > > Key: HDFS-16517 > URL: https://issues.apache.org/jira/browse/HDFS-16517 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > In 2.10, the metric for distance between the client and the data node is > wrong for machines that aren't running data nodes (ie. > getWeightUsingNetworkLocation). The code works correctly in 3.3+. > Currently > > ||Client||DataNode||getWeight||getWeightUsingNetworkLocation|| > |/rack1/node1|/rack1/node1|0|0| > |/rack1/node1|/rack1/node2|2|2| > |/rack1/node1|/rack2/node2|4|2| > |/pod1/rack1/node1|/pod1/rack1/node2|2|2| > |/pod1/rack1/node1|/pod1/rack2/node2|4|2| > |/pod1/rack1/node1|/pod2/rack2/node2|6|4| > > This bug will destroy data locality on clusters where the clients share racks > with DataNodes, but are running on machines that aren't running DataNodes, > such as striping federated HDFS clusters across racks. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16517) In 2.10 the distance metric is wrong for non-DN machines
[ https://issues.apache.org/jira/browse/HDFS-16517?focusedWorklogId=746880=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746880 ] ASF GitHub Bot logged work on HDFS-16517: - Author: ASF GitHub Bot Created on: 23/Mar/22 21:52 Start Date: 23/Mar/22 21:52 Worklog Time Spent: 10m Work Description: omalley merged pull request #4091: URL: https://github.com/apache/hadoop/pull/4091 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746880) Time Spent: 1h 20m (was: 1h 10m) > In 2.10 the distance metric is wrong for non-DN machines > > > Key: HDFS-16517 > URL: https://issues.apache.org/jira/browse/HDFS-16517 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > In 2.10, the metric for distance between the client and the data node is > wrong for machines that aren't running data nodes (ie. > getWeightUsingNetworkLocation). The code works correctly in 3.3+. > Currently > > ||Client||DataNode||getWeight||getWeightUsingNetworkLocation|| > |/rack1/node1|/rack1/node1|0|0| > |/rack1/node1|/rack1/node2|2|2| > |/rack1/node1|/rack2/node2|4|2| > |/pod1/rack1/node1|/pod1/rack1/node2|2|2| > |/pod1/rack1/node1|/pod1/rack2/node2|4|2| > |/pod1/rack1/node1|/pod2/rack2/node2|6|4| > > This bug will destroy data locality on clusters where the clients share racks > with DataNodes, but are running on machines that aren't running DataNodes, > such as striping federated HDFS clusters across racks. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16517) In 2.10 the distance metric is wrong for non-DN machines
[ https://issues.apache.org/jira/browse/HDFS-16517?focusedWorklogId=746879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746879 ] ASF GitHub Bot logged work on HDFS-16517: - Author: ASF GitHub Bot Created on: 23/Mar/22 21:47 Start Date: 23/Mar/22 21:47 Worklog Time Spent: 10m Work Description: omalley commented on pull request #4091: URL: https://github.com/apache/hadoop/pull/4091#issuecomment-1076852734 When I created the PR, I hadn't found the upstream jira, which is https://issues.apache.org/jira/browse/HADOOP-16161 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746879) Time Spent: 1h 10m (was: 1h) > In 2.10 the distance metric is wrong for non-DN machines > > > Key: HDFS-16517 > URL: https://issues.apache.org/jira/browse/HDFS-16517 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Owen O'Malley >Assignee: Owen O'Malley >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > In 2.10, the metric for distance between the client and the data node is > wrong for machines that aren't running data nodes (ie. > getWeightUsingNetworkLocation). The code works correctly in 3.3+. > Currently > > ||Client||DataNode||getWeight||getWeightUsingNetworkLocation|| > |/rack1/node1|/rack1/node1|0|0| > |/rack1/node1|/rack1/node2|2|2| > |/rack1/node1|/rack2/node2|4|2| > |/pod1/rack1/node1|/pod1/rack1/node2|2|2| > |/pod1/rack1/node1|/pod1/rack2/node2|4|2| > |/pod1/rack1/node1|/pod2/rack2/node2|6|4| > > This bug will destroy data locality on clusters where the clients share racks > with DataNodes, but are running on machines that aren't running DataNodes, > such as striping federated HDFS clusters across racks. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16518) Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager
[ https://issues.apache.org/jira/browse/HDFS-16518?focusedWorklogId=746838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746838 ] ASF GitHub Bot logged work on HDFS-16518: - Author: ASF GitHub Bot Created on: 23/Mar/22 19:49 Start Date: 23/Mar/22 19:49 Worklog Time Spent: 10m Work Description: li-leyang commented on a change in pull request #4100: URL: https://github.com/apache/hadoop/pull/4100#discussion_r833665038 ## File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/KeyProviderCache.java ## @@ -85,6 +90,26 @@ public KeyProvider call() throws Exception { } } + public static final int SHUTDOWN_HOOK_PRIORITY = FileSystem.SHUTDOWN_HOOK_PRIORITY - 1; + + private class KeyProviderCacheFinalizer implements Runnable { +@Override +public synchronized void run() { + invalidateCache(); +} + } + + /** + * Invalidate cache and auto close KeyProviders in the cache + */ + @VisibleForTesting + synchronized void invalidateCache() { +LOG.debug("Invalidating all cached KeyProviders in ShutdownHookManager."); Review comment: fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746838) Time Spent: 50m (was: 40m) > Cached KeyProvider in KeyProviderCache should be closed with > ShutdownHookManager > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > We need to make sure the underlying KeyProvider used by multiple DFSClient > instances is closed at one shot during jvm shutdown. Within the shutdownhook, > we invalidate the cache and make sure they are all closed. The cache has a > removeListener hook which is called when cache entry is invalidated. > {code:java} > Class KeyProviderCache > ... > public KeyProviderCache(long expiryMs) { > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); > }{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16518) Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager
[ https://issues.apache.org/jira/browse/HDFS-16518?focusedWorklogId=746836=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746836 ] ASF GitHub Bot logged work on HDFS-16518: - Author: ASF GitHub Bot Created on: 23/Mar/22 19:48 Start Date: 23/Mar/22 19:48 Worklog Time Spent: 10m Work Description: li-leyang commented on a change in pull request #4100: URL: https://github.com/apache/hadoop/pull/4100#discussion_r833664651 ## File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/KeyProviderCache.java ## @@ -85,6 +90,26 @@ public KeyProvider call() throws Exception { } } + public static final int SHUTDOWN_HOOK_PRIORITY = FileSystem.SHUTDOWN_HOOK_PRIORITY - 1; + + private class KeyProviderCacheFinalizer implements Runnable { +@Override +public synchronized void run() { + invalidateCache(); +} + } + + /** + * Invalidate cache and auto close KeyProviders in the cache + */ + @VisibleForTesting + synchronized void invalidateCache() { +LOG.debug("Invalidating all cached KeyProviders in ShutdownHookManager."); +if (cache != null) { + cache.invalidateAll(); Review comment: Added -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746836) Time Spent: 40m (was: 0.5h) > Cached KeyProvider in KeyProviderCache should be closed with > ShutdownHookManager > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > We need to make sure the underlying KeyProvider used by multiple DFSClient > instances is closed at one shot during jvm shutdown. Within the shutdownhook, > we invalidate the cache and make sure they are all closed. The cache has a > removeListener hook which is called when cache entry is invalidated. > {code:java} > Class KeyProviderCache > ... > public KeyProviderCache(long expiryMs) { > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); > }{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16511) Change some frequent method lock type in ReplicaMap.
[ https://issues.apache.org/jira/browse/HDFS-16511?focusedWorklogId=746835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746835 ] ASF GitHub Bot logged work on HDFS-16511: - Author: ASF GitHub Bot Created on: 23/Mar/22 19:46 Start Date: 23/Mar/22 19:46 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #4085: URL: https://github.com/apache/hadoop/pull/4085#issuecomment-1076753327 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 38s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 49s | | trunk passed | | +1 :green_heart: | compile | 1m 29s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 24s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 0s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 28s | | trunk passed | | +1 :green_heart: | javadoc | 1m 3s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 34s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 14s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 30s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 16s | | the patch passed | | +1 :green_heart: | compile | 1m 20s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 20s | | the patch passed | | +1 :green_heart: | compile | 1m 13s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 13s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 51s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 18s | | the patch passed | | +1 :green_heart: | javadoc | 0m 53s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 30s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 19s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 33s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 233m 4s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4085/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 45s | | The patch does not generate ASF License warnings. | | | | 333m 56s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestRollingUpgrade | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4085/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4085 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 35866c5cf66e 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d0b8d1ab852fa228a20520a132868f1aeaa75b79 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4085/3/testReport/ | | Max. process+thread count | 3286 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U:
[jira] [Work logged] (HDFS-16518) Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager
[ https://issues.apache.org/jira/browse/HDFS-16518?focusedWorklogId=746830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746830 ] ASF GitHub Bot logged work on HDFS-16518: - Author: ASF GitHub Bot Created on: 23/Mar/22 19:41 Start Date: 23/Mar/22 19:41 Worklog Time Spent: 10m Work Description: ibuenros commented on a change in pull request #4100: URL: https://github.com/apache/hadoop/pull/4100#discussion_r833658315 ## File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/KeyProviderCache.java ## @@ -85,6 +90,26 @@ public KeyProvider call() throws Exception { } } + public static final int SHUTDOWN_HOOK_PRIORITY = FileSystem.SHUTDOWN_HOOK_PRIORITY - 1; + + private class KeyProviderCacheFinalizer implements Runnable { +@Override +public synchronized void run() { + invalidateCache(); +} + } + + /** + * Invalidate cache and auto close KeyProviders in the cache + */ + @VisibleForTesting + synchronized void invalidateCache() { +LOG.debug("Invalidating all cached KeyProviders in ShutdownHookManager."); Review comment: This log is technically not correct in that we don't know the call is coming form ShutdownHookManager. Maybe just remove the last two words from the log. ## File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/KeyProviderCache.java ## @@ -85,6 +90,26 @@ public KeyProvider call() throws Exception { } } + public static final int SHUTDOWN_HOOK_PRIORITY = FileSystem.SHUTDOWN_HOOK_PRIORITY - 1; + + private class KeyProviderCacheFinalizer implements Runnable { +@Override +public synchronized void run() { + invalidateCache(); +} + } + + /** + * Invalidate cache and auto close KeyProviders in the cache + */ + @VisibleForTesting + synchronized void invalidateCache() { +LOG.debug("Invalidating all cached KeyProviders in ShutdownHookManager."); +if (cache != null) { + cache.invalidateAll(); Review comment: Maybe add a comment that this will close the providers due to the cache hook? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746830) Time Spent: 0.5h (was: 20m) > Cached KeyProvider in KeyProviderCache should be closed with > ShutdownHookManager > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > We need to make sure the underlying KeyProvider used by multiple DFSClient > instances is closed at one shot during jvm shutdown. Within the shutdownhook, > we invalidate the cache and make sure they are all closed. The cache has a > removeListener hook which is called when cache entry is invalidated. > {code:java} > Class KeyProviderCache > ... > public KeyProviderCache(long expiryMs) { > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); > }{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager
[ https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-16518: Description: We need to make sure the underlying KeyProvider used by multiple DFSClient instances is closed at one shot during jvm shutdown. Within the shutdownhook, we invalidate the cache and make sure they are all closed. The cache has a removeListener hook which is called when cache entry is invalidated. {code:java} Class KeyProviderCache ... public KeyProviderCache(long expiryMs) { cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); }{code} was: The cache has ttl and can close KeyProvider when cache entry is expired but when DFSClient is closed, we also need to make sure the underlying KeyProvider used by DFSClient is closed as well. The cache has a removeListener hook which is called when cache entry is removed. An alternative approach would be add shutdownhook at jvm shutdown and close all KeyProviders in the cache. {code:java} Class KeyProviderCache ... public KeyProviderCache(long expiryMs) { cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); }{code} > Cached KeyProvider in KeyProviderCache should be closed with > ShutdownHookManager > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We need to make sure the underlying KeyProvider used by multiple DFSClient > instances is closed at one shot during jvm shutdown. Within the shutdownhook, > we invalidate the cache and make sure they are all closed. The cache has a > removeListener hook which is called when cache entry is invalidated. > {code:java} > Class KeyProviderCache > ... > public KeyProviderCache(long expiryMs) { > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); > }{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager
[ https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-16518: Summary: Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager (was: Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed ) > Cached KeyProvider in KeyProviderCache should be closed with > ShutdownHookManager > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The cache has ttl and can close KeyProvider when cache entry is expired but > when DFSClient is closed, we also need to make sure the underlying > KeyProvider used by DFSClient is closed as well. The cache has a > removeListener hook which is called when cache entry is removed. An > alternative approach would be add shutdownhook at jvm shutdown and close all > KeyProviders in the cache. > {code:java} > Class KeyProviderCache > ... > public KeyProviderCache(long expiryMs) { > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); > }{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed
[ https://issues.apache.org/jira/browse/HDFS-16518?focusedWorklogId=746807=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746807 ] ASF GitHub Bot logged work on HDFS-16518: - Author: ASF GitHub Bot Created on: 23/Mar/22 18:50 Start Date: 23/Mar/22 18:50 Worklog Time Spent: 10m Work Description: ibuenros commented on pull request #4100: URL: https://github.com/apache/hadoop/pull/4100#issuecomment-1076700669 @li-leyang this change is invalidating the singleton cache every time a DFSClient is closed. I thought the intention was to use a shutdown hook to close key provider clients instead? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746807) Time Spent: 20m (was: 10m) > Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is > closed > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The cache has ttl and can close KeyProvider when cache entry is expired but > when DFSClient is closed, we also need to make sure the underlying > KeyProvider used by DFSClient is closed as well. The cache has a > removeListener hook which is called when cache entry is removed. An > alternative approach would be add shutdownhook at jvm shutdown and close all > KeyProviders in the cache. > {code:java} > Class KeyProviderCache > ... > public KeyProviderCache(long expiryMs) { > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); > }{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16517) In 2.10 the distance metric is wrong for non-DN machines
[ https://issues.apache.org/jira/browse/HDFS-16517?focusedWorklogId=746805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746805 ] ASF GitHub Bot logged work on HDFS-16517: - Author: ASF GitHub Bot Created on: 23/Mar/22 18:47 Start Date: 23/Mar/22 18:47 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #4091: URL: https://github.com/apache/hadoop/pull/4091#issuecomment-1076698479 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 44s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ branch-2.10 Compile Tests _ | | +1 :green_heart: | mvninstall | 15m 27s | | branch-2.10 passed | | +1 :green_heart: | compile | 15m 36s | | branch-2.10 passed with JDK Azul Systems, Inc.-1.7.0_262-b10 | | +1 :green_heart: | compile | 11m 49s | | branch-2.10 passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 | | +1 :green_heart: | checkstyle | 0m 38s | | branch-2.10 passed | | +1 :green_heart: | mvnsite | 1m 22s | | branch-2.10 passed | | +1 :green_heart: | javadoc | 1m 21s | | branch-2.10 passed with JDK Azul Systems, Inc.-1.7.0_262-b10 | | +1 :green_heart: | javadoc | 1m 5s | | branch-2.10 passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 | | -1 :x: | spotbugs | 2m 11s | [/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4091/4/artifact/out/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html) | hadoop-common-project/hadoop-common in branch-2.10 has 2 extant spotbugs warnings. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 47s | | the patch passed | | +1 :green_heart: | compile | 14m 24s | | the patch passed with JDK Azul Systems, Inc.-1.7.0_262-b10 | | +1 :green_heart: | javac | 14m 24s | | the patch passed | | +1 :green_heart: | compile | 12m 15s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 | | +1 :green_heart: | javac | 12m 15s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 41s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 20s | | the patch passed | | +1 :green_heart: | javadoc | 1m 22s | | the patch passed with JDK Azul Systems, Inc.-1.7.0_262-b10 | | +1 :green_heart: | javadoc | 1m 7s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 | | +1 :green_heart: | spotbugs | 2m 18s | | the patch passed | _ Other Tests _ | | -1 :x: | unit | 9m 39s | [/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4091/4/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt) | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 51s | | The patch does not generate ASF License warnings. | | | | 100m 9s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.fs.sftp.TestSFTPFileSystem | | | hadoop.io.compress.snappy.TestSnappyCompressorDecompressor | | | hadoop.util.TestBasicDiskValidator | | | hadoop.io.compress.TestCompressorDecompressor | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4091/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4091 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 4ecc0fa7fadb 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 13:41:54 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | branch-2.10 / c4db7feb4fe29a16eaaea907fafcf8f965bffb32 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 | | Multi-JDK versions | /usr/lib/jvm/zulu-7-amd64:Azul Systems, Inc.-1.7.0_262-b10 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 |
[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed
[ https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-16518: Description: The cache has ttl and can close KeyProvider when cache entry is expired but when DFSClient is closed, we also need to make sure the underlying KeyProvider used by DFSClient is closed as well. The cache has a removeListener hook which is called when cache entry is removed. An alternative approach would be add shutdownhook at jvm shutdown and close all KeyProviders in the cache. {code:java} Class KeyProviderCache ... public KeyProviderCache(long expiryMs) { cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); }{code} was: The cache has ttl and can close KeyProvider when cache entry is expired but when DFSClient is closed, we also need to make sure the underlying KeyProvider used by DFSClient is closed as well. The cache has a removeListener hook which is called when cache entry is removed. {code:java} Class KeyProviderCache ... public KeyProviderCache(long expiryMs) { cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); }{code} > Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is > closed > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The cache has ttl and can close KeyProvider when cache entry is expired but > when DFSClient is closed, we also need to make sure the underlying > KeyProvider used by DFSClient is closed as well. The cache has a > removeListener hook which is called when cache entry is removed. An > alternative approach would be add shutdownhook at jvm shutdown and close all > KeyProviders in the cache. > {code:java} > Class KeyProviderCache > ... > public KeyProviderCache(long expiryMs) { > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); > }{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed
[ https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-16518: Description: The cache has ttl and can close KeyProvider when cache entry is expired but when DFSClient is closed, we also need to make sure the underlying KeyProvider used by DFSClient is closed as well. The cache has a removeListener hook which is called when cache entry is removed. {code:java} Class KeyProviderCache ... public KeyProviderCache(long expiryMs) { cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); }{code} was: The cache has ttl and can close KeyProvider when cache entry is expired but when DFSClient is closed, we also need to make sure the underlying KeyProvider used by DFSClient is closed as well. The cache has a removeListener hook which is called when cache entry is removed. {code:java} org.apache.hadoop.hdfs.KeyProviderCache Class KeyProviderCache ... public KeyProviderCache(long expiryMs) { cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); }{code} > Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is > closed > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The cache has ttl and can close KeyProvider when cache entry is expired but > when DFSClient is closed, we also need to make sure the underlying > KeyProvider used by DFSClient is closed as well. The cache has a > removeListener hook which is called when cache entry is removed. > {code:java} > Class KeyProviderCache > ... > public KeyProviderCache(long expiryMs) { > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); > }{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed
[ https://issues.apache.org/jira/browse/HDFS-16518?focusedWorklogId=746794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746794 ] ASF GitHub Bot logged work on HDFS-16518: - Author: ASF GitHub Bot Created on: 23/Mar/22 18:34 Start Date: 23/Mar/22 18:34 Worklog Time Spent: 10m Work Description: li-leyang opened a new pull request #4100: URL: https://github.com/apache/hadoop/pull/4100 ### Description of PR https://issues.apache.org/jira/browse/HDFS-16518 ### How was this patch tested? ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746794) Remaining Estimate: 0h Time Spent: 10m > Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is > closed > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The cache has ttl and can close KeyProvider when cache entry is expired but > when DFSClient is closed, we also need to make sure the underlying > KeyProvider used by DFSClient is closed as well. The cache has a > removeListener hook which is called when cache entry is removed. > {code:java} > org.apache.hadoop.hdfs.KeyProviderCache > Class KeyProviderCache > ... > public KeyProviderCache(long expiryMs) { > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); > }{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed
[ https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16518: -- Labels: pull-request-available (was: ) > Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is > closed > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The cache has ttl and can close KeyProvider when cache entry is expired but > when DFSClient is closed, we also need to make sure the underlying > KeyProvider used by DFSClient is closed as well. The cache has a > removeListener hook which is called when cache entry is removed. > {code:java} > org.apache.hadoop.hdfs.KeyProviderCache > Class KeyProviderCache > ... > public KeyProviderCache(long expiryMs) { > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); > }{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed
[ https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-16518: Description: The cache has ttl and can close KeyProvider when cache entry is expired but when DFSClient is closed, we also need to make sure the underlying KeyProvider used by DFSClient is closed as well. The cache has a removeListener hook which is called when cache entry is removed. {code:java} org.apache.hadoop.hdfs.KeyProviderCache Class KeyProviderCache ... public KeyProviderCache(long expiryMs) { cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); }{code} was: The cache has ttl and can close KeyProvider when cache entry is expired but when DFSClient is closed, we also need to make sure the underlying KeyProvider used by DFSClient is closed as well. The cache has a removeListener hook which is called when cache entry is removed. {code:java} org.apache.hadoop.hdfs.KeyProviderCache public KeyProviderCache(long expiryMs) { cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); }{code} > Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is > closed > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > > The cache has ttl and can close KeyProvider when cache entry is expired but > when DFSClient is closed, we also need to make sure the underlying > KeyProvider used by DFSClient is closed as well. The cache has a > removeListener hook which is called when cache entry is removed. > {code:java} > org.apache.hadoop.hdfs.KeyProviderCache > Class KeyProviderCache > ... > public KeyProviderCache(long expiryMs) { > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); > }{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed
[ https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-16518: Description: The cache has ttl and can close KeyProvider when cache entry is expired but when DFSClient is closed, we also need to make sure the underlying KeyProvider used by DFSClient is closed as well. The cache has a removeListener hook which is called when cache entry is removed. {code:java} org.apache.hadoop.hdfs.KeyProviderCache public KeyProviderCache(long expiryMs) { cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); }{code} was: The cache has ttl and can close KeyProvider when cache entry is expired but when DFSClient is closed, we also need to make sure the underlying KeyProvider used by DFSClient is closed as well. The cache has a removeListener hook which is called when cache entry is removed. {code:java} cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); {code} > Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is > closed > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > > The cache has ttl and can close KeyProvider when cache entry is expired but > when DFSClient is closed, we also need to make sure the underlying > KeyProvider used by DFSClient is closed as well. The cache has a > removeListener hook which is called when cache entry is removed. > {code:java} > org.apache.hadoop.hdfs.KeyProviderCache > public KeyProviderCache(long expiryMs) { > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); > }{code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed
[ https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-16518: Description: The cache has ttl and can close KeyProvider when cache entry is expired but when DFSClient is closed, we also need to make sure the underlying KeyProvider used by DFSClient is closed as well. The cache has a removeListener hook which is called when cache entry is removed. {code:java} cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); {code} was: The cache has ttl and can close KeyProvider when cache entry is expired but when DFSClient is closed, we also need to make sure the underlying KeyProvider used by DFSClient is closed as well. The cache has a removeListener hook which is called when cache entry is removed. {code:java} cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); {code} > Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is > closed > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > > The cache has ttl and can close KeyProvider when cache entry is expired but > when DFSClient is closed, we also need to make sure the underlying > KeyProvider used by DFSClient is closed as well. The cache has a > removeListener hook which is called when cache entry is removed. > {code:java} > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed
[ https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-16518: Description: The cache has ttl and can close KeyProvider when cache entry is expired but when DFSClient is closed, we also need to make sure the underlying KeyProvider used by DFSClient is closed as well. The cache has a removeListener hook which is called when cache entry is removed. {code:java} cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); {code} was: The cache has ttl and can close KeyProvider when cache entry is expired but when DFSClient is closed, we also need to make sure the KeyProvider is closed as well. The cache has a removeListener hook which is called when cache entry is removed. {code:java} cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); {code} > Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is > closed > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > > The cache has ttl and can close KeyProvider when cache entry is expired but > when DFSClient is closed, we also need to make sure the underlying > KeyProvider used by DFSClient is closed as well. The cache has a > removeListener hook which is called when cache entry is removed. > {code:java} > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed
[ https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-16518: Description: The cache has ttl and can close KeyProvider when cache entry is expired but when DFSClient is closed, we also need to make sure the KeyProvider is closed as well. The cache has a removeListener hook which is called when cache entry is removed. {code:java} cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); {code} was: The cache has ttl and can close KeyProvider when cache entry is expired but when DFSClient is closed, we also need to make sure the KeyProvider is closed properly. The cache has a removeListener hook which is called when cache entry is removed. {code:java} cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); {code} > Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is > closed > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > > The cache has ttl and can close KeyProvider when cache entry is expired but > when DFSClient is closed, we also need to make sure the KeyProvider is closed > as well. The cache has a removeListener hook which is called when cache > entry is removed. > {code:java} > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed
[ https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-16518: Description: The cache has ttl and can close KeyProvider when cache entry is expired but when DFSClient is closed, we also need to make sure the KeyProvider is closed properly. The cache has a removeListener hook which is called when cache entry is removed. {code:java} cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); {code} was: The cache has ttl and can close KeyProvider when cache entry is expired but we also want to trigger close cached KeyProvider when DFSClient is closed. {code:java} cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); {code} > Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is > closed > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > > The cache has ttl and can close KeyProvider when cache entry is expired but > when DFSClient is closed, we also need to make sure the KeyProvider is closed > properly. The cache has a removeListener hook which is called when cache > entry is removed. > {code:java} > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed
[ https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-16518: Description: The cache has ttl and can close KeyProvider when cache entry is expired but we also want to trigger close cached KeyProvider when DFSClient is closed. {code:java} cache = CacheBuilder.newBuilder() .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) .removalListener(new RemovalListener() { @Override public void onRemoval( @Nonnull RemovalNotification notification) { try { assert notification.getValue() != null; notification.getValue().close(); } catch (Throwable e) { LOG.error( "Error closing KeyProvider with uri [" + notification.getKey() + "]", e); } } }) .build(); {code} was: The cache has ttl and can close KeyProvider when cache entry is expired but we also want to close underlying KeyProvider when DFSClient is closed. > Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is > closed > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > > The cache has ttl and can close KeyProvider when cache entry is expired but > we also want to trigger close cached KeyProvider when DFSClient is closed. > {code:java} > cache = CacheBuilder.newBuilder() > .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS) > .removalListener(new RemovalListener() { > @Override > public void onRemoval( > @Nonnull RemovalNotification notification) { > try { > assert notification.getValue() != null; > notification.getValue().close(); > } catch (Throwable e) { > LOG.error( > "Error closing KeyProvider with uri [" > + notification.getKey() + "]", e); > } > } > }) > .build(); {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed
[ https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-16518: Description: The cache has ttl and can close KeyProvider when cache entry is expired but we also want to close underlying KeyProvider when DFSClient is closed. was: The cache has ttl and can close KeyProvider when cache entry is expired. In KeyProviderCache, we should add ShutdownHookManager to clean up all keyprovider instances in the cache when jvm shuts down. > Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is > closed > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > > The cache has ttl and can close KeyProvider when cache entry is expired but > we also want to close underlying KeyProvider when DFSClient is closed. > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16518) KeyProviderCache does not get closed when DFSClient is closed
[ https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-16518: Summary: KeyProviderCache does not get closed when DFSClient is closed (was: KeyProviderCache does not get closed when DFSCLient shutdown) > KeyProviderCache does not get closed when DFSClient is closed > - > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > > The cache has ttl and can close KeyProvider when cache entry is expired. > In KeyProviderCache, we should add ShutdownHookManager to clean up all > keyprovider instances in the cache when jvm shuts down. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed
[ https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Yang updated HDFS-16518: Summary: Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed (was: KeyProviderCache does not get closed when DFSClient is closed) > Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is > closed > > > Key: HDFS-16518 > URL: https://issues.apache.org/jira/browse/HDFS-16518 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.10.0 >Reporter: Lei Yang >Priority: Major > > The cache has ttl and can close KeyProvider when cache entry is expired. > In KeyProviderCache, we should add ShutdownHookManager to clean up all > keyprovider instances in the cache when jvm shuts down. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16434) Add opname to read/write lock for remaining operations
[ https://issues.apache.org/jira/browse/HDFS-16434?focusedWorklogId=746633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746633 ] ASF GitHub Bot logged work on HDFS-16434: - Author: ASF GitHub Bot Created on: 23/Mar/22 14:49 Start Date: 23/Mar/22 14:49 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3915: URL: https://github.com/apache/hadoop/pull/3915#issuecomment-1076462287 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 43s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 36m 53s | | trunk passed | | +1 :green_heart: | compile | 1m 35s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 31s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 10s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 37s | | trunk passed | | +1 :green_heart: | javadoc | 1m 11s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 35s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 36s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 22s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 20s | | the patch passed | | +1 :green_heart: | compile | 1m 29s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 29s | | the patch passed | | +1 :green_heart: | compile | 1m 22s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 22s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 2s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3915/4/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 400 unchanged - 0 fixed = 402 total (was 400) | | +1 :green_heart: | mvnsite | 1m 29s | | the patch passed | | +1 :green_heart: | javadoc | 0m 55s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 34s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 42s | | the patch passed | | +1 :green_heart: | shadedclient | 25m 46s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 270m 43s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3915/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +0 :ok: | asflicense | 0m 35s | | ASF License check generated no output? | | | | 382m 20s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestDFSInputStream | | | hadoop.hdfs.server.balancer.TestBalancer | | | hadoop.hdfs.TestHDFSFileSystemContract | | | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy | | | hadoop.hdfs.TestLargeBlock | | | hadoop.hdfs.TestStoragePolicyPermissionSettings | | | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy | | | hadoop.hdfs.TestGetBlocks | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3915/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3915 | | Optional Tests |
[jira] [Work logged] (HDFS-16511) Change some frequent method lock type in ReplicaMap.
[ https://issues.apache.org/jira/browse/HDFS-16511?focusedWorklogId=746599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746599 ] ASF GitHub Bot logged work on HDFS-16511: - Author: ASF GitHub Bot Created on: 23/Mar/22 14:08 Start Date: 23/Mar/22 14:08 Worklog Time Spent: 10m Work Description: MingXiangLi commented on a change in pull request #4085: URL: https://github.com/apache/hadoop/pull/4085#discussion_r833312880 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java ## @@ -602,6 +605,54 @@ public void run() {} + "volumeMap.", 0, totalNumReplicas); } + @Test(timeout = 3) + public void testCurrentWriteAndDeleteBlock() throws Exception { +// Feed FsDataset with block metadata. +final int numBlocks = 1000; +final int threadCount = 10; +// Generate data blocks. +ExecutorService pool = Executors.newFixedThreadPool(threadCount); +List> futureList = new ArrayList<>(); +for (int i = 0; i < threadCount; i++) { + Thread thread = new Thread() { +@Override +public void run() { + try { +for (int i = 0; i < numBlocks; i++) { + String bpid = BLOCK_POOL_IDS[numBlocks % BLOCK_POOL_IDS.length]; + ExtendedBlock eb = new ExtendedBlock(bpid, i); + ReplicaHandler replica = null; + try { +replica = dataset.createRbw(StorageType.DEFAULT, null, eb, +false); +if (i % 2 > 0) { + dataset.invalidate(bpid, new Block[]{eb.getLocalBlock()}); +} + } finally { +if (replica != null) { + replica.close(); +} + } +} + } catch (Exception e) { +e.printStackTrace(); + } +} + }; + thread.setName("AddBlock" + i); + futureList.add(pool.submit(thread)); +} +// Wait for data generation +for (Future f : futureList) { + f.get(); +} +int totalNumReplicas = 0; Review comment: like testRemoveTwoVolumes(),we random write to block pool, so we final count the total block of all block pool. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746599) Time Spent: 40m (was: 0.5h) > Change some frequent method lock type in ReplicaMap. > > > Key: HDFS-16511 > URL: https://issues.apache.org/jira/browse/HDFS-16511 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Mingxiang Li >Assignee: Mingxiang Li >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > In HDFS-16429 we make LightWeightResizableGSet to be thread safe, and In > HDFS-15382 we have split lock to block pool grain locks.After these > improvement, we can change some method to acquire read lock replace to > acquire write lock. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16511) Change some frequent method lock type in ReplicaMap.
[ https://issues.apache.org/jira/browse/HDFS-16511?focusedWorklogId=746598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746598 ] ASF GitHub Bot logged work on HDFS-16511: - Author: ASF GitHub Bot Created on: 23/Mar/22 14:07 Start Date: 23/Mar/22 14:07 Worklog Time Spent: 10m Work Description: MingXiangLi commented on a change in pull request #4085: URL: https://github.com/apache/hadoop/pull/4085#discussion_r833311855 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java ## @@ -602,6 +605,54 @@ public void run() {} + "volumeMap.", 0, totalNumReplicas); } + @Test(timeout = 3) + public void testCurrentWriteAndDeleteBlock() throws Exception { +// Feed FsDataset with block metadata. +final int numBlocks = 1000; +final int threadCount = 10; +// Generate data blocks. +ExecutorService pool = Executors.newFixedThreadPool(threadCount); +List> futureList = new ArrayList<>(); +for (int i = 0; i < threadCount; i++) { + Thread thread = new Thread() { +@Override +public void run() { + try { +for (int i = 0; i < numBlocks; i++) { + String bpid = BLOCK_POOL_IDS[numBlocks % BLOCK_POOL_IDS.length]; Review comment: Yes, this means random write to a block pool.like testRemoveTwoVolumes(). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746598) Time Spent: 0.5h (was: 20m) > Change some frequent method lock type in ReplicaMap. > > > Key: HDFS-16511 > URL: https://issues.apache.org/jira/browse/HDFS-16511 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Reporter: Mingxiang Li >Assignee: Mingxiang Li >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In HDFS-16429 we make LightWeightResizableGSet to be thread safe, and In > HDFS-15382 we have split lock to block pool grain locks.After these > improvement, we can change some method to acquire read lock replace to > acquire write lock. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads
[ https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511224#comment-17511224 ] daimin edited comment on HDFS-16422 at 3/23/22, 12:30 PM: -- [~jingzhao] I tested this again, and my test steps are: # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 8g # Stop one datanode # Check md5sum of these files through HDFS FUSE, this is a simple way to create concurrent preads(indirect IO on FUSE) Here is test result: * md5sum check before datanode down: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} * md5sum after datanode down, with native(ISA-L) decoder: {quote}md5sum /mnt/fuse/*g 206288b264b92af42563a14a242aa629 /mnt/fuse/1g bc86f9f549912d78c8b3d02ada5621a2 /mnt/fuse/2g c201356b7437e6aac1b574ade08b6ccb /mnt/fuse/4g ef2e6f6b4b6ab96a24e5f734e93bacc3 /mnt/fuse/8g {quote} * md5sum after datanode down, with pure Java decoder: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not thread safe, the read/write lock seems unable to protect the native decodeImpl method. And I also tested on md5sum check on same file with native(ISA-L) decoder, the result is different every time. {quote} for i in \{1..5};do md5sum /mnt/fuse/1g;done 2e68ea6738dccb4f248df81b5c55d464 /mnt/fuse/1g 54944120797266fc4e26bd465ae5e67a /mnt/fuse/1g ef4d099269fb117e357015cf424723a9 /mnt/fuse/1g 6a40dbca2636ae796b6380385ddfbc83 /mnt/fuse/1g 126fc40073dcebb67d413de95571c08b /mnt/fuse/1g {quote} IMO, HADOOP-15499 did improve the performance of decoder, however it breaked the correctness of decode method when invoked concurrently. We should take synchronized back, and it's ok to the the read/write lock too as it protects from init/release methods. Thanks [~jingzhao] again. was (Author: cndaimin): [~jingzhao] I tested this again, and my test steps are: # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 8g # Stop one datanode # Check md5sum of these files through HDFS FUSE, this is a simple way to create concurrent preads(indirect IO on FUSE) Here is test result: * md5sum check before datanode down: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} * md5sum after datanode down, with native(ISA-L) decoder: {quote}md5sum /mnt/fuse/*g 206288b264b92af42563a14a242aa629 /mnt/fuse/1g bc86f9f549912d78c8b3d02ada5621a2 /mnt/fuse/2g c201356b7437e6aac1b574ade08b6ccb /mnt/fuse/4g ef2e6f6b4b6ab96a24e5f734e93bacc3 /mnt/fuse/8g {quote} * md5sum after datanode down, with pure Java decoder: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not thread safe, the read/write lock seems unable to protect the native decodeImpl method. And I also tested on md5sum check on same file with native(ISA-L) decoder, the result is different every time. {quote} for i in \{1..5};do md5sum /mnt/fuse/1g;done 2e68ea6738dccb4f248df81b5c55d464 /mnt/fuse/1g 54944120797266fc4e26bd465ae5e67a /mnt/fuse/1g ef4d099269fb117e357015cf424723a9 /mnt/fuse/1g 6a40dbca2636ae796b6380385ddfbc83 /mnt/fuse/1g 126fc40073dcebb67d413de95571c08b /mnt/fuse/1g {quote} IMO, HADOOP-15499 did improve the performance of decoder, however it breaked the correctness of decode method when invoked concurrently. We should take synchronized back, and I will submit a new PR later to do this work. Thanks [~jingzhao] again. > Fix thread safety of EC decoding during concurrent preads > - > > Key: HDFS-16422 > URL: https://issues.apache.org/jira/browse/HDFS-16422 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient, ec, erasure-coding >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.3 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Reading data on an erasure-coded file with missing replicas(internal block of > block group) will cause
[jira] [Comment Edited] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads
[ https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511224#comment-17511224 ] daimin edited comment on HDFS-16422 at 3/23/22, 12:26 PM: -- [~jingzhao] I tested this again, and my test steps are: # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 8g # Stop one datanode # Check md5sum of these files through HDFS FUSE, this is a simple way to create concurrent preads(indirect IO on FUSE) Here is test result: * md5sum check before datanode down: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} * md5sum after datanode down, with native(ISA-L) decoder: {quote}md5sum /mnt/fuse/*g 206288b264b92af42563a14a242aa629 /mnt/fuse/1g bc86f9f549912d78c8b3d02ada5621a2 /mnt/fuse/2g c201356b7437e6aac1b574ade08b6ccb /mnt/fuse/4g ef2e6f6b4b6ab96a24e5f734e93bacc3 /mnt/fuse/8g {quote} * md5sum after datanode down, with pure Java decoder: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not thread safe, the read/write lock seems unable to protect the native decodeImpl method. And I also tested on md5sum check on same file with native(ISA-L) decoder, the result is different every time. {quote} for i in \{1..5};do md5sum /mnt/fuse/1g;done 2e68ea6738dccb4f248df81b5c55d464 /mnt/fuse/1g 54944120797266fc4e26bd465ae5e67a /mnt/fuse/1g ef4d099269fb117e357015cf424723a9 /mnt/fuse/1g 6a40dbca2636ae796b6380385ddfbc83 /mnt/fuse/1g 126fc40073dcebb67d413de95571c08b /mnt/fuse/1g {quote} IMO, HADOOP-15499 did improve the performance of decoder, however it breaked the correctness of decode method when invoked concurrently. We should take synchronized back, and I will submit a new PR later to do this work. Thanks [~jingzhao] again. was (Author: cndaimin): [~jingzhao] I tested this again, and my test steps are: # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 8g # Stop one datanode # Check md5sum of these files through HDFS FUSE, this is a simple way to create concurrent preads(indirect IO on FUSE) Here is test result: * md5sum check before datanode down: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} * md5sum after datanode down, with native(ISA-L) decoder: {quote}md5sum /mnt/fuse/*g 206288b264b92af42563a14a242aa629 /mnt/fuse/1g bc86f9f549912d78c8b3d02ada5621a2 /mnt/fuse/2g c201356b7437e6aac1b574ade08b6ccb /mnt/fuse/4g ef2e6f6b4b6ab96a24e5f734e93bacc3 /mnt/fuse/8g {quote} * md5sum after datanode down, with pure Java decoder: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not thread safe, the read/write lock seems unable to protect the native decodeImpl method. And I also tested on md5sum check on same file with native(ISA-L) decoder, the result is different every time. {quote}for i in \{1..5};do md5sum /mnt/fuse/1g;done 2e68ea6738dccb4f248df81b5c55d464 /mnt/fuse/1g 54944120797266fc4e26bd465ae5e67a /mnt/fuse/1g ef4d099269fb117e357015cf424723a9 /mnt/fuse/1g 6a40dbca2636ae796b6380385ddfbc83 /mnt/fuse/1g 126fc40073dcebb67d413de95571c08b /mnt/fuse/1g {quote} IMO, HADOOP-15499 did improve the performance of decoder, however it breaked the correctness of decode method when invoked concurrently. We should take synchronized back, and I will submit a new PR later to do this work. Thanks [~jingzhao] again. > Fix thread safety of EC decoding during concurrent preads > - > > Key: HDFS-16422 > URL: https://issues.apache.org/jira/browse/HDFS-16422 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient, ec, erasure-coding >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.3 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Reading data on an erasure-coded file with missing replicas(internal block of > block group) will cause online reconstruction: read dataUnits part
[jira] [Comment Edited] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads
[ https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511224#comment-17511224 ] daimin edited comment on HDFS-16422 at 3/23/22, 12:25 PM: -- [~jingzhao] I tested this again, and my test steps are: # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 8g # Stop one datanode # Check md5sum of these files through HDFS FUSE, this is a simple way to create concurrent preads(indirect IO on FUSE) Here is test result: * md5sum check before datanode down: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} * md5sum after datanode down, with native(ISA-L) decoder: {quote}md5sum /mnt/fuse/*g 206288b264b92af42563a14a242aa629 /mnt/fuse/1g bc86f9f549912d78c8b3d02ada5621a2 /mnt/fuse/2g c201356b7437e6aac1b574ade08b6ccb /mnt/fuse/4g ef2e6f6b4b6ab96a24e5f734e93bacc3 /mnt/fuse/8g {quote} * md5sum after datanode down, with pure Java decoder: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not thread safe, the read/write lock seems unable to protect the native decodeImpl method. And I also tested on md5sum check on same file with native(ISA-L) decoder, the result is different every time. {quote}for i in \{1..5};do md5sum /mnt/fuse/1g;done 2e68ea6738dccb4f248df81b5c55d464 /mnt/fuse/1g 54944120797266fc4e26bd465ae5e67a /mnt/fuse/1g ef4d099269fb117e357015cf424723a9 /mnt/fuse/1g 6a40dbca2636ae796b6380385ddfbc83 /mnt/fuse/1g 126fc40073dcebb67d413de95571c08b /mnt/fuse/1g {quote} IMO, HADOOP-15499 did improve the performance of decoder, however it breaked the correctness of decode method when invoked concurrently. We should take synchronized back, and I will submit a new PR later to do this work. Thanks [~jingzhao] again. was (Author: cndaimin): [~jingzhao] I tested this again, and my test steps are: # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 8g # Stop one datanode # Check md5sum of these files through HDFS FUSE, this is a simple way to create concurrent preads(indirect IO on FUSE) Here is test result: * md5sum check before datanode down: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} * md5sum after datanode down, with native(ISA-L) decoder: {quote}md5sum /mnt/fuse/*g 206288b264b92af42563a14a242aa629 /mnt/fuse/1g bc86f9f549912d78c8b3d02ada5621a2 /mnt/fuse/2g c201356b7437e6aac1b574ade08b6ccb /mnt/fuse/4g ef2e6f6b4b6ab96a24e5f734e93bacc3 /mnt/fuse/8g {quote} * md5sum after datanode down, with pure Java decoder: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not thread safe, the read/write lock seems unable to protect the native decodeImpl method. And I also tested on md5sum check on same file with native(ISA-L) decoder, the result is different every time. {quote}for i in \{1..5};do md5sum /mnt/fuse/1g;done 2e68ea6738dccb4f248df81b5c55d464 /mnt/fuse/1g 54944120797266fc4e26bd465ae5e67a /mnt/fuse/1g ef4d099269fb117e357015cf424723a9 /mnt/fuse/1g 6a40dbca2636ae796b6380385ddfbc83 /mnt/fuse/1g 126fc40073dcebb67d413de95571c08b /mnt/fuse/1g {quote} IMO, HADOOP-15499 did improve the performance of decoder, however it breaked the correctness of decode method when invoked concurrently. We should take synchronized back, and I will submit a new PR later to do this work. Thanks [~jingzhao] again. > Fix thread safety of EC decoding during concurrent preads > - > > Key: HDFS-16422 > URL: https://issues.apache.org/jira/browse/HDFS-16422 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient, ec, erasure-coding >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.3 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Reading data on an erasure-coded file with missing replicas(internal block of > block group) will cause online reconstruction: read dataUnits part of data >
[jira] [Commented] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads
[ https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511224#comment-17511224 ] daimin commented on HDFS-16422: --- [~jingzhao] I tested this again, and my test steps are: # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 8g # Stop one datanode # Check md5sum of these files through HDFS FUSE, this is a simple way to create concurrent preads(indirect IO on FUSE) Here is test result: * md5sum check before datanode down: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} * md5sum after datanode down, with native(ISA-L) decoder: {quote}md5sum /mnt/fuse/*g 206288b264b92af42563a14a242aa629 /mnt/fuse/1g bc86f9f549912d78c8b3d02ada5621a2 /mnt/fuse/2g c201356b7437e6aac1b574ade08b6ccb /mnt/fuse/4g ef2e6f6b4b6ab96a24e5f734e93bacc3 /mnt/fuse/8g {quote} * md5sum after datanode down, with pure Java decoder: {quote}md5sum /mnt/fuse/*g 5e6c32c0b572e2ff24fb14f93c4cc45b /mnt/fuse/1g 782173623681c129558c09e89251f46d /mnt/fuse/2g e107f9a83a383b98aa23fdd3171b589c /mnt/fuse/4g adb81da2c34161f249439597c515db1d /mnt/fuse/8g {quote} In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not thread safe, the read/write lock seems unable to protect the native decodeImpl method. And I also tested on md5sum check on same file with native(ISA-L) decoder, the result is different every time. {quote}for i in \{1..5};do md5sum /mnt/fuse/1g;done 2e68ea6738dccb4f248df81b5c55d464 /mnt/fuse/1g 54944120797266fc4e26bd465ae5e67a /mnt/fuse/1g ef4d099269fb117e357015cf424723a9 /mnt/fuse/1g 6a40dbca2636ae796b6380385ddfbc83 /mnt/fuse/1g 126fc40073dcebb67d413de95571c08b /mnt/fuse/1g {quote} IMO, HADOOP-15499 did improve the performance of decoder, however it breaked the correctness of decode method when invoked concurrently. We should take synchronized back, and I will submit a new PR later to do this work. Thanks [~jingzhao] again. > Fix thread safety of EC decoding during concurrent preads > - > > Key: HDFS-16422 > URL: https://issues.apache.org/jira/browse/HDFS-16422 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient, ec, erasure-coding >Affects Versions: 3.3.0, 3.3.1 >Reporter: daimin >Assignee: daimin >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.3 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Reading data on an erasure-coded file with missing replicas(internal block of > block group) will cause online reconstruction: read dataUnits part of data > and decode them into the target missing data. Each DFSStripedInputStream > object has a RawErasureDecoder object, and when we doing pread concurrently, > RawErasureDecoder.decode will be invoked concurrently too. > RawErasureDecoder.decode is not thread safe, as a result of that we get wrong > data from pread occasionally. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15273) CacheReplicationMonitor hold lock for long time and lead to NN out of service
[ https://issues.apache.org/jira/browse/HDFS-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511194#comment-17511194 ] Hadoop QA commented on HDFS-15273: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 17s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 56s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 30s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 22s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 0s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 23m 21s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 29m 18s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 3m 27s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 23s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 19s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 17s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 17s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 58s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green}{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 21m 33s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} |
[jira] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely
[ https://issues.apache.org/jira/browse/HDFS-16064 ] yanbin.zhang deleted comment on HDFS-16064: - was (Author: it_singer): I think your root cause may not be here, we never seem to have this problem during our downline process. > HDFS-721 causes DataNode decommissioning to get stuck indefinitely > -- > > Key: HDFS-16064 > URL: https://issues.apache.org/jira/browse/HDFS-16064 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.2.1 >Reporter: Kevin Wikant >Priority: Major > > Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a > non-issue under the assumption that if the namenode & a datanode get into an > inconsistent state for a given block pipeline, there should be another > datanode available to replicate the block to > While testing datanode decommissioning using "dfs.exclude.hosts", I have > encountered a scenario where the decommissioning gets stuck indefinitely > Below is the progression of events: > * there are initially 4 datanodes DN1, DN2, DN3, DN4 > * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts" > * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in > order to satisfy their minimum replication factor of 2 > * during this replication process > https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes > the following inconsistent state: > ** DN3 thinks it has the block pipeline in FINALIZED state > ** the namenode does not think DN3 has the block pipeline > {code:java} > 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode > (DataXceiver for client at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): > DN3:9866:DataXceiver error processing WRITE_BLOCK operation src: /DN2:45654 > dst: /DN3:9866; > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created. > {code} > * the replication is attempted again, but: > ** DN4 has the block > ** DN1 and/or DN2 have the block, but don't count towards the minimum > replication factor because they are being decommissioned > ** DN3 does not have the block & cannot have the block replicated to it > because of HDFS-721 > * the namenode repeatedly tries to replicate the block to DN3 & repeatedly > fails, this continues indefinitely > * therefore DN4 is the only live datanode with the block & the minimum > replication factor of 2 cannot be satisfied > * because the minimum replication factor cannot be satisfied for the > block(s) being moved off DN1 & DN2, the datanode decommissioning can never be > completed > {code:java} > 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > ... > 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > {code} > Being stuck in decommissioning state forever is not an intended behavior of > DataNode decommissioning > A few potential solutions: > * Address the root cause of the problem which is an inconsistent state > between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721 > * Detect when datanode decommissioning is stuck due to lack of available > datanodes for satisfying the minimum replication factor, then recover by > re-enabling the datanodes being decommissioned > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely
[ https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511159#comment-17511159 ] yanbin.zhang commented on HDFS-16064: - I think your root cause may not be here, we never seem to have this problem during our downline process. > HDFS-721 causes DataNode decommissioning to get stuck indefinitely > -- > > Key: HDFS-16064 > URL: https://issues.apache.org/jira/browse/HDFS-16064 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.2.1 >Reporter: Kevin Wikant >Priority: Major > > Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a > non-issue under the assumption that if the namenode & a datanode get into an > inconsistent state for a given block pipeline, there should be another > datanode available to replicate the block to > While testing datanode decommissioning using "dfs.exclude.hosts", I have > encountered a scenario where the decommissioning gets stuck indefinitely > Below is the progression of events: > * there are initially 4 datanodes DN1, DN2, DN3, DN4 > * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts" > * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in > order to satisfy their minimum replication factor of 2 > * during this replication process > https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes > the following inconsistent state: > ** DN3 thinks it has the block pipeline in FINALIZED state > ** the namenode does not think DN3 has the block pipeline > {code:java} > 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode > (DataXceiver for client at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): > DN3:9866:DataXceiver error processing WRITE_BLOCK operation src: /DN2:45654 > dst: /DN3:9866; > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created. > {code} > * the replication is attempted again, but: > ** DN4 has the block > ** DN1 and/or DN2 have the block, but don't count towards the minimum > replication factor because they are being decommissioned > ** DN3 does not have the block & cannot have the block replicated to it > because of HDFS-721 > * the namenode repeatedly tries to replicate the block to DN3 & repeatedly > fails, this continues indefinitely > * therefore DN4 is the only live datanode with the block & the minimum > replication factor of 2 cannot be satisfied > * because the minimum replication factor cannot be satisfied for the > block(s) being moved off DN1 & DN2, the datanode decommissioning can never be > completed > {code:java} > 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > ... > 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > {code} > Being stuck in decommissioning state forever is not an intended behavior of > DataNode decommissioning > A few potential solutions: > * Address the root cause of the problem which is an inconsistent state > between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721 > * Detect when datanode decommissioning is stuck due to lack of available > datanodes for satisfying the minimum replication factor, then recover by > re-enabling the datanodes being decommissioned > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely
[ https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511158#comment-17511158 ] yanbin.zhang commented on HDFS-16064: - I think your root cause may not be here, we never seem to have this problem during our downline process. > HDFS-721 causes DataNode decommissioning to get stuck indefinitely > -- > > Key: HDFS-16064 > URL: https://issues.apache.org/jira/browse/HDFS-16064 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.2.1 >Reporter: Kevin Wikant >Priority: Major > > Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a > non-issue under the assumption that if the namenode & a datanode get into an > inconsistent state for a given block pipeline, there should be another > datanode available to replicate the block to > While testing datanode decommissioning using "dfs.exclude.hosts", I have > encountered a scenario where the decommissioning gets stuck indefinitely > Below is the progression of events: > * there are initially 4 datanodes DN1, DN2, DN3, DN4 > * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts" > * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in > order to satisfy their minimum replication factor of 2 > * during this replication process > https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes > the following inconsistent state: > ** DN3 thinks it has the block pipeline in FINALIZED state > ** the namenode does not think DN3 has the block pipeline > {code:java} > 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode > (DataXceiver for client at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): > DN3:9866:DataXceiver error processing WRITE_BLOCK operation src: /DN2:45654 > dst: /DN3:9866; > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created. > {code} > * the replication is attempted again, but: > ** DN4 has the block > ** DN1 and/or DN2 have the block, but don't count towards the minimum > replication factor because they are being decommissioned > ** DN3 does not have the block & cannot have the block replicated to it > because of HDFS-721 > * the namenode repeatedly tries to replicate the block to DN3 & repeatedly > fails, this continues indefinitely > * therefore DN4 is the only live datanode with the block & the minimum > replication factor of 2 cannot be satisfied > * because the minimum replication factor cannot be satisfied for the > block(s) being moved off DN1 & DN2, the datanode decommissioning can never be > completed > {code:java} > 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > ... > 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > {code} > Being stuck in decommissioning state forever is not an intended behavior of > DataNode decommissioning > A few potential solutions: > * Address the root cause of the problem which is an inconsistent state > between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721 > * Detect when datanode decommissioning is stuck due to lack of available > datanodes for satisfying the minimum replication factor, then recover by > re-enabling the datanodes being decommissioned > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16500) Make asynchronous blocks deletion lock and unlock durtion threshold configurable
[ https://issues.apache.org/jira/browse/HDFS-16500?focusedWorklogId=746413=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746413 ] ASF GitHub Bot logged work on HDFS-16500: - Author: ASF GitHub Bot Created on: 23/Mar/22 08:49 Start Date: 23/Mar/22 08:49 Worklog Time Spent: 10m Work Description: smarthanwang commented on pull request #4061: URL: https://github.com/apache/hadoop/pull/4061#issuecomment-1076101468 Hi @Hexiaoqiao, do you have any suggestion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746413) Time Spent: 1h 40m (was: 1.5h) > Make asynchronous blocks deletion lock and unlock durtion threshold > configurable > - > > Key: HDFS-16500 > URL: https://issues.apache.org/jira/browse/HDFS-16500 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Reporter: Chengwei Wang >Assignee: Chengwei Wang >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > I have backport the nice feature HDFS-16043 to our internal branch, it works > well in our testing cluster. > I think it's better to make the fields *_deleteBlockLockTimeMs_* and > *_deleteBlockUnlockIntervalTimeMs_* configurable, so that we can control the > lock and unlock duration. > {code:java} > private final long deleteBlockLockTimeMs = 500; > private final long deleteBlockUnlockIntervalTimeMs = 100;{code} > And we should set the default value smaller to avoid blocking other requests > long time when deleting some large directories. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16514) Reduce the failover sleep time if multiple namenode are configured
[ https://issues.apache.org/jira/browse/HDFS-16514?focusedWorklogId=746391=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746391 ] ASF GitHub Bot logged work on HDFS-16514: - Author: ASF GitHub Bot Created on: 23/Mar/22 08:13 Start Date: 23/Mar/22 08:13 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #4088: URL: https://github.com/apache/hadoop/pull/4088#issuecomment-1076063628 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 39s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 43s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 22m 56s | | trunk passed | | +1 :green_heart: | compile | 22m 50s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 20m 0s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 3m 38s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 56s | | trunk passed | | +1 :green_heart: | javadoc | 2m 12s | | trunk passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 2m 43s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 5m 11s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 1s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 28s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 53s | | the patch passed | | +1 :green_heart: | compile | 23m 32s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 23m 32s | | the patch passed | | +1 :green_heart: | compile | 20m 42s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 20m 42s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 3m 37s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4088/3/artifact/out/results-checkstyle-root.txt) | root: The patch generated 1 new + 45 unchanged - 0 fixed = 46 total (was 45) | | +1 :green_heart: | mvnsite | 2m 49s | | the patch passed | | +1 :green_heart: | javadoc | 1m 54s | | the patch passed with JDK Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 2m 29s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 5m 43s | | the patch passed | | +1 :green_heart: | shadedclient | 25m 54s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 19m 10s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 2m 48s | | hadoop-hdfs-client in the patch passed. | | +1 :green_heart: | asflicense | 0m 58s | | The patch does not generate ASF License warnings. | | | | 233m 16s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4088/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4088 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux e7152761be30 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 140dad88d65a6de47eda8e784d35f545922e7cce | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04
[jira] [Work logged] (HDFS-16434) Add opname to read/write lock for remaining operations
[ https://issues.apache.org/jira/browse/HDFS-16434?focusedWorklogId=746386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746386 ] ASF GitHub Bot logged work on HDFS-16434: - Author: ASF GitHub Bot Created on: 23/Mar/22 07:43 Start Date: 23/Mar/22 07:43 Worklog Time Spent: 10m Work Description: tomscut commented on a change in pull request #3915: URL: https://github.com/apache/hadoop/pull/3915#discussion_r832946641 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Namesystem.java ## @@ -63,4 +63,16 @@ * directories. Create them if not. */ void checkAndProvisionSnapshotTrashRoots(); + + /** + * Release read lock with operation name. + * @param opName + */ + void readUnlock(String opName); + + /** + * Release write lock with operation name. + * @param opName + */ + void writeUnlock(String opName); Review comment: Thanks @tasanuma for your review and comment. I agree with you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746386) Time Spent: 1h 20m (was: 1h 10m) > Add opname to read/write lock for remaining operations > -- > > Key: HDFS-16434 > URL: https://issues.apache.org/jira/browse/HDFS-16434 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > In this issue at > [HDFS-10872|https://issues.apache.org/jira/browse/HDFS-10872], we add opname > to read and write locks. However, there are still many operations that have > not been completed. When analyzing some operations that hold locks for a long > time, we can only find specific methods through stack. I suggest that these > remaining operations be completed to facilitate later performance > optimization. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16434) Add opname to read/write lock for remaining operations
[ https://issues.apache.org/jira/browse/HDFS-16434?focusedWorklogId=746384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746384 ] ASF GitHub Bot logged work on HDFS-16434: - Author: ASF GitHub Bot Created on: 23/Mar/22 07:24 Start Date: 23/Mar/22 07:24 Worklog Time Spent: 10m Work Description: tasanuma commented on a change in pull request #3915: URL: https://github.com/apache/hadoop/pull/3915#discussion_r832926344 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Namesystem.java ## @@ -63,4 +63,16 @@ * directories. Create them if not. */ void checkAndProvisionSnapshotTrashRoots(); + + /** + * Release read lock with operation name. + * @param opName + */ + void readUnlock(String opName); + + /** + * Release write lock with operation name. + * @param opName + */ + void writeUnlock(String opName); Review comment: How about moving the new methods to RwLock since it has all lock-related methods? Although RwLock is not `@InterfaceAudience.Private`, I think we can add new methods there if the target version is 3.4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746384) Time Spent: 1h 10m (was: 1h) > Add opname to read/write lock for remaining operations > -- > > Key: HDFS-16434 > URL: https://issues.apache.org/jira/browse/HDFS-16434 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > In this issue at > [HDFS-10872|https://issues.apache.org/jira/browse/HDFS-10872], we add opname > to read and write locks. However, there are still many operations that have > not been completed. When analyzing some operations that hold locks for a long > time, we can only find specific methods through stack. I suggest that these > remaining operations be completed to facilitate later performance > optimization. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16501) Print the exception when reporting a bad block
[ https://issues.apache.org/jira/browse/HDFS-16501?focusedWorklogId=746371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746371 ] ASF GitHub Bot logged work on HDFS-16501: - Author: ASF GitHub Bot Created on: 23/Mar/22 06:32 Start Date: 23/Mar/22 06:32 Worklog Time Spent: 10m Work Description: liubingxing commented on pull request #4062: URL: https://github.com/apache/hadoop/pull/4062#issuecomment-1075971790 Thanks @tasanuma and @tomscut -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746371) Time Spent: 1h 10m (was: 1h) > Print the exception when reporting a bad block > -- > > Key: HDFS-16501 > URL: https://issues.apache.org/jira/browse/HDFS-16501 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: qinyuren >Assignee: qinyuren >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.3 > > Attachments: image-2022-03-10-19-27-31-622.png > > Time Spent: 1h 10m > Remaining Estimate: 0h > > !image-2022-03-10-19-27-31-622.png|width=847,height=27! > Currently, volumeScanner will find bad block and report it to namenode > without printing the reason why the block is a bad block. I think we should > be better print the exception in log file. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16501) Print the exception when reporting a bad block
[ https://issues.apache.org/jira/browse/HDFS-16501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma resolved HDFS-16501. - Fix Version/s: 3.4.0 3.2.4 3.3.3 Assignee: qinyuren Resolution: Fixed > Print the exception when reporting a bad block > -- > > Key: HDFS-16501 > URL: https://issues.apache.org/jira/browse/HDFS-16501 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: qinyuren >Assignee: qinyuren >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.3 > > Attachments: image-2022-03-10-19-27-31-622.png > > Time Spent: 1h > Remaining Estimate: 0h > > !image-2022-03-10-19-27-31-622.png|width=847,height=27! > Currently, volumeScanner will find bad block and report it to namenode > without printing the reason why the block is a bad block. I think we should > be better print the exception in log file. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress
[ https://issues.apache.org/jira/browse/HDFS-16507?focusedWorklogId=746367=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746367 ] ASF GitHub Bot logged work on HDFS-16507: - Author: ASF GitHub Bot Created on: 23/Mar/22 06:06 Start Date: 23/Mar/22 06:06 Worklog Time Spent: 10m Work Description: tomscut commented on a change in pull request #4082: URL: https://github.com/apache/hadoop/pull/4082#discussion_r832879620 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java ## @@ -1509,13 +1509,18 @@ synchronized void abortCurrentLogSegment() { * effect. */ @Override - public synchronized void purgeLogsOlderThan(final long minTxIdToKeep) { + public synchronized void purgeLogsOlderThan(long minTxIdToKeep) { // Should not purge logs unless they are open for write. // This prevents the SBN from purging logs on shared storage, for example. if (!isOpenForWrite()) { return; } - + +// Reset purgeLogsFrom to avoid purging edit log which is in progress. +if (isSegmentOpen()) { + minTxIdToKeep = minTxIdToKeep > curSegmentTxId ? curSegmentTxId : minTxIdToKeep; Review comment: Hi @sunchao @tasanuma , could you please take a look at this discussion. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746367) Time Spent: 2h 20m (was: 2h 10m) > [SBN read] Avoid purging edit log which is in progress > -- > > Key: HDFS-16507 > URL: https://issues.apache.org/jira/browse/HDFS-16507 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: tomscut >Priority: Critical > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL > exception. It looks like it's purging edit logs which is in process. > According to the analysis, I suspect that the editlog which is in progress to > be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN > rolls edit its self. > The stack: > {code:java} > java.lang.Thread.getStackTrace(Thread.java:1552) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > > org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185) > > org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623) > > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388) > > org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620) > > org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512) > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177) > > org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515) > javax.servlet.http.HttpServlet.service(HttpServlet.java:710) > javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > >
[jira] [Work logged] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress
[ https://issues.apache.org/jira/browse/HDFS-16507?focusedWorklogId=746366=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746366 ] ASF GitHub Bot logged work on HDFS-16507: - Author: ASF GitHub Bot Created on: 23/Mar/22 06:05 Start Date: 23/Mar/22 06:05 Worklog Time Spent: 10m Work Description: tomscut commented on a change in pull request #4082: URL: https://github.com/apache/hadoop/pull/4082#discussion_r832879620 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java ## @@ -1509,13 +1509,18 @@ synchronized void abortCurrentLogSegment() { * effect. */ @Override - public synchronized void purgeLogsOlderThan(final long minTxIdToKeep) { + public synchronized void purgeLogsOlderThan(long minTxIdToKeep) { // Should not purge logs unless they are open for write. // This prevents the SBN from purging logs on shared storage, for example. if (!isOpenForWrite()) { return; } - + +// Reset purgeLogsFrom to avoid purging edit log which is in progress. +if (isSegmentOpen()) { + minTxIdToKeep = minTxIdToKeep > curSegmentTxId ? curSegmentTxId : minTxIdToKeep; Review comment: Hi @sunchao @tasanuma , could you please take a look at this discussion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746366) Time Spent: 2h 10m (was: 2h) > [SBN read] Avoid purging edit log which is in progress > -- > > Key: HDFS-16507 > URL: https://issues.apache.org/jira/browse/HDFS-16507 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: tomscut >Priority: Critical > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL > exception. It looks like it's purging edit logs which is in process. > According to the analysis, I suspect that the editlog which is in progress to > be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN > rolls edit its self. > The stack: > {code:java} > java.lang.Thread.getStackTrace(Thread.java:1552) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > > org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185) > > org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623) > > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388) > > org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620) > > org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512) > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177) > > org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515) > javax.servlet.http.HttpServlet.service(HttpServlet.java:710) > javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > >
[jira] [Work logged] (HDFS-16501) Print the exception when reporting a bad block
[ https://issues.apache.org/jira/browse/HDFS-16501?focusedWorklogId=746365=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746365 ] ASF GitHub Bot logged work on HDFS-16501: - Author: ASF GitHub Bot Created on: 23/Mar/22 06:04 Start Date: 23/Mar/22 06:04 Worklog Time Spent: 10m Work Description: tasanuma commented on pull request #4062: URL: https://github.com/apache/hadoop/pull/4062#issuecomment-1075951712 Sorry for being late. Thanks for your contribution, @liubingxing, and thanks for your review, @tomscut. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746365) Time Spent: 1h (was: 50m) > Print the exception when reporting a bad block > -- > > Key: HDFS-16501 > URL: https://issues.apache.org/jira/browse/HDFS-16501 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: qinyuren >Priority: Major > Labels: pull-request-available > Attachments: image-2022-03-10-19-27-31-622.png > > Time Spent: 1h > Remaining Estimate: 0h > > !image-2022-03-10-19-27-31-622.png|width=847,height=27! > Currently, volumeScanner will find bad block and report it to namenode > without printing the reason why the block is a bad block. I think we should > be better print the exception in log file. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16501) Print the exception when reporting a bad block
[ https://issues.apache.org/jira/browse/HDFS-16501?focusedWorklogId=746364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746364 ] ASF GitHub Bot logged work on HDFS-16501: - Author: ASF GitHub Bot Created on: 23/Mar/22 06:03 Start Date: 23/Mar/22 06:03 Worklog Time Spent: 10m Work Description: tasanuma merged pull request #4062: URL: https://github.com/apache/hadoop/pull/4062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 746364) Time Spent: 50m (was: 40m) > Print the exception when reporting a bad block > -- > > Key: HDFS-16501 > URL: https://issues.apache.org/jira/browse/HDFS-16501 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: qinyuren >Priority: Major > Labels: pull-request-available > Attachments: image-2022-03-10-19-27-31-622.png > > Time Spent: 50m > Remaining Estimate: 0h > > !image-2022-03-10-19-27-31-622.png|width=847,height=27! > Currently, volumeScanner will find bad block and report it to namenode > without printing the reason why the block is a bad block. I think we should > be better print the exception in log file. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org