[jira] [Work logged] (HDFS-16446) Consider ioutils of disk when choosing volume

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16446?focusedWorklogId=746947=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746947
 ]

ASF GitHub Bot logged work on HDFS-16446:
-

Author: ASF GitHub Bot
Created on: 24/Mar/22 04:14
Start Date: 24/Mar/22 04:14
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3960:
URL: https://github.com/apache/hadoop/pull/3960#issuecomment-1077050201


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m  0s |  |  Docker mode activated.  |
   | -1 :x: |  docker  |  14m 54s |  |  Docker failed to build 
yetus/hadoop:13467f45240.  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/3960 |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3960/4/console |
   | versions | git=2.17.1 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746947)
Time Spent: 1h 50m  (was: 1h 40m)

> Consider ioutils of disk when choosing volume
> -
>
> Key: HDFS-16446
> URL: https://issues.apache.org/jira/browse/HDFS-16446
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-02-05-09-50-12-241.png
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Consider ioutils of disk when choosing volume.
> Principle is as follows:
> !image-2022-02-05-09-50-12-241.png|width=309,height=159!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16446) Consider ioutils of disk when choosing volume

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16446?focusedWorklogId=746946=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746946
 ]

ASF GitHub Bot logged work on HDFS-16446:
-

Author: ASF GitHub Bot
Created on: 24/Mar/22 04:08
Start Date: 24/Mar/22 04:08
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3960:
URL: https://github.com/apache/hadoop/pull/3960#issuecomment-1077047322


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m  0s |  |  Docker mode activated.  |
   | -1 :x: |  docker  |  19m 36s |  |  Docker failed to build 
yetus/hadoop:13467f45240.  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/3960 |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3960/3/console |
   | versions | git=2.17.1 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746946)
Time Spent: 1h 40m  (was: 1.5h)

> Consider ioutils of disk when choosing volume
> -
>
> Key: HDFS-16446
> URL: https://issues.apache.org/jira/browse/HDFS-16446
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-02-05-09-50-12-241.png
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Consider ioutils of disk when choosing volume.
> Principle is as follows:
> !image-2022-02-05-09-50-12-241.png|width=309,height=159!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16519) Add throttler to EC reconstruction

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16519?focusedWorklogId=746941=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746941
 ]

ASF GitHub Bot logged work on HDFS-16519:
-

Author: ASF GitHub Bot
Created on: 24/Mar/22 03:51
Start Date: 24/Mar/22 03:51
Worklog Time Spent: 10m 
  Work Description: cndaimin opened a new pull request #4101:
URL: https://github.com/apache/hadoop/pull/4101


   HDFS already have throttlers for data transfer(replication) and balancer, 
the throttlers reduce the impact of these background procedures to user 
read/write.
   We should add a throttler to EC background reconstruction too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746941)
Remaining Estimate: 0h
Time Spent: 10m

> Add throttler to EC reconstruction
> --
>
> Key: HDFS-16519
> URL: https://issues.apache.org/jira/browse/HDFS-16519
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, ec
>Affects Versions: 3.3.1, 3.3.2
>Reporter: daimin
>Assignee: daimin
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HDFS already have throttlers for data transfer(replication) and balancer, the 
> throttlers reduce the impact of these background procedures to user 
> read/write.
> We should add a throttler to EC background reconstruction too.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16519) Add throttler to EC reconstruction

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16519:
--
Labels: pull-request-available  (was: )

> Add throttler to EC reconstruction
> --
>
> Key: HDFS-16519
> URL: https://issues.apache.org/jira/browse/HDFS-16519
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, ec
>Affects Versions: 3.3.1, 3.3.2
>Reporter: daimin
>Assignee: daimin
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HDFS already have throttlers for data transfer(replication) and balancer, the 
> throttlers reduce the impact of these background procedures to user 
> read/write.
> We should add a throttler to EC background reconstruction too.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16519) Add throttler to EC reconstruction

2022-03-23 Thread daimin (Jira)
daimin created HDFS-16519:
-

 Summary: Add throttler to EC reconstruction
 Key: HDFS-16519
 URL: https://issues.apache.org/jira/browse/HDFS-16519
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, ec
Affects Versions: 3.3.2, 3.3.1
Reporter: daimin
Assignee: daimin


HDFS already have throttlers for data transfer(replication) and balancer, the 
throttlers reduce the impact of these background procedures to user read/write.
We should add a throttler to EC background reconstruction too.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2022-03-23 Thread tomscut (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511575#comment-17511575
 ] 

tomscut commented on HDFS-13671:


Hi [~max2049] , we are still using CMS on a cluster without EC data, some 
parameter adjustment should be able to solve this problem.

And how long is your FBR period? If it is 6 hours(default) and the cluster size 
is large, it may have an impact on GC. We set this to 3 days.

We use G1GC on a cluster with this feature that uses EC data. The main 
parameters(open JDK 1.8) are as follows:
{code:java}
-server -Xmx200g -Xms200g 
-XX:MaxDirectMemorySize=2g 
-XX:MaxMetaspaceSize=2g 
-XX:MetaspaceSize=1g 
-XX:+UseG1GC -XX:+UnlockExperimentalVMOptions 
-XX:InitiatingHeapOccupancyPercent=75 
-XX:G1NewSizePercent=0 -XX:G1MaxNewSizePercent=3 
-XX:SurvivorRatio=2 -XX:+DisableExplicitGC -XX:MaxTenuringThreshold=15 
-XX:-UseBiasedLocking -XX:ParallelGCThreads=40 -XX:ConcGCThreads=20 
-XX:MaxJavaStackTraceDepth=100 -XX:MaxGCPauseMillis=200 
-verbose:gc -XX:+UnlockDiagnosticVMOptions -XX:+PrintGCDetails 
-XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCCause -XX:+PrintGCDateStamps 
-XX:+PrintReferenceGC -XX:+PrintHeapAtGC -XX:+PrintAdaptiveSizePolicy 
-XX:+G1PrintHeapRegions -XX:+PrintTenuringDistribution 
-Xloggc:/data1/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'`" {code}
 

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png, image-2021-06-18-15-46-46-052.png, 
> image-2021-06-18-15-47-04-037.png
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree 

[jira] [Commented] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2022-03-23 Thread Max Xie (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511573#comment-17511573
 ] 

Max  Xie commented on HDFS-13671:
-

[~tomscut]  Hi , Would you share the parameters of G1 GC for Namenode?  After 
our cluster use branch 3.3.0 with this patch, without EC data now, GC 
performance become poor. Thank you. 

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png, image-2021-06-18-15-46-46-052.png, 
> image-2021-06-18-15-47-04-037.png
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16498) Fix NPE for checkBlockReportLease

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16498?focusedWorklogId=746935=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746935
 ]

ASF GitHub Bot logged work on HDFS-16498:
-

Author: ASF GitHub Bot
Created on: 24/Mar/22 03:24
Start Date: 24/Mar/22 03:24
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #4057:
URL: https://github.com/apache/hadoop/pull/4057#issuecomment-1077029230


   Hi @Hexiaoqiao @tasanuma @ferhui , could you also please review this? Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746935)
Time Spent: 3h 20m  (was: 3h 10m)

> Fix NPE for checkBlockReportLease
> -
>
> Key: HDFS-16498
> URL: https://issues.apache.org/jira/browse/HDFS-16498
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-03-09-20-35-22-028.png, screenshot-1.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> During the restart of Namenode, a Datanode is not registered, but this 
> Datanode triggers FBR, which causes NPE.
> !image-2022-03-09-20-35-22-028.png|width=871,height=158!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16518) Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?focusedWorklogId=746929=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746929
 ]

ASF GitHub Bot logged work on HDFS-16518:
-

Author: ASF GitHub Bot
Created on: 24/Mar/22 02:39
Start Date: 24/Mar/22 02:39
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #4100:
URL: https://github.com/apache/hadoop/pull/4100#issuecomment-1077009230


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 50s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 19s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  25m 42s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m 34s |  |  trunk passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   6m 13s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 47s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 53s |  |  trunk passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 22s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m 44s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 57s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 23s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  8s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 27s |  |  the patch passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   6m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m  2s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   6m  2s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  7s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4100/1/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 2 new + 43 unchanged - 0 fixed = 
45 total (was 43)  |
   | +1 :green_heart: |  mvnsite  |   2m 14s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  4s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m 22s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m  3s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 17s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 335m 49s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 38s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 483m 40s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4100/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4100 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 8113c28b4d11 4.15.0-163-generic #171-Ubuntu SMP Fri Nov 5 
11:55:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 6b9414bc8efd322aaac25eea6cb5598c53db7b5d |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  

[jira] [Updated] (HDFS-16484) [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread

2022-03-23 Thread qinyuren (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qinyuren updated HDFS-16484:

Description: 
Currently, we ran SPS in our cluster and found this log. The SPSPathIdProcessor 
thread enters an infinite loop and prints the same log all the time.

!image-2022-02-25-14-35-42-255.png|width=682,height=195!

In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, 
then the SPSPathIdProcessor thread entry infinite loop and can't work normally. 

The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not 
exist. The inodeId will not be set to null, causing the thread hold this 
inodeId forever.
{code:java}
public void run() {
  LOG.info("Starting SPSPathIdProcessor!.");
  Long startINode = null;
  while (ctxt.isRunning()) {
try {
  if (!ctxt.isInSafeMode()) {
if (startINode == null) {
  startINode = ctxt.getNextSPSPath();
} // else same id will be retried
if (startINode == null) {
  // Waiting for SPS path
  Thread.sleep(3000);
} else {
  ctxt.scanAndCollectFiles(startINode);
  // check if directory was empty and no child added to queue
  DirPendingWorkInfo dirPendingWorkInfo =
  pendingWorkForDirectory.get(startINode);
  if (dirPendingWorkInfo != null
  && dirPendingWorkInfo.isDirWorkDone()) {
ctxt.removeSPSHint(startINode);
pendingWorkForDirectory.remove(startINode);
  }
}
startINode = null; // Current inode successfully scanned.
  }
} catch (Throwable t) {
  String reClass = t.getClass().getName();
  if (InterruptedException.class.getName().equals(reClass)) {
LOG.info("SPSPathIdProcessor thread is interrupted. Stopping..");
break;
  }
  LOG.warn("Exception while scanning file inodes to satisfy the policy",
  t);
  try {
Thread.sleep(3000);
  } catch (InterruptedException e) {
LOG.info("Interrupted while waiting in SPSPathIdProcessor", t);
break;
  }
}
  }
} {code}
 

 

  was:
In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, 
then the SPSPathIdProcessor thread entry infinite loop and can't work normally. 

!image-2022-02-25-14-35-42-255.png|width=682,height=195!


> [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread 
> -
>
> Key: HDFS-16484
> URL: https://issues.apache.org/jira/browse/HDFS-16484
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-02-25-14-35-42-255.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, we ran SPS in our cluster and found this log. The 
> SPSPathIdProcessor thread enters an infinite loop and prints the same log all 
> the time.
> !image-2022-02-25-14-35-42-255.png|width=682,height=195!
> In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, 
> then the SPSPathIdProcessor thread entry infinite loop and can't work 
> normally. 
> The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not 
> exist. The inodeId will not be set to null, causing the thread hold this 
> inodeId forever.
> {code:java}
> public void run() {
>   LOG.info("Starting SPSPathIdProcessor!.");
>   Long startINode = null;
>   while (ctxt.isRunning()) {
> try {
>   if (!ctxt.isInSafeMode()) {
> if (startINode == null) {
>   startINode = ctxt.getNextSPSPath();
> } // else same id will be retried
> if (startINode == null) {
>   // Waiting for SPS path
>   Thread.sleep(3000);
> } else {
>   ctxt.scanAndCollectFiles(startINode);
>   // check if directory was empty and no child added to queue
>   DirPendingWorkInfo dirPendingWorkInfo =
>   pendingWorkForDirectory.get(startINode);
>   if (dirPendingWorkInfo != null
>   && dirPendingWorkInfo.isDirWorkDone()) {
> ctxt.removeSPSHint(startINode);
> pendingWorkForDirectory.remove(startINode);
>   }
> }
> startINode = null; // Current inode successfully scanned.
>   }
> } catch (Throwable t) {
>   String reClass = t.getClass().getName();
>   if (InterruptedException.class.getName().equals(reClass)) {
> LOG.info("SPSPathIdProcessor thread is interrupted. Stopping..");
> break;
>   }
>   LOG.warn("Exception while scanning file inodes to satisfy the policy",
>   t);
>   try {
> Thread.sleep(3000);
>   } catch (InterruptedException e) {
> LOG.info("Interrupted while waiting 

[jira] [Work logged] (HDFS-16518) Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?focusedWorklogId=746923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746923
 ]

ASF GitHub Bot logged work on HDFS-16518:
-

Author: ASF GitHub Bot
Created on: 24/Mar/22 02:10
Start Date: 24/Mar/22 02:10
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #4100:
URL: https://github.com/apache/hadoop/pull/4100#issuecomment-1076995582


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 44s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 32s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  25m 30s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   5m 54s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 21s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   5m 45s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 55s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   6m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   6m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 48s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   5m 48s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  4s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4100/3/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 2 new + 0 unchanged - 0 fixed = 2 
total (was 0)  |
   | +1 :green_heart: |  mvnsite  |   2m 10s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  the patch passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 57s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   5m 59s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  24m 50s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 26s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 239m 48s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 49s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 381m 23s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4100/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4100 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux e899737cf03e 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 81cbc541ee5101332e9038e7f620c255e9cc01f9 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  

[jira] [Work logged] (HDFS-16518) Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?focusedWorklogId=746920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746920
 ]

ASF GitHub Bot logged work on HDFS-16518:
-

Author: ASF GitHub Bot
Created on: 24/Mar/22 01:55
Start Date: 24/Mar/22 01:55
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #4100:
URL: https://github.com/apache/hadoop/pull/4100#issuecomment-1076988908


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 42s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  1s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 33s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  24m  7s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m 29s |  |  trunk passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   5m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 13s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m 23s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  24m 15s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 28s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  4s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 50s |  |  the patch passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   5m 50s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 48s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   5m 48s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  0s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4100/2/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 3 new + 0 unchanged - 0 fixed = 3 
total (was 0)  |
   | +1 :green_heart: |  mvnsite  |   2m 15s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  4s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m 42s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 46s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 20s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 242m 26s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 44s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 382m  2s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4100/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4100 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 34bde18cbf15 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / e6e0f0165f2be565f6da8e720a5a3ef094b73036 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  

[jira] [Resolved] (HDFS-16517) In 2.10 the distance metric is wrong for non-DN machines

2022-03-23 Thread Owen O'Malley (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HDFS-16517.
--
Fix Version/s: 2.10.2
   Resolution: Fixed

> In 2.10 the distance metric is wrong for non-DN machines
> 
>
> Key: HDFS-16517
> URL: https://issues.apache.org/jira/browse/HDFS-16517
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.10.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In 2.10, the metric for distance between the client and the data node is 
> wrong for machines that aren't running data nodes (ie. 
> getWeightUsingNetworkLocation). The code works correctly in 3.3+. 
> Currently
>  
> ||Client||DataNode||getWeight||getWeightUsingNetworkLocation||
> |/rack1/node1|/rack1/node1|0|0|
> |/rack1/node1|/rack1/node2|2|2|
> |/rack1/node1|/rack2/node2|4|2|
> |/pod1/rack1/node1|/pod1/rack1/node2|2|2|
> |/pod1/rack1/node1|/pod1/rack2/node2|4|2|
> |/pod1/rack1/node1|/pod2/rack2/node2|6|4|
>  
> This bug will destroy data locality on clusters where the clients share racks 
> with DataNodes, but are running on machines that aren't running DataNodes, 
> such as striping federated HDFS clusters across racks.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16517) In 2.10 the distance metric is wrong for non-DN machines

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16517?focusedWorklogId=746881=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746881
 ]

ASF GitHub Bot logged work on HDFS-16517:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 21:54
Start Date: 23/Mar/22 21:54
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #4091:
URL: https://github.com/apache/hadoop/pull/4091#issuecomment-1076857473


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m  0s |  |  Docker mode activated.  |
   | -1 :x: |  patch  |   0m 19s |  |  
https://github.com/apache/hadoop/pull/4091 does not apply to branch-2.10. 
Rebase required? Wrong Branch? See 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.  
|
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/4091 |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4091/5/console |
   | versions | git=2.17.1 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746881)
Time Spent: 1.5h  (was: 1h 20m)

> In 2.10 the distance metric is wrong for non-DN machines
> 
>
> Key: HDFS-16517
> URL: https://issues.apache.org/jira/browse/HDFS-16517
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> In 2.10, the metric for distance between the client and the data node is 
> wrong for machines that aren't running data nodes (ie. 
> getWeightUsingNetworkLocation). The code works correctly in 3.3+. 
> Currently
>  
> ||Client||DataNode||getWeight||getWeightUsingNetworkLocation||
> |/rack1/node1|/rack1/node1|0|0|
> |/rack1/node1|/rack1/node2|2|2|
> |/rack1/node1|/rack2/node2|4|2|
> |/pod1/rack1/node1|/pod1/rack1/node2|2|2|
> |/pod1/rack1/node1|/pod1/rack2/node2|4|2|
> |/pod1/rack1/node1|/pod2/rack2/node2|6|4|
>  
> This bug will destroy data locality on clusters where the clients share racks 
> with DataNodes, but are running on machines that aren't running DataNodes, 
> such as striping federated HDFS clusters across racks.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16517) In 2.10 the distance metric is wrong for non-DN machines

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16517?focusedWorklogId=746880=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746880
 ]

ASF GitHub Bot logged work on HDFS-16517:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 21:52
Start Date: 23/Mar/22 21:52
Worklog Time Spent: 10m 
  Work Description: omalley merged pull request #4091:
URL: https://github.com/apache/hadoop/pull/4091


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746880)
Time Spent: 1h 20m  (was: 1h 10m)

> In 2.10 the distance metric is wrong for non-DN machines
> 
>
> Key: HDFS-16517
> URL: https://issues.apache.org/jira/browse/HDFS-16517
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In 2.10, the metric for distance between the client and the data node is 
> wrong for machines that aren't running data nodes (ie. 
> getWeightUsingNetworkLocation). The code works correctly in 3.3+. 
> Currently
>  
> ||Client||DataNode||getWeight||getWeightUsingNetworkLocation||
> |/rack1/node1|/rack1/node1|0|0|
> |/rack1/node1|/rack1/node2|2|2|
> |/rack1/node1|/rack2/node2|4|2|
> |/pod1/rack1/node1|/pod1/rack1/node2|2|2|
> |/pod1/rack1/node1|/pod1/rack2/node2|4|2|
> |/pod1/rack1/node1|/pod2/rack2/node2|6|4|
>  
> This bug will destroy data locality on clusters where the clients share racks 
> with DataNodes, but are running on machines that aren't running DataNodes, 
> such as striping federated HDFS clusters across racks.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16517) In 2.10 the distance metric is wrong for non-DN machines

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16517?focusedWorklogId=746879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746879
 ]

ASF GitHub Bot logged work on HDFS-16517:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 21:47
Start Date: 23/Mar/22 21:47
Worklog Time Spent: 10m 
  Work Description: omalley commented on pull request #4091:
URL: https://github.com/apache/hadoop/pull/4091#issuecomment-1076852734


   When I created the PR, I hadn't found the upstream jira, which is 
https://issues.apache.org/jira/browse/HADOOP-16161 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746879)
Time Spent: 1h 10m  (was: 1h)

> In 2.10 the distance metric is wrong for non-DN machines
> 
>
> Key: HDFS-16517
> URL: https://issues.apache.org/jira/browse/HDFS-16517
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In 2.10, the metric for distance between the client and the data node is 
> wrong for machines that aren't running data nodes (ie. 
> getWeightUsingNetworkLocation). The code works correctly in 3.3+. 
> Currently
>  
> ||Client||DataNode||getWeight||getWeightUsingNetworkLocation||
> |/rack1/node1|/rack1/node1|0|0|
> |/rack1/node1|/rack1/node2|2|2|
> |/rack1/node1|/rack2/node2|4|2|
> |/pod1/rack1/node1|/pod1/rack1/node2|2|2|
> |/pod1/rack1/node1|/pod1/rack2/node2|4|2|
> |/pod1/rack1/node1|/pod2/rack2/node2|6|4|
>  
> This bug will destroy data locality on clusters where the clients share racks 
> with DataNodes, but are running on machines that aren't running DataNodes, 
> such as striping federated HDFS clusters across racks.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16518) Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?focusedWorklogId=746838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746838
 ]

ASF GitHub Bot logged work on HDFS-16518:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 19:49
Start Date: 23/Mar/22 19:49
Worklog Time Spent: 10m 
  Work Description: li-leyang commented on a change in pull request #4100:
URL: https://github.com/apache/hadoop/pull/4100#discussion_r833665038



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/KeyProviderCache.java
##
@@ -85,6 +90,26 @@ public KeyProvider call() throws Exception {
 }
   }
 
+  public static final int SHUTDOWN_HOOK_PRIORITY = 
FileSystem.SHUTDOWN_HOOK_PRIORITY - 1;
+
+  private class KeyProviderCacheFinalizer implements Runnable {
+@Override
+public synchronized void run() {
+  invalidateCache();
+}
+  }
+
+  /**
+   * Invalidate cache and auto close KeyProviders in the cache
+   */
+  @VisibleForTesting
+  synchronized void invalidateCache() {
+LOG.debug("Invalidating all cached KeyProviders in ShutdownHookManager.");

Review comment:
   fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746838)
Time Spent: 50m  (was: 40m)

> Cached KeyProvider in KeyProviderCache should be closed with 
> ShutdownHookManager
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We need to make sure the underlying KeyProvider used by multiple DFSClient 
> instances is closed at one shot during jvm shutdown. Within the shutdownhook, 
> we invalidate the cache and make sure they are all closed. The  cache has a 
> removeListener hook which is called when cache entry is invalidated. 
> {code:java}
> Class KeyProviderCache
> ...
>  public KeyProviderCache(long expiryMs) {
>   cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); 
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16518) Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?focusedWorklogId=746836=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746836
 ]

ASF GitHub Bot logged work on HDFS-16518:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 19:48
Start Date: 23/Mar/22 19:48
Worklog Time Spent: 10m 
  Work Description: li-leyang commented on a change in pull request #4100:
URL: https://github.com/apache/hadoop/pull/4100#discussion_r833664651



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/KeyProviderCache.java
##
@@ -85,6 +90,26 @@ public KeyProvider call() throws Exception {
 }
   }
 
+  public static final int SHUTDOWN_HOOK_PRIORITY = 
FileSystem.SHUTDOWN_HOOK_PRIORITY - 1;
+
+  private class KeyProviderCacheFinalizer implements Runnable {
+@Override
+public synchronized void run() {
+  invalidateCache();
+}
+  }
+
+  /**
+   * Invalidate cache and auto close KeyProviders in the cache
+   */
+  @VisibleForTesting
+  synchronized void invalidateCache() {
+LOG.debug("Invalidating all cached KeyProviders in ShutdownHookManager.");
+if (cache != null) {
+  cache.invalidateAll();

Review comment:
   Added




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746836)
Time Spent: 40m  (was: 0.5h)

> Cached KeyProvider in KeyProviderCache should be closed with 
> ShutdownHookManager
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We need to make sure the underlying KeyProvider used by multiple DFSClient 
> instances is closed at one shot during jvm shutdown. Within the shutdownhook, 
> we invalidate the cache and make sure they are all closed. The  cache has a 
> removeListener hook which is called when cache entry is invalidated. 
> {code:java}
> Class KeyProviderCache
> ...
>  public KeyProviderCache(long expiryMs) {
>   cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); 
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16511) Change some frequent method lock type in ReplicaMap.

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16511?focusedWorklogId=746835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746835
 ]

ASF GitHub Bot logged work on HDFS-16511:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 19:46
Start Date: 23/Mar/22 19:46
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #4085:
URL: https://github.com/apache/hadoop/pull/4085#issuecomment-1076753327


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 49s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 29s |  |  trunk passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 28s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 14s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 30s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  the patch passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 51s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  the patch passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 33s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 233m  4s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4085/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 45s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 333m 56s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4085/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4085 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 35866c5cf66e 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d0b8d1ab852fa228a20520a132868f1aeaa75b79 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4085/3/testReport/ |
   | Max. process+thread count | 3286 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 

[jira] [Work logged] (HDFS-16518) Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?focusedWorklogId=746830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746830
 ]

ASF GitHub Bot logged work on HDFS-16518:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 19:41
Start Date: 23/Mar/22 19:41
Worklog Time Spent: 10m 
  Work Description: ibuenros commented on a change in pull request #4100:
URL: https://github.com/apache/hadoop/pull/4100#discussion_r833658315



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/KeyProviderCache.java
##
@@ -85,6 +90,26 @@ public KeyProvider call() throws Exception {
 }
   }
 
+  public static final int SHUTDOWN_HOOK_PRIORITY = 
FileSystem.SHUTDOWN_HOOK_PRIORITY - 1;
+
+  private class KeyProviderCacheFinalizer implements Runnable {
+@Override
+public synchronized void run() {
+  invalidateCache();
+}
+  }
+
+  /**
+   * Invalidate cache and auto close KeyProviders in the cache
+   */
+  @VisibleForTesting
+  synchronized void invalidateCache() {
+LOG.debug("Invalidating all cached KeyProviders in ShutdownHookManager.");

Review comment:
   This log is technically not correct in that we don't know the call is 
coming form ShutdownHookManager. Maybe just remove the last two words from the 
log.

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/KeyProviderCache.java
##
@@ -85,6 +90,26 @@ public KeyProvider call() throws Exception {
 }
   }
 
+  public static final int SHUTDOWN_HOOK_PRIORITY = 
FileSystem.SHUTDOWN_HOOK_PRIORITY - 1;
+
+  private class KeyProviderCacheFinalizer implements Runnable {
+@Override
+public synchronized void run() {
+  invalidateCache();
+}
+  }
+
+  /**
+   * Invalidate cache and auto close KeyProviders in the cache
+   */
+  @VisibleForTesting
+  synchronized void invalidateCache() {
+LOG.debug("Invalidating all cached KeyProviders in ShutdownHookManager.");
+if (cache != null) {
+  cache.invalidateAll();

Review comment:
   Maybe add a comment that this will close the providers due to the cache 
hook?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746830)
Time Spent: 0.5h  (was: 20m)

> Cached KeyProvider in KeyProviderCache should be closed with 
> ShutdownHookManager
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We need to make sure the underlying KeyProvider used by multiple DFSClient 
> instances is closed at one shot during jvm shutdown. Within the shutdownhook, 
> we invalidate the cache and make sure they are all closed. The  cache has a 
> removeListener hook which is called when cache entry is invalidated. 
> {code:java}
> Class KeyProviderCache
> ...
>  public KeyProviderCache(long expiryMs) {
>   cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); 
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager

2022-03-23 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16518:

Description: 
We need to make sure the underlying KeyProvider used by multiple DFSClient 
instances is closed at one shot during jvm shutdown. Within the shutdownhook, 
we invalidate the cache and make sure they are all closed. The  cache has a 
removeListener hook which is called when cache entry is invalidated. 
{code:java}
Class KeyProviderCache

...
 public KeyProviderCache(long expiryMs) {
  cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); 
}{code}
 

  was:
The cache has ttl and can close KeyProvider when cache entry is expired but 
when DFSClient is closed, we also need to make sure the underlying KeyProvider 
used by DFSClient is closed as well. The  cache has a removeListener hook which 
is called when cache entry is removed. An alternative approach would be add 
shutdownhook at jvm shutdown and close all KeyProviders in the cache.
{code:java}
Class KeyProviderCache

...
 public KeyProviderCache(long expiryMs) {
  cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); 
}{code}
 


> Cached KeyProvider in KeyProviderCache should be closed with 
> ShutdownHookManager
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We need to make sure the underlying KeyProvider used by multiple DFSClient 
> instances is closed at one shot during jvm shutdown. Within the shutdownhook, 
> we invalidate the cache and make sure they are all closed. The  cache has a 
> removeListener hook which is called when cache entry is invalidated. 
> {code:java}
> Class KeyProviderCache
> ...
>  public KeyProviderCache(long expiryMs) {
>   cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); 
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache should be closed with ShutdownHookManager

2022-03-23 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16518:

Summary: Cached KeyProvider in KeyProviderCache should be closed with 
ShutdownHookManager  (was: Cached KeyProvider in KeyProviderCache does not get 
closed when DFSClient is closed )

> Cached KeyProvider in KeyProviderCache should be closed with 
> ShutdownHookManager
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The cache has ttl and can close KeyProvider when cache entry is expired but 
> when DFSClient is closed, we also need to make sure the underlying 
> KeyProvider used by DFSClient is closed as well. The  cache has a 
> removeListener hook which is called when cache entry is removed. An 
> alternative approach would be add shutdownhook at jvm shutdown and close all 
> KeyProviders in the cache.
> {code:java}
> Class KeyProviderCache
> ...
>  public KeyProviderCache(long expiryMs) {
>   cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); 
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?focusedWorklogId=746807=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746807
 ]

ASF GitHub Bot logged work on HDFS-16518:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 18:50
Start Date: 23/Mar/22 18:50
Worklog Time Spent: 10m 
  Work Description: ibuenros commented on pull request #4100:
URL: https://github.com/apache/hadoop/pull/4100#issuecomment-1076700669


   @li-leyang this change is invalidating the singleton cache every time a 
DFSClient is closed. I thought the intention was to use a shutdown hook to 
close key provider clients instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746807)
Time Spent: 20m  (was: 10m)

> Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is 
> closed 
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The cache has ttl and can close KeyProvider when cache entry is expired but 
> when DFSClient is closed, we also need to make sure the underlying 
> KeyProvider used by DFSClient is closed as well. The  cache has a 
> removeListener hook which is called when cache entry is removed. An 
> alternative approach would be add shutdownhook at jvm shutdown and close all 
> KeyProviders in the cache.
> {code:java}
> Class KeyProviderCache
> ...
>  public KeyProviderCache(long expiryMs) {
>   cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); 
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16517) In 2.10 the distance metric is wrong for non-DN machines

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16517?focusedWorklogId=746805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746805
 ]

ASF GitHub Bot logged work on HDFS-16517:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 18:47
Start Date: 23/Mar/22 18:47
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #4091:
URL: https://github.com/apache/hadoop/pull/4091#issuecomment-1076698479


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 44s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ branch-2.10 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  15m 27s |  |  branch-2.10 passed  |
   | +1 :green_heart: |  compile  |  15m 36s |  |  branch-2.10 passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10  |
   | +1 :green_heart: |  compile  |  11m 49s |  |  branch-2.10 passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 38s |  |  branch-2.10 passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  branch-2.10 passed  |
   | +1 :green_heart: |  javadoc  |   1m 21s |  |  branch-2.10 passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10  |
   | +1 :green_heart: |  javadoc  |   1m  5s |  |  branch-2.10 passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07  |
   | -1 :x: |  spotbugs  |   2m 11s | 
[/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4091/4/artifact/out/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html)
 |  hadoop-common-project/hadoop-common in branch-2.10 has 2 extant spotbugs 
warnings.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 47s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  14m 24s |  |  the patch passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10  |
   | +1 :green_heart: |  javac  |  14m 24s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  12m 15s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07  |
   | +1 :green_heart: |  javac  |  12m 15s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 22s |  |  the patch passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07  |
   | +1 :green_heart: |  spotbugs  |   2m 18s |  |  the patch passed  |
    _ Other Tests _ |
   | -1 :x: |  unit  |   9m 39s | 
[/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4091/4/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 51s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 100m  9s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.fs.sftp.TestSFTPFileSystem |
   |   | hadoop.io.compress.snappy.TestSnappyCompressorDecompressor |
   |   | hadoop.util.TestBasicDiskValidator |
   |   | hadoop.io.compress.TestCompressorDecompressor |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4091/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4091 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 4ecc0fa7fadb 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 
13:41:54 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-2.10 / c4db7feb4fe29a16eaaea907fafcf8f965bffb32 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/zulu-7-amd64:Azul Systems, 
Inc.-1.7.0_262-b10 /usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~18.04-b07 |
   

[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed

2022-03-23 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16518:

Description: 
The cache has ttl and can close KeyProvider when cache entry is expired but 
when DFSClient is closed, we also need to make sure the underlying KeyProvider 
used by DFSClient is closed as well. The  cache has a removeListener hook which 
is called when cache entry is removed. An alternative approach would be add 
shutdownhook at jvm shutdown and close all KeyProviders in the cache.
{code:java}
Class KeyProviderCache

...
 public KeyProviderCache(long expiryMs) {
  cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); 
}{code}
 

  was:
The cache has ttl and can close KeyProvider when cache entry is expired but 
when DFSClient is closed, we also need to make sure the underlying KeyProvider 
used by DFSClient is closed as well. The  cache has a removeListener hook which 
is called when cache entry is removed. 
{code:java}
Class KeyProviderCache

...
 public KeyProviderCache(long expiryMs) {
  cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); 
}{code}
 


> Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is 
> closed 
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The cache has ttl and can close KeyProvider when cache entry is expired but 
> when DFSClient is closed, we also need to make sure the underlying 
> KeyProvider used by DFSClient is closed as well. The  cache has a 
> removeListener hook which is called when cache entry is removed. An 
> alternative approach would be add shutdownhook at jvm shutdown and close all 
> KeyProviders in the cache.
> {code:java}
> Class KeyProviderCache
> ...
>  public KeyProviderCache(long expiryMs) {
>   cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); 
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed

2022-03-23 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16518:

Description: 
The cache has ttl and can close KeyProvider when cache entry is expired but 
when DFSClient is closed, we also need to make sure the underlying KeyProvider 
used by DFSClient is closed as well. The  cache has a removeListener hook which 
is called when cache entry is removed. 
{code:java}
Class KeyProviderCache

...
 public KeyProviderCache(long expiryMs) {
  cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); 
}{code}
 

  was:
The cache has ttl and can close KeyProvider when cache entry is expired but 
when DFSClient is closed, we also need to make sure the underlying KeyProvider 
used by DFSClient is closed as well. The  cache has a removeListener hook which 
is called when cache entry is removed. 
{code:java}
org.apache.hadoop.hdfs.KeyProviderCache  



Class KeyProviderCache

...
 public KeyProviderCache(long expiryMs) {
  cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); 
}{code}
 


> Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is 
> closed 
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The cache has ttl and can close KeyProvider when cache entry is expired but 
> when DFSClient is closed, we also need to make sure the underlying 
> KeyProvider used by DFSClient is closed as well. The  cache has a 
> removeListener hook which is called when cache entry is removed. 
> {code:java}
> Class KeyProviderCache
> ...
>  public KeyProviderCache(long expiryMs) {
>   cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); 
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?focusedWorklogId=746794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746794
 ]

ASF GitHub Bot logged work on HDFS-16518:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 18:34
Start Date: 23/Mar/22 18:34
Worklog Time Spent: 10m 
  Work Description: li-leyang opened a new pull request #4100:
URL: https://github.com/apache/hadoop/pull/4100


   
   
   ### Description of PR
   
   https://issues.apache.org/jira/browse/HDFS-16518
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746794)
Remaining Estimate: 0h
Time Spent: 10m

> Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is 
> closed 
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The cache has ttl and can close KeyProvider when cache entry is expired but 
> when DFSClient is closed, we also need to make sure the underlying 
> KeyProvider used by DFSClient is closed as well. The  cache has a 
> removeListener hook which is called when cache entry is removed. 
> {code:java}
> org.apache.hadoop.hdfs.KeyProviderCache  
> Class KeyProviderCache
> ...
>  public KeyProviderCache(long expiryMs) {
>   cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); 
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16518:
--
Labels: pull-request-available  (was: )

> Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is 
> closed 
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The cache has ttl and can close KeyProvider when cache entry is expired but 
> when DFSClient is closed, we also need to make sure the underlying 
> KeyProvider used by DFSClient is closed as well. The  cache has a 
> removeListener hook which is called when cache entry is removed. 
> {code:java}
> org.apache.hadoop.hdfs.KeyProviderCache  
> Class KeyProviderCache
> ...
>  public KeyProviderCache(long expiryMs) {
>   cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); 
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed

2022-03-23 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16518:

Description: 
The cache has ttl and can close KeyProvider when cache entry is expired but 
when DFSClient is closed, we also need to make sure the underlying KeyProvider 
used by DFSClient is closed as well. The  cache has a removeListener hook which 
is called when cache entry is removed. 
{code:java}
org.apache.hadoop.hdfs.KeyProviderCache  



Class KeyProviderCache

...
 public KeyProviderCache(long expiryMs) {
  cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); 
}{code}
 

  was:
The cache has ttl and can close KeyProvider when cache entry is expired but 
when DFSClient is closed, we also need to make sure the underlying KeyProvider 
used by DFSClient is closed as well. The  cache has a removeListener hook which 
is called when cache entry is removed. 
{code:java}
org.apache.hadoop.hdfs.KeyProviderCache


public KeyProviderCache(long expiryMs) {
  cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); 
}{code}
 


> Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is 
> closed 
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>
> The cache has ttl and can close KeyProvider when cache entry is expired but 
> when DFSClient is closed, we also need to make sure the underlying 
> KeyProvider used by DFSClient is closed as well. The  cache has a 
> removeListener hook which is called when cache entry is removed. 
> {code:java}
> org.apache.hadoop.hdfs.KeyProviderCache  
> Class KeyProviderCache
> ...
>  public KeyProviderCache(long expiryMs) {
>   cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); 
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed

2022-03-23 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16518:

Description: 
The cache has ttl and can close KeyProvider when cache entry is expired but 
when DFSClient is closed, we also need to make sure the underlying KeyProvider 
used by DFSClient is closed as well. The  cache has a removeListener hook which 
is called when cache entry is removed. 
{code:java}
org.apache.hadoop.hdfs.KeyProviderCache


public KeyProviderCache(long expiryMs) {
  cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); 
}{code}
 

  was:
The cache has ttl and can close KeyProvider when cache entry is expired but 
when DFSClient is closed, we also need to make sure the underlying KeyProvider 
used by DFSClient is closed as well. The  cache has a removeListener hook which 
is called when cache entry is removed. 
{code:java}
cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); {code}
 


> Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is 
> closed 
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>
> The cache has ttl and can close KeyProvider when cache entry is expired but 
> when DFSClient is closed, we also need to make sure the underlying 
> KeyProvider used by DFSClient is closed as well. The  cache has a 
> removeListener hook which is called when cache entry is removed. 
> {code:java}
> org.apache.hadoop.hdfs.KeyProviderCache
> public KeyProviderCache(long expiryMs) {
>   cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); 
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed

2022-03-23 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16518:

Description: 
The cache has ttl and can close KeyProvider when cache entry is expired but 
when DFSClient is closed, we also need to make sure the underlying KeyProvider 
used by DFSClient is closed as well. The  cache has a removeListener hook which 
is called when cache entry is removed. 
{code:java}
cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); {code}
 

  was:
The cache has ttl and can close KeyProvider when cache entry is expired but 
when DFSClient is closed, we also need to make sure the underlying KeyProvider 
used by DFSClient is closed as well. The  cache has a removeListener hook which 
is called when cache entry is removed.
{code:java}
cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); {code}
 


> Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is 
> closed 
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>
> The cache has ttl and can close KeyProvider when cache entry is expired but 
> when DFSClient is closed, we also need to make sure the underlying 
> KeyProvider used by DFSClient is closed as well. The  cache has a 
> removeListener hook which is called when cache entry is removed. 
> {code:java}
> cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed

2022-03-23 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16518:

Description: 
The cache has ttl and can close KeyProvider when cache entry is expired but 
when DFSClient is closed, we also need to make sure the underlying KeyProvider 
used by DFSClient is closed as well. The  cache has a removeListener hook which 
is called when cache entry is removed.
{code:java}
cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); {code}
 

  was:
The cache has ttl and can close KeyProvider when cache entry is expired but 
when DFSClient is closed, we also need to make sure the KeyProvider is closed 
as well. The  cache has a removeListener hook which is called when cache entry 
is removed.
{code:java}
cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); {code}
 


> Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is 
> closed 
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>
> The cache has ttl and can close KeyProvider when cache entry is expired but 
> when DFSClient is closed, we also need to make sure the underlying 
> KeyProvider used by DFSClient is closed as well. The  cache has a 
> removeListener hook which is called when cache entry is removed.
> {code:java}
> cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed

2022-03-23 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16518:

Description: 
The cache has ttl and can close KeyProvider when cache entry is expired but 
when DFSClient is closed, we also need to make sure the KeyProvider is closed 
as well. The  cache has a removeListener hook which is called when cache entry 
is removed.
{code:java}
cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); {code}
 

  was:
The cache has ttl and can close KeyProvider when cache entry is expired but 
when DFSClient is closed, we also need to make sure the KeyProvider is closed 
properly. The  cache has a removeListener hook which is called when cache entry 
is removed.
{code:java}
cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); {code}
 


> Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is 
> closed 
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>
> The cache has ttl and can close KeyProvider when cache entry is expired but 
> when DFSClient is closed, we also need to make sure the KeyProvider is closed 
> as well. The  cache has a removeListener hook which is called when cache 
> entry is removed.
> {code:java}
> cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed

2022-03-23 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16518:

Description: 
The cache has ttl and can close KeyProvider when cache entry is expired but 
when DFSClient is closed, we also need to make sure the KeyProvider is closed 
properly. The  cache has a removeListener hook which is called when cache entry 
is removed.
{code:java}
cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); {code}
 

  was:
The cache has ttl and can close KeyProvider when cache entry is expired but we 
also want to trigger close cached KeyProvider when DFSClient is closed.
{code:java}
cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); {code}
 


> Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is 
> closed 
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>
> The cache has ttl and can close KeyProvider when cache entry is expired but 
> when DFSClient is closed, we also need to make sure the KeyProvider is closed 
> properly. The  cache has a removeListener hook which is called when cache 
> entry is removed.
> {code:java}
> cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed

2022-03-23 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16518:

Description: 
The cache has ttl and can close KeyProvider when cache entry is expired but we 
also want to trigger close cached KeyProvider when DFSClient is closed.
{code:java}
cache = CacheBuilder.newBuilder()
.expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
.removalListener(new RemovalListener() {
  @Override
  public void onRemoval(
  @Nonnull RemovalNotification notification) {
try {
  assert notification.getValue() != null;
  notification.getValue().close();
} catch (Throwable e) {
  LOG.error(
  "Error closing KeyProvider with uri ["
  + notification.getKey() + "]", e);
}
  }
})
.build(); {code}
 

  was:
The cache has ttl and can close KeyProvider when cache entry is expired but we 
also want to close underlying KeyProvider when DFSClient is closed. 

 

 


> Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is 
> closed 
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>
> The cache has ttl and can close KeyProvider when cache entry is expired but 
> we also want to trigger close cached KeyProvider when DFSClient is closed.
> {code:java}
> cache = CacheBuilder.newBuilder()
> .expireAfterAccess(expiryMs, TimeUnit.MILLISECONDS)
> .removalListener(new RemovalListener() {
>   @Override
>   public void onRemoval(
>   @Nonnull RemovalNotification notification) {
> try {
>   assert notification.getValue() != null;
>   notification.getValue().close();
> } catch (Throwable e) {
>   LOG.error(
>   "Error closing KeyProvider with uri ["
>   + notification.getKey() + "]", e);
> }
>   }
> })
> .build(); {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed

2022-03-23 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16518:

Description: 
The cache has ttl and can close KeyProvider when cache entry is expired but we 
also want to close underlying KeyProvider when DFSClient is closed. 

 

 

  was:
The cache has ttl and can close KeyProvider when cache entry is expired.

In KeyProviderCache, we should add ShutdownHookManager to clean up all 
keyprovider instances in the cache when jvm shuts down.


> Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is 
> closed 
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>
> The cache has ttl and can close KeyProvider when cache entry is expired but 
> we also want to close underlying KeyProvider when DFSClient is closed. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16518) KeyProviderCache does not get closed when DFSClient is closed

2022-03-23 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16518:

Summary: KeyProviderCache does not get closed when DFSClient is closed  
(was: KeyProviderCache does not get closed when DFSCLient shutdown)

> KeyProviderCache does not get closed when DFSClient is closed
> -
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>
> The cache has ttl and can close KeyProvider when cache entry is expired.
> In KeyProviderCache, we should add ShutdownHookManager to clean up all 
> keyprovider instances in the cache when jvm shuts down.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16518) Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is closed

2022-03-23 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16518:

Summary: Cached KeyProvider in KeyProviderCache does not get closed when 
DFSClient is closed   (was: KeyProviderCache does not get closed when DFSClient 
is closed)

> Cached KeyProvider in KeyProviderCache does not get closed when DFSClient is 
> closed 
> 
>
> Key: HDFS-16518
> URL: https://issues.apache.org/jira/browse/HDFS-16518
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Major
>
> The cache has ttl and can close KeyProvider when cache entry is expired.
> In KeyProviderCache, we should add ShutdownHookManager to clean up all 
> keyprovider instances in the cache when jvm shuts down.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16434) Add opname to read/write lock for remaining operations

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16434?focusedWorklogId=746633=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746633
 ]

ASF GitHub Bot logged work on HDFS-16434:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 14:49
Start Date: 23/Mar/22 14:49
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3915:
URL: https://github.com/apache/hadoop/pull/3915#issuecomment-1076462287


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 43s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  36m 53s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 35s |  |  trunk passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 10s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 37s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 11s |  |  trunk passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 36s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 22s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  2s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3915/4/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 400 unchanged 
- 0 fixed = 402 total (was 400)  |
   | +1 :green_heart: |  mvnsite  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  the patch passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 42s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 46s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 270m 43s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3915/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +0 :ok: |  asflicense  |   0m 35s |  |  ASF License check generated no 
output?  |
   |  |   | 382m 20s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestDFSInputStream |
   |   | hadoop.hdfs.server.balancer.TestBalancer |
   |   | hadoop.hdfs.TestHDFSFileSystemContract |
   |   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
   |   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
   |   | hadoop.hdfs.TestLargeBlock |
   |   | hadoop.hdfs.TestStoragePolicyPermissionSettings |
   |   | hadoop.hdfs.TestRollingUpgrade |
   |   | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy |
   |   | hadoop.hdfs.TestGetBlocks |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3915/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3915 |
   | Optional Tests | 

[jira] [Work logged] (HDFS-16511) Change some frequent method lock type in ReplicaMap.

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16511?focusedWorklogId=746599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746599
 ]

ASF GitHub Bot logged work on HDFS-16511:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 14:08
Start Date: 23/Mar/22 14:08
Worklog Time Spent: 10m 
  Work Description: MingXiangLi commented on a change in pull request #4085:
URL: https://github.com/apache/hadoop/pull/4085#discussion_r833312880



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
##
@@ -602,6 +605,54 @@ public void run() {}
 + "volumeMap.", 0, totalNumReplicas);
   }
 
+  @Test(timeout = 3)
+  public void testCurrentWriteAndDeleteBlock() throws Exception {
+// Feed FsDataset with block metadata.
+final int numBlocks = 1000;
+final int threadCount = 10;
+// Generate data blocks.
+ExecutorService pool = Executors.newFixedThreadPool(threadCount);
+List> futureList = new ArrayList<>();
+for (int i = 0; i < threadCount; i++) {
+  Thread thread = new Thread() {
+@Override
+public void run() {
+  try {
+for (int i = 0; i < numBlocks; i++) {
+  String bpid = BLOCK_POOL_IDS[numBlocks % BLOCK_POOL_IDS.length];
+  ExtendedBlock eb = new ExtendedBlock(bpid, i);
+  ReplicaHandler replica = null;
+  try {
+replica = dataset.createRbw(StorageType.DEFAULT, null, eb,
+false);
+if (i % 2 > 0) {
+  dataset.invalidate(bpid, new Block[]{eb.getLocalBlock()});
+}
+  } finally {
+if (replica != null) {
+  replica.close();
+}
+  }
+}
+  } catch (Exception e) {
+e.printStackTrace();
+  }
+}
+  };
+  thread.setName("AddBlock" + i);
+  futureList.add(pool.submit(thread));
+}
+// Wait for data generation
+for (Future f : futureList) {
+  f.get();
+}
+int totalNumReplicas = 0;

Review comment:
   like testRemoveTwoVolumes(),we random write to block pool, so we final 
count the total block of all block pool.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746599)
Time Spent: 40m  (was: 0.5h)

> Change some frequent method lock type in ReplicaMap.
> 
>
> Key: HDFS-16511
> URL: https://issues.apache.org/jira/browse/HDFS-16511
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Mingxiang Li
>Assignee: Mingxiang Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In HDFS-16429 we make LightWeightResizableGSet to be thread safe, and  In 
> HDFS-15382 we have split lock to block pool grain locks.After these 
> improvement, we can change some method to acquire read lock replace to 
> acquire write lock.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16511) Change some frequent method lock type in ReplicaMap.

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16511?focusedWorklogId=746598=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746598
 ]

ASF GitHub Bot logged work on HDFS-16511:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 14:07
Start Date: 23/Mar/22 14:07
Worklog Time Spent: 10m 
  Work Description: MingXiangLi commented on a change in pull request #4085:
URL: https://github.com/apache/hadoop/pull/4085#discussion_r833311855



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
##
@@ -602,6 +605,54 @@ public void run() {}
 + "volumeMap.", 0, totalNumReplicas);
   }
 
+  @Test(timeout = 3)
+  public void testCurrentWriteAndDeleteBlock() throws Exception {
+// Feed FsDataset with block metadata.
+final int numBlocks = 1000;
+final int threadCount = 10;
+// Generate data blocks.
+ExecutorService pool = Executors.newFixedThreadPool(threadCount);
+List> futureList = new ArrayList<>();
+for (int i = 0; i < threadCount; i++) {
+  Thread thread = new Thread() {
+@Override
+public void run() {
+  try {
+for (int i = 0; i < numBlocks; i++) {
+  String bpid = BLOCK_POOL_IDS[numBlocks % BLOCK_POOL_IDS.length];

Review comment:
   Yes, this means random write to a block pool.like testRemoveTwoVolumes().




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746598)
Time Spent: 0.5h  (was: 20m)

> Change some frequent method lock type in ReplicaMap.
> 
>
> Key: HDFS-16511
> URL: https://issues.apache.org/jira/browse/HDFS-16511
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Reporter: Mingxiang Li
>Assignee: Mingxiang Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In HDFS-16429 we make LightWeightResizableGSet to be thread safe, and  In 
> HDFS-15382 we have split lock to block pool grain locks.After these 
> improvement, we can change some method to acquire read lock replace to 
> acquire write lock.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads

2022-03-23 Thread daimin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511224#comment-17511224
 ] 

daimin edited comment on HDFS-16422 at 3/23/22, 12:30 PM:
--

[~jingzhao] I tested this again, and my test steps are:
 # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 
8g
 # Stop one datanode
 # Check md5sum of these files through HDFS FUSE, this is a simple way to 
create concurrent preads(indirect IO on FUSE)

Here is test result:
* md5sum check before datanode down:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote} 

* md5sum after datanode down, with native(ISA-L) decoder:

 

 
{quote}md5sum /mnt/fuse/*g
206288b264b92af42563a14a242aa629  /mnt/fuse/1g
bc86f9f549912d78c8b3d02ada5621a2  /mnt/fuse/2g
c201356b7437e6aac1b574ade08b6ccb  /mnt/fuse/4g
ef2e6f6b4b6ab96a24e5f734e93bacc3  /mnt/fuse/8g
{quote}

* md5sum after datanode down, with pure Java decoder:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote}

In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not 
thread safe, the read/write lock seems unable to protect the native decodeImpl 
method.

And I also tested on md5sum check on same file with native(ISA-L) decoder, the 
result is different every time.
{quote}
for i in \{1..5};do md5sum /mnt/fuse/1g;done
2e68ea6738dccb4f248df81b5c55d464  /mnt/fuse/1g
54944120797266fc4e26bd465ae5e67a  /mnt/fuse/1g
ef4d099269fb117e357015cf424723a9  /mnt/fuse/1g
6a40dbca2636ae796b6380385ddfbc83  /mnt/fuse/1g
126fc40073dcebb67d413de95571c08b  /mnt/fuse/1g
{quote}

IMO, HADOOP-15499 did improve the performance of decoder, however it breaked 
the correctness of decode method when invoked concurrently. We should take 
synchronized back, and it's ok to the the read/write lock too as it protects 
from init/release methods. Thanks [~jingzhao] again.


was (Author: cndaimin):
[~jingzhao] I tested this again, and my test steps are:
 # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 
8g
 # Stop one datanode
 # Check md5sum of these files through HDFS FUSE, this is a simple way to 
create concurrent preads(indirect IO on FUSE)

Here is test result:
* md5sum check before datanode down:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote} 

* md5sum after datanode down, with native(ISA-L) decoder:

 

 
{quote}md5sum /mnt/fuse/*g
206288b264b92af42563a14a242aa629  /mnt/fuse/1g
bc86f9f549912d78c8b3d02ada5621a2  /mnt/fuse/2g
c201356b7437e6aac1b574ade08b6ccb  /mnt/fuse/4g
ef2e6f6b4b6ab96a24e5f734e93bacc3  /mnt/fuse/8g
{quote}

* md5sum after datanode down, with pure Java decoder:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote}

In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not 
thread safe, the read/write lock seems unable to protect the native decodeImpl 
method.

And I also tested on md5sum check on same file with native(ISA-L) decoder, the 
result is different every time.
{quote}
for i in \{1..5};do md5sum /mnt/fuse/1g;done
2e68ea6738dccb4f248df81b5c55d464  /mnt/fuse/1g
54944120797266fc4e26bd465ae5e67a  /mnt/fuse/1g
ef4d099269fb117e357015cf424723a9  /mnt/fuse/1g
6a40dbca2636ae796b6380385ddfbc83  /mnt/fuse/1g
126fc40073dcebb67d413de95571c08b  /mnt/fuse/1g
{quote}

IMO, HADOOP-15499 did improve the performance of decoder, however it breaked 
the correctness of decode method when invoked concurrently. We should take 
synchronized back, and I will submit a new PR later to do this work. Thanks 
[~jingzhao] again.

> Fix thread safety of EC decoding during concurrent preads
> -
>
> Key: HDFS-16422
> URL: https://issues.apache.org/jira/browse/HDFS-16422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.3
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Reading data on an erasure-coded file with missing replicas(internal block of 
> block group) will cause 

[jira] [Comment Edited] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads

2022-03-23 Thread daimin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511224#comment-17511224
 ] 

daimin edited comment on HDFS-16422 at 3/23/22, 12:26 PM:
--

[~jingzhao] I tested this again, and my test steps are:
 # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 
8g
 # Stop one datanode
 # Check md5sum of these files through HDFS FUSE, this is a simple way to 
create concurrent preads(indirect IO on FUSE)

Here is test result:
* md5sum check before datanode down:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote} 

* md5sum after datanode down, with native(ISA-L) decoder:

 

 
{quote}md5sum /mnt/fuse/*g
206288b264b92af42563a14a242aa629  /mnt/fuse/1g
bc86f9f549912d78c8b3d02ada5621a2  /mnt/fuse/2g
c201356b7437e6aac1b574ade08b6ccb  /mnt/fuse/4g
ef2e6f6b4b6ab96a24e5f734e93bacc3  /mnt/fuse/8g
{quote}

* md5sum after datanode down, with pure Java decoder:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote}

In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not 
thread safe, the read/write lock seems unable to protect the native decodeImpl 
method.

And I also tested on md5sum check on same file with native(ISA-L) decoder, the 
result is different every time.
{quote}
for i in \{1..5};do md5sum /mnt/fuse/1g;done
2e68ea6738dccb4f248df81b5c55d464  /mnt/fuse/1g
54944120797266fc4e26bd465ae5e67a  /mnt/fuse/1g
ef4d099269fb117e357015cf424723a9  /mnt/fuse/1g
6a40dbca2636ae796b6380385ddfbc83  /mnt/fuse/1g
126fc40073dcebb67d413de95571c08b  /mnt/fuse/1g
{quote}

IMO, HADOOP-15499 did improve the performance of decoder, however it breaked 
the correctness of decode method when invoked concurrently. We should take 
synchronized back, and I will submit a new PR later to do this work. Thanks 
[~jingzhao] again.


was (Author: cndaimin):
[~jingzhao] I tested this again, and my test steps are:
 # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 
8g
 # Stop one datanode
 # Check md5sum of these files through HDFS FUSE, this is a simple way to 
create concurrent preads(indirect IO on FUSE)

Here is test result:
 * md5sum check before datanode down:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote} * md5sum after datanode down, with native(ISA-L) decoder:

 

 
{quote}md5sum /mnt/fuse/*g
206288b264b92af42563a14a242aa629  /mnt/fuse/1g
bc86f9f549912d78c8b3d02ada5621a2  /mnt/fuse/2g
c201356b7437e6aac1b574ade08b6ccb  /mnt/fuse/4g
ef2e6f6b4b6ab96a24e5f734e93bacc3  /mnt/fuse/8g
{quote} * md5sum after datanode down, with pure Java decoder:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote}
In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not 
thread safe, the read/write lock seems unable to protect the native decodeImpl 
method.

And I also tested on md5sum check on same file with native(ISA-L) decoder, the 
result is different every time.
{quote}for i in \{1..5};do md5sum /mnt/fuse/1g;done
2e68ea6738dccb4f248df81b5c55d464  /mnt/fuse/1g
54944120797266fc4e26bd465ae5e67a  /mnt/fuse/1g
ef4d099269fb117e357015cf424723a9  /mnt/fuse/1g
6a40dbca2636ae796b6380385ddfbc83  /mnt/fuse/1g
126fc40073dcebb67d413de95571c08b  /mnt/fuse/1g
{quote}
IMO, HADOOP-15499 did improve the performance of decoder, however it breaked 
the correctness of decode method when invoked concurrently. We should take 
synchronized back, and I will submit a new PR later to do this work. Thanks 
[~jingzhao] again.

> Fix thread safety of EC decoding during concurrent preads
> -
>
> Key: HDFS-16422
> URL: https://issues.apache.org/jira/browse/HDFS-16422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.3
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Reading data on an erasure-coded file with missing replicas(internal block of 
> block group) will cause online reconstruction: read dataUnits part 

[jira] [Comment Edited] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads

2022-03-23 Thread daimin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511224#comment-17511224
 ] 

daimin edited comment on HDFS-16422 at 3/23/22, 12:25 PM:
--

[~jingzhao] I tested this again, and my test steps are:
 # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 
8g
 # Stop one datanode
 # Check md5sum of these files through HDFS FUSE, this is a simple way to 
create concurrent preads(indirect IO on FUSE)

Here is test result:
 * md5sum check before datanode down:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote} * md5sum after datanode down, with native(ISA-L) decoder:

 

 
{quote}md5sum /mnt/fuse/*g
206288b264b92af42563a14a242aa629  /mnt/fuse/1g
bc86f9f549912d78c8b3d02ada5621a2  /mnt/fuse/2g
c201356b7437e6aac1b574ade08b6ccb  /mnt/fuse/4g
ef2e6f6b4b6ab96a24e5f734e93bacc3  /mnt/fuse/8g
{quote} * md5sum after datanode down, with pure Java decoder:

 
{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote}
In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not 
thread safe, the read/write lock seems unable to protect the native decodeImpl 
method.

And I also tested on md5sum check on same file with native(ISA-L) decoder, the 
result is different every time.
{quote}for i in \{1..5};do md5sum /mnt/fuse/1g;done
2e68ea6738dccb4f248df81b5c55d464  /mnt/fuse/1g
54944120797266fc4e26bd465ae5e67a  /mnt/fuse/1g
ef4d099269fb117e357015cf424723a9  /mnt/fuse/1g
6a40dbca2636ae796b6380385ddfbc83  /mnt/fuse/1g
126fc40073dcebb67d413de95571c08b  /mnt/fuse/1g
{quote}
IMO, HADOOP-15499 did improve the performance of decoder, however it breaked 
the correctness of decode method when invoked concurrently. We should take 
synchronized back, and I will submit a new PR later to do this work. Thanks 
[~jingzhao] again.


was (Author: cndaimin):
[~jingzhao] I tested this again, and my test steps are:
 # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 
8g
 # Stop one datanode
 # Check md5sum of these files through HDFS FUSE, this is a simple way to 
create concurrent preads(indirect IO on FUSE)

Here is test result:
 * md5sum check before datanode down:

{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote} * md5sum after datanode down, with native(ISA-L) decoder:

{quote}md5sum /mnt/fuse/*g
206288b264b92af42563a14a242aa629  /mnt/fuse/1g
bc86f9f549912d78c8b3d02ada5621a2  /mnt/fuse/2g
c201356b7437e6aac1b574ade08b6ccb  /mnt/fuse/4g
ef2e6f6b4b6ab96a24e5f734e93bacc3  /mnt/fuse/8g
{quote} * md5sum after datanode down, with pure Java decoder:

{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote}
In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not 
thread safe, the read/write lock seems unable to protect the native decodeImpl 
method.

And I also tested on md5sum check on same file with native(ISA-L) decoder, the 
result is different every time.
{quote}for i in \{1..5};do md5sum /mnt/fuse/1g;done
2e68ea6738dccb4f248df81b5c55d464  /mnt/fuse/1g
54944120797266fc4e26bd465ae5e67a  /mnt/fuse/1g
ef4d099269fb117e357015cf424723a9  /mnt/fuse/1g
6a40dbca2636ae796b6380385ddfbc83  /mnt/fuse/1g
126fc40073dcebb67d413de95571c08b  /mnt/fuse/1g
{quote}
IMO, HADOOP-15499 did improve the performance of decoder, however it breaked 
the correctness of decode method when invoked concurrently. We should take 
synchronized back, and I will submit a new PR later to do this work. Thanks 
[~jingzhao] again.

> Fix thread safety of EC decoding during concurrent preads
> -
>
> Key: HDFS-16422
> URL: https://issues.apache.org/jira/browse/HDFS-16422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.3
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Reading data on an erasure-coded file with missing replicas(internal block of 
> block group) will cause online reconstruction: read dataUnits part of data 
> 

[jira] [Commented] (HDFS-16422) Fix thread safety of EC decoding during concurrent preads

2022-03-23 Thread daimin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511224#comment-17511224
 ] 

daimin commented on HDFS-16422:
---

[~jingzhao] I tested this again, and my test steps are:
 # Setup a cluster with 11 datanodes, and write 4 EC RS-8-2 files: 1g, 2g, 4g, 
8g
 # Stop one datanode
 # Check md5sum of these files through HDFS FUSE, this is a simple way to 
create concurrent preads(indirect IO on FUSE)

Here is test result:
 * md5sum check before datanode down:

{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote} * md5sum after datanode down, with native(ISA-L) decoder:

{quote}md5sum /mnt/fuse/*g
206288b264b92af42563a14a242aa629  /mnt/fuse/1g
bc86f9f549912d78c8b3d02ada5621a2  /mnt/fuse/2g
c201356b7437e6aac1b574ade08b6ccb  /mnt/fuse/4g
ef2e6f6b4b6ab96a24e5f734e93bacc3  /mnt/fuse/8g
{quote} * md5sum after datanode down, with pure Java decoder:

{quote}md5sum /mnt/fuse/*g
5e6c32c0b572e2ff24fb14f93c4cc45b  /mnt/fuse/1g
782173623681c129558c09e89251f46d  /mnt/fuse/2g
e107f9a83a383b98aa23fdd3171b589c  /mnt/fuse/4g
adb81da2c34161f249439597c515db1d  /mnt/fuse/8g
{quote}
In conclusion: RSRawDecoder seems to be thread safe, NativeRSRawDecoder is not 
thread safe, the read/write lock seems unable to protect the native decodeImpl 
method.

And I also tested on md5sum check on same file with native(ISA-L) decoder, the 
result is different every time.
{quote}for i in \{1..5};do md5sum /mnt/fuse/1g;done
2e68ea6738dccb4f248df81b5c55d464  /mnt/fuse/1g
54944120797266fc4e26bd465ae5e67a  /mnt/fuse/1g
ef4d099269fb117e357015cf424723a9  /mnt/fuse/1g
6a40dbca2636ae796b6380385ddfbc83  /mnt/fuse/1g
126fc40073dcebb67d413de95571c08b  /mnt/fuse/1g
{quote}
IMO, HADOOP-15499 did improve the performance of decoder, however it breaked 
the correctness of decode method when invoked concurrently. We should take 
synchronized back, and I will submit a new PR later to do this work. Thanks 
[~jingzhao] again.

> Fix thread safety of EC decoding during concurrent preads
> -
>
> Key: HDFS-16422
> URL: https://issues.apache.org/jira/browse/HDFS-16422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient, ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.3
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Reading data on an erasure-coded file with missing replicas(internal block of 
> block group) will cause online reconstruction: read dataUnits part of data 
> and decode them into the target missing data. Each DFSStripedInputStream 
> object has a RawErasureDecoder object, and when we doing pread concurrently, 
> RawErasureDecoder.decode will be invoked concurrently too. 
> RawErasureDecoder.decode is not thread safe, as a result of that we get wrong 
> data from pread occasionally.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15273) CacheReplicationMonitor hold lock for long time and lead to NN out of service

2022-03-23 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511194#comment-17511194
 ] 

Hadoop QA commented on HDFS-15273:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 
17s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
56s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
30s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
22s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 0s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
28s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
23m 21s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
29s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 29m 
18s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  3m 
27s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
23s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
19s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
19s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
17s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
17s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green}{color} | {color:green} The patch has no ill-formed 
XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
21m 33s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | 

[jira] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely

2022-03-23 Thread yanbin.zhang (Jira)


[ https://issues.apache.org/jira/browse/HDFS-16064 ]


yanbin.zhang deleted comment on HDFS-16064:
-

was (Author: it_singer):
I think your root cause may not be here, we never seem to have this problem 
during our downline process.

> HDFS-721 causes DataNode decommissioning to get stuck indefinitely
> --
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Priority: Major
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the datanodes being decommissioned
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely

2022-03-23 Thread yanbin.zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511159#comment-17511159
 ] 

yanbin.zhang commented on HDFS-16064:
-

I think your root cause may not be here, we never seem to have this problem 
during our downline process.

> HDFS-721 causes DataNode decommissioning to get stuck indefinitely
> --
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Priority: Major
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the datanodes being decommissioned
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely

2022-03-23 Thread yanbin.zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17511158#comment-17511158
 ] 

yanbin.zhang commented on HDFS-16064:
-

I think your root cause may not be here, we never seem to have this problem 
during our downline process.

> HDFS-721 causes DataNode decommissioning to get stuck indefinitely
> --
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Priority: Major
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the datanodes being decommissioned
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16500) Make asynchronous blocks deletion lock and unlock durtion threshold configurable

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16500?focusedWorklogId=746413=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746413
 ]

ASF GitHub Bot logged work on HDFS-16500:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 08:49
Start Date: 23/Mar/22 08:49
Worklog Time Spent: 10m 
  Work Description: smarthanwang commented on pull request #4061:
URL: https://github.com/apache/hadoop/pull/4061#issuecomment-1076101468


   Hi @Hexiaoqiao, do you have any suggestion?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746413)
Time Spent: 1h 40m  (was: 1.5h)

> Make asynchronous blocks deletion lock and unlock durtion threshold 
> configurable 
> -
>
> Key: HDFS-16500
> URL: https://issues.apache.org/jira/browse/HDFS-16500
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: Chengwei Wang
>Assignee: Chengwei Wang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> I have backport the nice feature HDFS-16043 to our internal branch, it works 
> well in our testing cluster.
> I think it's better to make the fields *_deleteBlockLockTimeMs_* and 
> *_deleteBlockUnlockIntervalTimeMs_* configurable, so that we can control the 
> lock and unlock duration.
> {code:java}
> private final long deleteBlockLockTimeMs = 500;
> private final long deleteBlockUnlockIntervalTimeMs = 100;{code}
> And we should set the default value smaller to avoid blocking other requests 
> long time when deleting some  large directories.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16514) Reduce the failover sleep time if multiple namenode are configured

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16514?focusedWorklogId=746391=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746391
 ]

ASF GitHub Bot logged work on HDFS-16514:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 08:13
Start Date: 23/Mar/22 08:13
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #4088:
URL: https://github.com/apache/hadoop/pull/4088#issuecomment-1076063628


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 39s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 43s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  22m 56s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  22m 50s |  |  trunk passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  20m  0s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   3m 38s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 56s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 12s |  |  trunk passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   5m 11s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m  1s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 28s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 53s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  23m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  23m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |  20m 42s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   3m 37s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4088/3/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 1 new + 45 unchanged - 0 fixed = 46 total (was 
45)  |
   | +1 :green_heart: |  mvnsite  |   2m 49s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   5m 43s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 54s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 10s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   2m 48s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 58s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 233m 16s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4088/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4088 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux e7152761be30 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 
23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 140dad88d65a6de47eda8e784d35f545922e7cce |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14+9-Ubuntu-0ubuntu2.20.04 

[jira] [Work logged] (HDFS-16434) Add opname to read/write lock for remaining operations

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16434?focusedWorklogId=746386=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746386
 ]

ASF GitHub Bot logged work on HDFS-16434:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 07:43
Start Date: 23/Mar/22 07:43
Worklog Time Spent: 10m 
  Work Description: tomscut commented on a change in pull request #3915:
URL: https://github.com/apache/hadoop/pull/3915#discussion_r832946641



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Namesystem.java
##
@@ -63,4 +63,16 @@
* directories. Create them if not.
*/
   void checkAndProvisionSnapshotTrashRoots();
+
+  /**
+   * Release read lock with operation name.
+   * @param opName
+   */
+  void readUnlock(String opName);
+
+  /**
+   * Release write lock with operation name.
+   * @param opName
+   */
+  void writeUnlock(String opName);

Review comment:
   Thanks @tasanuma for your review and comment. I agree with you.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746386)
Time Spent: 1h 20m  (was: 1h 10m)

> Add opname to read/write lock for remaining operations
> --
>
> Key: HDFS-16434
> URL: https://issues.apache.org/jira/browse/HDFS-16434
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In this issue at 
> [HDFS-10872|https://issues.apache.org/jira/browse/HDFS-10872], we add opname 
> to read and write locks. However, there are still many operations that have 
> not been completed. When analyzing some operations that hold locks for a long 
> time, we can only find specific methods through stack. I suggest that these 
> remaining operations be completed to facilitate later performance 
> optimization.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16434) Add opname to read/write lock for remaining operations

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16434?focusedWorklogId=746384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746384
 ]

ASF GitHub Bot logged work on HDFS-16434:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 07:24
Start Date: 23/Mar/22 07:24
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on a change in pull request #3915:
URL: https://github.com/apache/hadoop/pull/3915#discussion_r832926344



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/Namesystem.java
##
@@ -63,4 +63,16 @@
* directories. Create them if not.
*/
   void checkAndProvisionSnapshotTrashRoots();
+
+  /**
+   * Release read lock with operation name.
+   * @param opName
+   */
+  void readUnlock(String opName);
+
+  /**
+   * Release write lock with operation name.
+   * @param opName
+   */
+  void writeUnlock(String opName);

Review comment:
   How about moving the new methods to RwLock since it has all lock-related 
methods? Although RwLock is not `@InterfaceAudience.Private`, I think we can 
add new methods there if the target version is 3.4.0.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746384)
Time Spent: 1h 10m  (was: 1h)

> Add opname to read/write lock for remaining operations
> --
>
> Key: HDFS-16434
> URL: https://issues.apache.org/jira/browse/HDFS-16434
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In this issue at 
> [HDFS-10872|https://issues.apache.org/jira/browse/HDFS-10872], we add opname 
> to read and write locks. However, there are still many operations that have 
> not been completed. When analyzing some operations that hold locks for a long 
> time, we can only find specific methods through stack. I suggest that these 
> remaining operations be completed to facilitate later performance 
> optimization.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16501) Print the exception when reporting a bad block

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16501?focusedWorklogId=746371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746371
 ]

ASF GitHub Bot logged work on HDFS-16501:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 06:32
Start Date: 23/Mar/22 06:32
Worklog Time Spent: 10m 
  Work Description: liubingxing commented on pull request #4062:
URL: https://github.com/apache/hadoop/pull/4062#issuecomment-1075971790


   Thanks @tasanuma and @tomscut


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746371)
Time Spent: 1h 10m  (was: 1h)

> Print the exception when reporting a bad block
> --
>
> Key: HDFS-16501
> URL: https://issues.apache.org/jira/browse/HDFS-16501
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: image-2022-03-10-19-27-31-622.png
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> !image-2022-03-10-19-27-31-622.png|width=847,height=27!
> Currently, volumeScanner will find bad block and report it to namenode 
> without printing the reason why the block is a bad block. I think we should 
> be better print the exception in log file.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16501) Print the exception when reporting a bad block

2022-03-23 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16501.
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.3
 Assignee: qinyuren
   Resolution: Fixed

> Print the exception when reporting a bad block
> --
>
> Key: HDFS-16501
> URL: https://issues.apache.org/jira/browse/HDFS-16501
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: image-2022-03-10-19-27-31-622.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> !image-2022-03-10-19-27-31-622.png|width=847,height=27!
> Currently, volumeScanner will find bad block and report it to namenode 
> without printing the reason why the block is a bad block. I think we should 
> be better print the exception in log file.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16507?focusedWorklogId=746367=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746367
 ]

ASF GitHub Bot logged work on HDFS-16507:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 06:06
Start Date: 23/Mar/22 06:06
Worklog Time Spent: 10m 
  Work Description: tomscut commented on a change in pull request #4082:
URL: https://github.com/apache/hadoop/pull/4082#discussion_r832879620



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
##
@@ -1509,13 +1509,18 @@ synchronized void abortCurrentLogSegment() {
* effect.
*/
   @Override
-  public synchronized void purgeLogsOlderThan(final long minTxIdToKeep) {
+  public synchronized void purgeLogsOlderThan(long minTxIdToKeep) {
 // Should not purge logs unless they are open for write.
 // This prevents the SBN from purging logs on shared storage, for example.
 if (!isOpenForWrite()) {
   return;
 }
-
+
+// Reset purgeLogsFrom to avoid purging edit log which is in progress.
+if (isSegmentOpen()) {
+  minTxIdToKeep = minTxIdToKeep > curSegmentTxId ? curSegmentTxId : 
minTxIdToKeep;

Review comment:
   Hi @sunchao @tasanuma , could you please take a look at this discussion. 
Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746367)
Time Spent: 2h 20m  (was: 2h 10m)

> [SBN read] Avoid purging edit log which is in progress
> --
>
> Key: HDFS-16507
> URL: https://issues.apache.org/jira/browse/HDFS-16507
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: tomscut
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL 
> exception. It looks like it's purging edit logs which is in process.
> According to the analysis, I suspect that the editlog which is in progress to 
> be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN 
> rolls edit its self. 
> The stack:
> {code:java}
> java.lang.Thread.getStackTrace(Thread.java:1552)
>     org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
>     
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620)
>     
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512)
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177)
>     
> org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>     
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     
> 

[jira] [Work logged] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16507?focusedWorklogId=746366=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746366
 ]

ASF GitHub Bot logged work on HDFS-16507:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 06:05
Start Date: 23/Mar/22 06:05
Worklog Time Spent: 10m 
  Work Description: tomscut commented on a change in pull request #4082:
URL: https://github.com/apache/hadoop/pull/4082#discussion_r832879620



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
##
@@ -1509,13 +1509,18 @@ synchronized void abortCurrentLogSegment() {
* effect.
*/
   @Override
-  public synchronized void purgeLogsOlderThan(final long minTxIdToKeep) {
+  public synchronized void purgeLogsOlderThan(long minTxIdToKeep) {
 // Should not purge logs unless they are open for write.
 // This prevents the SBN from purging logs on shared storage, for example.
 if (!isOpenForWrite()) {
   return;
 }
-
+
+// Reset purgeLogsFrom to avoid purging edit log which is in progress.
+if (isSegmentOpen()) {
+  minTxIdToKeep = minTxIdToKeep > curSegmentTxId ? curSegmentTxId : 
minTxIdToKeep;

Review comment:
   Hi @sunchao @tasanuma , could you please take a look at this discussion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746366)
Time Spent: 2h 10m  (was: 2h)

> [SBN read] Avoid purging edit log which is in progress
> --
>
> Key: HDFS-16507
> URL: https://issues.apache.org/jira/browse/HDFS-16507
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: tomscut
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL 
> exception. It looks like it's purging edit logs which is in process.
> According to the analysis, I suspect that the editlog which is in progress to 
> be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN 
> rolls edit its self. 
> The stack:
> {code:java}
> java.lang.Thread.getStackTrace(Thread.java:1552)
>     org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
>     
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388)
>     
> org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620)
>     
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512)
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177)
>     
> org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516)
>     java.security.AccessController.doPrivileged(Native Method)
>     javax.security.auth.Subject.doAs(Subject.java:422)
>     
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
>     javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>     
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>     
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>     org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>     
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>     
> 

[jira] [Work logged] (HDFS-16501) Print the exception when reporting a bad block

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16501?focusedWorklogId=746365=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746365
 ]

ASF GitHub Bot logged work on HDFS-16501:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 06:04
Start Date: 23/Mar/22 06:04
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on pull request #4062:
URL: https://github.com/apache/hadoop/pull/4062#issuecomment-1075951712


   Sorry for being late. Thanks for your contribution, @liubingxing, and thanks 
for your review, @tomscut.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746365)
Time Spent: 1h  (was: 50m)

> Print the exception when reporting a bad block
> --
>
> Key: HDFS-16501
> URL: https://issues.apache.org/jira/browse/HDFS-16501
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-03-10-19-27-31-622.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> !image-2022-03-10-19-27-31-622.png|width=847,height=27!
> Currently, volumeScanner will find bad block and report it to namenode 
> without printing the reason why the block is a bad block. I think we should 
> be better print the exception in log file.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16501) Print the exception when reporting a bad block

2022-03-23 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16501?focusedWorklogId=746364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-746364
 ]

ASF GitHub Bot logged work on HDFS-16501:
-

Author: ASF GitHub Bot
Created on: 23/Mar/22 06:03
Start Date: 23/Mar/22 06:03
Worklog Time Spent: 10m 
  Work Description: tasanuma merged pull request #4062:
URL: https://github.com/apache/hadoop/pull/4062


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 746364)
Time Spent: 50m  (was: 40m)

> Print the exception when reporting a bad block
> --
>
> Key: HDFS-16501
> URL: https://issues.apache.org/jira/browse/HDFS-16501
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-03-10-19-27-31-622.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> !image-2022-03-10-19-27-31-622.png|width=847,height=27!
> Currently, volumeScanner will find bad block and report it to namenode 
> without printing the reason why the block is a bad block. I think we should 
> be better print the exception in log file.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org