[ https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052019#comment-17052019 ]
Anes Mukhametov edited comment on HDFS-14498 at 3/5/20, 11:13 AM: ------------------------------------------------------------------ Got the same issue with CDH-5.16.2 {{It also seems to happen after a client died while writing, more than a week ago. As a result LeaseManager seems stuck on this lease with no other leases being recovered.}} All logs already rolled, so I can't give any additional info. Also I'm unable to provide any jstack information, can't pause production namenode for such a long period of time (it takes more than 5 minutes to create stack dump). {quote}{{2020-03-05 13:29:33,846 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease. Holder: DFSClient_NONMAPREDUCE_-1052192603_27, pending creates: 1], src=/tmp/57-1582557184-0.tmp}}{{2020-03-05 13:29:33,846 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: Failed to release lease for file /tmp/57-1582557184-0.tmp. Committed blocks are waiting to be minimally replicated. Try again later.}}{{2020-03-05 13:29:33,846 WARN org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path /tmp/57-1582557184-0.tmp in the lease [Lease. Holder: DFSClient_NONMAPREDUCE_-1052192603_27, pending creates: 1]. It will be retried.}}{{org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* NameSystem.internalReleaseLease: Failed to release lease for file /tmp/57-1582557184-0.tmp. Committed blocks are waiting to be minimally replicated. Try again later.}}{{ at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:4889)}}{{ at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:605)}}{{ at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:541)}}{{ at java.lang.Thread.run(Thread.java:748)}} {quote} was (Author: amuhametov): Got the same issue with CDH-5.16.2 {{It also seems to happen after a client died while writing, more than a week ago. As a result LockManager seems stuck on this lease with no other leases being recovered.}} All logs already rolled, so I can't give any additional info. Also I'm unable to provide any jstack information, can't pause production namenode for such a long period of time (it takes more than 5 minutes to create stack dump). {quote}{{2020-03-05 13:29:33,846 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease. Holder: DFSClient_NONMAPREDUCE_-1052192603_27, pending creates: 1], src=/tmp/57-1582557184-0.tmp}}{{2020-03-05 13:29:33,846 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: Failed to release lease for file /tmp/57-1582557184-0.tmp. Committed blocks are waiting to be minimally replicated. Try again later.}}{{2020-03-05 13:29:33,846 WARN org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path /tmp/57-1582557184-0.tmp in the lease [Lease. Holder: DFSClient_NONMAPREDUCE_-1052192603_27, pending creates: 1]. It will be retried.}}{{org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* NameSystem.internalReleaseLease: Failed to release lease for file /tmp/57-1582557184-0.tmp. Committed blocks are waiting to be minimally replicated. Try again later.}}{{ at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:4889)}}{{ at org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:605)}}{{ at org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:541)}}{{ at java.lang.Thread.run(Thread.java:748)}} {quote} > LeaseManager can loop forever on the file for which create has failed > ---------------------------------------------------------------------- > > Key: HDFS-14498 > URL: https://issues.apache.org/jira/browse/HDFS-14498 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.9.0 > Reporter: Sergey Shelukhin > Priority: Major > > The logs from file creation are long gone due to infinite lease logging, > however it presumably failed... the client who was trying to write this file > is definitely long dead. > The version includes HDFS-4882. > We get this log pattern repeating infinitely: > {noformat} > 2019-05-16 14:00:16,893 INFO > [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] > org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease. Holder: > DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard > limit > 2019-05-16 14:00:16,893 INFO > [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease. > Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=<snip> > 2019-05-16 14:00:16,893 WARN > [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] > org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: > Failed to release lease for file <snip>. Committed blocks are waiting to be > minimally replicated. Try again later. > 2019-05-16 14:00:16,893 WARN > [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path > <snip> in the lease [Lease. Holder: DFSClient_NONMAPREDUCE_-20898906_61, > pending creates: 1]. It will be retried. > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* > NameSystem.internalReleaseLease: Failed to release lease for file <snip>. > Committed blocks are waiting to be minimally replicated. Try again later. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357) > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573) > at > org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509) > at java.lang.Thread.run(Thread.java:745) > $ grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: > 1" hdfs_nn* > hdfs_nn.log:1068035 > hdfs_nn.log.2019-05-16-14:1516179 > hdfs_nn.log.2019-05-16-15:1538350 > {noformat} > Aside from an actual bug fix, it might make sense to make LeaseManager not > log so much, in case if there are more bugs like this... -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org