[ https://issues.apache.org/jira/browse/HDFS-11208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Erik Krogen updated HDFS-11208: ------------------------------- Attachment: HDFS-11208-test-deadlock.patch I am attaching a patch containing a unit test which demonstrates this issue (currently times out with deadlock when applied). I am open to ideas on how best to solve this deadlock issue. > Deadlock in WebHDFS on shutdown > ------------------------------- > > Key: HDFS-11208 > URL: https://issues.apache.org/jira/browse/HDFS-11208 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs > Affects Versions: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1 > Reporter: Erik Krogen > Assignee: Erik Krogen > Attachments: HDFS-11208-test-deadlock.patch > > > Currently on the client side if the {{DelegationTokenRenewer}} attempts to > renew a WebHdfs delegation token while the client system is shutting down > (i.e. {{FileSystem.Cache.ClientFinalizer}} is running) a deadlock may occur. > This happens because {{ClientFinalizer}} calls > {{FileSystem.Cache.closeAll()}} which first takes a lock on the > {{FileSystem.Cache}} object and then locks each file system in the cache as > it iterates over them. {{DelegationTokenRenewer}} takes a lock on a > filesystem object while it is renewing that filesystem's token, but within > {{TokenAspect.TokenManager.renew()}} (used for renewal of WebHdfs tokens) > {{FileSystem.get}} is called, which in turn takes a lock on the FileSystem > cache object, potentially causing deadlock if {{ClientFinalizer}} is > currently running. > See below for example deadlock output: > {code} > Found one Java-level deadlock: > ============================= > "Thread-8572": > waiting to lock monitor 0x00007eff401f9878 (object 0x000000051ec3f930, a > dali.hdfs.web.WebHdfsFileSystem), > which is held by "FileSystem-DelegationTokenRenewer" > "FileSystem-DelegationTokenRenewer": > waiting to lock monitor 0x00007f005c08f5c8 (object 0x000000050389c8b8, a > dali.fs.FileSystem$Cache), > which is held by "Thread-8572" > Java stack information for the threads listed above: > =================================================== > "Thread-8572": > at dali.hdfs.web.WebHdfsFileSystem.close(WebHdfsFileSystem.java:864) > - waiting to lock <0x000000051ec3f930> (a > dali.hdfs.web.WebHdfsFileSystem) > at dali.fs.FilterFileSystem.close(FilterFileSystem.java:449) > at dali.fs.FileSystem$Cache.closeAll(FileSystem.java:2407) > - locked <0x000000050389c8b8> (a dali.fs.FileSystem$Cache) > at dali.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2424) > - locked <0x000000050389c8d0> (a > dali.fs.FileSystem$Cache$ClientFinalizer) > at dali.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > "FileSystem-DelegationTokenRenewer": > at dali.fs.FileSystem$Cache.getInternal(FileSystem.java:2343) > - waiting to lock <0x000000050389c8b8> (a dali.fs.FileSystem$Cache) > at dali.fs.FileSystem$Cache.get(FileSystem.java:2332) > at dali.fs.FileSystem.get(FileSystem.java:369) > at > dali.hdfs.web.TokenAspect$TokenManager.getInstance(TokenAspect.java:92) > at dali.hdfs.web.TokenAspect$TokenManager.renew(TokenAspect.java:72) > at dali.security.token.Token.renew(Token.java:373) > at > > dali.fs.DelegationTokenRenewer$RenewAction.renew(DelegationTokenRenewer.java:127) > - locked <0x000000051ec3f930> (a dali.hdfs.web.WebHdfsFileSystem) > at > > dali.fs.DelegationTokenRenewer$RenewAction.access$300(DelegationTokenRenewer.java:57) > at dali.fs.DelegationTokenRenewer.run(DelegationTokenRenewer.java:258) > Found 1 deadlock. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org