[ https://issues.apache.org/jira/browse/HDFS-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660012#comment-14660012 ]
Bob Hansen commented on HDFS-8855: ---------------------------------- I tried running this script against an HDFS-2.6.0 patched with [~daryn]'s original HDFS-7597 patch (just to ensure that my refactoring wasn't a confounding factor), and the issue seems to persist. It is entirely possible that I messed up my configuration and was accidentally running unpatched code, so take that with a grain of salt. Looking at the source, my inclination would be to have an LRU (much like the HDFS-7597 patch) attached to the WebHDFS session that would map ugi->HDFS client. This would keep the client's information and connection state around for re-use, but shut it down once the HTTP session ended. What do you think, [~xiaobingo]? > Webhdfs client leaks active NameNode connections > ------------------------------------------------ > > Key: HDFS-8855 > URL: https://issues.apache.org/jira/browse/HDFS-8855 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs > Environment: HDP 2.2 > Reporter: Bob Hansen > Assignee: Xiaobing Zhou > > The attached script simulates a process opening ~50 files via webhdfs and > performing random reads. Note that there are at most 50 concurrent reads, > and all webhdfs sessions are kept open. Each read is ~64k at a random > position. > The script periodically (once per second) shells into the NameNode and > produces a summary of the socket states. For my test cluster with 5 nodes, > it took ~30 seconds for the NameNode to have ~25000 active connections and > fails. > It appears that each request to the webhdfs client is opening a new > connection to the NameNode and keeping it open after the request is complete. > If the process continues to run, eventually (~30-60 seconds), all of the > open connections are closed and the NameNode recovers. > This smells like SoftReference reaping. Are we using SoftReferences in the > webhdfs client to cache NameNode connections but never re-using them? -- This message was sent by Atlassian JIRA (v6.3.4#6332)