[ 
https://issues.apache.org/jira/browse/HDFS-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660012#comment-14660012
 ] 

Bob Hansen commented on HDFS-8855:
----------------------------------

I tried running this script against an HDFS-2.6.0 patched with [~daryn]'s 
original HDFS-7597 patch (just to ensure that my refactoring wasn't a 
confounding factor), and the issue seems to persist.  It is entirely possible 
that I messed up my configuration and was accidentally running unpatched code, 
so take that with a grain of salt.

Looking at the source, my inclination would be to have an LRU (much like the 
HDFS-7597 patch) attached to the WebHDFS session that would map ugi->HDFS 
client.  This would keep the client's information and connection state around 
for re-use, but shut it down once the HTTP session ended.  What do you think, 
[~xiaobingo]?

> Webhdfs client leaks active NameNode connections
> ------------------------------------------------
>
>                 Key: HDFS-8855
>                 URL: https://issues.apache.org/jira/browse/HDFS-8855
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: webhdfs
>         Environment: HDP 2.2
>            Reporter: Bob Hansen
>            Assignee: Xiaobing Zhou
>
> The attached script simulates a process opening ~50 files via webhdfs and 
> performing random reads.  Note that there are at most 50 concurrent reads, 
> and all webhdfs sessions are kept open.  Each read is ~64k at a random 
> position.  
> The script periodically (once per second) shells into the NameNode and 
> produces a summary of the socket states.  For my test cluster with 5 nodes, 
> it took ~30 seconds for the NameNode to have ~25000 active connections and 
> fails.
> It appears that each request to the webhdfs client is opening a new 
> connection to the NameNode and keeping it open after the request is complete. 
>  If the process continues to run, eventually (~30-60 seconds), all of the 
> open connections are closed and the NameNode recovers.  
> This smells like SoftReference reaping.  Are we using SoftReferences in the 
> webhdfs client to cache NameNode connections but never re-using them?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to