[ https://issues.apache.org/jira/browse/HADOOP-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777870#comment-13777870 ]
Daryn Sharp commented on HADOOP-9955: ------------------------------------- Good points, Suresh. The {{Server}} class overall isn't trivial to understand with all the nested classes nestled between the methods of the outer class so it'll be an improvement. BTW, I stumbled upon another bug: connection closing isn't +extremely+ inefficient... it's +completely+ inefficient. The rpc count used to determine idle (count=0) goes negative with security enabled. SASL responses decrement but don't increment the rpc count. *No connections are ever actually closed. The NN just slows itself down under load!* It's been like this since SASL was added. BTW, I'm currently performance testing the patch. > RPC idle connection closing is extremely inefficient > ---------------------------------------------------- > > Key: HADOOP-9955 > URL: https://issues.apache.org/jira/browse/HADOOP-9955 > Project: Hadoop Common > Issue Type: Sub-task > Components: ipc > Affects Versions: 2.0.0-alpha, 3.0.0 > Reporter: Daryn Sharp > Assignee: Daryn Sharp > Attachments: HADOOP-9955.patch > > > The RPC server listener loops accepting connections, distributing the new > connections to socket readers, and then conditionally & periodically performs > a scan for idle connections. The idle scan choses a _random index range_ to > scan in a _synchronized linked list_. > With 20k+ connections, walking the range of indices in the linked list is > extremely expensive. During the sweep, other threads (socket responder and > readers) that want to close connections are blocked, and no new connections > are being accepted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira