[ 
https://issues.apache.org/jira/browse/HADOOP-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822793#comment-13822793
 ] 

Daryn Sharp commented on HADOOP-9955:
-------------------------------------

In the original patch, I did use idleScanThreshold as the initial capacity.  I 
realized that since the idle scan will now be very cheap, it may be desirable 
to drop the idle threshold much lower.  Let's say the idle threshold is set to 
1.  You don't want the initial capacity to 1!

I now chose the initial capacity to be the size of the callq because in 
practice, non-multithreaded clients create a one-to-one correspondence between 
a call and a connection.  They are of course loosely coupled because a 
multithreaded client will consume multiple calls/connection.  If clients are 
predominantly multi-threaded, then it means the hash has excess capacity for a 
surge in load.

A multiplier of the callq would likely require another conf parameter which I'd 
like to avoid.  I suspect the rudimentary calculation will be fine under normal 
conditions and anything fancier may be overkill.  I checked the rehash 
implementation is pretty efficient.  It moves the chains and copies about 1/6 
of the nodes.  Heavily loaded grids (of our size) will rehash maybe 1-3 times 
(10k -> ~18K -> ~31k -> 94k).  We'll blow the fd limit long before hitting the 
~70k threshold to grow again.

I'm pretty sure I filed a jira about the broken bookkeeping.  If not, I'll do 
so.

> RPC idle connection closing is extremely inefficient
> ----------------------------------------------------
>
>                 Key: HADOOP-9955
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9955
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: ipc
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HADOOP-9955.patch, HADOOP-9955.patch
>
>
> The RPC server listener loops accepting connections, distributing the new 
> connections to socket readers, and then conditionally & periodically performs 
> a scan for idle connections.  The idle scan choses a _random index range_ to 
> scan in a _synchronized linked list_.
> With 20k+ connections, walking the range of indices in the linked list is 
> extremely expensive.  During the sweep, other threads (socket responder and 
> readers) that want to close connections are blocked, and no new connections 
> are being accepted.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to