[jira] [Comment Edited] (HADOOP-9955) RPC idle connection closing is extremely inefficient

Kihwal Lee (JIRA) Thu, 14 Nov 2013 10:33:54 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-9955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822726#comment-13822726
 ]


Kihwal Lee edited comment on HADOOP-9955 at 11/14/13 6:32 PM:
--------------------------------------------------------------

When the concurrent hash map is created, the initial size is set to the max 
call queue size.  This may not be always ideal. In the production name nodes 
I've seen, 4X of that will make more sense. Both the number of concurrent 
connections and the max call queue length (determined by number of handlers) 
are influenced by the size of cluster (containers, jobs, etc.) and the load, 
but the two seem only loosely coupled. E.g. a small number of clients can 
generate a load that fills up the call queue. There may be a better parameter 
we can use to determine the reasonable initial size of {{connections}}. 

It could be a function of {{idleScanThreshold}}. This threshold would normally 
be set to # of persistent connections + # connections from steady state average 
load + slack, so the initial size for {{connections}} could be set to the max 
of  call queue size and {{some_factor * idleScanThreshold}}.

{code}
      this.connections = Collections.newSetFromMap(
          new ConcurrentHashMap<Connection,Boolean>(
              maxQueueSize, 0.75f, readThreads+2));
{code}


was (Author: kihwal):
When the concurrent hash map is created, the initial size is set to the max 
call queue size.  This may not be always ideal. In the production name nodes 
I've seen, 4X of that will make more sense. Both the number of concurrent 
connections and the max call queue length (determined by number of handlers) 
are influenced by the size of cluster (containers, jobs, etc.) and the load, 
but the two seem only loosely coupled. E.g. a small number of clients can 
generate a load that fills up the call queue. There may be a better parameter 
we can use to determine the reasonable initial size of {{connections}}. 

It could be a function of {{idleScanThreshold}}. This threshold would normally 
be set to # of persistent connections + # connections from steady state average 
load + slack, so the initial size for {{connections}} could be set to the max 
of  call queue size or {{some_factor * idleScanThreshold}}.

{code}
      this.connections = Collections.newSetFromMap(
          new ConcurrentHashMap<Connection,Boolean>(
              maxQueueSize, 0.75f, readThreads+2));
{code}

> RPC idle connection closing is extremely inefficient
> ----------------------------------------------------
>
>                 Key: HADOOP-9955
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9955
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: ipc
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HADOOP-9955.patch, HADOOP-9955.patch
>
>
> The RPC server listener loops accepting connections, distributing the new 
> connections to socket readers, and then conditionally & periodically performs 
> a scan for idle connections.  The idle scan choses a _random index range_ to 
> scan in a _synchronized linked list_.
> With 20k+ connections, walking the range of indices in the linked list is 
> extremely expensive.  During the sweep, other threads (socket responder and 
> readers) that want to close connections are blocked, and no new connections 
> are being accepted.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Comment Edited] (HADOOP-9955) RPC idle connection closing is extremely inefficient

Reply via email to