+1 for dropping the client side expiry down to something like 1-2 seconds.
I'd rather do that than up the server side, since the server side resource
(DN threads) is likely to be more contended.

-Todd

On Fri, Jun 7, 2013 at 4:29 PM, Colin McCabe <cmcc...@alumni.cmu.edu> wrote:

> Hi all,
>
> HDFS-941 added dfs.datanode.socket.reuse.keepalive.  This allows
> DataXceiver worker threads in the DataNode to linger for a second or
> two after finishing a request, in case the client wants to send
> another request.  On the client side, HDFS-941 added a SocketCache, so
> that subsequent client requests could reuse the same socket.  Sockets
> were closed purely by an LRU eviction policy.
>
> Later, HDFS-3373 added a minimum expiration time to the SocketCache,
> and added a thread which periodically closed old sockets.
>
> However, the default timeout for SocketCache (which is now called
> PeerCache) is much longer than the DN would possibly keep the socket
> open.  Specifically, dfs.client.socketcache.expiryMsec defaults to 2 *
> 60 * 1000 (2 minutes), whereas dfs.datanode.socket.reuse.keepalive
> defaults to 1000 (1 second).
>
> I'm not sure why we have such a big disparity here.  It seems like
> this will inevitably lead to clients trying to use sockets which have
> gone stale, because the server closes them way before the client
> expires them.  Unless I'm missing something, we should probably either
> lengthen the keepalive, or shorten the socket cache expiry, or both.
>
> thoughts?
> Colin
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to