[jira] Commented: (HADOOP-6762) exception while doing RPC I/O closes channel

sam rash (JIRA) Wed, 12 May 2010 09:31:03 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866619#action_12866619
 ]


sam rash commented on HADOOP-6762:
----------------------------------

the general problem is that 'client' threads hold the socket and do writes to 
it to send RPCs.  If a client thread receives an interrupt, it will leave the 
socket in an unusable state. 

i have a test for this general case and a patch which moves the actual writing 
to the socket to a thread owned by the Client object.  This means a client can 
be interrupted and not ruin the socket for other clients.

note:  other socket errors may occur that make the socket unusable. The patch 
doesn't handle this (only intended to help with interrupted cases since that is 
common with filesystem.close).

we might also want to consider finding a way to fail fast when RPC goes bad.  
Near as I can tell from watching this happen, until the filesystem is closed, 
the underlying RPC is in a bad state.  It seems like we could fail on one 
operation, detect the bad socket and perhaps recreate the socket or the whole 
RPC object.  not sure where this retry logic goes

> exception while doing RPC I/O closes channel
> --------------------------------------------
>
>                 Key: HADOOP-6762
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6762
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: sam rash
>
> If a single process creates two unique fileSystems to the same NN using 
> FileSystem.newInstance(), and one of them issues a close(), the leasechecker 
> thread is interrupted.  This interrupt races with the rpc namenode.renew() 
> and can cause a ClosedByInterruptException.  This closes the underlying 
> channel and the other filesystem, sharing the connection will get errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-6762) exception while doing RPC I/O closes channel

Reply via email to