[ 
https://issues.apache.org/jira/browse/HADOOP-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875411#action_12875411
 ] 

sam rash commented on HADOOP-6762:
----------------------------------

Hmm, how would pending calls complete?  They already have a Connection object 
with a socket channel that is in bad shape.  basically there would have to be a 
check inside a sync block that the channel is valid before sending.  If it's 
not, it would have to create a new socket (or whole Connection, again all in 
sync block).  Does this make sense?  A bunch of threads get the Connection 
object and pile up on the synchronized(this.out) and if one of them is 
interrupted, the whole pile will get errors.  I think having the test & fix 
code is more complicated than using another thread actually, but I may be 
biased (having already done it the other way)

FWIW, we're already using this on our 0.20 branch in production where we have 
up to 200+ threads using the same RPC instance.

also, i don't actually think the code is complex--it's using an executor so the 
thread management is as simple as it can get. 
We can even get rid of the latch--it's not necessary, but I wanted the change 
to function exactly as it does now, so I put it in.




> exception while doing RPC I/O closes channel
> --------------------------------------------
>
>                 Key: HADOOP-6762
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6762
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: sam rash
>            Assignee: sam rash
>         Attachments: hadoop-6762-1.txt, hadoop-6762-2.txt, hadoop-6762-3.txt, 
> hadoop-6762-4.txt, hadoop-6762-6.txt
>
>
> If a single process creates two unique fileSystems to the same NN using 
> FileSystem.newInstance(), and one of them issues a close(), the leasechecker 
> thread is interrupted.  This interrupt races with the rpc namenode.renew() 
> and can cause a ClosedByInterruptException.  This closes the underlying 
> channel and the other filesystem, sharing the connection will get errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to