[ https://issues.apache.org/jira/browse/HADOOP-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876081#action_12876081 ]
Todd Lipcon commented on HADOOP-6762: ------------------------------------- bq. btw, is it appropriate to delete the very old patches as they aren't really relevant? or just leave them? Usually just leave them - makes it easier to follow the conversation and see how the patch evolved. bq. (which there might a timeout on the actual out.write call already--i need to double-check) Looks like there is one - a timeout is set when NetUtils.getOutputStream is called on the socket in setupIOStreams(). So I don't think we need a separate timeout in awaiting the send param. I think the patch is good. I ran it with my test case since Friday night and didn't see any RPC hangs. The test case eventually failed with "Too many open files" but I think it's some other bug/socket leak in DFS code, not IPC. Mark Patch Available to swing it through Hudson? > exception while doing RPC I/O closes channel > -------------------------------------------- > > Key: HADOOP-6762 > URL: https://issues.apache.org/jira/browse/HADOOP-6762 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 0.20.2 > Reporter: sam rash > Assignee: sam rash > Attachments: hadoop-6762-1.txt, hadoop-6762-2.txt, hadoop-6762-3.txt, > hadoop-6762-4.txt, hadoop-6762-6.txt, hadoop-6762-7.txt > > > If a single process creates two unique fileSystems to the same NN using > FileSystem.newInstance(), and one of them issues a close(), the leasechecker > thread is interrupted. This interrupt races with the rpc namenode.renew() > and can cause a ClosedByInterruptException. This closes the underlying > channel and the other filesystem, sharing the connection will get errors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.