[ https://issues.apache.org/jira/browse/HADOOP-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902556#action_12902556 ]
Todd Lipcon commented on HADOOP-6762: ------------------------------------- Hi Sam, With this patch, I see occasinal failures of TestGridmixSubmission when the JobMonitor gets interrupted: {noformat} 10/08/25 11:05:20 WARN ipc.Client: interrupted waiting to send params to server java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1215) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:754) at org.apache.hadoop.ipc.Client.call(Client.java:1001) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) at org.apache.hadoop.mapred.$Proxy10.getJobStatus(Unknown Source) at org.apache.hadoop.mapred.JobClient$NetworkedJob.updateStatus(JobClient.java:250) at org.apache.hadoop.mapred.JobClient$NetworkedJob.isSuccessful(JobClient.java:339) at org.apache.hadoop.mapreduce.Job.isSuccessful(Job.java:332) at org.apache.hadoop.mapred.gridmix.JobMonitor$MonitorThread.process(JobMonitor.java:134) at org.apache.hadoop.mapred.gridmix.JobMonitor$MonitorThread.run(JobMonitor.java:175) 10/08/25 11:05:20 WARN gridmix.JobMonitor: Lost job GRIDMIX00000 java.io.IOException: java.lang.InterruptedException at org.apache.hadoop.ipc.Client.call(Client.java:1007) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) at org.apache.hadoop.mapred.$Proxy10.getJobStatus(Unknown Source) at org.apache.hadoop.mapred.JobClient$NetworkedJob.updateStatus(JobClient.java:250) at org.apache.hadoop.mapred.JobClient$NetworkedJob.isSuccessful(JobClient.java:339) at org.apache.hadoop.mapreduce.Job.isSuccessful(Job.java:332) at org.apache.hadoop.mapred.gridmix.JobMonitor$MonitorThread.process(JobMonitor.java:134) at org.apache.hadoop.mapred.gridmix.JobMonitor$MonitorThread.run(JobMonitor.java:175) Caused by: java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1215) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:754) at org.apache.hadoop.ipc.Client.call(Client.java:1001) ... 7 more {noformat} I think the patch is doing the right thing here for the most part, but it should throw InterruptedIOException instead of just a normal IOException. Then the caller can at least catch it distinctly from a normal IOException. > exception while doing RPC I/O closes channel > -------------------------------------------- > > Key: HADOOP-6762 > URL: https://issues.apache.org/jira/browse/HADOOP-6762 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 0.20.2 > Reporter: sam rash > Assignee: sam rash > Attachments: hadoop-6762-1.txt, hadoop-6762-2.txt, hadoop-6762-3.txt, > hadoop-6762-4.txt, hadoop-6762-6.txt, hadoop-6762-7.txt, hadoop-6762-8.txt, > hadoop-6762-9.txt > > > If a single process creates two unique fileSystems to the same NN using > FileSystem.newInstance(), and one of them issues a close(), the leasechecker > thread is interrupted. This interrupt races with the rpc namenode.renew() > and can cause a ClosedByInterruptException. This closes the underlying > channel and the other filesystem, sharing the connection will get errors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.