[ 
https://issues.apache.org/jira/browse/HADOOP-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902556#action_12902556
 ] 

Todd Lipcon commented on HADOOP-6762:
-------------------------------------

Hi Sam,

With this patch, I see occasinal failures of TestGridmixSubmission when the 
JobMonitor gets interrupted:

{noformat}
10/08/25 11:05:20 WARN ipc.Client: interrupted waiting to send params to server
java.lang.InterruptedException
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1215)
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:754)
        at org.apache.hadoop.ipc.Client.call(Client.java:1001)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
        at org.apache.hadoop.mapred.$Proxy10.getJobStatus(Unknown Source)
        at 
org.apache.hadoop.mapred.JobClient$NetworkedJob.updateStatus(JobClient.java:250)
        at 
org.apache.hadoop.mapred.JobClient$NetworkedJob.isSuccessful(JobClient.java:339)
        at org.apache.hadoop.mapreduce.Job.isSuccessful(Job.java:332)
        at 
org.apache.hadoop.mapred.gridmix.JobMonitor$MonitorThread.process(JobMonitor.java:134)
        at 
org.apache.hadoop.mapred.gridmix.JobMonitor$MonitorThread.run(JobMonitor.java:175)
10/08/25 11:05:20 WARN gridmix.JobMonitor: Lost job GRIDMIX00000
java.io.IOException: java.lang.InterruptedException
        at org.apache.hadoop.ipc.Client.call(Client.java:1007)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
        at org.apache.hadoop.mapred.$Proxy10.getJobStatus(Unknown Source)
        at 
org.apache.hadoop.mapred.JobClient$NetworkedJob.updateStatus(JobClient.java:250)
        at 
org.apache.hadoop.mapred.JobClient$NetworkedJob.isSuccessful(JobClient.java:339)
        at org.apache.hadoop.mapreduce.Job.isSuccessful(Job.java:332)
        at 
org.apache.hadoop.mapred.gridmix.JobMonitor$MonitorThread.process(JobMonitor.java:134)
        at 
org.apache.hadoop.mapred.gridmix.JobMonitor$MonitorThread.run(JobMonitor.java:175)
Caused by: java.lang.InterruptedException
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1215)
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
        at java.util.concurrent.FutureTask.get(FutureTask.java:83)
        at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:754)
        at org.apache.hadoop.ipc.Client.call(Client.java:1001)
        ... 7 more
{noformat}

I think the patch is doing the right thing here for the most part, but it 
should throw InterruptedIOException instead of just a normal IOException. Then 
the caller can at least catch it distinctly from a normal IOException.

> exception while doing RPC I/O closes channel
> --------------------------------------------
>
>                 Key: HADOOP-6762
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6762
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: sam rash
>            Assignee: sam rash
>         Attachments: hadoop-6762-1.txt, hadoop-6762-2.txt, hadoop-6762-3.txt, 
> hadoop-6762-4.txt, hadoop-6762-6.txt, hadoop-6762-7.txt, hadoop-6762-8.txt, 
> hadoop-6762-9.txt
>
>
> If a single process creates two unique fileSystems to the same NN using 
> FileSystem.newInstance(), and one of them issues a close(), the leasechecker 
> thread is interrupted.  This interrupt races with the rpc namenode.renew() 
> and can cause a ClosedByInterruptException.  This closes the underlying 
> channel and the other filesystem, sharing the connection will get errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to