[ https://issues.apache.org/jira/browse/HADOOP-6498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802343#action_12802343 ]
Vinod K V commented on HADOOP-6498: ----------------------------------- Though the fix is a proper direction to avoid client hangs, we should first fix the original problem. Heartbeat hanging would mean some kind of uncaught exception like NPE on the JobTracker side which left unhandled may result in inconsistent state. Can you dig through JT logs and see if any such issue happened? > IPC client bug may cause rpc call hang > --------------------------------------- > > Key: HADOOP-6498 > URL: https://issues.apache.org/jira/browse/HADOOP-6498 > Project: Hadoop Common > Issue Type: Bug > Components: ipc > Affects Versions: 0.18.3, 0.19.0, 0.19.1, 0.19.2, 0.20.0, 0.20.1 > Reporter: Ruyue Ma > Assignee: Ruyue Ma > Priority: Critical > Fix For: 0.21.0 > > Attachments: hadoop-6498.patch > > > I can reproduce some rpc call hang bug when connection thread of ipc client > receives response for outstanding call. > The stacks when hang occurs (TaskTracker): > Waiting on org.apache.hadoop.ipc.client$c...@1c3cbb4b > Stack: > java.lang.Object.wait(Native Method) > java.lang.Object.wait(Object.java:485) > org.apache.hadoop.ipc.Client.call(Client.java:691) > org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) > org.apache.hadoop.mapred.$Proxy4.heartbeat(Unknown Source) > > org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1250) > org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1082) > org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1785) > org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2796) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.