[ https://issues.apache.org/jira/browse/HADOOP-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dhruba borthakur updated HADOOP-2492: ------------------------------------- Attachment: rpcexception.patch This patch prints out the stack trace where the exception occured. This will help debug the real cause of the problem. > ConcurrentModificationException in org.apache.hadoop.ipc.Server.Responder > ------------------------------------------------------------------------- > > Key: HADOOP-2492 > URL: https://issues.apache.org/jira/browse/HADOOP-2492 > Project: Hadoop > Issue Type: Bug > Components: ipc > Affects Versions: 0.16.0 > Reporter: Devaraj Das > Assignee: dhruba borthakur > Fix For: 0.16.0 > > Attachments: rpcexception.patch > > > I was running hadoop on 800 machines and after running a couple of jobs, and > running 100% of the maps of the current job, the JobTracker stopped > responding - *all* tasktrackers were lost ... When I looked at the JT logs, > these seemed alarming: > 2007-12-26 19:18:30,185 WARN org.apache.hadoop.ipc.Server: Exception in > Responder java.util.ConcurrentModificationException > Following the above exception, I saw a whole lot of exceptions like: > 2007-12-26 19:23:10,926 WARN org.apache.hadoop.ipc.Server: Call queue > overflow discarding oldest call heartbeat([EMAIL PROTECTED], false, true, > 1758) from 1.2.3.4:1234 > From the number of exceptions to do with call queue overflow, it seemed like > the jobtracker was not processing RPCs after it got the > ConcurrentModificationException, and around that time the tasktrackers > started getting timeouts on RPCs... > There were two occurrences of the ConcurrentModificationException but the > first instance seemed to not have any effect on the call queue... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.