[ https://issues.apache.org/jira/browse/MAPREDUCE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799449#action_12799449 ]
Arun C Murthy commented on MAPREDUCE-1372: ------------------------------------------ Ok, being awake while debugging synchronization errors is (kinda) useful, so here goes: The primary cause for this bug is that we are resolving nodes (adding them to nodesAtMaxLevel) during job initialization - during which we cannot lock up the JobTracker. A reasonable fix is to queue the resolutions and resolve them under the JobTracker lock in a separate thread... this has the added benefit that we resolve more of the hosts simultaneously, there-by decreasing the number of forks we make via the usual ScriptBasedMapping. This actually needs a bit more work, we'll need the JIP to call 'wait' on itself while the hosts are being resolved and then the thread doing the resolutions will have to signal the JIP to continue. This is necessary since the JIP needs all nodes to be resolved to build up all it's cache tables for scheduling. Sigh. Thoughts? > ConcurrentModificationException in JobInProgress > ------------------------------------------------ > > Key: MAPREDUCE-1372 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1372 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker > Affects Versions: 0.20.1 > Reporter: Amareshwari Sriramadasu > Priority: Blocker > Fix For: 0.21.0 > > > We have seen the following ConcurrentModificationException in one of our > clusters > {noformat} > java.io.IOException: java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) > at java.util.HashMap$KeyIterator.next(HashMap.java:828) > at > org.apache.hadoop.mapred.JobInProgress.findNewMapTask(JobInProgress.java:2018) > at > org.apache.hadoop.mapred.JobInProgress.obtainNewMapTask(JobInProgress.java:1077) > at > org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr.obtainNewTask(CapacityTaskScheduler.java:796) > at > org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.getTaskFromQueue(CapacityTaskScheduler.java:589) > at > org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:677) > at > org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$500(CapacityTaskScheduler.java:348) > at > org.apache.hadoop.mapred.CapacityTaskScheduler.addMapTask(CapacityTaskScheduler.java:1397) > at > org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:1349) > at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2976) > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.