[
https://issues.apache.org/jira/browse/MAPREDUCE-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101035#comment-13101035
]
Siddharth Seth commented on MAPREDUCE-2954:
-------------------------------------------
Looks ok - but am not sure about the large prime - will almost definitely cause
the hashcode to wrap around the integer range which is likely not a problem. We
could revert to the eclipse generated default of 31.
bq. We should be able to do better if we analyse more on our IDs, but this
should work for now.
Completely agree with this though - clusterTimestamp is in ms, there's unlikely
to be a very large number of attemptIds and container per app.
> Deadlock in NM with threads racing for ApplicationAttemptId
> -----------------------------------------------------------
>
> Key: MAPREDUCE-2954
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2954
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 0.23.0, 0.24.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Siddharth Seth
> Priority: Critical
> Fix For: 0.23.0, 0.24.0
>
> Attachments: MAPREDUCE-2954-20110909.txt, MR2954_1.patch
>
>
> Found this:
> {code}
> Java stack information for the threads listed above:
> ===================================================
> "Thread-45":
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.getApplicationId(ApplicationAttemptIdPBImpl.java:101)
> - waiting to lock <0xb6a43ba0> (a
> org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.compareTo(ApplicationAttemptIdPBImpl.java:144)
> - locked <0xb6a443a0> (a
> org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.compareTo(ApplicationAttemptIdPBImpl.java:31)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerIdPBImpl.compareTo(ContainerIdPBImpl.java:215)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerIdPBImpl.compareTo(ContainerIdPBImpl.java:34)
> at
> java.util.concurrent.ConcurrentSkipListMap.doGet(ConcurrentSkipListMap.java:797)
> at
> java.util.concurrent.ConcurrentSkipListMap.get(ConcurrentSkipListMap.java:1640)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:360)
> at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:355)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:113)
> at
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
> at java.lang.Thread.run(Thread.java:619)
> "Thread-30":
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.getApplicationId(ApplicationAttemptIdPBImpl.java:101)
> - waiting to lock <0xb6a443a0> (a
> org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.compareTo(ApplicationAttemptIdPBImpl.java:144)
> - locked <0xb6a43ba0> (a
> org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ApplicationAttemptIdPBImpl.compareTo(ApplicationAttemptIdPBImpl.java:31)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerIdPBImpl.compareTo(ContainerIdPBImpl.java:215)
> at
> org.apache.hadoop.yarn.api.records.impl.pb.ContainerIdPBImpl.compareTo(ContainerIdPBImpl.java:34)
> at
> java.util.concurrent.ConcurrentSkipListMap.doRemove(ConcurrentSkipListMap.java:1078)
> at
> java.util.concurrent.ConcurrentSkipListMap.remove(ConcurrentSkipListMap.java:1673)
> at
> java.util.concurrent.ConcurrentSkipListMap$Iter.remove(ConcurrentSkipListMap.java:2256)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.getNodeStatus(NodeStatusUpdaterImpl.java:223)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.access$300(NodeStatusUpdaterImpl.java:62)
> at
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:262)
> Found 1 deadlock.
> {code}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira