[ https://issues.apache.org/jira/browse/MAPREDUCE-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Lowe reassigned MAPREDUCE-6957: ------------------------------------- Assignee: Jooseong Kim Thanks for the report and the patch! bq. When the fetch succeeds, only the first map output gets committed through ShuffleSchedulerImpl.copySucceeded -> InMemoryMapOutput.commit, because commit() is gated by !finishedMaps\[mapIndex\]. This looks like another latent bug. If, for whatever reason, we try to report a fetch completed for a map that has already completed fetching then it should call output.abort() so we unreserve the memory. Even with the redundant fetching caused by the double put-back of known map outputs, that unreserve fix would have prevented the merge manager hang. Would you mind updating the patch to address the missing unreserve? The rest of the patch looks good to me. > shuffle hangs after a node manager connection timeout > ----------------------------------------------------- > > Key: MAPREDUCE-6957 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6957 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Reporter: Jooseong Kim > Assignee: Jooseong Kim > Attachments: MAPREDUCE-6957.001.patch > > > After a connection failure from the reducer to the node manager, shuffles > started to hang with the following message: > {code} > org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 - MergeManager > returned status WAIT ... > {code} > There are two problems that leads to the hang. > Problem 1. > When a reducer has an issue connecting to the node manager, copyFromHost may > call putBackKnownMapOutput on the same task attempt multiple times. > There are two call sites of putBackKnownMapOutput in copyFromHost since > MAPREDUCE-6303: > 1. In the finally block of copyFromHost > 2. In the catch block of openShuffleUrl. > When openShuffleUrl fails to connect from the catch block in copyFromHost, it > returns null. > By the time openShuffleUrl returns null, putBackKnownMapOutput would have > been called already for all remaining map outputs. > However, the finally block calls putBackKnownMapOutput one more time on the > map outputs. > Problem 2. Problem 1 causes a leak in MergeManager. > The problem occurs when multiple fetchers get the same set of map attempt > outputs to fetch. > Different fetchers reserves memory from MergeManager in Fetcher.copyMapOutput > for the same map outputs. > When the fetch succeeds, only the first map output gets committed through > ShuffleSchedulerImpl.copySucceeded -> InMemoryMapOutput.commit, because > commit() is gated by !finishedMaps[mapIndex]. > This may lead to a condition where usedMemory > memoryLimit, while > commitMemory < mergeThreshold. > This gets the MergeManager into a deadlock where a merge is never triggered > while MergeManager cannot reserve additional space for map outputs. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org