[ https://issues.apache.org/jira/browse/MAPREDUCE-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164825#comment-16164825 ]
Hadoop QA commented on MAPREDUCE-6957: -------------------------------------- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 20m 27s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 48s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 0s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 45m 57s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:71bbb86 | | JIRA Issue | MAPREDUCE-6957 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12886881/MAPREDUCE-6957.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 3b92538772e6 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / fa6cc43 | | Default Java | 1.8.0_144 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7132/testReport/ | | modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7132/console | | Powered by | Apache Yetus 0.5.0 http://yetus.apache.org | This message was automatically generated. > shuffle hangs after a node manager connection timeout > ----------------------------------------------------- > > Key: MAPREDUCE-6957 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6957 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Reporter: Jooseong Kim > Assignee: Jooseong Kim > Attachments: MAPREDUCE-6957.001.patch, MAPREDUCE-6957.002.patch, > MAPREDUCE-6957.003.patch > > > After a connection failure from the reducer to the node manager, shuffles > started to hang with the following message: > {code} > org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 - MergeManager > returned status WAIT ... > {code} > There are two problems that leads to the hang. > Problem 1. > When a reducer has an issue connecting to the node manager, copyFromHost may > call putBackKnownMapOutput on the same task attempt multiple times. > There are two call sites of putBackKnownMapOutput in copyFromHost since > MAPREDUCE-6303: > 1. In the finally block of copyFromHost > 2. In the catch block of openShuffleUrl. > When openShuffleUrl fails to connect from the catch block in copyFromHost, it > returns null. > By the time openShuffleUrl returns null, putBackKnownMapOutput would have > been called already for all remaining map outputs. > However, the finally block calls putBackKnownMapOutput one more time on the > map outputs. > Problem 2. Problem 1 causes a leak in MergeManager. > The problem occurs when multiple fetchers get the same set of map attempt > outputs to fetch. > Different fetchers reserves memory from MergeManager in Fetcher.copyMapOutput > for the same map outputs. > When the fetch succeeds, only the first map output gets committed through > ShuffleSchedulerImpl.copySucceeded -> InMemoryMapOutput.commit, because > commit() is gated by !finishedMaps[mapIndex]. > This may lead to a condition where usedMemory > memoryLimit, while > commitMemory < mergeThreshold. > This gets the MergeManager into a deadlock where a merge is never triggered > while MergeManager cannot reserve additional space for map outputs. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org