[ https://issues.apache.org/jira/browse/MAPREDUCE-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Mitic resolved MAPREDUCE-5512. ----------------------------------- Resolution: Fixed Fix Version/s: 1.3.0 1-win Fix committed to branch-1 and branch-1-win. > TaskTracker hung after failed reconnect to the JobTracker > --------------------------------------------------------- > > Key: MAPREDUCE-5512 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5512 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Affects Versions: 1.3.0 > Reporter: Ivan Mitic > Assignee: Ivan Mitic > Fix For: 1-win, 1.3.0 > > Attachments: hadoop-tasktracker-RD00155DD09100.log, > MAPREDUCE-5512.branch-1.patch, tt_Hung.txt > > > TaskTracker hung after failed reconnect to the JobTracker. > This is the problematic piece of code: > {code} > this.distributedCacheManager = new TrackerDistributedCacheManager( > this.fConf, taskController); > this.distributedCacheManager.startCleanupThread(); > > this.jobClient = (InterTrackerProtocol) > UserGroupInformation.getLoginUser().doAs( > new PrivilegedExceptionAction<Object>() { > public Object run() throws IOException { > return RPC.waitForProxy(InterTrackerProtocol.class, > InterTrackerProtocol.versionID, > jobTrackAddr, fConf); > } > }); > {code} > In case RPC.waitForProxy() throws, TrackerDistributedCacheManager cleanup > thread will never be stopped, and given that it is a non daemon thread it > will keep TT up forever. -- This message was sent by Atlassian JIRA (v6.1#6144)