[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790907#action_12790907 ]
Todd Lipcon commented on MAPREDUCE-1213: ---------------------------------------- Patch looks mostly good to me. One comment: I think the name "moveAndDeleteLocalFiles" isn't quite descriptive enough. Perhaps "asyncDeletePathOnEachVolume" or something? The fact that the path is relative to *all* volumes and will be deleted on each of them is the key part I didn't understand at the first pass through the JIRA. I agree with Vinod that it would be nice to share code with CleanupQueue, but would be fine seeing it in another JIRA. > TaskTrackers restart is very slow because it deletes distributed cache > directory synchronously > ---------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-1213 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 0.20.1 > Reporter: dhruba borthakur > Assignee: Zheng Shao > Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, > MAPREDUCE-1213.3.patch > > > We are seeing that when we restart a tasktracker, it tries to recursively > delete all the file in the distributed cache. It invoked > FileUtil.fullyDelete() which is very very slow. This means that the > TaskTracker cannot join the cluster for an extended period of time (upto 2 > hours for us). The problem is acute if the number of files in a distributed > cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.