[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated MAPREDUCE-1213: --- Hadoop Flags: [Incompatible change, Reviewed] (was: [Reviewed]) Release Note: Directories specified in mapred.local.dir that can not be created now cause the TaskTracker to fail to start. Marking as in incompatible change as mapred.local.dirs that do not exist are no longer with this change. You'll get the following error and the TT will not start. ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Cannot create toBeDeleted in /doesNotExist TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao Fix For: 0.21.0 Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, MAPREDUCE-1213.3.patch, MAPREDUCE-1213.4.patch, MAPREDUCE-1213.branch-0.20.2.patch, MAPREDUCE-1213.branch-0.20.patch We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: MAPREDUCE-1213.branch-0.20.patch Patch for 0.20. TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao Fix For: 0.22.0 Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, MAPREDUCE-1213.3.patch, MAPREDUCE-1213.4.patch, MAPREDUCE-1213.branch-0.20.patch We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated MAPREDUCE-1213: Resolution: Fixed Fix Version/s: 0.22.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I just committed this. Thanks Zheng. TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao Fix For: 0.22.0 Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, MAPREDUCE-1213.3.patch, MAPREDUCE-1213.4.patch We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: MAPREDUCE-1213.4.patch Changed function name to moveAndDeleteFromEachVolume. AsyncDelete may have a different meaning - users might still see the files when the function returns. This code actually moves the file first. TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, MAPREDUCE-1213.3.patch, MAPREDUCE-1213.4.patch We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Status: Open (was: Patch Available) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, MAPREDUCE-1213.3.patch, MAPREDUCE-1213.4.patch We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Status: Patch Available (was: Open) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, MAPREDUCE-1213.3.patch, MAPREDUCE-1213.4.patch We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: MAPREDUCE-1213.3.patch This one uses the newly-committed AsyncDiskService from common. TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, MAPREDUCE-1213.3.patch We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Status: Open (was: Patch Available) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, MAPREDUCE-1213.3.patch We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: MAPREDUCE-1213.3.patch This one uses the AsyncDiskService from common. TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, MAPREDUCE-1213.3.patch We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: (was: MAPREDUCE-1213.3.patch) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, MAPREDUCE-1213.3.patch We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: (was: MAPREDUCE-1213.3.patch) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: (was: MAPREDUCE-1213.3.patch) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: MAPREDUCE-1213.3.patch TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, MAPREDUCE-1213.3.patch We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: MAPREDUCE-1213.1.patch This patch fixes the problem by moving the file first and removing it later asynchronously using a thread pool per volume. TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao Attachments: MAPREDUCE-1213.1.patch We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Status: Patch Available (was: Open) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao Attachments: MAPREDUCE-1213.1.patch We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: MAPREDUCE-1213.2.patch Fixed the comments and reorganized the class. TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Summary: TaskTrackers restart is very slow because it deletes distributed cache directory synchronously (was: TaskTrackers restart is very slow because ti deletes distributed cache directory synchronously) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously -- Key: MAPREDUCE-1213 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: dhruba borthakur Assignee: Zheng Shao We are seeing that when we restart a tasktracker, it tries to recursively delete all the file in the distributed cache. It invoked FileUtil.fullyDelete() which is very very slow. This means that the TaskTracker cannot join the cluster for an extended period of time (upto 2 hours for us). The problem is acute if the number of files in a distributed cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.