[jira] [Updated] (MAPREDUCE-6840) Distcp to support cutoff time
[ https://issues.apache.org/jira/browse/MAPREDUCE-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-6840: -- Attachment: MAPREDUCE-6840.1.patch > Distcp to support cutoff time > - > > Key: MAPREDUCE-6840 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6840 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 2.6.0 >Reporter: Zheng Shao >Assignee: Zheng Shao >Priority: Minor > Attachments: MAPREDUCE-6840.1.patch > > > To ensure consistency in the datasets on HDFS, some projects like file > formats on Hive do HDFS operations in a particular order. For example, if a > file format uses an index file, a new version of the index file will only be > written to HDFS after all files mentioned by the index are written to HDFS. > When we do distcp, it's important to preserve that consistency, so that we > don't break those file formats. > A typical solution for that is to create a HDFS Snapshot beforehand, and only > distcp the Snapshot. That could work well if the user has superuser > privilege to make the directory snapshottable. > If not, then it will be beneficial to have a cutoff time for distcp, so that > distcp only copy files modified on/before that cutoff time. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6840) Distcp to support cutoff time
Zheng Shao created MAPREDUCE-6840: - Summary: Distcp to support cutoff time Key: MAPREDUCE-6840 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6840 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 2.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Priority: Minor To ensure consistency in the datasets on HDFS, some projects like file formats on Hive do HDFS operations in a particular order. For example, if a file format uses an index file, a new version of the index file will only be written to HDFS after all files mentioned by the index are written to HDFS. When we do distcp, it's important to preserve that consistency, so that we don't break those file formats. A typical solution for that is to create a HDFS Snapshot beforehand, and only distcp the Snapshot. That could work well if the user has superuser privilege to make the directory snapshottable. If not, then it will be beneficial to have a cutoff time for distcp, so that distcp only copy files modified on/before that cutoff time. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6009) Map-only job with new-api runs wrong OutputCommitter when cleanup scheduled in a reduce slot
[ https://issues.apache.org/jira/browse/MAPREDUCE-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638392#comment-14638392 ] Zheng Shao commented on MAPREDUCE-6009: --- We just hit this bug in an unpatched version of MR1. The situation is that HCatalog submits a Map-only job and hopes to use OutputCommitter.commitJob to create a Hive partition. Because of this bug, the Hive partition was never created. Our sanity check on the hive table + workflow retry mechanism allowed us to have this bug running in production for a long time (and wasting compute resources). It's great that this is fixed. > Map-only job with new-api runs wrong OutputCommitter when cleanup scheduled > in a reduce slot > > > Key: MAPREDUCE-6009 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6009 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, job submission >Affects Versions: 1.2.1 >Reporter: Gera Shegalov >Assignee: Gera Shegalov >Priority: Blocker > Fix For: 1.3.0, 1.2.2 > > Attachments: MAPREDUCE-6009.v01-branch-1.2.patch, > MAPREDUCE-6009.v02-branch-1.2.patch > > > In branch 1 job commit is executed in a JOB_CLEANUP task that may run in > either map or reduce slot > in org.apache.hadoop.mapreduce.Job#setUseNewAPI there is a logic setting > new-api flag only for reduce-ful jobs. > {code} > if (numReduces != 0) { > conf.setBooleanIfUnset("mapred.reducer.new-api", > conf.get(oldReduceClass) == null); > ... > {code} > Therefore, when cleanup runs in a reduce slot, ReduceTask inits using the old > API and runs incorrect default OutputCommitter, instead of consulting > OutputFormat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MAPREDUCE-1144) JT should not hold lock while writing user history logs to DFS
[ https://issues.apache.org/jira/browse/MAPREDUCE-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao reassigned MAPREDUCE-1144: - Assignee: (was: Zheng Shao) > JT should not hold lock while writing user history logs to DFS > -- > > Key: MAPREDUCE-1144 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1144 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.1 >Reporter: Todd Lipcon > Attachments: MAPREDUCE-1144-branch-1.2.patch > > > I've seen behavior a few times now where the DFS is being slow for one reason > or another, and the JT essentially locks up waiting on it while one thread > tries for a long time to write history files out. The stack trace blocking > everything is: > Thread 210 (IPC Server handler 10 on 7277): > State: WAITING > Blocked count: 171424 > Waited count: 1209604 > Waiting on java.util.LinkedList@407dd154 > Stack: > java.lang.Object.wait(Native Method) > java.lang.Object.wait(Object.java:485) > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3122) > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3202) > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3151) > > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:67) > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) > sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:301) > sun.nio.cs.StreamEncoder.close(StreamEncoder.java:130) > java.io.OutputStreamWriter.close(OutputStreamWriter.java:216) > java.io.BufferedWriter.close(BufferedWriter.java:248) > java.io.PrintWriter.close(PrintWriter.java:295) > > org.apache.hadoop.mapred.JobHistory$JobInfo.logFinished(JobHistory.java:1349) > > org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:2167) > > org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2111) > > org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:873) > > org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:3598) > org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:2792) > org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2581) > sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) > We should try not to do external IO while holding the JT lock, and instead > write the data to an in-memory buffer, drop the lock, and then write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MAPREDUCE-1144) JT should not hold lock while writing user history logs to DFS
[ https://issues.apache.org/jira/browse/MAPREDUCE-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao reassigned MAPREDUCE-1144: - Assignee: Zheng Shao > JT should not hold lock while writing user history logs to DFS > -- > > Key: MAPREDUCE-1144 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1144 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.1 >Reporter: Todd Lipcon >Assignee: Zheng Shao > Attachments: MAPREDUCE-1144-branch-1.2.patch > > > I've seen behavior a few times now where the DFS is being slow for one reason > or another, and the JT essentially locks up waiting on it while one thread > tries for a long time to write history files out. The stack trace blocking > everything is: > Thread 210 (IPC Server handler 10 on 7277): > State: WAITING > Blocked count: 171424 > Waited count: 1209604 > Waiting on java.util.LinkedList@407dd154 > Stack: > java.lang.Object.wait(Native Method) > java.lang.Object.wait(Object.java:485) > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3122) > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3202) > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3151) > > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:67) > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) > sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:301) > sun.nio.cs.StreamEncoder.close(StreamEncoder.java:130) > java.io.OutputStreamWriter.close(OutputStreamWriter.java:216) > java.io.BufferedWriter.close(BufferedWriter.java:248) > java.io.PrintWriter.close(PrintWriter.java:295) > > org.apache.hadoop.mapred.JobHistory$JobInfo.logFinished(JobHistory.java:1349) > > org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:2167) > > org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2111) > > org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:873) > > org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:3598) > org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:2792) > org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2581) > sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) > We should try not to do external IO while holding the JT lock, and instead > write the data to an in-memory buffer, drop the lock, and then write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] Commented: (MAPREDUCE-1382) MRAsyncDiscService should tolerate missing local.dir
[ https://issues.apache.org/jira/browse/MAPREDUCE-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907441#action_12907441 ] Zheng Shao commented on MAPREDUCE-1382: --- I believe other logic in TaskTracker/JobTracker will fail and report in that case. > MRAsyncDiscService should tolerate missing local.dir > > > Key: MAPREDUCE-1382 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1382 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Scott Chen >Assignee: Zheng Shao > Attachments: MAPREDUCE-1382.1.patch, MAPREDUCE-1382.2.patch, > MAPREDUCE-1382.3.patch, > MAPREDUCE-1382.branch-0.20.on.top.of.MAPREDUCE-1302.1.patch > > > Currently when some of the local.dir do not exist, MRAsyncDiscService will > fail. It should only fail when all directories don't work. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1382) MRAsyncDiscService should tolerate missing local.dir
[ https://issues.apache.org/jira/browse/MAPREDUCE-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1382: -- Attachment: MAPREDUCE-1382.3.patch Fixed unit test. > MRAsyncDiscService should tolerate missing local.dir > > > Key: MAPREDUCE-1382 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1382 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Scott Chen >Assignee: Zheng Shao > Attachments: MAPREDUCE-1382.1.patch, MAPREDUCE-1382.2.patch, > MAPREDUCE-1382.3.patch, > MAPREDUCE-1382.branch-0.20.on.top.of.MAPREDUCE-1302.1.patch > > > Currently when some of the local.dir do not exist, MRAsyncDiscService will > fail. It should only fail when all directories don't work. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1382) MRAsyncDiscService should tolerate missing local.dir
[ https://issues.apache.org/jira/browse/MAPREDUCE-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1382: -- Status: Patch Available (was: Open) > MRAsyncDiscService should tolerate missing local.dir > > > Key: MAPREDUCE-1382 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1382 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Scott Chen >Assignee: Zheng Shao > Attachments: MAPREDUCE-1382.1.patch, MAPREDUCE-1382.2.patch, > MAPREDUCE-1382.3.patch, > MAPREDUCE-1382.branch-0.20.on.top.of.MAPREDUCE-1302.1.patch > > > Currently when some of the local.dir do not exist, MRAsyncDiscService will > fail. It should only fail when all directories don't work. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1382) MRAsyncDiscService should tolerate missing local.dir
[ https://issues.apache.org/jira/browse/MAPREDUCE-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1382: -- Attachment: MAPREDUCE-1382.2.patch Todd, you are right. LocalFileSystem will throw IOE in that case. This patch addresses Todd's concern. > MRAsyncDiscService should tolerate missing local.dir > > > Key: MAPREDUCE-1382 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1382 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Scott Chen >Assignee: Zheng Shao > Attachments: MAPREDUCE-1382.1.patch, MAPREDUCE-1382.2.patch, > MAPREDUCE-1382.branch-0.20.on.top.of.MAPREDUCE-1302.1.patch > > > Currently when some of the local.dir do not exist, MRAsyncDiscService will > fail. It should only fail when all directories don't work. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1887) MRAsyncDiskService does not properly absolutize volume root paths
[ https://issues.apache.org/jira/browse/MAPREDUCE-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1887: -- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Release Note: MAPREDUCE-1887. MRAsyncDiskService now properly absolutizes volume root paths. (Aaron Kimball via zshao) Fix Version/s: 0.22.0 Resolution: Fixed Committed revision 957772. Thanks Aaron! > MRAsyncDiskService does not properly absolutize volume root paths > - > > Key: MAPREDUCE-1887 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1887 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1887.2.patch, MAPREDUCE-1887.3.patch, > MAPREDUCE-1887.patch > > > In MRAsyncDiskService, volume names are sometimes specified as relative > paths, which are not converted to absolute paths. This can cause errors of > the form "cannot delete since it is outside of > " even though the actual path is inside the root. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1887) MRAsyncDiskService does not properly absolutize volume root paths
[ https://issues.apache.org/jira/browse/MAPREDUCE-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882372#action_12882372 ] Zheng Shao commented on MAPREDUCE-1887: --- Can you take a look at the failed contrib tests? > MRAsyncDiskService does not properly absolutize volume root paths > - > > Key: MAPREDUCE-1887 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1887 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Attachments: MAPREDUCE-1887.2.patch, MAPREDUCE-1887.3.patch, > MAPREDUCE-1887.patch > > > In MRAsyncDiskService, volume names are sometimes specified as relative > paths, which are not converted to absolute paths. This can cause errors of > the form "cannot delete since it is outside of > " even though the actual path is inside the root. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1887) MRAsyncDiskService does not properly absolutize volume root paths
[ https://issues.apache.org/jira/browse/MAPREDUCE-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881875#action_12881875 ] Zheng Shao commented on MAPREDUCE-1887: --- Aaron, can you take a look at the unit test failures? > MRAsyncDiskService does not properly absolutize volume root paths > - > > Key: MAPREDUCE-1887 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1887 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Attachments: MAPREDUCE-1887.2.patch, MAPREDUCE-1887.3.patch, > MAPREDUCE-1887.patch > > > In MRAsyncDiskService, volume names are sometimes specified as relative > paths, which are not converted to absolute paths. This can cause errors of > the form "cannot delete since it is outside of > " even though the actual path is inside the root. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1887) MRAsyncDiskService does not properly absolutize volume root paths
[ https://issues.apache.org/jira/browse/MAPREDUCE-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1887: -- Status: Open (was: Patch Available) > MRAsyncDiskService does not properly absolutize volume root paths > - > > Key: MAPREDUCE-1887 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1887 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Attachments: MAPREDUCE-1887.2.patch, MAPREDUCE-1887.patch > > > In MRAsyncDiskService, volume names are sometimes specified as relative > paths, which are not converted to absolute paths. This can cause errors of > the form "cannot delete since it is outside of > " even though the actual path is inside the root. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1887) MRAsyncDiskService does not properly absolutize volume root paths
[ https://issues.apache.org/jira/browse/MAPREDUCE-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881397#action_12881397 ] Zheng Shao commented on MAPREDUCE-1887: --- Code looks good. Can we change {code} + * @param nonCanonicalVols The roots of the file system volumes, which may not + * be canonical paths. {code} to {code} + * @param nonCanonicalVols The roots of the file system volumes, which can be + * absolute paths from root or relative path from cwd. {code} ? I think the second one is easier to understand. > MRAsyncDiskService does not properly absolutize volume root paths > - > > Key: MAPREDUCE-1887 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1887 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Attachments: MAPREDUCE-1887.2.patch, MAPREDUCE-1887.patch > > > In MRAsyncDiskService, volume names are sometimes specified as relative > paths, which are not converted to absolute paths. This can cause errors of > the form "cannot delete since it is outside of > " even though the actual path is inside the root. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1568) TrackerDistributedCacheManager should clean up cache in a background thread
[ https://issues.apache.org/jira/browse/MAPREDUCE-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1568: -- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Release Note: MAPREDUCE-1568. TrackerDistributedCacheManager should clean up cache in a background thread. (Scott Chen via zshao) Resolution: Fixed Committed. Thanks Scott! > TrackerDistributedCacheManager should clean up cache in a background thread > --- > > Key: MAPREDUCE-1568 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1568 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1568-v2.1.txt, MAPREDUCE-1568-v2.txt, > MAPREDUCE-1568-v3.1.txt, MAPREDUCE-1568-v3.txt, MAPREDUCE-1568.txt > > > Right now the TrackerDistributedCacheManager do the clean up with the > following code path: > {code} > TaskRunner.run() -> > TrackerDistributedCacheManager.setup() -> > TrackerDistributedCacheManager.getLocalCache() -> > TrackerDistributedCacheManager.deleteCache() > {code} > The deletion of the cache files can take a long time and it should not be > done by a task. We suggest that there should be a separate thread checking > and clean up the cache files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1568) TrackerDistributedCacheManager should do deleteLocalPath asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860091#action_12860091 ] Zheng Shao commented on MAPREDUCE-1568: --- +1 > TrackerDistributedCacheManager should do deleteLocalPath asynchronously > --- > > Key: MAPREDUCE-1568 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1568 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1568.txt > > > TrackerDistributedCacheManager.deleteCache() has been improved: > MAPREDUCE-1302 makes TrackerDistributedCacheManager rename the caches in the > main thread and then delete them in the background > MAPREDUCE-1098 avoids global locking while do the renaming (renaming lots of > directories can also takes a long time) > But the deleteLocalCache is still in the main thread of TaskRunner.run(). So > it will still slow down the task which triggers the deletion (originally this > will blocks all tasks, but it is fixed by MAPREDUCE-1098). Other tasks do not > wait for the deletion. The task which triggers the deletion should not wait > for this either. TrackerDistributedCacheManager should do deleteLocalPath() > asynchronously. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1649) Compressed files with TextInputFormat does not work with CombineFileInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1649: -- Attachment: MAPREDUCE-1649.1.branch-0.20.patch A simple fix (MAPREDUCE-1649.1.branch-0.20.patch) is to ignore the splits that start with non-0 offset in {{TextInputFormat}} when the file is non-splittable. > Compressed files with TextInputFormat does not work with > CombineFileInputFormat > --- > > Key: MAPREDUCE-1649 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1649 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.2 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1649.1.branch-0.20.patch > > > {{CombineFileInputFormat}} creates splits based on blocks, regardless whether > the underlying {{FileInputFormat}} is splittable or not.. > This means that we can have 2 or more splits for a compressed text file with > {{TextInputFormat}}. For each of these splits, > {{TextInputFormat.getRecordReader}} will return a {{RecordReader}} for the > whole compressed file, thus causing duplicate input data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1649) Compressed files with TextInputFormat does not work with CombineFileInputFormat
Compressed files with TextInputFormat does not work with CombineFileInputFormat --- Key: MAPREDUCE-1649 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1649 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2 Reporter: Zheng Shao Assignee: Zheng Shao {{CombineFileInputFormat}} creates splits based on blocks, regardless whether the underlying {{FileInputFormat}} is splittable or not.. This means that we can have 2 or more splits for a compressed text file with {{TextInputFormat}}. For each of these splits, {{TextInputFormat.getRecordReader}} will return a {{RecordReader}} for the whole compressed file, thus causing duplicate input data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao resolved MAPREDUCE-1501. --- Resolution: Fixed Will open a new one to address this issue. > FileInputFormat to support multi-level/recursive directory listing > -- > > Key: MAPREDUCE-1501 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1501.1.branch-0.20.patch, > MAPREDUCE-1501.1.trunk.patch > > > As we have seen multiple times in the mailing list, users want to have the > capability of getting all files out of a multi-level directory structure. > 4/1/2008: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e > 2/3/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e > 6/2/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e > One solution that our users had is to write a new FileInputFormat, but that > means all existing FileInputFormat subclasses need to be changed in order to > support this feature. > We can easily provide a JobConf option (which defaults to false) to > {{FileInputFormat.listStatus(...)}} to recursively go into directory > structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1577) FileInputFormat in the new mapreduce package to support multi-level/recursive directory listing
FileInputFormat in the new mapreduce package to support multi-level/recursive directory listing Key: MAPREDUCE-1577 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1577 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao See MAPREDUCE-1501 for details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao reopened MAPREDUCE-1501: --- Reopened for Chris's comments. > FileInputFormat to support multi-level/recursive directory listing > -- > > Key: MAPREDUCE-1501 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1501.1.branch-0.20.patch, > MAPREDUCE-1501.1.trunk.patch > > > As we have seen multiple times in the mailing list, users want to have the > capability of getting all files out of a multi-level directory structure. > 4/1/2008: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e > 2/3/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e > 6/2/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e > One solution that our users had is to write a new FileInputFormat, but that > means all existing FileInputFormat subclasses need to be changed in order to > support this feature. > We can easily provide a JobConf option (which defaults to false) to > {{FileInputFormat.listStatus(...)}} to recursively go into directory > structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1423) Improve performance of CombineFileInputFormat when multiple pools are configured
[ https://issues.apache.org/jira/browse/MAPREDUCE-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1423: -- Tags: combinefileinputformat Resolution: Fixed Fix Version/s: 0.22.0 Release Note: MAPREDUCE-1423. Improve performance of CombineFileInputFormat when multiple pools are configured. (Dhruba Borthakur via zshao) Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Dhruba! > Improve performance of CombineFileInputFormat when multiple pools are > configured > > > Key: MAPREDUCE-1423 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1423 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: dhruba borthakur >Assignee: dhruba borthakur > Fix For: 0.22.0 > > Attachments: CombineFileInputFormatPerformance.txt, > CombineFileInputFormatPerformance.txt > > > I have a map-reduce job that is using CombineFileInputFormat. It has > configured 1 pools and 3 files. The time to create the splits takes > more than an hour. The reaosn being that CombineFileInputFormat.getSplits() > converts the same path from String to Path object multiple times, one for > each instance of a pool. Similarly, it calls Path.toUri(0 multiple times. > This code can be optimized. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1538) TrackerDistributedCacheManager can fail because the number of subdirectories reaches system limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839131#action_12839131 ] Zheng Shao commented on MAPREDUCE-1538: --- +1 > TrackerDistributedCacheManager can fail because the number of subdirectories > reaches system limit > - > > Key: MAPREDUCE-1538 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1538 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1538.patch > > > TrackerDistributedCacheManager deletes the cached files when the size goes up > to a configured number. > But there is no such limit for the number of subdirectories. Therefore the > number of subdirectories may grow large and exceed system limit. > This will make TT cannot create directory when getLocalCache and fails the > tasks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
[ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838023#action_12838023 ] Zheng Shao commented on MAPREDUCE-1221: --- Arun, does the explanations from Scott and Matei make sense to you? If it looks good to you, I would like to commit it. > Kill tasks on a node if the free physical memory on that machine falls below > a configured threshold > --- > > Key: MAPREDUCE-1221 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.22.0 >Reporter: dhruba borthakur >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, > MAPREDUCE-1221-v3.patch > > > The TaskTracker currently supports killing tasks if the virtual memory of a > task exceeds a set of configured thresholds. I would like to extend this > feature to enable killing tasks if the physical memory used by that task > exceeds a certain threshold. > On a certain operating system (guess?), if user space processes start using > lots of memory, the machine hangs and dies quickly. This means that we would > like to prevent map-reduce jobs from triggering this condition. From my > understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were > designed to address this problem. This works well when most map-reduce jobs > are Java jobs and have well-defined -Xmx parameters that specify the max > virtual memory for each task. On the other hand, if each task forks off > mappers/reducers written in other languages (python/php, etc), the total > virtual memory usage of the process-subtree varies greatly. In these cases, > it is better to use kill-tasks-using-physical-memory-limits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836963#action_12836963 ] Zheng Shao commented on MAPREDUCE-1501: --- Thanks Dhruba. I missed the part "and other hidden directories". We do call PathFilter on the sub directories as well (see addInputPathRecursively(...)). Is that good enough or we want to split the PathFilters for files and the PathFilters for directories? > FileInputFormat to support multi-level/recursive directory listing > -- > > Key: MAPREDUCE-1501 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1501.1.branch-0.20.patch, > MAPREDUCE-1501.1.trunk.patch > > > As we have seen multiple times in the mailing list, users want to have the > capability of getting all files out of a multi-level directory structure. > 4/1/2008: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e > 2/3/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e > 6/2/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e > One solution that our users had is to write a new FileInputFormat, but that > means all existing FileInputFormat subclasses need to be changed in order to > support this feature. > We can easily provide a JobConf option (which defaults to false) to > {{FileInputFormat.listStatus(...)}} to recursively go into directory > structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836936#action_12836936 ] Zheng Shao commented on MAPREDUCE-1501: --- Thanks for the feedback Ian. I don't think FileSystem.listPath() returns "." or "..". If it does, I believe the current code in trunk will also break. The new unit test will also fail if that's the case. > FileInputFormat to support multi-level/recursive directory listing > -- > > Key: MAPREDUCE-1501 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1501.1.branch-0.20.patch, > MAPREDUCE-1501.1.trunk.patch > > > As we have seen multiple times in the mailing list, users want to have the > capability of getting all files out of a multi-level directory structure. > 4/1/2008: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e > 2/3/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e > 6/2/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e > One solution that our users had is to write a new FileInputFormat, but that > means all existing FileInputFormat subclasses need to be changed in order to > support this feature. > We can easily provide a JobConf option (which defaults to false) to > {{FileInputFormat.listStatus(...)}} to recursively go into directory > structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1501: -- Status: Patch Available (was: Open) There are 2 test failures but I don't think they are related. Resubmitting patch to get it tested again. > FileInputFormat to support multi-level/recursive directory listing > -- > > Key: MAPREDUCE-1501 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1501.1.branch-0.20.patch, > MAPREDUCE-1501.1.trunk.patch > > > As we have seen multiple times in the mailing list, users want to have the > capability of getting all files out of a multi-level directory structure. > 4/1/2008: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e > 2/3/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e > 6/2/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e > One solution that our users had is to write a new FileInputFormat, but that > means all existing FileInputFormat subclasses need to be changed in order to > support this feature. > We can easily provide a JobConf option (which defaults to false) to > {{FileInputFormat.listStatus(...)}} to recursively go into directory > structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1501: -- Status: Open (was: Patch Available) > FileInputFormat to support multi-level/recursive directory listing > -- > > Key: MAPREDUCE-1501 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1501.1.branch-0.20.patch, > MAPREDUCE-1501.1.trunk.patch > > > As we have seen multiple times in the mailing list, users want to have the > capability of getting all files out of a multi-level directory structure. > 4/1/2008: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e > 2/3/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e > 6/2/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e > One solution that our users had is to write a new FileInputFormat, but that > means all existing FileInputFormat subclasses need to be changed in order to > support this feature. > We can easily provide a JobConf option (which defaults to false) to > {{FileInputFormat.listStatus(...)}} to recursively go into directory > structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1504) SequenceFile.Reader constructor leaking resources
[ https://issues.apache.org/jira/browse/MAPREDUCE-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao resolved MAPREDUCE-1504. --- Resolution: Fixed Fixed in HADOOP-5476 > SequenceFile.Reader constructor leaking resources > - > > Key: MAPREDUCE-1504 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1504 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Zheng Shao > > When {{SequenceFile.Reader}} constructor throws an {{IOException}} (because > the file does not conform to {{SequenceFile}} format), we will have such a > problem. > The caller won't have a pointer to the reader because of the {{IOException}} > thrown. > We should call {{in.close()}} inside the constructor to make sure that we > don't leak resources (file descriptor and connection to the data node, etc). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1501: -- Attachment: MAPREDUCE-1501.1.trunk.patch > FileInputFormat to support multi-level/recursive directory listing > -- > > Key: MAPREDUCE-1501 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1501.1.branch-0.20.patch, > MAPREDUCE-1501.1.trunk.patch > > > As we have seen multiple times in the mailing list, users want to have the > capability of getting all files out of a multi-level directory structure. > 4/1/2008: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e > 2/3/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e > 6/2/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e > One solution that our users had is to write a new FileInputFormat, but that > means all existing FileInputFormat subclasses need to be changed in order to > support this feature. > We can easily provide a JobConf option (which defaults to false) to > {{FileInputFormat.listStatus(...)}} to recursively go into directory > structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1501: -- Status: Patch Available (was: Open) > FileInputFormat to support multi-level/recursive directory listing > -- > > Key: MAPREDUCE-1501 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1501.1.branch-0.20.patch, > MAPREDUCE-1501.1.trunk.patch > > > As we have seen multiple times in the mailing list, users want to have the > capability of getting all files out of a multi-level directory structure. > 4/1/2008: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e > 2/3/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e > 6/2/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e > One solution that our users had is to write a new FileInputFormat, but that > means all existing FileInputFormat subclasses need to be changed in order to > support this feature. > We can easily provide a JobConf option (which defaults to false) to > {{FileInputFormat.listStatus(...)}} to recursively go into directory > structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1501: -- Attachment: MAPREDUCE-1501.1.branch-0.20.patch > FileInputFormat to support multi-level/recursive directory listing > -- > > Key: MAPREDUCE-1501 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1501.1.branch-0.20.patch > > > As we have seen multiple times in the mailing list, users want to have the > capability of getting all files out of a multi-level directory structure. > 4/1/2008: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e > 2/3/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e > 6/2/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e > One solution that our users had is to write a new FileInputFormat, but that > means all existing FileInputFormat subclasses need to be changed in order to > support this feature. > We can easily provide a JobConf option (which defaults to false) to > {{FileInputFormat.listStatus(...)}} to recursively go into directory > structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao reassigned MAPREDUCE-1501: - Assignee: Zheng Shao > FileInputFormat to support multi-level/recursive directory listing > -- > > Key: MAPREDUCE-1501 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zheng Shao >Assignee: Zheng Shao > > As we have seen multiple times in the mailing list, users want to have the > capability of getting all files out of a multi-level directory structure. > 4/1/2008: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e > 2/3/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e > 6/2/2009: > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e > One solution that our users had is to write a new FileInputFormat, but that > means all existing FileInputFormat subclasses need to be changed in order to > support this feature. > We can easily provide a JobConf option (which defaults to false) to > {{FileInputFormat.listStatus(...)}} to recursively go into directory > structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
[ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1221: -- Status: Patch Available (was: Open) > Kill tasks on a node if the free physical memory on that machine falls below > a configured threshold > --- > > Key: MAPREDUCE-1221 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.22.0 >Reporter: dhruba borthakur >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch, > MAPREDUCE-1221-v3.patch > > > The TaskTracker currently supports killing tasks if the virtual memory of a > task exceeds a set of configured thresholds. I would like to extend this > feature to enable killing tasks if the physical memory used by that task > exceeds a certain threshold. > On a certain operating system (guess?), if user space processes start using > lots of memory, the machine hangs and dies quickly. This means that we would > like to prevent map-reduce jobs from triggering this condition. From my > understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were > designed to address this problem. This works well when most map-reduce jobs > are Java jobs and have well-defined -Xmx parameters that specify the max > virtual memory for each task. On the other hand, if each task forks off > mappers/reducers written in other languages (python/php, etc), the total > virtual memory usage of the process-subtree varies greatly. In these cases, > it is better to use kill-tasks-using-physical-memory-limits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
[ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835966#action_12835966 ] Zheng Shao commented on MAPREDUCE-1221: --- Scott, can you replace TAB with 2 spaces in your code? > Kill tasks on a node if the free physical memory on that machine falls below > a configured threshold > --- > > Key: MAPREDUCE-1221 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.22.0 >Reporter: dhruba borthakur >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch > > > The TaskTracker currently supports killing tasks if the virtual memory of a > task exceeds a set of configured thresholds. I would like to extend this > feature to enable killing tasks if the physical memory used by that task > exceeds a certain threshold. > On a certain operating system (guess?), if user space processes start using > lots of memory, the machine hangs and dies quickly. This means that we would > like to prevent map-reduce jobs from triggering this condition. From my > understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were > designed to address this problem. This works well when most map-reduce jobs > are Java jobs and have well-defined -Xmx parameters that specify the max > virtual memory for each task. On the other hand, if each task forks off > mappers/reducers written in other languages (python/php, etc), the total > virtual memory usage of the process-subtree varies greatly. In these cases, > it is better to use kill-tasks-using-physical-memory-limits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold
[ https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1221: -- Status: Open (was: Patch Available) > Kill tasks on a node if the free physical memory on that machine falls below > a configured threshold > --- > > Key: MAPREDUCE-1221 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.22.0 >Reporter: dhruba borthakur >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch > > > The TaskTracker currently supports killing tasks if the virtual memory of a > task exceeds a set of configured thresholds. I would like to extend this > feature to enable killing tasks if the physical memory used by that task > exceeds a certain threshold. > On a certain operating system (guess?), if user space processes start using > lots of memory, the machine hangs and dies quickly. This means that we would > like to prevent map-reduce jobs from triggering this condition. From my > understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were > designed to address this problem. This works well when most map-reduce jobs > are Java jobs and have well-defined -Xmx parameters that specify the max > virtual memory for each task. On the other hand, if each task forks off > mappers/reducers written in other languages (python/php, etc), the total > virtual memory usage of the process-subtree varies greatly. In these cases, > it is better to use kill-tasks-using-physical-memory-limits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1504) SequenceFile.Reader constructor leaking resources
SequenceFile.Reader constructor leaking resources - Key: MAPREDUCE-1504 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1504 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Zheng Shao When {{SequenceFile.Reader}} constructor throws an {{IOException}} (because the file does not conform to {{SequenceFile}} format), we will have such a problem. The caller won't have a pointer to the reader because of the {{IOException}} thrown. We should call {{in.close()}} inside the constructor to make sure that we don't leak resources (file descriptor and connection to the data node, etc). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing
FileInputFormat to support multi-level/recursive directory listing -- Key: MAPREDUCE-1501 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Zheng Shao As we have seen multiple times in the mailing list, users want to have the capability of getting all files out of a multi-level directory structure. 4/1/2008: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e 2/3/2009: http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e 6/2/2009: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e One solution that our users had is to write a new FileInputFormat, but that means all existing FileInputFormat subclasses need to be changed in order to support this feature. We can easily provide a JobConf option (which defaults to false) to {{FileInputFormat.listStatus(...)}} to recursively go into directory structure. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1374) Reduce memory footprint of FileSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802500#action_12802500 ] Zheng Shao commented on MAPREDUCE-1374: --- Right I think JT uses RawSplits. This issue is trying to fix the memory footprint of the JobClient. We call InputFormat.getSplits(job) which returns all splits in an array. This costs a lot of memory. I verified that this is still true for trunk. > Reduce memory footprint of FileSplit > > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.21.0, 0.22.0 > > Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, > MAPREDUCE-1374.3.patch > > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > -retVal[index++] = host.node.getName().split(":")[0]; > +retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1382) MRAsyncDiscService should tolerate missing local.dir
[ https://issues.apache.org/jira/browse/MAPREDUCE-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1382: -- Attachment: MAPREDUCE-1382.branch-0.20.on.top.of.MAPREDUCE-1302.1.patch patch for 0.20 > MRAsyncDiscService should tolerate missing local.dir > > > Key: MAPREDUCE-1382 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1382 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Scott Chen >Assignee: Zheng Shao > Attachments: MAPREDUCE-1382.1.patch, > MAPREDUCE-1382.branch-0.20.on.top.of.MAPREDUCE-1302.1.patch > > > Currently when some of the local.dir do not exist, MRAsyncDiscService will > fail. It should only fail when all directories don't work. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1382) MRAsyncDiscService should tolerate missing local.dir
[ https://issues.apache.org/jira/browse/MAPREDUCE-1382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1382: -- Attachment: MAPREDUCE-1382.1.patch Added a test that tests deletion non-existing files. > MRAsyncDiscService should tolerate missing local.dir > > > Key: MAPREDUCE-1382 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1382 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Scott Chen >Assignee: Zheng Shao > Attachments: MAPREDUCE-1382.1.patch > > > Currently when some of the local.dir do not exist, MRAsyncDiscService will > fail. It should only fail when all directories don't work. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1374) Reduce memory footprint of FileSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800969#action_12800969 ] Zheng Shao commented on MAPREDUCE-1374: --- Verified that the test error is not related: {code} java.lang.ClassNotFoundException: org.apache.hadoop.mapred.TestTTMemoryReporting at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:169) {code} > Reduce memory footprint of FileSplit > > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.21.0, 0.22.0 > > Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, > MAPREDUCE-1374.3.patch > > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > -retVal[index++] = host.node.getName().split(":")[0]; > +retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Attachment: MAPREDUCE-1302.branch-0.20.on.top.of.MAPREDUCE-1213.3.patch Fixed some unit test failures for hadoop-0.20. Note that this patch can only be applied to hadoop-0.20 after the MAPREDUCE-1213 is applied. > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch, MAPREDUCE-1302.4.patch, > MAPREDUCE-1302.5.patch, > MAPREDUCE-1302.branch-0.20.on.top.of.MAPREDUCE-1213.2.patch, > MAPREDUCE-1302.branch-0.20.on.top.of.MAPREDUCE-1213.3.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: MAPREDUCE-1213.branch-0.20.2.patch Removed unnecessary changes for hadoop 0.20. > TaskTrackers restart is very slow because it deletes distributed cache > directory synchronously > -- > > Key: MAPREDUCE-1213 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: dhruba borthakur >Assignee: Zheng Shao > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, > MAPREDUCE-1213.3.patch, MAPREDUCE-1213.4.patch, > MAPREDUCE-1213.branch-0.20.2.patch, MAPREDUCE-1213.branch-0.20.patch > > > We are seeing that when we restart a tasktracker, it tries to recursively > delete all the file in the distributed cache. It invoked > FileUtil.fullyDelete() which is very very slow. This means that the > TaskTracker cannot join the cluster for an extended period of time (upto 2 > hours for us). The problem is acute if the number of files in a distributed > cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1374) Reduce memory footprint of FileSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1374: -- Status: Patch Available (was: Open) > Reduce memory footprint of FileSplit > > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.21.0, 0.22.0 > > Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, > MAPREDUCE-1374.3.patch > > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > -retVal[index++] = host.node.getName().split(":")[0]; > +retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1374) Reduce memory footprint of FileSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1374: -- Status: Open (was: Patch Available) > Reduce memory footprint of FileSplit > > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.21.0, 0.22.0 > > Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, > MAPREDUCE-1374.3.patch > > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > -retVal[index++] = host.node.getName().split(":")[0]; > +retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1374) Reduce memory footprint of FileSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1374: -- Status: Open (was: Patch Available) > Reduce memory footprint of FileSplit > > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.21.0, 0.22.0 > > Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, > MAPREDUCE-1374.3.patch > > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > -retVal[index++] = host.node.getName().split(":")[0]; > +retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1374) Reduce memory footprint of FileSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1374: -- Status: Patch Available (was: Open) > Reduce memory footprint of FileSplit > > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.21.0, 0.22.0 > > Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, > MAPREDUCE-1374.3.patch > > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > -retVal[index++] = host.node.getName().split(":")[0]; > +retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1333) Parallel running tasks on one single node may slow down the performance
[ https://issues.apache.org/jira/browse/MAPREDUCE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799895#action_12799895 ] Zheng Shao commented on MAPREDUCE-1333: --- How many CPU cores do each of the node have? > Parallel running tasks on one single node may slow down the performance > --- > > Key: MAPREDUCE-1333 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1333 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker, task, tasktracker >Affects Versions: 0.20.1 >Reporter: Zhaoning Zhang > > When I analysis running tasks performance, I found that parallel running > tasks on one single node will not be better performance than the serialized > ones. > We can set mapred.tasktracker.{map|reduce}.tasks.maximum = 1 individually, > but there will be parallel map AND reduce tasks. > And I wonder it's true in the real commercial clusters? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1374) Reduce memory footprint of FileSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1374: -- Attachment: MAPREDUCE-1374.3.patch Added comment before "Path getPath()" to address Todd's comment. > Reduce memory footprint of FileSplit > > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.21.0, 0.22.0 > > Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, > MAPREDUCE-1374.3.patch > > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > -retVal[index++] = host.node.getName().split(":")[0]; > +retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1374) Reduce memory footprint of FileSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799890#action_12799890 ] Zheng Shao commented on MAPREDUCE-1374: --- Thanks Todd. Yes I see the merit of adding a weak reference map in the Path class. That will still consume several times larger memory than String, but will help remove the potential duplicate Path objects. > Reduce memory footprint of FileSplit > > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.21.0, 0.22.0 > > Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch > > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > -retVal[index++] = host.node.getName().split(":")[0]; > +retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1374) Reduce memory footprint of FileSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1374: -- Attachment: MAPREDUCE-1374.2.patch Added test case. > Reduce memory footprint of FileSplit > > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.21.0, 0.22.0 > > Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch > > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > -retVal[index++] = host.node.getName().split(":")[0]; > +retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1374) Reduce memory footprint of FileSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1374: -- Fix Version/s: 0.22.0 0.21.0 Status: Patch Available (was: Open) > Reduce memory footprint of FileSplit > > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.21.0, 0.22.0 > > Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch > > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > -retVal[index++] = host.node.getName().split(":")[0]; > +retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1374) Reduce memory footprint of FileSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1374: -- Attachment: MAPREDUCE-1374.1.patch I am not sure whether I should create a new String[] in the constructor and then change the elements. Since file is private, this should be compatible with any other derived classes. > Reduce memory footprint of FileSplit > > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1374.1.patch > > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > -retVal[index++] = host.node.getName().split(":")[0]; > +retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1374) Reduce memory footprint of FileSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799632#action_12799632 ] Zheng Shao commented on MAPREDUCE-1374: --- This experiment is done on hadoop-0.20. It shows JobClient memory usage by submitting a map-reduce job with around 200K mappers: jmap before using this patch: (OOM before getting to the same stage as the second example) {code} num #instances #bytes class name -- 1:188870 18107344 [C 2:2426169704640 java.lang.String 3: 428506543408 4: 732185271696 org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit 5: 428505151504 6: 35704693192 7: 720773647360 8: 733073518736 org.apache.hadoop.mapred.FileSplit 9: 754243075008 [Ljava.lang.String; 10: 35702818968 11: 27412524096 ... 14: 100691449936 java.net.URI ... 23: 10065 241560 org.apache.hadoop.fs.Path {code} jmap after this patch: {code} num #instances #bytes class name -- 1:199014 14329008 org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit 2:2018019818856 [Ljava.lang.String; 3:1996849584832 org.apache.hadoop.mapred.FileSplit 4: 565948211632 [C 5: 428516543872 6: 428515151624 7: 35704693616 8: 720913648368 9: 35702818968 10: 25172675256 [Ljava.lang.Object; 11: 47632531104 [I 12: 27412524320 13: 622752491000 java.lang.String ... 31: 456 65664 java.net.URI ... 69: 452 10848 org.apache.hadoop.fs.Path {code} String:FileSplit ratio: before this patch: 3.3 : 1 after this patch: 0.3 : 1 We reduced the number of String object by 10 times! > Reduce memory footprint of FileSplit > > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > -retVal[index++] = host.node.getName().split(":")[0]; > +retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1374) Reduce memory footprint of FileSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao reassigned MAPREDUCE-1374: - Assignee: Zheng Shao > Reduce memory footprint of FileSplit > > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > -retVal[index++] = host.node.getName().split(":")[0]; > +retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1374) Reduce memory footprint of FileSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1374: -- Description: We can have many FileInput objects in the memory, depending on the number of mappers. It will save tons of memory on JobTracker and JobClient if we intern those Strings for host names. {code} FileInputFormat.java: for (NodeInfo host: hostList) { // Strip out the port number from the host name -retVal[index++] = host.node.getName().split(":")[0]; +retVal[index++] = host.node.getName().split(":")[0].intern(); if (index == replicationFactor) { done = true; break; } } {code} More on String.intern(): http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html It will also save a lot of memory by changing the class of {{file}} from {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally contains ~10 String fields. This will also be a huge saving. {code} private Path file; {code} was: We can have many FileInput objects in the memory, depending on the number of mappers. It will save tons of memory on JobTracker and JobClient if we intern those Strings for host names. {code} FileInputFormat.java: for (NodeInfo host: hostList) { // Strip out the port number from the host name -retVal[index++] = host.node.getName().split(":")[0]; +retVal[index++] = host.node.getName().split(":")[0].intern(); if (index == replicationFactor) { done = true; break; } } {code} More on String.intern(): http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html Summary: Reduce memory footprint of FileSplit (was: FileSplit.hosts should have the host names "intern"ed) > Reduce memory footprint of FileSplit > > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.1, 0.21.0, 0.22.0 >Reporter: Zheng Shao > > We can have many FileInput objects in the memory, depending on the number of > mappers. > It will save tons of memory on JobTracker and JobClient if we intern those > Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > -retVal[index++] = host.node.getName().split(":")[0]; > +retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): > http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from > {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally > contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1374) FileSplit.hosts should have the host names "intern"ed
FileSplit.hosts should have the host names "intern"ed - Key: MAPREDUCE-1374 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.20.1, 0.21.0, 0.22.0 Reporter: Zheng Shao We can have many FileInput objects in the memory, depending on the number of mappers. It will save tons of memory on JobTracker and JobClient if we intern those Strings for host names. {code} FileInputFormat.java: for (NodeInfo host: hostList) { // Strip out the port number from the host name -retVal[index++] = host.node.getName().split(":")[0]; +retVal[index++] = host.node.getName().split(":")[0].intern(); if (index == replicationFactor) { done = true; break; } } {code} More on String.intern(): http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Attachment: (was: MAPREDUCE-1302.branch-0.20.on.top.of.MAPREDUCE-1213.patch) > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch, MAPREDUCE-1302.4.patch, > MAPREDUCE-1302.5.patch, > MAPREDUCE-1302.branch-0.20.on.top.of.MAPREDUCE-1213.2.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Attachment: MAPREDUCE-1302.branch-0.20.on.top.of.MAPREDUCE-1213.2.patch > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch, MAPREDUCE-1302.4.patch, > MAPREDUCE-1302.5.patch, > MAPREDUCE-1302.branch-0.20.on.top.of.MAPREDUCE-1213.2.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Attachment: MAPREDUCE-1302.branch-0.20.on.top.of.MAPREDUCE-1213.patch This patch is for branch-0.20 (on top of MAPREDUCE-1213.branch-0.20.patch) > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch, MAPREDUCE-1302.4.patch, > MAPREDUCE-1302.5.patch, > MAPREDUCE-1302.branch-0.20.on.top.of.MAPREDUCE-1213.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: MAPREDUCE-1213.branch-0.20.patch Patch for 0.20. > TaskTrackers restart is very slow because it deletes distributed cache > directory synchronously > -- > > Key: MAPREDUCE-1213 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: dhruba borthakur >Assignee: Zheng Shao > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, > MAPREDUCE-1213.3.patch, MAPREDUCE-1213.4.patch, > MAPREDUCE-1213.branch-0.20.patch > > > We are seeing that when we restart a tasktracker, it tries to recursively > delete all the file in the distributed cache. It invoked > FileUtil.fullyDelete() which is very very slow. This means that the > TaskTracker cannot join the cluster for an extended period of time (upto 2 > hours for us). The problem is acute if the number of files in a distributed > cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Status: Patch Available (was: Open) > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch, MAPREDUCE-1302.4.patch, > MAPREDUCE-1302.5.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Status: Open (was: Patch Available) > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch, MAPREDUCE-1302.4.patch, > MAPREDUCE-1302.5.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Attachment: MAPREDUCE-1302.5.patch Merged with latest trunk. Vinod, can you take a look? > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch, MAPREDUCE-1302.4.patch, > MAPREDUCE-1302.5.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1186) While localizing a DistributedCache file, TT sets permissions recursively on the whole base-dir
[ https://issues.apache.org/jira/browse/MAPREDUCE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798781#action_12798781 ] Zheng Shao commented on MAPREDUCE-1186: --- Does this change mean that we cannot package a bunch of python scripts into a zip/jar file, and let hadoop unpack them and run it? > While localizing a DistributedCache file, TT sets permissions recursively on > the whole base-dir > --- > > Key: MAPREDUCE-1186 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1186 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.21.0 >Reporter: Vinod K V >Assignee: Amareshwari Sriramadasu > Fix For: 0.22.0 > > Attachments: patch-1186-1.txt, patch-1186-2.txt, > patch-1186-3-ydist.txt, patch-1186-3-ydist.txt, patch-1186-3.txt, > patch-1186-4.txt, patch-1186-5.txt, patch-1186-ydist.txt, > patch-1186-ydist.txt, patch-1186.txt > > > This is a performance problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797404#action_12797404 ] Zheng Shao commented on MAPREDUCE-1302: --- Scott, I do use the returned value of #getRelativePathName() after comparing it with null. So the other option is to have 2 functions: #isInVolume() and #getRelativePathName(). I prefer having only 1 function just for simplification. > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch, MAPREDUCE-1302.4.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Status: Open (was: Patch Available) > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch, MAPREDUCE-1302.4.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Status: Patch Available (was: Open) > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch, MAPREDUCE-1302.4.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Attachment: MAPREDUCE-1302.4.patch Modified according to Vinod's comments. I didn't change the test. I did verify the deletion in the makeSureCleanedUp() method. I will deprecate JobConf.deleteLocalFiles(subdir) in a follow-up jira. > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch, MAPREDUCE-1302.4.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Attachment: MAPREDUCE-1302.3.patch Renamed SUBDIR to TOBEDELETED to avoid confusion. > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Status: Open (was: Patch Available) > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Status: Patch Available (was: Open) > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch, MAPREDUCE-1302.3.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795205#action_12795205 ] Zheng Shao commented on MAPREDUCE-1302: --- MAPREDUCE-1141 is fixed by this patch. > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Status: Open (was: Patch Available) > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Status: Patch Available (was: Open) > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795143#action_12795143 ] Zheng Shao commented on MAPREDUCE-1302: --- The code didn't create a single task for toBeDeleted, but went through the toBeDeleted directory and create one task per each. The reason for that is: 1. This allows parallel deletion of the contents inside toBeDeleted 2. A single list call per volume shouldn't take too long 3. If we want to create a single task for toBeDeleted, then we need to rename it to something else, and recreate toBeDeleted, and then move the old one to be a sub directory inside the new toBeDeleted. This will introduce additional intermediate states that may be hard to recover from. > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Attachment: MAPREDUCE-1302.2.patch Added logic to remove the files inside toBeDeleted upon restart. > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch, > MAPREDUCE-1302.2.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794865#action_12794865 ] Zheng Shao commented on MAPREDUCE-1302: --- Good question. There is no special handling right now. I will list the directory and create one task per item returned. > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1270) Hadoop C++ Extention
[ https://issues.apache.org/jira/browse/MAPREDUCE-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794299#action_12794299 ] Zheng Shao commented on MAPREDUCE-1270: --- Any progress on this? > Hadoop C++ Extention > > > Key: MAPREDUCE-1270 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1270 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Affects Versions: 0.20.1 > Environment: hadoop linux >Reporter: Wang Shouyan > > Hadoop C++ extension is an internal project in baidu, We start it for these > reasons: >1 To provide C++ API. We mostly use Streaming before, and we also try to > use PIPES, but we do not find PIPES is more efficient than Streaming. So we > think a new C++ extention is needed for us. >2 Even using PIPES or Streaming, it is hard to control memory of hadoop > map/reduce Child JVM. >3 It costs so much to read/write/sort TB/PB data by Java. When using > PIPES or Streaming, pipe or socket is not efficient to carry so huge data. >What we want to do: >1 We do not use map/reduce Child JVM to do any data processing, which just > prepares environment, starts C++ mapper, tells mapper which split it should > deal with, and reads report from mapper until that finished. The mapper will > read record, ivoke user defined map, to do partition, write spill, combine > and merge into file.out. We think these operations can be done by C++ code. >2 Reducer is similar to mapper, it was started after sort finished, it > read from sorted files, ivoke user difined reduce, and write to user defined > record writer. >3 We also intend to rewrite shuffle and sort with C++, for efficience and > memory control. >at first, 1 and 2, then 3. >What's the difference with PIPES: >1 Yes, We will reuse most PIPES code. >2 And, We should do it more completely, nothing changed in scheduling and > management, but everything in execution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793314#action_12793314 ] Zheng Shao commented on MAPREDUCE-1302: --- The test errors are not related. The first one seems like random error - it didn't appear on the last hudson test of the same patch. The next 3 are common to all patches' test results. >>><<< >>>org.apache.hadoop.security.authorize.TestServiceLevelAuthorization.testServiceLevelAuthorization >>><<< org.apache.hadoop.streaming.TestStreamingExitStatus.testMapFailOk >>><<< org.apache.hadoop.streaming.TestStreamingExitStatus.testReduceFailOk >>><<< org.apache.hadoop.streaming.TestStreamingKeyValue.testCommandLine > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Status: Open (was: Patch Available) > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Status: Patch Available (was: Open) Transient errors in hudson. (user1 not found) Submitting again. > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Status: Open (was: Patch Available) > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Status: Patch Available (was: Open) > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Attachment: MAPREDUCE-1302.1.patch This patch is on top of MAPREDUCE-1213 which is already committed. > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch, MAPREDUCE-1302.1.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1303) Merge org.apache.hadoop.mapred.CleanupQueue with MRAsyncDiskService
Merge org.apache.hadoop.mapred.CleanupQueue with MRAsyncDiskService --- Key: MAPREDUCE-1303 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1303 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao org.apache.hadoop.mapred.CleanupQueue is very similar to MRAsyncDiskService. We should be able to simplify the codebase by merging it into MRAsyncDiskService. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Status: Patch Available (was: Open) > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1302: -- Attachment: MAPREDUCE-1302.0.patch This patch includes MAPREDUCE-1213. It's just for demo purpose. > TrackerDistributedCacheManager can delete file asynchronously > - > > Key: MAPREDUCE-1302 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zheng Shao >Assignee: Zheng Shao > Attachments: MAPREDUCE-1302.0.patch > > > With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to > delete files from distributed cache asynchronously. > That will help make task initialization faster, because task initialization > calls the code that localizes files into the cache and may delete some other > files. > The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1302) TrackerDistributedCacheManager can delete file asynchronously
TrackerDistributedCacheManager can delete file asynchronously - Key: MAPREDUCE-1302 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1302 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao With the help of AsyncDiskService from MAPREDUCE-1213, we should be able to delete files from distributed cache asynchronously. That will help make task initialization faster, because task initialization calls the code that localizes files into the cache and may delete some other files. The deletion can slow down the task initialization speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791236#action_12791236 ] Zheng Shao commented on MAPREDUCE-1213: --- The contrib tests failures do not seem to be related to this patch. I saw the same errors on hudson in the results of other patches. > TaskTrackers restart is very slow because it deletes distributed cache > directory synchronously > -- > > Key: MAPREDUCE-1213 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: dhruba borthakur >Assignee: Zheng Shao > Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, > MAPREDUCE-1213.3.patch, MAPREDUCE-1213.4.patch > > > We are seeing that when we restart a tasktracker, it tries to recursively > delete all the file in the distributed cache. It invoked > FileUtil.fullyDelete() which is very very slow. This means that the > TaskTracker cannot join the cluster for an extended period of time (upto 2 > hours for us). The problem is acute if the number of files in a distributed > cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Status: Open (was: Patch Available) > TaskTrackers restart is very slow because it deletes distributed cache > directory synchronously > -- > > Key: MAPREDUCE-1213 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: dhruba borthakur >Assignee: Zheng Shao > Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, > MAPREDUCE-1213.3.patch, MAPREDUCE-1213.4.patch > > > We are seeing that when we restart a tasktracker, it tries to recursively > delete all the file in the distributed cache. It invoked > FileUtil.fullyDelete() which is very very slow. This means that the > TaskTracker cannot join the cluster for an extended period of time (upto 2 > hours for us). The problem is acute if the number of files in a distributed > cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Status: Patch Available (was: Open) > TaskTrackers restart is very slow because it deletes distributed cache > directory synchronously > -- > > Key: MAPREDUCE-1213 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: dhruba borthakur >Assignee: Zheng Shao > Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, > MAPREDUCE-1213.3.patch, MAPREDUCE-1213.4.patch > > > We are seeing that when we restart a tasktracker, it tries to recursively > delete all the file in the distributed cache. It invoked > FileUtil.fullyDelete() which is very very slow. This means that the > TaskTracker cannot join the cluster for an extended period of time (upto 2 > hours for us). The problem is acute if the number of files in a distributed > cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: MAPREDUCE-1213.4.patch Changed function name to moveAndDeleteFromEachVolume. AsyncDelete may have a different meaning - users might still see the files when the function returns. This code actually moves the file first. > TaskTrackers restart is very slow because it deletes distributed cache > directory synchronously > -- > > Key: MAPREDUCE-1213 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: dhruba borthakur >Assignee: Zheng Shao > Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, > MAPREDUCE-1213.3.patch, MAPREDUCE-1213.4.patch > > > We are seeing that when we restart a tasktracker, it tries to recursively > delete all the file in the distributed cache. It invoked > FileUtil.fullyDelete() which is very very slow. This means that the > TaskTracker cannot join the cluster for an extended period of time (upto 2 > hours for us). The problem is acute if the number of files in a distributed > cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: MAPREDUCE-1213.3.patch > TaskTrackers restart is very slow because it deletes distributed cache > directory synchronously > -- > > Key: MAPREDUCE-1213 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: dhruba borthakur >Assignee: Zheng Shao > Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, > MAPREDUCE-1213.3.patch > > > We are seeing that when we restart a tasktracker, it tries to recursively > delete all the file in the distributed cache. It invoked > FileUtil.fullyDelete() which is very very slow. This means that the > TaskTracker cannot join the cluster for an extended period of time (upto 2 > hours for us). The problem is acute if the number of files in a distributed > cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: (was: MAPREDUCE-1213.3.patch) > TaskTrackers restart is very slow because it deletes distributed cache > directory synchronously > -- > > Key: MAPREDUCE-1213 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: dhruba borthakur >Assignee: Zheng Shao > Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch > > > We are seeing that when we restart a tasktracker, it tries to recursively > delete all the file in the distributed cache. It invoked > FileUtil.fullyDelete() which is very very slow. This means that the > TaskTracker cannot join the cluster for an extended period of time (upto 2 > hours for us). The problem is acute if the number of files in a distributed > cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: MAPREDUCE-1213.3.patch > TaskTrackers restart is very slow because it deletes distributed cache > directory synchronously > -- > > Key: MAPREDUCE-1213 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: dhruba borthakur >Assignee: Zheng Shao > Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch > > > We are seeing that when we restart a tasktracker, it tries to recursively > delete all the file in the distributed cache. It invoked > FileUtil.fullyDelete() which is very very slow. This means that the > TaskTracker cannot join the cluster for an extended period of time (upto 2 > hours for us). The problem is acute if the number of files in a distributed > cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: (was: MAPREDUCE-1213.3.patch) > TaskTrackers restart is very slow because it deletes distributed cache > directory synchronously > -- > > Key: MAPREDUCE-1213 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: dhruba borthakur >Assignee: Zheng Shao > Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch > > > We are seeing that when we restart a tasktracker, it tries to recursively > delete all the file in the distributed cache. It invoked > FileUtil.fullyDelete() which is very very slow. This means that the > TaskTracker cannot join the cluster for an extended period of time (upto 2 > hours for us). The problem is acute if the number of files in a distributed > cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Status: Patch Available (was: Open) > TaskTrackers restart is very slow because it deletes distributed cache > directory synchronously > -- > > Key: MAPREDUCE-1213 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: dhruba borthakur >Assignee: Zheng Shao > Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, > MAPREDUCE-1213.3.patch > > > We are seeing that when we restart a tasktracker, it tries to recursively > delete all the file in the distributed cache. It invoked > FileUtil.fullyDelete() which is very very slow. This means that the > TaskTracker cannot join the cluster for an extended period of time (upto 2 > hours for us). The problem is acute if the number of files in a distributed > cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously
[ https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated MAPREDUCE-1213: -- Attachment: MAPREDUCE-1213.3.patch This one uses the AsyncDiskService from common. > TaskTrackers restart is very slow because it deletes distributed cache > directory synchronously > -- > > Key: MAPREDUCE-1213 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: dhruba borthakur >Assignee: Zheng Shao > Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch, > MAPREDUCE-1213.3.patch > > > We are seeing that when we restart a tasktracker, it tries to recursively > delete all the file in the distributed cache. It invoked > FileUtil.fullyDelete() which is very very slow. This means that the > TaskTracker cannot join the cluster for an extended period of time (upto 2 > hours for us). The problem is acute if the number of files in a distributed > cache is a few-thousands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.