[jira] [Commented] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057637#comment-13057637 ] Harsh J commented on MAPREDUCE-2632: Thanks for clarifying Ravi :) Avoid calling the partitioner when the numReduceTasks is 1. --- Key: MAPREDUCE-2632 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.23.0 Reporter: Ravi Teja Ch N V We can avoid the call to the partitioner when the number of reducers is 1.This will avoid the unnecessary computations by the partitioner. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Teja Ch N V updated MAPREDUCE-2632: Status: Patch Available (was: Open) Avoid calling the partitioner when the numReduceTasks is 1. --- Key: MAPREDUCE-2632 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.23.0 Reporter: Ravi Teja Ch N V Attachments: MAPREDUCE-2632.patch We can avoid the call to the partitioner when the number of reducers is 1.This will avoid the unnecessary computations by the partitioner. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Teja Ch N V updated MAPREDUCE-2632: Attachment: MAPREDUCE-2632.patch Avoid calling the partitioner when the numReduceTasks is 1. --- Key: MAPREDUCE-2632 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.23.0 Reporter: Ravi Teja Ch N V Attachments: MAPREDUCE-2632.patch We can avoid the call to the partitioner when the number of reducers is 1.This will avoid the unnecessary computations by the partitioner. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057820#comment-13057820 ] Hadoop QA commented on MAPREDUCE-2632: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12484759/MAPREDUCE-2632.patch against trunk revision 1140942. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestMRCLI org.apache.hadoop.fs.TestFileSystem -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/433//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/433//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/433//console This message is automatically generated. Avoid calling the partitioner when the numReduceTasks is 1. --- Key: MAPREDUCE-2632 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.23.0 Reporter: Ravi Teja Ch N V Attachments: MAPREDUCE-2632.patch We can avoid the call to the partitioner when the number of reducers is 1.This will avoid the unnecessary computations by the partitioner. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2629) Class loading quirk prevents inner class method compilation
[ https://issues.apache.org/jira/browse/MAPREDUCE-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057856#comment-13057856 ] Eric Caspole commented on MAPREDUCE-2629: - On my test system I see about 2% cpu time in that method, which is removed with various versions of the patch. But since I spend a lot of time in I/O wait I cannot see an improvement in the wall clock run time of, say, a 8gb terasort. I think this patch can be improved, I'll update it after I get back from vacation. Class loading quirk prevents inner class method compilation --- Key: MAPREDUCE-2629 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2629 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 0.21.0, 0.22.0 Reporter: Eric Caspole Priority: Minor Attachments: MAPREDUCE-2629.patch Original Estimate: 24h Remaining Estimate: 24h While profiling jobs like terasort and gridmix, I noticed that a method org.apache.hadoop.mapreduce.task.ReduceContextImpl.access $000 is near the top. It turns out that this is because the ReduceContextImpl class has a member backupStore which is accessed from an inner class ReduceContextImpl$ValueIterator. Due to the way synthetic accessor methods work, every access of backupStore results in a call to access$000 to the outer class. For some portion of the run, backupStore is null and the BackupStore class has never been loaded by the reducer. Due to the way the Hotspot JVM inliner works, by default it will not inline a short method where the class of of the return value object is unloaded - if you use a debug JVM with -XX:+PrintCompilation you will see a failure reason message like unloaded signature classes. This causes every call to ReduceContextImpl.access$000 to be executed in the interpreter for the handful of bytecodes to return the null backupStore. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057857#comment-13057857 ] Ravi Teja Ch N V commented on MAPREDUCE-2632: - The failing tests are not related to this patch. Avoid calling the partitioner when the numReduceTasks is 1. --- Key: MAPREDUCE-2632 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.23.0 Reporter: Ravi Teja Ch N V Attachments: MAPREDUCE-2632.patch We can avoid the call to the partitioner when the number of reducers is 1.This will avoid the unnecessary computations by the partitioner. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2633) MR-279: Add a getCounter(Enum) method to the Counters interface
MR-279: Add a getCounter(Enum) method to the Counters interface --- Key: MAPREDUCE-2633 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2633 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Environment: All Reporter: Josh Wills Priority: Minor I'm fixing a few TODOs I came across in TaskAttemptImpl.java related to the fact that the MRv2 Counters interface don't expose a getCounter(Enum) method for accessing a Counter using the enum's class as the group name and the enum's value as the name of the counter. Will add the patch momentarily. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2633) MR-279: Add a getCounter(Enum) method to the Counters interface
[ https://issues.apache.org/jira/browse/MAPREDUCE-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Wills updated MAPREDUCE-2633: -- Attachment: MAPREDUCE-2633.patch MR-279: Add a getCounter(Enum) method to the Counters interface --- Key: MAPREDUCE-2633 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2633 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Environment: All Reporter: Josh Wills Priority: Minor Attachments: MAPREDUCE-2633.patch I'm fixing a few TODOs I came across in TaskAttemptImpl.java related to the fact that the MRv2 Counters interface don't expose a getCounter(Enum) method for accessing a Counter using the enum's class as the group name and the enum's value as the name of the counter. Will add the patch momentarily. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2243) Close all the file streams propely in a finally block to avoid their leakage.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057955#comment-13057955 ] Eli Collins commented on MAPREDUCE-2243: fyi HADOOP-7428 is a case where the RTE is relevant. Close all the file streams propely in a finally block to avoid their leakage. - Key: MAPREDUCE-2243 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2243 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker, tasktracker Affects Versions: 0.20.1, 0.22.0, 0.23.0 Environment: NA Reporter: Bhallamudi Venkata Siva Kamesh Assignee: Devaraj K Priority: Minor Attachments: MAPREDUCE-2243.patch Original Estimate: 72h Remaining Estimate: 72h In the following classes streams should be closed in finally block to avoid their leakage in the exceptional cases. CompletedJobStatusStore.java -- dataOut.writeInt(events.length); for (TaskCompletionEvent event : events) { event.write(dataOut); } dataOut.close() ; EventWriter.java -- encoder.flush(); out.close(); MapTask.java --- splitMetaInfo.write(out); out.close(); TaskLog 1) str = fis.readLine(); fis.close(); 2) dos.writeBytes(Long.toString(new File(logLocation, LogName.SYSLOG .toString()).length() - prevLogLength) + \n); dos.close(); TotalOrderPartitioner.java --- while (reader.next(key, value)) { parts.add(key); key = ReflectionUtils.newInstance(keyClass, conf); } reader.close(); -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2629) Class loading quirk prevents inner class method compilation
[ https://issues.apache.org/jira/browse/MAPREDUCE-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057956#comment-13057956 ] Todd Lipcon commented on MAPREDUCE-2629: OK. I'm happy to +1 as is, but if you think there's an improvement, I'll hold off. Enjoy your vacation. Class loading quirk prevents inner class method compilation --- Key: MAPREDUCE-2629 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2629 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Affects Versions: 0.21.0, 0.22.0 Reporter: Eric Caspole Priority: Minor Attachments: MAPREDUCE-2629.patch Original Estimate: 24h Remaining Estimate: 24h While profiling jobs like terasort and gridmix, I noticed that a method org.apache.hadoop.mapreduce.task.ReduceContextImpl.access $000 is near the top. It turns out that this is because the ReduceContextImpl class has a member backupStore which is accessed from an inner class ReduceContextImpl$ValueIterator. Due to the way synthetic accessor methods work, every access of backupStore results in a call to access$000 to the outer class. For some portion of the run, backupStore is null and the BackupStore class has never been loaded by the reducer. Due to the way the Hotspot JVM inliner works, by default it will not inline a short method where the class of of the return value object is unloaded - if you use a debug JVM with -XX:+PrintCompilation you will see a failure reason message like unloaded signature classes. This causes every call to ReduceContextImpl.access$000 to be executed in the interpreter for the handful of bytecodes to return the null backupStore. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-279) Map-Reduce 2.0
[ https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058060#comment-13058060 ] Giridharan Kesavan commented on MAPREDUCE-279: -- Nigel/Arun, I can help setup a build on MR-279 Map-Reduce 2.0 -- Key: MAPREDUCE-279 URL: https://issues.apache.org/jira/browse/MAPREDUCE-279 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 0.23.0 Attachments: MR-279.patch, MR-279.patch, MR-279.sh, MR-279_MR_files_to_move.txt, capacity-scheduler-dark-theme.png, multi-column-stable-sort-default-theme.png, yarn-state-machine.job.dot, yarn-state-machine.job.png, yarn-state-machine.task-attempt.dot, yarn-state-machine.task-attempt.png, yarn-state-machine.task.dot, yarn-state-machine.task.png Re-factor MapReduce into a generic resource scheduler and a per-job, user-defined component that manages the application execution. Check it out by following [the instructions|http://goo.gl/rSJJC]. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2627) guava-r09 JAR file needs to be added to mapreduce.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058174#comment-13058174 ] Luke Lu commented on MAPREDUCE-2627: Ah, I see the problem is in the branch-0.22. The problem is introduced by merging HDFS-941 from trunk, which pulls in guava dependency in hdfs and forgot to update ivy/hadoop-hdfs-template.xml, since trunk has proper dependency in hadoop-common-template.xml, which would suffice for trunk mapreduce. guava-r09 JAR file needs to be added to mapreduce. -- Key: MAPREDUCE-2627 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2627 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.22.0 Reporter: Plamen Jeliazkov Priority: Trivial Original Estimate: 24h Remaining Estimate: 24h Need to add the guava-r09.jar file into the mapreduce/build/ivy/lib/Hadoop/common directory; missing from build. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2634) MapReduce Performance Improvements using forced heartbeat
MapReduce Performance Improvements using forced heartbeat -- Key: MAPREDUCE-2634 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Abhijit Suresh Shingate Priority: Minor Following are the proposals which would cause some performance optimizations over MapReduce 1.Notify TaskTracker to send heartbeat when a new Job is submitted a) Presently when new Job is submitted to JobTracker, the tasks are assigned to TaskTracker only when the TaskTracker sends heartbeat to JobTracker b) Proposal: (1). JobTracker will notify all TaskTrackers to send heartbeat to JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of the new Job can be immediately assigned to all TaskTrackers. 2. Execute Job Setup and Cleanup on JobTracker JVM a) Presently Job Setup and Cleanup is carried out as a separated task on TaskTracker b) Launching a new JVM for Setup and Cleanup of the Job introduces some amount of overhead. It takes generally about 0.7 - 1.5 seconds. c) Proposal: (1). JobTracker will execute the Job Setup and Cleanup tasks on the JobTracker JVM only. 3. Request TaskTracker to send heartbeat when the Map Task is completed. a) Presently TaskTracker reports status of completed Map Tasks as part of heartbeat at a regular interval. b) Proposal: (1). Map Task requests TaskTracker to send heartbeat to JobTracker when Map Task is completed. So that Reduce task can quickly know which map task is finished and copy map outputs to local. 4. Request JobTracker to trigger committing of Reduce output when Reduce Task has finished. a) Presently JobTracker will ask the Reduce Task to commit its output to HDFS through heartbeat response. b) Proposal: (1). Reduce Task requests TaskTracker to send heartbeat to JobTracker whenever Reduce Task is completed. These optimizations might work on small clusters but on big clusters it may be overhead. Please let us know your views. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2634) MapReduce Performance Improvements using forced heartbeat
[ https://issues.apache.org/jira/browse/MAPREDUCE-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhijit Suresh Shingate updated MAPREDUCE-2634: --- Description: Following are the proposals which would cause some performance optimizations over MapReduce *1.Notify TaskTracker to send heartbeat when a new Job is submitted* a) Presently when new Job is submitted to JobTracker, the tasks are assigned to TaskTracker only when the TaskTracker sends heartbeat to JobTracker b) Proposal: - JobTracker will notify all TaskTrackers to send heartbeat to JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of the new Job can be immediately assigned to all TaskTrackers. *2. Execute Job Setup and Cleanup on JobTracker JVM* a) Presently Job Setup and Cleanup is carried out as a separated task on TaskTracker b) Launching a new JVM for Setup and Cleanup of the Job introduces some amount of overhead. It takes generally about 0.7 - 1.5 seconds. c) Proposal: - JobTracker will execute the Job Setup and Cleanup tasks on the JobTracker JVM only. *3. Request TaskTracker to send heartbeat when the Map Task is completed.* a) Presently TaskTracker reports status of completed Map Tasks as part of heartbeat at a regular interval. b) Proposal: - Map Task requests TaskTracker to send heartbeat to JobTracker when Map Task is completed. So that Reduce task can quickly know which map task is finished and copy map outputs to local. *4. Request JobTracker to trigger committing of Reduce output when Reduce Task has finished. * a) Presently JobTracker will ask the Reduce Task to commit its output to HDFS through heartbeat response. b) Proposal: - Reduce Task requests TaskTracker to send heartbeat to JobTracker whenever Reduce Task is completed. These optimizations might work on small clusters but on big clusters it may be overhead. Please let us know your views. was: Following are the proposals which would cause some performance optimizations over MapReduce 1.Notify TaskTracker to send heartbeat when a new Job is submitted a) Presently when new Job is submitted to JobTracker, the tasks are assigned to TaskTracker only when the TaskTracker sends heartbeat to JobTracker b) Proposal: (1). JobTracker will notify all TaskTrackers to send heartbeat to JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of the new Job can be immediately assigned to all TaskTrackers. 2. Execute Job Setup and Cleanup on JobTracker JVM a) Presently Job Setup and Cleanup is carried out as a separated task on TaskTracker b) Launching a new JVM for Setup and Cleanup of the Job introduces some amount of overhead. It takes generally about 0.7 - 1.5 seconds. c) Proposal: (1). JobTracker will execute the Job Setup and Cleanup tasks on the JobTracker JVM only. 3. Request TaskTracker to send heartbeat when the Map Task is completed. a) Presently TaskTracker reports status of completed Map Tasks as part of heartbeat at a regular interval. b) Proposal: (1). Map Task requests TaskTracker to send heartbeat to JobTracker when Map Task is completed. So that Reduce task can quickly know which map task is finished and copy map outputs to local. 4. Request JobTracker to trigger committing of Reduce output when Reduce Task has finished. a) Presently JobTracker will ask the Reduce Task to commit its output to HDFS through heartbeat response. b) Proposal: (1). Reduce Task requests TaskTracker to send heartbeat to JobTracker whenever Reduce Task is completed. These optimizations might work on small clusters but on big clusters it may be overhead. Please let us know your views. MapReduce Performance Improvements using forced heartbeat -- Key: MAPREDUCE-2634 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Abhijit Suresh Shingate Priority: Minor Original Estimate: 168h Remaining Estimate: 168h Following are the proposals which would cause some performance optimizations over MapReduce *1.Notify TaskTracker to send heartbeat when a new Job is submitted* a) Presently when new Job is submitted to JobTracker, the tasks are assigned to TaskTracker only when the TaskTracker sends heartbeat to JobTracker b) Proposal: - JobTracker will notify all TaskTrackers to send heartbeat to JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of the new Job can be immediately assigned to all TaskTrackers. *2. Execute Job Setup and Cleanup on JobTracker JVM* a) Presently Job Setup and Cleanup is carried out as a separated task on TaskTracker b) Launching a new JVM for Setup and
[jira] [Updated] (MAPREDUCE-2634) MapReduce Performance Improvements using forced heartbeat
[ https://issues.apache.org/jira/browse/MAPREDUCE-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhijit Suresh Shingate updated MAPREDUCE-2634: --- Description: Following are the proposals which would cause some performance optimizations over MapReduce *1.Notify TaskTracker to send heartbeat when a new Job is submitted* a) Presently when new Job is submitted to JobTracker, the tasks are assigned to TaskTracker only when the TaskTracker sends heartbeat to JobTracker b) Proposal: - JobTracker will notify all TaskTrackers to send heartbeat to JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of the new Job can be immediately assigned to all TaskTrackers. *2. Execute Job Setup and Cleanup on JobTracker JVM* a) Presently Job Setup and Cleanup is carried out as a separated task on TaskTracker b) Launching a new JVM for Setup and Cleanup of the Job introduces some amount of overhead. It takes generally about 0.7 - 1.5 seconds. c) Proposal: - JobTracker will execute the Job Setup and Cleanup tasks on the JobTracker JVM only. *3. Request TaskTracker to send heartbeat when the Map Task is completed.* a) Presently TaskTracker reports status of completed Map Tasks as part of heartbeat at a regular interval. b) Proposal: - Map Task requests TaskTracker to send heartbeat to JobTracker when Map Task is completed. So that Reduce task can quickly know which map task is finished and copy map outputs to local. *4. Request JobTracker to trigger committing of Reduce output when Reduce Task has finished.* a) Presently JobTracker will ask the Reduce Task to commit its output to HDFS through heartbeat response. b) Proposal: - Reduce Task requests TaskTracker to send heartbeat to JobTracker whenever Reduce Task is completed. These optimizations might work on small clusters but on big clusters it may be overhead. Please let us know your views. was: Following are the proposals which would cause some performance optimizations over MapReduce *1.Notify TaskTracker to send heartbeat when a new Job is submitted* a) Presently when new Job is submitted to JobTracker, the tasks are assigned to TaskTracker only when the TaskTracker sends heartbeat to JobTracker b) Proposal: - JobTracker will notify all TaskTrackers to send heartbeat to JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of the new Job can be immediately assigned to all TaskTrackers. *2. Execute Job Setup and Cleanup on JobTracker JVM* a) Presently Job Setup and Cleanup is carried out as a separated task on TaskTracker b) Launching a new JVM for Setup and Cleanup of the Job introduces some amount of overhead. It takes generally about 0.7 - 1.5 seconds. c) Proposal: - JobTracker will execute the Job Setup and Cleanup tasks on the JobTracker JVM only. *3. Request TaskTracker to send heartbeat when the Map Task is completed.* a) Presently TaskTracker reports status of completed Map Tasks as part of heartbeat at a regular interval. b) Proposal: - Map Task requests TaskTracker to send heartbeat to JobTracker when Map Task is completed. So that Reduce task can quickly know which map task is finished and copy map outputs to local. *4. Request JobTracker to trigger committing of Reduce output when Reduce Task has finished. * a) Presently JobTracker will ask the Reduce Task to commit its output to HDFS through heartbeat response. b) Proposal: - Reduce Task requests TaskTracker to send heartbeat to JobTracker whenever Reduce Task is completed. These optimizations might work on small clusters but on big clusters it may be overhead. Please let us know your views. MapReduce Performance Improvements using forced heartbeat -- Key: MAPREDUCE-2634 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Abhijit Suresh Shingate Priority: Minor Original Estimate: 168h Remaining Estimate: 168h Following are the proposals which would cause some performance optimizations over MapReduce *1.Notify TaskTracker to send heartbeat when a new Job is submitted* a) Presently when new Job is submitted to JobTracker, the tasks are assigned to TaskTracker only when the TaskTracker sends heartbeat to JobTracker b) Proposal: - JobTracker will notify all TaskTrackers to send heartbeat to JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of the new Job can be immediately assigned to all TaskTrackers. *2. Execute Job Setup and Cleanup on JobTracker JVM* a) Presently Job Setup and Cleanup is carried out as a separated task on TaskTracker b) Launching a new JVM
[jira] [Commented] (MAPREDUCE-2634) MapReduce Performance Improvements using forced heartbeat
[ https://issues.apache.org/jira/browse/MAPREDUCE-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058203#comment-13058203 ] Todd Lipcon commented on MAPREDUCE-2634: Proposal #1 seems like an interested idea, but I'm skeptical that it will make a big difference, since we've already lowered the minimum heartbeat interval to 300ms in MAPREDUCE-1906. Proposal #2 seems scary since setup and cleanup may run user code, and running user code in the JobTracker JVM is insecure. Piggybacking those with other map tasks, though, is probably a good idea (for some reason I don't think we do this with JVM reuse today) Your proposal #3 and #4 is already implemented by MAPREDUCE-270 if I understand you correctly. MapReduce Performance Improvements using forced heartbeat -- Key: MAPREDUCE-2634 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Abhijit Suresh Shingate Priority: Minor Original Estimate: 168h Remaining Estimate: 168h Following are the proposals which would cause some performance optimizations over MapReduce *1.Notify TaskTracker to send heartbeat when a new Job is submitted* a) Presently when new Job is submitted to JobTracker, the tasks are assigned to TaskTracker only when the TaskTracker sends heartbeat to JobTracker b) Proposal: - JobTracker will notify all TaskTrackers to send heartbeat to JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of the new Job can be immediately assigned to all TaskTrackers. *2. Execute Job Setup and Cleanup on JobTracker JVM* a) Presently Job Setup and Cleanup is carried out as a separated task on TaskTracker b) Launching a new JVM for Setup and Cleanup of the Job introduces some amount of overhead. It takes generally about 0.7 - 1.5 seconds. c) Proposal: - JobTracker will execute the Job Setup and Cleanup tasks on the JobTracker JVM only. *3. Request TaskTracker to send heartbeat when the Map Task is completed.* a) Presently TaskTracker reports status of completed Map Tasks as part of heartbeat at a regular interval. b) Proposal: - Map Task requests TaskTracker to send heartbeat to JobTracker when Map Task is completed. So that Reduce task can quickly know which map task is finished and copy map outputs to local. *4. Request JobTracker to trigger committing of Reduce output when Reduce Task has finished.* a) Presently JobTracker will ask the Reduce Task to commit its output to HDFS through heartbeat response. b) Proposal: - Reduce Task requests TaskTracker to send heartbeat to JobTracker whenever Reduce Task is completed. These optimizations might work on small clusters but on big clusters it may be overhead. Please let us know your views. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2634) MapReduce Performance Improvements using forced heartbeat
[ https://issues.apache.org/jira/browse/MAPREDUCE-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058213#comment-13058213 ] M. C. Srivas commented on MAPREDUCE-2634: - As Todd says, proposal #3 should remain as a task to be scheduled on a node like any other task. The setup/cleanup might consume enormous resources and/or take a long time to run. MapReduce Performance Improvements using forced heartbeat -- Key: MAPREDUCE-2634 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Abhijit Suresh Shingate Priority: Minor Original Estimate: 168h Remaining Estimate: 168h Following are the proposals which would cause some performance optimizations over MapReduce *1.Notify TaskTracker to send heartbeat when a new Job is submitted* a) Presently when new Job is submitted to JobTracker, the tasks are assigned to TaskTracker only when the TaskTracker sends heartbeat to JobTracker b) Proposal: - JobTracker will notify all TaskTrackers to send heartbeat to JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of the new Job can be immediately assigned to all TaskTrackers. *2. Execute Job Setup and Cleanup on JobTracker JVM* a) Presently Job Setup and Cleanup is carried out as a separated task on TaskTracker b) Launching a new JVM for Setup and Cleanup of the Job introduces some amount of overhead. It takes generally about 0.7 - 1.5 seconds. c) Proposal: - JobTracker will execute the Job Setup and Cleanup tasks on the JobTracker JVM only. *3. Request TaskTracker to send heartbeat when the Map Task is completed.* a) Presently TaskTracker reports status of completed Map Tasks as part of heartbeat at a regular interval. b) Proposal: - Map Task requests TaskTracker to send heartbeat to JobTracker when Map Task is completed. So that Reduce task can quickly know which map task is finished and copy map outputs to local. *4. Request JobTracker to trigger committing of Reduce output when Reduce Task has finished.* a) Presently JobTracker will ask the Reduce Task to commit its output to HDFS through heartbeat response. b) Proposal: - Reduce Task requests TaskTracker to send heartbeat to JobTracker whenever Reduce Task is completed. These optimizations might work on small clusters but on big clusters it may be overhead. Please let us know your views. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2634) MapReduce Performance Improvements using forced heartbeat
[ https://issues.apache.org/jira/browse/MAPREDUCE-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058214#comment-13058214 ] Owen O'Malley commented on MAPREDUCE-2634: -- Proposal #1 breaks the architecture by making a loop in the communication graph. That means you need to work very hard to ensure there aren't any distributed deadlocks. (We also need some way to minimize the chance that someone introduces one later when working on the code.) I don't see how you'll get enough performance out of that to risk the potential for deadlocks. Proposal #2 violates the security model of never running user code in the servers. That is pretty much a non-starter since it would require security managers in all of the servers, which would impose a huge performance degradation based on everything I've read about them. Proposal #3 has been there for a long time. If proposal #4 isn't already done, we should do it. A fast heartbeat for committing the output makes sense. MapReduce Performance Improvements using forced heartbeat -- Key: MAPREDUCE-2634 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Abhijit Suresh Shingate Priority: Minor Original Estimate: 168h Remaining Estimate: 168h Following are the proposals which would cause some performance optimizations over MapReduce *1.Notify TaskTracker to send heartbeat when a new Job is submitted* a) Presently when new Job is submitted to JobTracker, the tasks are assigned to TaskTracker only when the TaskTracker sends heartbeat to JobTracker b) Proposal: - JobTracker will notify all TaskTrackers to send heartbeat to JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of the new Job can be immediately assigned to all TaskTrackers. *2. Execute Job Setup and Cleanup on JobTracker JVM* a) Presently Job Setup and Cleanup is carried out as a separated task on TaskTracker b) Launching a new JVM for Setup and Cleanup of the Job introduces some amount of overhead. It takes generally about 0.7 - 1.5 seconds. c) Proposal: - JobTracker will execute the Job Setup and Cleanup tasks on the JobTracker JVM only. *3. Request TaskTracker to send heartbeat when the Map Task is completed.* a) Presently TaskTracker reports status of completed Map Tasks as part of heartbeat at a regular interval. b) Proposal: - Map Task requests TaskTracker to send heartbeat to JobTracker when Map Task is completed. So that Reduce task can quickly know which map task is finished and copy map outputs to local. *4. Request JobTracker to trigger committing of Reduce output when Reduce Task has finished.* a) Presently JobTracker will ask the Reduce Task to commit its output to HDFS through heartbeat response. b) Proposal: - Reduce Task requests TaskTracker to send heartbeat to JobTracker whenever Reduce Task is completed. These optimizations might work on small clusters but on big clusters it may be overhead. Please let us know your views. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira