[jira] [Commented] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.

2011-06-30 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057637#comment-13057637
 ] 

Harsh J commented on MAPREDUCE-2632:


Thanks for clarifying Ravi :)

 Avoid calling the partitioner when the numReduceTasks is 1.
 ---

 Key: MAPREDUCE-2632
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.23.0
Reporter: Ravi Teja Ch N V

 We can avoid the call to the partitioner when the number of reducers is 
 1.This will avoid the unnecessary computations by the partitioner.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.

2011-06-30 Thread Ravi Teja Ch N V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Teja Ch N V updated MAPREDUCE-2632:


Status: Patch Available  (was: Open)

 Avoid calling the partitioner when the numReduceTasks is 1.
 ---

 Key: MAPREDUCE-2632
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.23.0
Reporter: Ravi Teja Ch N V
 Attachments: MAPREDUCE-2632.patch


 We can avoid the call to the partitioner when the number of reducers is 
 1.This will avoid the unnecessary computations by the partitioner.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.

2011-06-30 Thread Ravi Teja Ch N V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Teja Ch N V updated MAPREDUCE-2632:


Attachment: MAPREDUCE-2632.patch

 Avoid calling the partitioner when the numReduceTasks is 1.
 ---

 Key: MAPREDUCE-2632
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.23.0
Reporter: Ravi Teja Ch N V
 Attachments: MAPREDUCE-2632.patch


 We can avoid the call to the partitioner when the number of reducers is 
 1.This will avoid the unnecessary computations by the partitioner.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.

2011-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057820#comment-13057820
 ] 

Hadoop QA commented on MAPREDUCE-2632:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12484759/MAPREDUCE-2632.patch
  against trunk revision 1140942.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.cli.TestMRCLI
  org.apache.hadoop.fs.TestFileSystem

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/433//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/433//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/433//console

This message is automatically generated.

 Avoid calling the partitioner when the numReduceTasks is 1.
 ---

 Key: MAPREDUCE-2632
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.23.0
Reporter: Ravi Teja Ch N V
 Attachments: MAPREDUCE-2632.patch


 We can avoid the call to the partitioner when the number of reducers is 
 1.This will avoid the unnecessary computations by the partitioner.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2629) Class loading quirk prevents inner class method compilation

2011-06-30 Thread Eric Caspole (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057856#comment-13057856
 ] 

Eric Caspole commented on MAPREDUCE-2629:
-

On my test system I see about 2% cpu time in that method, which is removed with 
various versions of the patch. But since I spend a lot of time in I/O wait I 
cannot see an improvement in the wall clock run time of, say, a 8gb terasort. 

I think this patch can be improved, I'll update it after I get back from 
vacation.

 Class loading quirk prevents inner class method compilation
 ---

 Key: MAPREDUCE-2629
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2629
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 0.21.0, 0.22.0
Reporter: Eric Caspole
Priority: Minor
 Attachments: MAPREDUCE-2629.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 While profiling jobs like terasort and gridmix, I noticed that a
 method org.apache.hadoop.mapreduce.task.ReduceContextImpl.access
 $000 is near the top. It turns out that this is because the
 ReduceContextImpl class has a member backupStore which is accessed
 from an inner class ReduceContextImpl$ValueIterator. Due to the way
 synthetic accessor methods work, every access of backupStore results
 in a call to access$000 to the outer class. For some portion of the
 run, backupStore is null and the BackupStore class has never been
 loaded by the reducer.
 Due to the way the Hotspot JVM inliner works, by default it will not
 inline a short method where the class of of the return value object
 is unloaded - if you use a debug JVM with -XX:+PrintCompilation you
 will see a failure reason message like unloaded signature classes.
 This causes every call to ReduceContextImpl.access$000 to be executed
 in the interpreter for the handful of bytecodes to return the null
 backupStore.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.

2011-06-30 Thread Ravi Teja Ch N V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057857#comment-13057857
 ] 

Ravi Teja Ch N V commented on MAPREDUCE-2632:
-

The failing tests are not related to this patch.

 Avoid calling the partitioner when the numReduceTasks is 1.
 ---

 Key: MAPREDUCE-2632
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.23.0
Reporter: Ravi Teja Ch N V
 Attachments: MAPREDUCE-2632.patch


 We can avoid the call to the partitioner when the number of reducers is 
 1.This will avoid the unnecessary computations by the partitioner.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2633) MR-279: Add a getCounter(Enum) method to the Counters interface

2011-06-30 Thread Josh Wills (JIRA)
MR-279: Add a getCounter(Enum) method to the Counters interface
---

 Key: MAPREDUCE-2633
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2633
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
 Environment: All
Reporter: Josh Wills
Priority: Minor


I'm fixing a few TODOs I came across in TaskAttemptImpl.java related to the 
fact that the MRv2 Counters interface don't expose a getCounter(Enum) method 
for accessing a Counter using the enum's class as the group name and the enum's 
value as the name of the counter.

Will add the patch momentarily.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2633) MR-279: Add a getCounter(Enum) method to the Counters interface

2011-06-30 Thread Josh Wills (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Wills updated MAPREDUCE-2633:
--

Attachment: MAPREDUCE-2633.patch

 MR-279: Add a getCounter(Enum) method to the Counters interface
 ---

 Key: MAPREDUCE-2633
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2633
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
 Environment: All
Reporter: Josh Wills
Priority: Minor
 Attachments: MAPREDUCE-2633.patch


 I'm fixing a few TODOs I came across in TaskAttemptImpl.java related to the 
 fact that the MRv2 Counters interface don't expose a getCounter(Enum) method 
 for accessing a Counter using the enum's class as the group name and the 
 enum's value as the name of the counter.
 Will add the patch momentarily.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2243) Close all the file streams propely in a finally block to avoid their leakage.

2011-06-30 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057955#comment-13057955
 ] 

Eli Collins commented on MAPREDUCE-2243:


fyi HADOOP-7428 is a case where the RTE is relevant.  

 Close all the file streams propely in a finally block to avoid their leakage.
 -

 Key: MAPREDUCE-2243
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2243
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker, tasktracker
Affects Versions: 0.20.1, 0.22.0, 0.23.0
 Environment: NA
Reporter: Bhallamudi Venkata Siva Kamesh
Assignee: Devaraj K
Priority: Minor
 Attachments: MAPREDUCE-2243.patch

   Original Estimate: 72h
  Remaining Estimate: 72h

 In the following classes streams should be closed in finally block to avoid 
 their leakage in the exceptional cases.
 CompletedJobStatusStore.java
 --
dataOut.writeInt(events.length);
 for (TaskCompletionEvent event : events) {
   event.write(dataOut);
 }
dataOut.close() ;
 EventWriter.java
 --
encoder.flush();
out.close();
 MapTask.java
 ---
 splitMetaInfo.write(out);
  out.close();
 TaskLog
 
  1) str = fis.readLine();
   fis.close();
 2) dos.writeBytes(Long.toString(new File(logLocation, LogName.SYSLOG
   .toString()).length() - prevLogLength) + \n);
 dos.close();
 TotalOrderPartitioner.java
 ---
  while (reader.next(key, value)) {
 parts.add(key);
 key = ReflectionUtils.newInstance(keyClass, conf);
   }
 reader.close();

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2629) Class loading quirk prevents inner class method compilation

2011-06-30 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057956#comment-13057956
 ] 

Todd Lipcon commented on MAPREDUCE-2629:


OK. I'm happy to +1 as is, but if you think there's an improvement, I'll hold 
off. Enjoy your vacation.

 Class loading quirk prevents inner class method compilation
 ---

 Key: MAPREDUCE-2629
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2629
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 0.21.0, 0.22.0
Reporter: Eric Caspole
Priority: Minor
 Attachments: MAPREDUCE-2629.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 While profiling jobs like terasort and gridmix, I noticed that a
 method org.apache.hadoop.mapreduce.task.ReduceContextImpl.access
 $000 is near the top. It turns out that this is because the
 ReduceContextImpl class has a member backupStore which is accessed
 from an inner class ReduceContextImpl$ValueIterator. Due to the way
 synthetic accessor methods work, every access of backupStore results
 in a call to access$000 to the outer class. For some portion of the
 run, backupStore is null and the BackupStore class has never been
 loaded by the reducer.
 Due to the way the Hotspot JVM inliner works, by default it will not
 inline a short method where the class of of the return value object
 is unloaded - if you use a debug JVM with -XX:+PrintCompilation you
 will see a failure reason message like unloaded signature classes.
 This causes every call to ReduceContextImpl.access$000 to be executed
 in the interpreter for the handful of bytecodes to return the null
 backupStore.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-279) Map-Reduce 2.0

2011-06-30 Thread Giridharan Kesavan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058060#comment-13058060
 ] 

Giridharan Kesavan commented on MAPREDUCE-279:
--

Nigel/Arun, I can help setup a build on MR-279

 Map-Reduce 2.0
 --

 Key: MAPREDUCE-279
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.23.0

 Attachments: MR-279.patch, MR-279.patch, MR-279.sh, 
 MR-279_MR_files_to_move.txt, capacity-scheduler-dark-theme.png, 
 multi-column-stable-sort-default-theme.png, yarn-state-machine.job.dot, 
 yarn-state-machine.job.png, yarn-state-machine.task-attempt.dot, 
 yarn-state-machine.task-attempt.png, yarn-state-machine.task.dot, 
 yarn-state-machine.task.png


 Re-factor MapReduce into a generic resource scheduler and a per-job, 
 user-defined component that manages the application execution.
 Check it out by following [the instructions|http://goo.gl/rSJJC].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2627) guava-r09 JAR file needs to be added to mapreduce.

2011-06-30 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058174#comment-13058174
 ] 

Luke Lu commented on MAPREDUCE-2627:


Ah, I see the problem is in the branch-0.22. The problem is introduced by 
merging HDFS-941 from trunk, which pulls in guava dependency in hdfs and forgot 
to update ivy/hadoop-hdfs-template.xml, since trunk has proper dependency in 
hadoop-common-template.xml, which would suffice for trunk mapreduce.

 guava-r09 JAR file needs to be added to mapreduce.
 --

 Key: MAPREDUCE-2627
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2627
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.0
Reporter: Plamen Jeliazkov
Priority: Trivial
   Original Estimate: 24h
  Remaining Estimate: 24h

 Need to add the guava-r09.jar file into the 
 mapreduce/build/ivy/lib/Hadoop/common directory; missing from build.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2634) MapReduce Performance Improvements using forced heartbeat

2011-06-30 Thread Abhijit Suresh Shingate (JIRA)
MapReduce Performance Improvements using forced heartbeat 
--

 Key: MAPREDUCE-2634
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Abhijit Suresh Shingate
Priority: Minor


Following are the proposals which would cause some performance optimizations 
over MapReduce

1.Notify TaskTracker to send heartbeat  when a new Job is submitted
  a) Presently when new Job is submitted to JobTracker, the tasks are assigned 
to TaskTracker only when the TaskTracker sends heartbeat  to JobTracker
  b) Proposal:
(1). JobTracker will notify all TaskTrackers to send heartbeat to 
JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of 
the new Job can be immediately assigned to all TaskTrackers. 

2. Execute Job Setup and Cleanup on JobTracker JVM
  a) Presently Job Setup and Cleanup is carried out as a separated task on 
TaskTracker
  b) Launching a new JVM for Setup and Cleanup of the Job introduces some 
amount of overhead. It takes generally about 0.7 - 1.5 seconds.
  c) Proposal:
(1). JobTracker will execute the Job Setup and Cleanup tasks on the 
JobTracker JVM only.
3. Request TaskTracker to send heartbeat when the Map Task is completed.
  a) Presently TaskTracker reports status of completed Map Tasks as part of 
heartbeat at a regular interval.
  b) Proposal:
   (1). Map Task requests TaskTracker to send heartbeat to JobTracker when Map 
Task is completed. So that Reduce task can quickly know which map task is 
finished and copy map outputs to local.
4. Request JobTracker to trigger committing of Reduce output when Reduce Task 
has finished. 
  a) Presently JobTracker will ask the Reduce Task to commit its output to HDFS 
through heartbeat response.
  b) Proposal:
   (1). Reduce Task requests TaskTracker to send heartbeat to JobTracker 
whenever Reduce Task is completed.

These optimizations might work on small clusters but on big clusters it may be 
overhead.

Please let us know your views.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2634) MapReduce Performance Improvements using forced heartbeat

2011-06-30 Thread Abhijit Suresh Shingate (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhijit Suresh Shingate updated MAPREDUCE-2634:
---

Description: 
Following are the proposals which would cause some performance optimizations 
over MapReduce

*1.Notify TaskTracker to send heartbeat  when a new Job is submitted*
  a) Presently when new Job is submitted to JobTracker, the tasks are assigned 
to TaskTracker only when the TaskTracker sends heartbeat  to JobTracker
  b) Proposal:
- JobTracker will notify all TaskTrackers to send heartbeat to 
JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of 
the new Job can be immediately assigned to all TaskTrackers. 

*2. Execute Job Setup and Cleanup on JobTracker JVM*
  a) Presently Job Setup and Cleanup is carried out as a separated task on 
TaskTracker
  b) Launching a new JVM for Setup and Cleanup of the Job introduces some 
amount of overhead. It takes generally about 0.7 - 1.5 seconds.
  c) Proposal:
- JobTracker will execute the Job Setup and Cleanup tasks on the 
JobTracker JVM only.
*3. Request TaskTracker to send heartbeat when the Map Task is completed.*
  a) Presently TaskTracker reports status of completed Map Tasks as part of 
heartbeat at a regular interval.
  b) Proposal:
- Map Task requests TaskTracker to send heartbeat to JobTracker when 
Map Task is completed. So that Reduce task can quickly know which map task is 
finished and copy map outputs to local.
*4. Request JobTracker to trigger committing of Reduce output when Reduce Task 
has finished. *
  a) Presently JobTracker will ask the Reduce Task to commit its output to HDFS 
through heartbeat response.
  b) Proposal:
- Reduce Task requests TaskTracker to send heartbeat to JobTracker 
whenever Reduce Task is completed.

These optimizations might work on small clusters but on big clusters it may be 
overhead.

Please let us know your views.


  was:
Following are the proposals which would cause some performance optimizations 
over MapReduce

1.Notify TaskTracker to send heartbeat  when a new Job is submitted
  a) Presently when new Job is submitted to JobTracker, the tasks are assigned 
to TaskTracker only when the TaskTracker sends heartbeat  to JobTracker
  b) Proposal:
(1). JobTracker will notify all TaskTrackers to send heartbeat to 
JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of 
the new Job can be immediately assigned to all TaskTrackers. 

2. Execute Job Setup and Cleanup on JobTracker JVM
  a) Presently Job Setup and Cleanup is carried out as a separated task on 
TaskTracker
  b) Launching a new JVM for Setup and Cleanup of the Job introduces some 
amount of overhead. It takes generally about 0.7 - 1.5 seconds.
  c) Proposal:
(1). JobTracker will execute the Job Setup and Cleanup tasks on the 
JobTracker JVM only.
3. Request TaskTracker to send heartbeat when the Map Task is completed.
  a) Presently TaskTracker reports status of completed Map Tasks as part of 
heartbeat at a regular interval.
  b) Proposal:
   (1). Map Task requests TaskTracker to send heartbeat to JobTracker when Map 
Task is completed. So that Reduce task can quickly know which map task is 
finished and copy map outputs to local.
4. Request JobTracker to trigger committing of Reduce output when Reduce Task 
has finished. 
  a) Presently JobTracker will ask the Reduce Task to commit its output to HDFS 
through heartbeat response.
  b) Proposal:
   (1). Reduce Task requests TaskTracker to send heartbeat to JobTracker 
whenever Reduce Task is completed.

These optimizations might work on small clusters but on big clusters it may be 
overhead.

Please let us know your views.



 MapReduce Performance Improvements using forced heartbeat 
 --

 Key: MAPREDUCE-2634
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Abhijit Suresh Shingate
Priority: Minor
   Original Estimate: 168h
  Remaining Estimate: 168h

 Following are the proposals which would cause some performance optimizations 
 over MapReduce
 *1.Notify TaskTracker to send heartbeat  when a new Job is submitted*
   a) Presently when new Job is submitted to JobTracker, the tasks are 
 assigned to TaskTracker only when the TaskTracker sends heartbeat  to 
 JobTracker
   b) Proposal:
 - JobTracker will notify all TaskTrackers to send heartbeat to 
 JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks 
 of the new Job can be immediately assigned to all TaskTrackers. 
 *2. Execute Job Setup and Cleanup on JobTracker JVM*
   a) Presently Job Setup and Cleanup is carried out as a separated task on 
 TaskTracker
   b) Launching a new JVM for Setup and 

[jira] [Updated] (MAPREDUCE-2634) MapReduce Performance Improvements using forced heartbeat

2011-06-30 Thread Abhijit Suresh Shingate (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhijit Suresh Shingate updated MAPREDUCE-2634:
---

Description: 
Following are the proposals which would cause some performance optimizations 
over MapReduce

*1.Notify TaskTracker to send heartbeat  when a new Job is submitted*
  a) Presently when new Job is submitted to JobTracker, the tasks are assigned 
to TaskTracker only when the TaskTracker sends heartbeat  to JobTracker
  b) Proposal:
- JobTracker will notify all TaskTrackers to send heartbeat to 
JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of 
the new Job can be immediately assigned to all TaskTrackers. 

*2. Execute Job Setup and Cleanup on JobTracker JVM*
  a) Presently Job Setup and Cleanup is carried out as a separated task on 
TaskTracker
  b) Launching a new JVM for Setup and Cleanup of the Job introduces some 
amount of overhead. It takes generally about 0.7 - 1.5 seconds.
  c) Proposal:
- JobTracker will execute the Job Setup and Cleanup tasks on the 
JobTracker JVM only.

*3. Request TaskTracker to send heartbeat when the Map Task is completed.*
  a) Presently TaskTracker reports status of completed Map Tasks as part of 
heartbeat at a regular interval.
  b) Proposal:
- Map Task requests TaskTracker to send heartbeat to JobTracker when 
Map Task is completed. So that Reduce task can quickly know which map task is 
finished and copy map outputs to local.

*4. Request JobTracker to trigger committing of Reduce output when Reduce Task 
has finished.*
  a) Presently JobTracker will ask the Reduce Task to commit its output to HDFS 
through heartbeat response.
  b) Proposal:
- Reduce Task requests TaskTracker to send heartbeat to JobTracker 
whenever Reduce Task is completed.


These optimizations might work on small clusters but on big clusters it may be 
overhead.

Please let us know your views.


  was:
Following are the proposals which would cause some performance optimizations 
over MapReduce

*1.Notify TaskTracker to send heartbeat  when a new Job is submitted*
  a) Presently when new Job is submitted to JobTracker, the tasks are assigned 
to TaskTracker only when the TaskTracker sends heartbeat  to JobTracker
  b) Proposal:
- JobTracker will notify all TaskTrackers to send heartbeat to 
JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks of 
the new Job can be immediately assigned to all TaskTrackers. 

*2. Execute Job Setup and Cleanup on JobTracker JVM*
  a) Presently Job Setup and Cleanup is carried out as a separated task on 
TaskTracker
  b) Launching a new JVM for Setup and Cleanup of the Job introduces some 
amount of overhead. It takes generally about 0.7 - 1.5 seconds.
  c) Proposal:
- JobTracker will execute the Job Setup and Cleanup tasks on the 
JobTracker JVM only.
*3. Request TaskTracker to send heartbeat when the Map Task is completed.*
  a) Presently TaskTracker reports status of completed Map Tasks as part of 
heartbeat at a regular interval.
  b) Proposal:
- Map Task requests TaskTracker to send heartbeat to JobTracker when 
Map Task is completed. So that Reduce task can quickly know which map task is 
finished and copy map outputs to local.
*4. Request JobTracker to trigger committing of Reduce output when Reduce Task 
has finished. *
  a) Presently JobTracker will ask the Reduce Task to commit its output to HDFS 
through heartbeat response.
  b) Proposal:
- Reduce Task requests TaskTracker to send heartbeat to JobTracker 
whenever Reduce Task is completed.

These optimizations might work on small clusters but on big clusters it may be 
overhead.

Please let us know your views.



 MapReduce Performance Improvements using forced heartbeat 
 --

 Key: MAPREDUCE-2634
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Abhijit Suresh Shingate
Priority: Minor
   Original Estimate: 168h
  Remaining Estimate: 168h

 Following are the proposals which would cause some performance optimizations 
 over MapReduce
 *1.Notify TaskTracker to send heartbeat  when a new Job is submitted*
   a) Presently when new Job is submitted to JobTracker, the tasks are 
 assigned to TaskTracker only when the TaskTracker sends heartbeat  to 
 JobTracker
   b) Proposal:
 - JobTracker will notify all TaskTrackers to send heartbeat to 
 JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks 
 of the new Job can be immediately assigned to all TaskTrackers. 
 *2. Execute Job Setup and Cleanup on JobTracker JVM*
   a) Presently Job Setup and Cleanup is carried out as a separated task on 
 TaskTracker
   b) Launching a new JVM 

[jira] [Commented] (MAPREDUCE-2634) MapReduce Performance Improvements using forced heartbeat

2011-06-30 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058203#comment-13058203
 ] 

Todd Lipcon commented on MAPREDUCE-2634:


Proposal #1 seems like an interested idea, but I'm skeptical that it will make 
a big difference, since we've already lowered the minimum heartbeat interval to 
300ms in MAPREDUCE-1906.
Proposal #2 seems scary since setup and cleanup may run user code, and running 
user code in the JobTracker JVM is insecure. Piggybacking those with other map 
tasks, though, is probably a good idea (for some reason I don't think we do 
this with JVM reuse today)
Your proposal #3 and #4 is already implemented by MAPREDUCE-270 if I understand 
you correctly.

 MapReduce Performance Improvements using forced heartbeat 
 --

 Key: MAPREDUCE-2634
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Abhijit Suresh Shingate
Priority: Minor
   Original Estimate: 168h
  Remaining Estimate: 168h

 Following are the proposals which would cause some performance optimizations 
 over MapReduce
 *1.Notify TaskTracker to send heartbeat  when a new Job is submitted*
   a) Presently when new Job is submitted to JobTracker, the tasks are 
 assigned to TaskTracker only when the TaskTracker sends heartbeat  to 
 JobTracker
   b) Proposal:
 - JobTracker will notify all TaskTrackers to send heartbeat to 
 JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks 
 of the new Job can be immediately assigned to all TaskTrackers. 
 *2. Execute Job Setup and Cleanup on JobTracker JVM*
   a) Presently Job Setup and Cleanup is carried out as a separated task on 
 TaskTracker
   b) Launching a new JVM for Setup and Cleanup of the Job introduces some 
 amount of overhead. It takes generally about 0.7 - 1.5 seconds.
   c) Proposal:
 - JobTracker will execute the Job Setup and Cleanup tasks on the 
 JobTracker JVM only.
 *3. Request TaskTracker to send heartbeat when the Map Task is completed.*
   a) Presently TaskTracker reports status of completed Map Tasks as part of 
 heartbeat at a regular interval.
   b) Proposal:
 - Map Task requests TaskTracker to send heartbeat to JobTracker when 
 Map Task is completed. So that Reduce task can quickly know which map task is 
 finished and copy map outputs to local.
 *4. Request JobTracker to trigger committing of Reduce output when Reduce 
 Task has finished.*
   a) Presently JobTracker will ask the Reduce Task to commit its output to 
 HDFS through heartbeat response.
   b) Proposal:
 - Reduce Task requests TaskTracker to send heartbeat to JobTracker 
 whenever Reduce Task is completed.
 These optimizations might work on small clusters but on big clusters it may 
 be overhead.
 Please let us know your views.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2634) MapReduce Performance Improvements using forced heartbeat

2011-06-30 Thread M. C. Srivas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058213#comment-13058213
 ] 

M. C. Srivas commented on MAPREDUCE-2634:
-

As Todd says, proposal #3 should remain as a task to be scheduled on a node 
like any other task. The setup/cleanup might consume enormous resources and/or 
take a long time to run.

 MapReduce Performance Improvements using forced heartbeat 
 --

 Key: MAPREDUCE-2634
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Abhijit Suresh Shingate
Priority: Minor
   Original Estimate: 168h
  Remaining Estimate: 168h

 Following are the proposals which would cause some performance optimizations 
 over MapReduce
 *1.Notify TaskTracker to send heartbeat  when a new Job is submitted*
   a) Presently when new Job is submitted to JobTracker, the tasks are 
 assigned to TaskTracker only when the TaskTracker sends heartbeat  to 
 JobTracker
   b) Proposal:
 - JobTracker will notify all TaskTrackers to send heartbeat to 
 JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks 
 of the new Job can be immediately assigned to all TaskTrackers. 
 *2. Execute Job Setup and Cleanup on JobTracker JVM*
   a) Presently Job Setup and Cleanup is carried out as a separated task on 
 TaskTracker
   b) Launching a new JVM for Setup and Cleanup of the Job introduces some 
 amount of overhead. It takes generally about 0.7 - 1.5 seconds.
   c) Proposal:
 - JobTracker will execute the Job Setup and Cleanup tasks on the 
 JobTracker JVM only.
 *3. Request TaskTracker to send heartbeat when the Map Task is completed.*
   a) Presently TaskTracker reports status of completed Map Tasks as part of 
 heartbeat at a regular interval.
   b) Proposal:
 - Map Task requests TaskTracker to send heartbeat to JobTracker when 
 Map Task is completed. So that Reduce task can quickly know which map task is 
 finished and copy map outputs to local.
 *4. Request JobTracker to trigger committing of Reduce output when Reduce 
 Task has finished.*
   a) Presently JobTracker will ask the Reduce Task to commit its output to 
 HDFS through heartbeat response.
   b) Proposal:
 - Reduce Task requests TaskTracker to send heartbeat to JobTracker 
 whenever Reduce Task is completed.
 These optimizations might work on small clusters but on big clusters it may 
 be overhead.
 Please let us know your views.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2634) MapReduce Performance Improvements using forced heartbeat

2011-06-30 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058214#comment-13058214
 ] 

Owen O'Malley commented on MAPREDUCE-2634:
--

Proposal #1 breaks the architecture by making a loop in the communication 
graph. That means you need to work very hard to ensure there aren't any 
distributed deadlocks. (We also need some way to minimize the chance that 
someone introduces one later when working on the code.) I don't see how you'll 
get enough performance out of that to risk the potential for deadlocks.

Proposal #2 violates the security model of never running user code in the 
servers. That is pretty much a non-starter since it would require security 
managers in all of the servers, which would impose a huge performance 
degradation based on everything I've read about them.

Proposal #3 has been there for a long time.

If proposal #4 isn't already done, we should do it. A fast heartbeat for 
committing the output makes sense. 

 MapReduce Performance Improvements using forced heartbeat 
 --

 Key: MAPREDUCE-2634
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2634
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Abhijit Suresh Shingate
Priority: Minor
   Original Estimate: 168h
  Remaining Estimate: 168h

 Following are the proposals which would cause some performance optimizations 
 over MapReduce
 *1.Notify TaskTracker to send heartbeat  when a new Job is submitted*
   a) Presently when new Job is submitted to JobTracker, the tasks are 
 assigned to TaskTracker only when the TaskTracker sends heartbeat  to 
 JobTracker
   b) Proposal:
 - JobTracker will notify all TaskTrackers to send heartbeat to 
 JobTracker whenever a new Job is submitted to JobTracker. So that the Tasks 
 of the new Job can be immediately assigned to all TaskTrackers. 
 *2. Execute Job Setup and Cleanup on JobTracker JVM*
   a) Presently Job Setup and Cleanup is carried out as a separated task on 
 TaskTracker
   b) Launching a new JVM for Setup and Cleanup of the Job introduces some 
 amount of overhead. It takes generally about 0.7 - 1.5 seconds.
   c) Proposal:
 - JobTracker will execute the Job Setup and Cleanup tasks on the 
 JobTracker JVM only.
 *3. Request TaskTracker to send heartbeat when the Map Task is completed.*
   a) Presently TaskTracker reports status of completed Map Tasks as part of 
 heartbeat at a regular interval.
   b) Proposal:
 - Map Task requests TaskTracker to send heartbeat to JobTracker when 
 Map Task is completed. So that Reduce task can quickly know which map task is 
 finished and copy map outputs to local.
 *4. Request JobTracker to trigger committing of Reduce output when Reduce 
 Task has finished.*
   a) Presently JobTracker will ask the Reduce Task to commit its output to 
 HDFS through heartbeat response.
   b) Proposal:
 - Reduce Task requests TaskTracker to send heartbeat to JobTracker 
 whenever Reduce Task is completed.
 These optimizations might work on small clusters but on big clusters it may 
 be overhead.
 Please let us know your views.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira