[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-14 Thread Ravi Gummadi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888209#action_12888209
 ] 

Ravi Gummadi commented on MAPREDUCE-1925:
-

'git diff' couldn't get the changes to the gzipped file in my patch attached.
I will remove the whole file and will have the expected output in an array in 
the test case itself --- as suggested by Amar offline.

 TestRumenJobTraces fails in trunk
 -

 Key: MAPREDUCE-1925
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 1925.patch


 TestRumenJobTraces failed with following error:
 Error Message
 the gold file contains more text at line 1 expected:56 but was:0
 Stacktrace
   at 
 org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
 Full log of the failure is available at 
 http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-14 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888212#action_12888212
 ] 

Amar Kamat commented on MAPREDUCE-1925:
---

Few comments
# The patch doesnt contain the changes to the 
{{v20-single-input-log-event-classes.text.gz}} file. 
# You can get rid of the {{inputLogStream}} variable to avoid future confusion
# You can make the resulting events list and the gold-standard list, in-memory. 
Instead of writing the test events into a file ({{result.txt}}) and then 
comparing the contents of 2 files ({{result.txt}} and 
{{v20-single-input-log-event-classes.text.gz}}) you can keep the contents of 
both the files in memory and get rid of {{result.txt}} and 
{{v20-single-input-log-event-classes.text.gz}}. The test will be faster and 
also easy to change in future.
# If you decide to do the above then there is no need of {{tempDir}} and 
{{rootTempDir}}.

 TestRumenJobTraces fails in trunk
 -

 Key: MAPREDUCE-1925
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 1925.patch


 TestRumenJobTraces failed with following error:
 Error Message
 the gold file contains more text at line 1 expected:56 but was:0
 Stacktrace
   at 
 org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
 Full log of the failure is available at 
 http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1415) With streaming jobs and LinuxTaskController, the localized streaming binary has 571 permissions instead of 570

2010-07-14 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1415:
---

Status: Open  (was: Patch Available)

Patch needs to be updated to trunk.

 With streaming jobs and LinuxTaskController, the localized streaming binary 
 has 571 permissions instead of 570
 --

 Key: MAPREDUCE-1415
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1415
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming, security
Reporter: Vinod K V
Assignee: Amareshwari Sriramadasu
 Fix For: 0.22.0

 Attachments: patch-1415-1.txt, patch-1415-2.txt, patch-1415-3.txt, 
 patch-1415.txt


 After MAPREDUCE-856, all localized files are expected to have **0 permissions 
 for the sake of security.
 This was found by Karam while testing LinuxTaskController functionality after 
 MAPREDUCE-856.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1686) ClassNotFoundException for custom format classes provided in libjars

2010-07-14 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888221#action_12888221
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1686:


Paul, Can you create the patch with suggested change and a unit test, and 
upload here? 

 ClassNotFoundException for custom format classes provided in libjars
 

 Key: MAPREDUCE-1686
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1686
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.20.2
Reporter: Paul Burkhardt
Priority: Minor

 The StreamUtil::goodClassOrNull method assumes user-provided classes have 
 package names and if not, they are part of the Hadoop Streaming package. For 
 example, using custom InputFormat or OutputFormat classes without package 
 names will fail with a ClassNotFound exception which is not indicative given 
 the classes are provided in the libjars option. Admittedly, most Java 
 packages should have a package name so this should rarely come up.
 Possible resolution options:
 1) modify the error message to include the actual classname that was 
 attempted in the goodClassOrNull method
 2) call the Configuration::getClassByName method first and if class not found 
 check for default package name and try the call again
 {code}
 public static Class goodClassOrNull(Configuration conf, String className, 
 String defaultPackage) {
 Class clazz = null;
 try {
 clazz = conf.getClassByName(className);
 } catch (ClassNotFoundException cnf) {
 }
 if (clazz == null) {
 if (className.indexOf('.') == -1  defaultPackage != null) {
 className = defaultPackage + . + className;
 try {
 clazz = conf.getClassByName(className);
 } catch (ClassNotFoundException cnf) {
 }
 }
 }
 return clazz;
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1865) [Rumen] Rumen should also support jobhistory files generated using trunk

2010-07-14 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-1865:
--

Attachment: mapreduce-1865-v1.7.1.patch

Attaching a slightly modified patch with changes to comments and assert 
messages.

 [Rumen] Rumen should also support jobhistory files generated using trunk
 

 Key: MAPREDUCE-1865
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1865
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.22.0

 Attachments: mapreduce-1865-v1.2.patch, mapreduce-1865-v1.6.2.patch, 
 mapreduce-1865-v1.7.1.patch, mapreduce-1865-v1.7.patch


 Rumen code in trunk parses and process only jobhistory files from pre-21 
 hadoop mapreduce clusters. It should also support jobhistory files generated 
 using trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1878) Add MRUnit documentation

2010-07-14 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888238#action_12888238
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1878:


I think the document can be added as package.html in mrunit package instead of 
.txt file, similar to all other packages. 

 Add MRUnit documentation
 

 Key: MAPREDUCE-1878
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1878
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/mrunit
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1878.2.patch, MAPREDUCE-1878.patch


 A short user guide for MRUnit, written in asciidoc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1713) Utilities for system tests specific.

2010-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888243#action_12888243
 ] 

Hadoop QA commented on MAPREDUCE-1713:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449108/MAPREDUCE-1713.patch
  against trunk revision 962682.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/console

This message is automatically generated.

 Utilities for system tests specific.
 

 Key: MAPREDUCE-1713
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1713
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: test
Affects Versions: 0.21.0
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota
 Attachments: 1713-ydist-security.patch, 1713-ydist-security.patch, 
 1713-ydist-security.patch, 1713-ydist-security.patch, 
 1713-ydist-security.patch, MAPREDUCE-1713.patch, MAPREDUCE-1713.patch, 
 MAPREDUCE-1713.patch, systemtestutils_MR1713.patch, 
 utilsforsystemtest_1713.patch


 1.  A method for restarting  the daemon with new configuration.
   public static  void restartCluster(HashtableString,Long props, String 
 confFile) throws Exception;
 2.  A method for resetting the daemon with default configuration.
   public void resetCluster() throws Exception;
 3.  A method for waiting until daemon to stop.
   public  void waitForClusterToStop() throws Exception;
 4.  A method for waiting until daemon to start.
   public  void waitForClusterToStart() throws Exception;
 5.  A method for checking the job whether it has started or not.
   public boolean isJobStarted(JobID id) throws IOException;
 6.  A method for checking the task whether it has started or not.
   public boolean isTaskStarted(TaskInfo taskInfo) throws IOException;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1865) [Rumen] Rumen should also support jobhistory files generated using trunk

2010-07-14 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-1865:


Status: Patch Available  (was: Open)

 [Rumen] Rumen should also support jobhistory files generated using trunk
 

 Key: MAPREDUCE-1865
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1865
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.22.0

 Attachments: mapreduce-1865-v1.2.patch, mapreduce-1865-v1.6.2.patch, 
 mapreduce-1865-v1.7.1.patch, mapreduce-1865-v1.7.patch


 Rumen code in trunk parses and process only jobhistory files from pre-21 
 hadoop mapreduce clusters. It should also support jobhistory files generated 
 using trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1710) Process tree clean up of exceeding memory limit tasks.

2010-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888247#action_12888247
 ] 

Hadoop QA commented on MAPREDUCE-1710:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449101/MAPREDUCE-1710.patch
  against trunk revision 962682.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/console

This message is automatically generated.

 Process tree clean up of exceeding memory limit tasks.
 --

 Key: MAPREDUCE-1710
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1710
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: test
Affects Versions: 0.21.0
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota
 Attachments: 1710-ydist_security.patch, 1710-ydist_security.patch, 
 1710-ydist_security.patch, MAPREDUCE-1710.patch, memorylimittask_1710.patch, 
 memorylimittask_1710.patch, memorylimittask_1710.patch, 
 memorylimittask_1710.patch, memorylimittask_1710.patch


 1. Submit a job which would spawn child processes and each of the child 
 processes exceeds the memory limits. Let the job complete . Check if all the 
 child processes are killed, the overall job should fail.
 2. Submit a job which would spawn child processes and each of the child 
 processes exceeds the memory limits. Kill/fail the job while in progress. 
 Check if all the child processes are killed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-14 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888249#action_12888249
 ] 

Hong Tang commented on MAPREDUCE-1925:
--

Git diff --text will add binary diff to the patch.





 TestRumenJobTraces fails in trunk
 -

 Key: MAPREDUCE-1925
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 1925.patch


 TestRumenJobTraces failed with following error:
 Error Message
 the gold file contains more text at line 1 expected:56 but was:0
 Stacktrace
   at 
 org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
 Full log of the failure is available at 
 http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-14 Thread Ravi Gummadi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888257#action_12888257
 ] 

Ravi Gummadi commented on MAPREDUCE-1925:
-

Thanks Hong.
Will upload new patch which removes that .gz file and the testcase itself 
contains the expected list of events as array of Strings.

 TestRumenJobTraces fails in trunk
 -

 Key: MAPREDUCE-1925
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 1925.patch


 TestRumenJobTraces failed with following error:
 Error Message
 the gold file contains more text at line 1 expected:56 but was:0
 Stacktrace
   at 
 org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
 Full log of the failure is available at 
 http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-14 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-1925:


Attachment: 1925.v1.patch

Attaching new patch incorporating review comments.

 TestRumenJobTraces fails in trunk
 -

 Key: MAPREDUCE-1925
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 1925.patch, 1925.v1.patch


 TestRumenJobTraces failed with following error:
 Error Message
 the gold file contains more text at line 1 expected:56 but was:0
 Stacktrace
   at 
 org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
 Full log of the failure is available at 
 http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1840) [Gridmix] Exploit/Add security features in GridMix

2010-07-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1840:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

I committed this.

Thanks to Amar, Rahul, and Hong

 [Gridmix] Exploit/Add security features in GridMix
 --

 Key: MAPREDUCE-1840
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1840
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/gridmix
Affects Versions: 0.22.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.22.0

 Attachments: mapreduce-gridmix-fp-v1.3.3.patch, 
 mapreduce-gridmix-fp-v1.3.9.patch


 Use security information while replaying jobs in Gridmix. This includes
 - Support for multiple users
 - Submitting jobs as different users
 - Allowing usage of secure cluster (hdfs + mapreduce)
 - Support for multiple queues
 Other features include : 
 - Support for sleep job
 - Support for load job 
 + testcases for verifying all of the above changes

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1594) Support for Sleep Jobs in gridmix

2010-07-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas resolved MAPREDUCE-1594.
--

 Hadoop Flags: [Reviewed]
 Assignee: rahul k singh
Fix Version/s: 0.22.0
   Resolution: Fixed

Fixed in MAPREDUCE-1840

 Support for Sleep Jobs in gridmix
 -

 Key: MAPREDUCE-1594
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1594
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/gridmix
Reporter: rahul k singh
Assignee: rahul k singh
 Fix For: 0.22.0

 Attachments: 1376-5-yhadoop20-100-3.patch, 1594-diff-4-5.patch, 
 1594-yhadoop-20-1xx-1-2.patch, 1594-yhadoop-20-1xx-1-3.patch, 
 1594-yhadoop-20-1xx-1-4.patch, 1594-yhadoop-20-1xx-1-5.patch, 
 1594-yhadoop-20-1xx-1.patch, 1594-yhadoop-20-1xx.patch


 Support for Sleep jobs in gridmix

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1376) Support for varied user submission in Gridmix

2010-07-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas resolved MAPREDUCE-1376.
--

 Hadoop Flags: [Reviewed]
Fix Version/s: 0.22.0
   Resolution: Fixed

Fixed in MAPREDUCE-1840

 Support for varied user submission in Gridmix
 -

 Key: MAPREDUCE-1376
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1376
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/gridmix
Reporter: Chris Douglas
Assignee: Chris Douglas
 Fix For: 0.22.0

 Attachments: 1376-2-yhadoop-security.patch, 
 1376-3-yhadoop20.100.patch, 1376-4-yhadoop20.100.patch, 
 1376-5-yhadoop20-100.patch, 1376-yhadoop-security.patch, M1376-0.patch, 
 M1376-1.patch, M1376-2.patch, M1376-3.patch, M1376-4.patch


 Gridmix currently submits all synthetic jobs as the client user. It should be 
 possible to map users in the trace to a set of users appropriate for the 
 target cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1711) Gridmix should provide an option to submit jobs to the same queues as specified in the trace.

2010-07-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas resolved MAPREDUCE-1711.
--

 Hadoop Flags: [Reviewed]
Fix Version/s: 0.22.0
   Resolution: Fixed

Fixed in MAPREDUCE-1840

 Gridmix should provide an option to submit jobs to the same queues as 
 specified in the trace.
 -

 Key: MAPREDUCE-1711
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1711
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/gridmix
Reporter: Hong Tang
Assignee: rahul k singh
 Fix For: 0.22.0

 Attachments: diff-gridmix.patch, diff-rumen.patch, 
 MR-1711-yhadoop-20-1xx-2.patch, MR-1711-yhadoop-20-1xx-3.patch, 
 MR-1711-yhadoop-20-1xx-4.patch, MR-1711-yhadoop-20-1xx-5.patch, 
 MR-1711-yhadoop-20-1xx-6.patch, MR-1711-yhadoop-20-1xx-7.patch, 
 MR-1711-yhadoop-20-1xx.patch, MR-1711-Yhadoop-20-crossPort-1.patch, 
 MR-1711-Yhadoop-20-crossPort-2.patch, MR-1711-Yhadoop-20-crossPort.patch, 
 mr-1711-yhadoop-20.1xx-20100416.patch


 Gridmix should provide an option to submit jobs to the same queues as 
 specified in the trace.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1526) Cache the job related information while submitting the job , this would avoid many RPC calls to JobTracker.

2010-07-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas resolved MAPREDUCE-1526.
--

 Hadoop Flags: [Reviewed]
 Assignee: rahul k singh
Fix Version/s: 0.22.0
   Resolution: Fixed

Fixed in MAPREDUCE-1840

 Cache the job related information while submitting the job , this would avoid 
 many RPC calls to JobTracker.
 ---

 Key: MAPREDUCE-1526
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1526
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/gridmix
Reporter: rahul k singh
Assignee: rahul k singh
 Fix For: 0.22.0

 Attachments: 1526-yahadoop-20-101-2.patch, 
 1526-yahadoop-20-101-3.patch, 1526-yahadoop-20-101.patch, 
 1526-yhadoop-20-101-4.patch, 1526-yhadoop-20-101-4.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1940) [Rumen] Add appropriate switches to Folder and TraceBuilder w.r.t input and output files

2010-07-14 Thread Amar Kamat (JIRA)
[Rumen] Add appropriate switches to Folder and TraceBuilder w.r.t input and 
output files


 Key: MAPREDUCE-1940
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1940
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Reporter: Amar Kamat


Currently Folder and TraceBuilder expect the input and output to be the last 
arguments in the command line. It would be better to add special switches to 
the input and output files to avoid confusion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1912) [Rumen] Add a driver for Rumen tool

2010-07-14 Thread Ravi Gummadi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888300#action_12888300
 ] 

Ravi Gummadi commented on MAPREDUCE-1912:
-

Some comments:

(1) In build.xml, please change ${common.ivy.lib.dir dir} to 
${common.ivy.lib.dir} directory.

(2) In Folder.java, in initialize() method, printUsage() should be called at 
the 2 places where IllegalArgumentException is thrown(just before throwing).

(3) In Rumen.java, please change A Rumen tool fold/scale the trace to A 
Rumen tool to fold/scale the trace.

(4) In TraceBuilder.java, please reverse the conditions in the following while 
statement so that validation of index is done before accessing the element at 
that index. {code}while (args[switchTop].startsWith(-)  switchTop  
args.length){code}

(5) As you observed the bug, please make the necessary code change of moving 
++switchTop; out of if statement in the above while loop --- to fix the bug 
of the infinite loop when some option that starts with -(and is not same as 
-denuxer) is given.

(6) In both places in TraceBuilder.java where printUsage() is called, you are 
checking the case of zero more arguments only. We need to make sure that there
are at least 3 arguments in both places.
So change (a) if (0 == args.length) to if (args.length  3) and (b) if 
(switchTop == args.length) to if (switchTop+2 = args.length).

 [Rumen] Add a driver for Rumen tool 
 

 Key: MAPREDUCE-1912
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1912
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.22.0

 Attachments: mapreduce-1912-v1.1.patch


 Rumen, as a tool, has 2 entry points :
 - Trace builder
 - Folder
 It would be nice to have a single driver program and have 'trace-builder' and 
 'folder' as its options. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1896) [Herriot] New property for multi user list.

2010-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888307#action_12888307
 ] 

Hadoop QA commented on MAPREDUCE-1896:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12448436/MAPREDUCE-1896.patch
  against trunk revision 962682.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/console

This message is automatically generated.

 [Herriot] New property for multi user list.
 ---

 Key: MAPREDUCE-1896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1896
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: test
Affects Versions: 0.21.0
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota
 Attachments: MAPREDUCE-1896.patch, MAPREDUCE-1896.patch, 
 MAPREDUCE-1896.patch


 Adding new property for multi user list.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1621) Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output

2010-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888319#action_12888319
 ] 

Hadoop QA commented on MAPREDUCE-1621:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449214/patch-1621.txt
  against trunk revision 962682.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/console

This message is automatically generated.

 Streaming's TextOutputReader.getLastOutput throws NPE if it has never read 
 any output
 -

 Key: MAPREDUCE-1621
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1621
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.21.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.22.0

 Attachments: patch-1621.txt


 If TextOutputReader.readKeyValue() has never successfully read a line, then 
 its bytes member will be left null. Thus when logging a task failure, 
 PipeMapRed.getContext() can trigger an NPE when it calls 
 outReader_.getLastOutput().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1928) Dynamic information fed into Hadoop for controlling execution of a submitted job

2010-07-14 Thread Steven Lewis (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888332#action_12888332
 ] 

Steven Lewis commented on MAPREDUCE-1928:
-

Another possible use has to do with adjusting parameters to avoid failures. I 
have an issue where a reducer is running out of memory. If I was aware that 
certain  keys lead to this failure I could take steps such as sampling data 
rather than processing the whole set do I would add access to data about 
failures

 Dynamic information fed into Hadoop for controlling execution of a submitted 
 job
 

 Key: MAPREDUCE-1928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1928
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: job submission, jobtracker, tasktracker
Affects Versions: 0.20.3
Reporter: Raman Grover
   Original Estimate: 2016h
  Remaining Estimate: 2016h

 Currently the job submission protocol requires the job provider to put every 
 bit of information inside an instance of JobConf. The submitted information 
 includes the input data (hdfs path) , suspected resource requirement, number 
 of reducers etc.  This information is read by JobTracker as part of job 
 initialization. Once initialized, job is moved into a running state. From 
 this point, there is no mechanism for any additional information to be fed 
 into Hadoop infrastructure for controlling the job execution. 
The execution pattern for the job looks very much 
 static from this point. Using the size of input data and a few settings 
 inside JobConf, number of mappers is computed. Hadoop attempts at reading the 
 whole of data in parallel by launching parallel map tasks. Once map phase is 
 over, a known number of reduce tasks (supplied as part of  JobConf) are 
 started. 
 Parameters that control the job execution were set in JobConf prior to 
 reading the input data. As the map phase progresses, useful information based 
 upon the content of the input data surfaces and can be used in controlling 
 the further execution of the job. Let us walk through some of the examples 
 where additional information can be fed to Hadoop subsequent to job 
 submission for optimal execution of the job. 
 I) Process a part of the input , based upon the results decide if reading 
 more input is required  
 In a huge data set, user is interested in finding 'k' records that 
 satisfy a predicate, essentially sampling the data. In current 
 implementation, as the data is huge, a large no of mappers would be launched 
 consuming a significant fraction of the available map slots in the cluster. 
 Each map task would attempt at emitting a max of  'k' records. With N 
 mappers, we get N*k records out of which one can pick any k to form the final 
 result. 
This is not optimal as:
1)  A larger number of map slots get occupied initially, affecting other 
 jobs in the queue. 
2) If the selectivity of input data is very low, we essentially did not 
 need scanning the whole of data to form our result. 
 we could have finished by reading a fraction of input data, 
 monitoring the cardinality of the map output and determining if 
more input needs to be processed.  

Optimal way: If reading the whole of input requires N mappers, launch only 
 'M' initially. Allow them to complete. Based upon the statistics collected, 
 decide additional number of mappers to be launched next and so on until the 
 whole of input has been processed or enough records have been collected to 
 for the results, whichever is earlier. 
  
  
 II)  Here is some data, the remaining is yet to arrive, but you may start 
 with it, and receive more input later
  Consider a chain of 2 M-R jobs chained together such that the latter 
 reads the output of the former. The second MR job cannot be started until the 
 first has finished completely. This is essentially because Hadoop needs to be 
 told the complete information about the input before beginning the job. 
 The first M-R has produced enough data ( not finished yet) that can be 
 processed by another MR job and hence the other MR need not wait to grab the 
 whole of input before beginning.  Input splits could be supplied later , but 
 ofcourse before the copy/shuffle phase.
  
 III)   Input data has undergone one round of processing by map phase, have 
 some stats, can now say of the resources 
 required further 
Mappers can produce useful stats about of their output, like the 
 cardinality or produce a histogram describing distribution of output . These 
 stats are available to the job provider (Hive/Pig/End User) who can 
   now determine with better accuracy of the resources (memory 
 requirements ) required in 

[jira] Created: (MAPREDUCE-1941) Need a servlet in JobTracker to stream contents of the job history file

2010-07-14 Thread Srikanth Sundarrajan (JIRA)
Need a servlet in JobTracker to stream contents of the job history file
---

 Key: MAPREDUCE-1941
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1941
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Srikanth Sundarrajan
Assignee: Srikanth Sundarrajan


There is no convenient mechanism to retrieve the contents of the job history 
file. Need a way to retrieve the job history file contents from Job Tracker. 

This can perhaps be implemented as a servlet on the Job tracker.

* Create a jsp/servlet that accepts job id as a request parameter
* Stream the contents of the history file corresponding to the job id, if user 
has permissions to view the job details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1686) ClassNotFoundException for custom format classes provided in libjars

2010-07-14 Thread Paul Burkhardt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888358#action_12888358
 ] 

Paul Burkhardt commented on MAPREDUCE-1686:
---

Okay, I'll try and do that.

Paul



 ClassNotFoundException for custom format classes provided in libjars
 

 Key: MAPREDUCE-1686
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1686
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.20.2
Reporter: Paul Burkhardt
Priority: Minor

 The StreamUtil::goodClassOrNull method assumes user-provided classes have 
 package names and if not, they are part of the Hadoop Streaming package. For 
 example, using custom InputFormat or OutputFormat classes without package 
 names will fail with a ClassNotFound exception which is not indicative given 
 the classes are provided in the libjars option. Admittedly, most Java 
 packages should have a package name so this should rarely come up.
 Possible resolution options:
 1) modify the error message to include the actual classname that was 
 attempted in the goodClassOrNull method
 2) call the Configuration::getClassByName method first and if class not found 
 check for default package name and try the call again
 {code}
 public static Class goodClassOrNull(Configuration conf, String className, 
 String defaultPackage) {
 Class clazz = null;
 try {
 clazz = conf.getClassByName(className);
 } catch (ClassNotFoundException cnf) {
 }
 if (clazz == null) {
 if (className.indexOf('.') == -1  defaultPackage != null) {
 className = defaultPackage + . + className;
 try {
 clazz = conf.getClassByName(className);
 } catch (ClassNotFoundException cnf) {
 }
 }
 }
 return clazz;
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1911) Fix errors in -info option in streaming

2010-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888399#action_12888399
 ] 

Hadoop QA commented on MAPREDUCE-1911:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449235/patch-1911-1.txt
  against trunk revision 963986.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/console

This message is automatically generated.

 Fix errors in -info option in streaming
 ---

 Key: MAPREDUCE-1911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1911
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.22.0

 Attachments: patch-1911-1.txt, patch-1911.txt


 Here are some of the findings by Karam while verifying -info option in 
 streaming:
 # We need to add Optional for -mapper, -reducer,-combiner and -file options.
 # For -inputformat and -outputformat options, we should put Optional in the 
 prefix for the sake on uniformity.
 # We need to remove -cluster decription.
 # -help option is not displayed in usage message.
 # when displaying message for -info or -help options, we should not display 
 Streaming Job Failed!; also exit code should be 0 in case of -help/-info 
 option.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888433#action_12888433
 ] 

Doug Cutting commented on MAPREDUCE-1938:
-

Two thoughts:
 1. In general, we need to better separate the kernel from the library.  
CombineFileInputFormat is library code and should be easy to update without 
updating the cluster.  Long-term, only kernel code should be hardwired on the 
classpath of tasks, with library and user code both specified per job.  There 
should be no default version of library classes for a task: tasks should always 
specify their required libraries.  Is there a Jira for this?  I know Tom's 
expressed interest in working on this.
 2. We should permit user code to depend on different versions of things than 
the kernel does.  For example, user code might rely on a different version of 
HttpClient or Avro than that used by MapReduce.  This should be possible if 
instances of classes from these are not a passed between user and kernel code, 
e.g., as long as Avro and HttpClient classes are not a part of the MapReduce 
API.  In this case classloaders (probably via OSGI) could permit this.

 Ability for having user's classes take precedence over the system classes for 
 tasks' classpath
 --

 Key: MAPREDUCE-1938
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: job submission, task, tasktracker
Reporter: Devaraj Das
 Fix For: 0.22.0

 Attachments: mr-1938-bp20.patch


 It would be nice to have the ability in MapReduce to allow users to specify 
 for their jobs alternate implementations of classes that are already defined 
 in the MapReduce libraries. For example, an alternate implementation for 
 CombineFileInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888436#action_12888436
 ] 

Owen O'Malley commented on MAPREDUCE-1938:
--

I think that the default for this should be on.

Rather than add HADOOP_CLIENT_CLASSPATH, let's make a new variable 
HADOOP_USER_CLASSPATH_LAST. If it is defined, we add HADOOP_CLASSPATH to the 
tail like we currently do. Otherwise it is added to the front.

 Ability for having user's classes take precedence over the system classes for 
 tasks' classpath
 --

 Key: MAPREDUCE-1938
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: job submission, task, tasktracker
Reporter: Devaraj Das
 Fix For: 0.22.0

 Attachments: mr-1938-bp20.patch


 It would be nice to have the ability in MapReduce to allow users to specify 
 for their jobs alternate implementations of classes that are already defined 
 in the MapReduce libraries. For example, an alternate implementation for 
 CombineFileInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888445#action_12888445
 ] 

Owen O'Malley commented on MAPREDUCE-1938:
--

Doug,

I agree that the kernel code should be split out from libraries, however, that 
work is much more involved. I don't see a problem with putting the user's code 
first. It is not a security concern. The user's code is only run as the user. 
Furthermore, it doesn't actually stop them from loading system classes. They 
can exec a new jvm with a new class path of their own choosing.

Therefore, by putting the user's classes last all that we've done is make it 
harder for the user to implement hot fixes in their own jobs. That doesn't seem 
like a good goal.

 Ability for having user's classes take precedence over the system classes for 
 tasks' classpath
 --

 Key: MAPREDUCE-1938
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: job submission, task, tasktracker
Reporter: Devaraj Das
 Fix For: 0.22.0

 Attachments: mr-1938-bp20.patch


 It would be nice to have the ability in MapReduce to allow users to specify 
 for their jobs alternate implementations of classes that are already defined 
 in the MapReduce libraries. For example, an alternate implementation for 
 CombineFileInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1933) Create automated testcase for tasktracker dealing with corrupted disk.

2010-07-14 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888468#action_12888468
 ] 

Konstantin Boudnik commented on MAPREDUCE-1933:
---

bq. prop.put(mapred.local.dir, 
/grid/0/dev/tmp/mapred/mapred-local,/grid/1/dev/tmp/mapred/mapred-local,/grid/2/dev/tmp/mapred/mapred-local,/grid/3/dev/tmp/mapred/mapred-local);

Absolutely, besides this particular parameter should be set by a normal MR 
config already. 

Also, please don't use string literals for configuration parameters. There was 
a significant effort in 0.21 to have all configuration keys refactored to named 
constants. Use them instead.

 Create automated testcase for tasktracker dealing with corrupted disk.
 --

 Key: MAPREDUCE-1933
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1933
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Reporter: Iyappan Srinivasan
Assignee: Iyappan Srinivasan
 Attachments: TestCorruptedDiskJob.java


 After the TaskTracker has already run some tasks successfully, corrupt a 
 disk by making the corresponding mapred.local.dir unreadable/unwritable. 
 Make sure that jobs continue to succeed even though some tasks scheduled 
 there fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1919) [Herriot] Test for verification of per cache file ref count.

2010-07-14 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888472#action_12888472
 ] 

Konstantin Boudnik commented on MAPREDUCE-1919:
---

I want to disagree with the suggestion on moving this little method to a helper 
class. It doesn't make much sense to create a wrapper around a well know 
ToolRunner interface - it just creates confusion. Why don't you simply use 
{{int exitCode = ToolRunner.run(job, tool, jobArgs)}} ? Why do you need a 
method to wrap a call to another one?

Also, please consider the optimization for the imports list - it is over 
detailed. 

 [Herriot] Test for verification of per cache file ref  count.
 -

 Key: MAPREDUCE-1919
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1919
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: test
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota
 Attachments: 1919-ydist-security.patch, MAPREDUCE-1919.patch


 It covers the following scenarios.
 1. Run the job with two distributed cache files and verify whether job is 
 succeeded or not.
 2.  Run the job with distributed cache files and remove one cache file from 
 the DFS when it is localized.verify whether the job is failed or not.
 3.  Run the job with two distribute cache files and the size of  one file 
 should be larger than local.cache.size.Verify  whether job is succeeded or 
 not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1942) 'compile-fault-inject' should never be called directly.

2010-07-14 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated MAPREDUCE-1942:
--

Attachment: MAPREDUCE-1942.patch

The fix.

  'compile-fault-inject' should never be called directly.
 

 Key: MAPREDUCE-1942
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1942
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.21.0
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
Priority: Minor
 Attachments: MAPREDUCE-1942.patch


 Similar to HDFS-1299: prevent calls to helper targets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1942) 'compile-fault-inject' should never be called directly.

2010-07-14 Thread Konstantin Boudnik (JIRA)
 'compile-fault-inject' should never be called directly.


 Key: MAPREDUCE-1942
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1942
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.21.0
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
Priority: Minor


Similar to HDFS-1299: prevent calls to helper targets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888482#action_12888482
 ] 

Doug Cutting commented on MAPREDUCE-1938:
-

Owen, I agree with your analysis.  I'm just trying to put this patch in context 
of these other related discussions.

This patch addresses some issues relevant to separation of kernel  library.  
In common cases one can merely provide an alternate version of the library 
class in one's job.  Fully separating kernel  library with a well-defined, 
minimal kernel API is clearly aesthetically better.  Are there use cases that 
will that enable that this patch will not?  I think mostly it will just make it 
clear which classes are safe to replace with updated versions and which are 
not.  Does that sound right?

The issue of user versions of libraries that the kernel uses (like Avro, log4j, 
HttpClient, etc.) is not entirely addressed by this patch.  If the user's 
version is backwards compatible with the kernel's version then this patch is 
sufficient.  But if the user's version of a library makes incompatible changes 
then we'd need a classloader/OSGI solution.  Even then, I think it only works 
if user and kernel code do not interchange instances of classes defined by 
these libraries.  A minimal kernel API will help reduce that risk.  Does this 
analysis sound right?

I'm trying to understand how far this patch gets us towards those goals: what 
it solves and what it doesn't.

 Ability for having user's classes take precedence over the system classes for 
 tasks' classpath
 --

 Key: MAPREDUCE-1938
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: job submission, task, tasktracker
Reporter: Devaraj Das
 Fix For: 0.22.0

 Attachments: mr-1938-bp20.patch


 It would be nice to have the ability in MapReduce to allow users to specify 
 for their jobs alternate implementations of classes that are already defined 
 in the MapReduce libraries. For example, an alternate implementation for 
 CombineFileInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1928) Dynamic information fed into Hadoop for controlling execution of a submitted job

2010-07-14 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888503#action_12888503
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1928:
--

to add to #1 - we may be able to change the split size based on the observed 
selectivity of an ongoing job (ie. add splits with larger/smaller size 
depending on stats from the first set of splits). It's possible that Hadoop may 
want to do this as part of the basic framework (by exploiting any mechanisms 
provided here).

This is a huge win for a framework like Hive. It would drastically reduce the 
amount of wasted work (limit N queries) and spawning unnecessarily large number 
of mappers (unknown selectivity) - just to name to obvious use cases. 

Can you supply a more concrete proposal in terms of api changes?

 Dynamic information fed into Hadoop for controlling execution of a submitted 
 job
 

 Key: MAPREDUCE-1928
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1928
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: job submission, jobtracker, tasktracker
Affects Versions: 0.20.3
Reporter: Raman Grover
   Original Estimate: 2016h
  Remaining Estimate: 2016h

 Currently the job submission protocol requires the job provider to put every 
 bit of information inside an instance of JobConf. The submitted information 
 includes the input data (hdfs path) , suspected resource requirement, number 
 of reducers etc.  This information is read by JobTracker as part of job 
 initialization. Once initialized, job is moved into a running state. From 
 this point, there is no mechanism for any additional information to be fed 
 into Hadoop infrastructure for controlling the job execution. 
The execution pattern for the job looks very much 
 static from this point. Using the size of input data and a few settings 
 inside JobConf, number of mappers is computed. Hadoop attempts at reading the 
 whole of data in parallel by launching parallel map tasks. Once map phase is 
 over, a known number of reduce tasks (supplied as part of  JobConf) are 
 started. 
 Parameters that control the job execution were set in JobConf prior to 
 reading the input data. As the map phase progresses, useful information based 
 upon the content of the input data surfaces and can be used in controlling 
 the further execution of the job. Let us walk through some of the examples 
 where additional information can be fed to Hadoop subsequent to job 
 submission for optimal execution of the job. 
 I) Process a part of the input , based upon the results decide if reading 
 more input is required  
 In a huge data set, user is interested in finding 'k' records that 
 satisfy a predicate, essentially sampling the data. In current 
 implementation, as the data is huge, a large no of mappers would be launched 
 consuming a significant fraction of the available map slots in the cluster. 
 Each map task would attempt at emitting a max of  'k' records. With N 
 mappers, we get N*k records out of which one can pick any k to form the final 
 result. 
This is not optimal as:
1)  A larger number of map slots get occupied initially, affecting other 
 jobs in the queue. 
2) If the selectivity of input data is very low, we essentially did not 
 need scanning the whole of data to form our result. 
 we could have finished by reading a fraction of input data, 
 monitoring the cardinality of the map output and determining if 
more input needs to be processed.  

Optimal way: If reading the whole of input requires N mappers, launch only 
 'M' initially. Allow them to complete. Based upon the statistics collected, 
 decide additional number of mappers to be launched next and so on until the 
 whole of input has been processed or enough records have been collected to 
 for the results, whichever is earlier. 
  
  
 II)  Here is some data, the remaining is yet to arrive, but you may start 
 with it, and receive more input later
  Consider a chain of 2 M-R jobs chained together such that the latter 
 reads the output of the former. The second MR job cannot be started until the 
 first has finished completely. This is essentially because Hadoop needs to be 
 told the complete information about the input before beginning the job. 
 The first M-R has produced enough data ( not finished yet) that can be 
 processed by another MR job and hence the other MR need not wait to grab the 
 whole of input before beginning.  Input splits could be supplied later , but 
 ofcourse before the copy/shuffle phase.
  
 III)   Input data has undergone one round of processing by map phase, have 
 some stats, can now say of the resources 
 required further 
  

[jira] Updated: (MAPREDUCE-1733) Authentication between pipes processes and java counterparts.

2010-07-14 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated MAPREDUCE-1733:


Status: Patch Available  (was: Open)

 Authentication between pipes processes and java counterparts.
 -

 Key: MAPREDUCE-1733
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1733
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: MR-1733-y20.1.patch, MR-1733-y20.2.patch, 
 MR-1733-y20.3.patch, MR-1733.5.patch


 The connection between a pipe process and its parent java process should be 
 authenticated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1812) New properties for suspend and resume process.

2010-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888511#action_12888511
 ] 

Hadoop QA commented on MAPREDUCE-1812:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449207/MAPREDUCE-1812.patch
  against trunk revision 963986.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/console

This message is automatically generated.

 New properties for suspend and resume process.
 --

 Key: MAPREDUCE-1812
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1812
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: test
Affects Versions: 0.21.0
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota
 Attachments: MAPREDUCE-1812.patch, MAPREDUCE-1812.patch


 Adding new properties in system-test-mr.xml file for suspend and resume 
 process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-1938:
---

Attachment: mr-1938-bp20.1.patch

Addressing Owen's comment on the shell script part of the patch. 

Doug, this patch is a first step towards letting users use their own versions 
of library provided implementation for things like CombineFileInputFormat. The 
use case is to allow for specific implementations of library classes for 
certain classes of jobs. 

This doesn't aim to address the kernel/library separation in its entirety. So 
yes, if the user puts a class on the classpath that doesn't work with the 
kernel compatibly then tasks will fail, or produce obscure/inconsistent 
results, but that will affect only that job, and the user would notice that 
soon (hopefully). Did i understand your concern right?

 Ability for having user's classes take precedence over the system classes for 
 tasks' classpath
 --

 Key: MAPREDUCE-1938
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: job submission, task, tasktracker
Reporter: Devaraj Das
 Fix For: 0.22.0

 Attachments: mr-1938-bp20.1.patch, mr-1938-bp20.patch


 It would be nice to have the ability in MapReduce to allow users to specify 
 for their jobs alternate implementations of classes that are already defined 
 in the MapReduce libraries. For example, an alternate implementation for 
 CombineFileInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888532#action_12888532
 ] 

Doug Cutting commented on MAPREDUCE-1938:
-

 Did i understand your concern right?

I don't have specific concerns about this patch.  Sorry for any confusion in 
that regard.  I thought it worthwhile to discuss how this change relates to 
other changes that are contemplated.  It seems not inconsistent, provides some 
of the benefits, and is considerably simpler; in short, a good thing.

 Ability for having user's classes take precedence over the system classes for 
 tasks' classpath
 --

 Key: MAPREDUCE-1938
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: job submission, task, tasktracker
Reporter: Devaraj Das
 Fix For: 0.22.0

 Attachments: mr-1938-bp20.1.patch, mr-1938-bp20.patch


 It would be nice to have the ability in MapReduce to allow users to specify 
 for their jobs alternate implementations of classes that are already defined 
 in the MapReduce libraries. For example, an alternate implementation for 
 CombineFileInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888536#action_12888536
 ] 

Owen O'Malley commented on MAPREDUCE-1938:
--

This patch basically puts the user in charge of their job. They can leave the 
safety switch set in which case they get the current behavior. But if they turn 
off the safety, their classes go ahead of the ones installed on the cluster. 
That means that they can break things, but all they can break is their own 
tasks.

After we do the split of core from library, you still need this switch. There 
will always be the possibility of needing to patch something in the core, 
because even MapTask has bugs. *smile* After splitting them apart, we can put 
the library code at the very end

safety on:  core, user, library
safety off: user, core, library

This patch is just about providing the safety switch.

 Ability for having user's classes take precedence over the system classes for 
 tasks' classpath
 --

 Key: MAPREDUCE-1938
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: job submission, task, tasktracker
Reporter: Devaraj Das
 Fix For: 0.22.0

 Attachments: mr-1938-bp20.1.patch, mr-1938-bp20.patch


 It would be nice to have the ability in MapReduce to allow users to specify 
 for their jobs alternate implementations of classes that are already defined 
 in the MapReduce libraries. For example, an alternate implementation for 
 CombineFileInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2010-07-14 Thread Mahadev konar (JIRA)
Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes


 Key: MAPREDUCE-1943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar
 Fix For: 0.22.0


We have come across issues in production clusters wherein users abuse counters, 
statusreport messages and split sizes. One such case was when one of the users 
had 100 million counters. This leads to jobtracker going out of memory and 
being unresponsive. In this jira I am proposing to put sane limits on the 
status report length, the number of counters and the size of block locations 
returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2010-07-14 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1943:
-

Fix Version/s: (was: 0.22.0)

 Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
 

 Key: MAPREDUCE-1943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar

 We have come across issues in production clusters wherein users abuse 
 counters, statusreport messages and split sizes. One such case was when one 
 of the users had 100 million counters. This leads to jobtracker going out of 
 memory and being unresponsive. In this jira I am proposing to put sane limits 
 on the status report length, the number of counters and the size of block 
 locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1942) 'compile-fault-inject' should never be called directly.

2010-07-14 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888548#action_12888548
 ] 

Eli Collins commented on MAPREDUCE-1942:


+1

  'compile-fault-inject' should never be called directly.
 

 Key: MAPREDUCE-1942
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1942
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.21.0
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
Priority: Minor
 Attachments: MAPREDUCE-1942.patch


 Similar to HDFS-1299: prevent calls to helper targets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2010-07-14 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888557#action_12888557
 ] 

Scott Chen commented on MAPREDUCE-1943:
---

+1 to the idea. We have seen the huge split-size kills JT. This will help.

 Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
 

 Key: MAPREDUCE-1943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar

 We have come across issues in production clusters wherein users abuse 
 counters, statusreport messages and split sizes. One such case was when one 
 of the users had 100 million counters. This leads to jobtracker going out of 
 memory and being unresponsive. In this jira I am proposing to put sane limits 
 on the status report length, the number of counters and the size of block 
 locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1848) Put number of speculative, data local, rack local tasks in JobTracker metrics

2010-07-14 Thread Dmytro Molkov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888553#action_12888553
 ] 

Dmytro Molkov commented on MAPREDUCE-1848:
--

Patch looks good to me

 Put number of speculative, data local, rack local tasks in JobTracker metrics
 -

 Key: MAPREDUCE-1848
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1848
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1848-20100614.txt, 
 MAPREDUCE-1848-20100617.txt, MAPREDUCE-1848-20100623.txt


 It will be nice that we can collect these information in JobTracker metrics

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2010-07-14 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1943:
-

Attachment: MAPREDUCE-1521-0.20-yahoo.patch

this patch imposes some limits.

the following are the limits it imposes:

1) The number of counters per group is limited to 40. If the counters increase 
that amount they are dropped silently.
2) The number of counter groups is restricted to 40. Again if the groups are 
more than the limit they are dropped silently.
3) The string size of counter name is restricted to 64 characters.
4) the string size of group name is restricted to 128 characters.
5) The number of block locations returned by a split is restricted to 100, this 
can be changed with a configuration parameter. 
6) limit the reporter.setstatus() string size to 512 characters.

I havent added tests yet. Will upload one shortly. Also, this patch is for 
yahoo 0.20 branch. I will upload one for the trunk shortly.

 Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
 

 Key: MAPREDUCE-1943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar
 Attachments: MAPREDUCE-1521-0.20-yahoo.patch


 We have come across issues in production clusters wherein users abuse 
 counters, statusreport messages and split sizes. One such case was when one 
 of the users had 100 million counters. This leads to jobtracker going out of 
 memory and being unresponsive. In this jira I am proposing to put sane limits 
 on the status report length, the number of counters and the size of block 
 locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2010-07-14 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1943:
-

Attachment: MAPREDUCE-1943-0.20-yahoo.patch

attached the wrong file.. :)

 Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
 

 Key: MAPREDUCE-1943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar
 Attachments: MAPREDUCE-1943-0.20-yahoo.patch


 We have come across issues in production clusters wherein users abuse 
 counters, statusreport messages and split sizes. One such case was when one 
 of the users had 100 million counters. This leads to jobtracker going out of 
 memory and being unresponsive. In this jira I am proposing to put sane limits 
 on the status report length, the number of counters and the size of block 
 locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2010-07-14 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1943:
-

Attachment: (was: MAPREDUCE-1521-0.20-yahoo.patch)

 Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
 

 Key: MAPREDUCE-1943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar
 Attachments: MAPREDUCE-1943-0.20-yahoo.patch


 We have come across issues in production clusters wherein users abuse 
 counters, statusreport messages and split sizes. One such case was when one 
 of the users had 100 million counters. This leads to jobtracker going out of 
 memory and being unresponsive. In this jira I am proposing to put sane limits 
 on the status report length, the number of counters and the size of block 
 locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker Jobtracker

2010-07-14 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated MAPREDUCE-1906:
---

Status: Open  (was: Patch Available)

re-subit for hudson.

 Lower minimum heartbeat interval for tasktracker  Jobtracker
 -

 Key: MAPREDUCE-1906
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.20.2, 0.20.1
Reporter: Scott Carey
 Attachments: MAPREDUCE-1906-0.21-v2.patch, MAPREDUCE-1906-0.21.patch


 I get a 0% to 15% performance increase for smaller clusters by making the 
 heartbeat throttle stop penalizing clusters with less than 300 nodes.
 Between 0.19 and 0.20, the default minimum heartbeat interval increased from 
 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large 
 clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats 
 per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker Jobtracker

2010-07-14 Thread Scott Carey (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated MAPREDUCE-1906:
---

Status: Patch Available  (was: Open)

re-submit for hudson.

 Lower minimum heartbeat interval for tasktracker  Jobtracker
 -

 Key: MAPREDUCE-1906
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.20.2, 0.20.1
Reporter: Scott Carey
 Attachments: MAPREDUCE-1906-0.21-v2.patch, MAPREDUCE-1906-0.21.patch


 I get a 0% to 15% performance increase for smaller clusters by making the 
 heartbeat throttle stop penalizing clusters with less than 300 nodes.
 Between 0.19 and 0.20, the default minimum heartbeat interval increased from 
 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large 
 clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats 
 per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1730) Automate test scenario for successful/killed jobs' memory is properly removed from jobtracker after these jobs retire.

2010-07-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888623#action_12888623
 ] 

Hadoop QA commented on MAPREDUCE-1730:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449081/MAPREDUCE-1730.patch
  against trunk revision 963986.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/console

This message is automatically generated.

 Automate test scenario for successful/killed jobs' memory is properly removed 
 from jobtracker after these jobs retire.
 --

 Key: MAPREDUCE-1730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1730
 Project: Hadoop Map/Reduce
  Issue Type: Test
Affects Versions: 0.21.0
Reporter: Iyappan Srinivasan
Assignee: Iyappan Srinivasan
 Attachments: MAPREDUCE-1730.patch, MAPREDUCE-1730.patch, 
 MAPREDUCE-1730.patch, TestJobRetired.patch, TestJobRetired.patch, 
 TestRetiredJobs-ydist-security-patch.txt, 
 TestRetiredJobs-ydist-security-patch.txt, TestRetiredJobs.patch


 Automate using herriot framework,  test scenario for successful/killed jobs' 
 memory is properly removed from jobtracker after these jobs retire.
 This should test when successful and failed jobs are retired,  their 
 jobInProgress object are removed properly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2010-07-14 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888693#action_12888693
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1943:


Limiting task diagnostic info and status are done in MAPREDUCE-1482.

 Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
 

 Key: MAPREDUCE-1943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar
 Attachments: MAPREDUCE-1943-0.20-yahoo.patch


 We have come across issues in production clusters wherein users abuse 
 counters, statusreport messages and split sizes. One such case was when one 
 of the users had 100 million counters. This leads to jobtracker going out of 
 memory and being unresponsive. In this jira I am proposing to put sane limits 
 on the status report length, the number of counters and the size of block 
 locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1896) [Herriot] New property for multi user list.

2010-07-14 Thread Vinay Kumar Thota (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888694#action_12888694
 ] 

Vinay Kumar Thota commented on MAPREDUCE-1896:
--

I could see two failures and they are unrelated to this patch. I don't think so 
the patch could raise these failures because the scope is just adds the new 
property in a xml file.

 [Herriot] New property for multi user list.
 ---

 Key: MAPREDUCE-1896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1896
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: test
Affects Versions: 0.21.0
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota
 Attachments: MAPREDUCE-1896.patch, MAPREDUCE-1896.patch, 
 MAPREDUCE-1896.patch


 Adding new property for multi user list.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1941) Need a servlet in JobTracker to stream contents of the job history file

2010-07-14 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888697#action_12888697
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1941:


This can be done in Job client itself, no? History url is already available in 
JobStatus. 

 Need a servlet in JobTracker to stream contents of the job history file
 ---

 Key: MAPREDUCE-1941
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1941
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Srikanth Sundarrajan
Assignee: Srikanth Sundarrajan

 There is no convenient mechanism to retrieve the contents of the job history 
 file. Need a way to retrieve the job history file contents from Job Tracker. 
 This can perhaps be implemented as a servlet on the Job tracker.
 * Create a jsp/servlet that accepts job id as a request parameter
 * Stream the contents of the history file corresponding to the job id, if 
 user has permissions to view the job details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1911) Fix errors in -info option in streaming

2010-07-14 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888702#action_12888702
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1911:


Test failures are because of MAPREDUCE-1834 and MAPREDUCE-1925

 Fix errors in -info option in streaming
 ---

 Key: MAPREDUCE-1911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1911
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.22.0

 Attachments: patch-1911-1.txt, patch-1911.txt


 Here are some of the findings by Karam while verifying -info option in 
 streaming:
 # We need to add Optional for -mapper, -reducer,-combiner and -file options.
 # For -inputformat and -outputformat options, we should put Optional in the 
 prefix for the sake on uniformity.
 # We need to remove -cluster decription.
 # -help option is not displayed in usage message.
 # when displaying message for -info or -help options, we should not display 
 Streaming Job Failed!; also exit code should be 0 in case of -help/-info 
 option.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1621) Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output

2010-07-14 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1621:
---

Status: Open  (was: Patch Available)

Many tests failed because of NoClassDefFoundError. Re-submitting to hudson

 Streaming's TextOutputReader.getLastOutput throws NPE if it has never read 
 any output
 -

 Key: MAPREDUCE-1621
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1621
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.21.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.22.0

 Attachments: patch-1621.txt


 If TextOutputReader.readKeyValue() has never successfully read a line, then 
 its bytes member will be left null. Thus when logging a task failure, 
 PipeMapRed.getContext() can trigger an NPE when it calls 
 outReader_.getLastOutput().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1621) Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output

2010-07-14 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1621:
---

Status: Patch Available  (was: Open)

 Streaming's TextOutputReader.getLastOutput throws NPE if it has never read 
 any output
 -

 Key: MAPREDUCE-1621
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1621
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.21.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.22.0

 Attachments: patch-1621.txt


 If TextOutputReader.readKeyValue() has never successfully read a line, then 
 its bytes member will be left null. Thus when logging a task failure, 
 PipeMapRed.getContext() can trigger an NPE when it calls 
 outReader_.getLastOutput().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1812) New properties for suspend and resume process.

2010-07-14 Thread Vinay Kumar Thota (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888704#action_12888704
 ] 

Vinay Kumar Thota commented on MAPREDUCE-1812:
--

I could see 6 failures and they are unrelated to this patch. I don't think so 
the patch could raise these failures because the scope is just adds the new 
properties in a xml file.

 New properties for suspend and resume process.
 --

 Key: MAPREDUCE-1812
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1812
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: test
Affects Versions: 0.21.0
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota
 Attachments: MAPREDUCE-1812.patch, MAPREDUCE-1812.patch


 Adding new properties in system-test-mr.xml file for suspend and resume 
 process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1730) Automate test scenario for successful/killed jobs' memory is properly removed from jobtracker after these jobs retire.

2010-07-14 Thread Iyappan Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888706#action_12888706
 ] 

Iyappan Srinivasan commented on MAPREDUCE-1730:
---

The two errors are unrelated to the patch. 

 Automate test scenario for successful/killed jobs' memory is properly removed 
 from jobtracker after these jobs retire.
 --

 Key: MAPREDUCE-1730
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1730
 Project: Hadoop Map/Reduce
  Issue Type: Test
Affects Versions: 0.21.0
Reporter: Iyappan Srinivasan
Assignee: Iyappan Srinivasan
 Attachments: MAPREDUCE-1730.patch, MAPREDUCE-1730.patch, 
 MAPREDUCE-1730.patch, TestJobRetired.patch, TestJobRetired.patch, 
 TestRetiredJobs-ydist-security-patch.txt, 
 TestRetiredJobs-ydist-security-patch.txt, TestRetiredJobs.patch


 Automate using herriot framework,  test scenario for successful/killed jobs' 
 memory is properly removed from jobtracker after these jobs retire.
 This should test when successful and failed jobs are retired,  their 
 jobInProgress object are removed properly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1941) Need a servlet in JobTracker to stream contents of the job history file

2010-07-14 Thread Srikanth Sundarrajan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888710#action_12888710
 ] 

Srikanth Sundarrajan commented on MAPREDUCE-1941:
-

{quote}
This can be done in Job client itself, no? History url is already available in 
JobStatus. 
{quote} 

While the history file name may be available through JobStatus, the history 
file is owned by user who runs the job tracker. However access to history file 
should be governed by JobACL.VIEW_JOB. Hence the request to have a separate 
servlet to provide job history file contents.  

 Need a servlet in JobTracker to stream contents of the job history file
 ---

 Key: MAPREDUCE-1941
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1941
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Srikanth Sundarrajan
Assignee: Srikanth Sundarrajan

 There is no convenient mechanism to retrieve the contents of the job history 
 file. Need a way to retrieve the job history file contents from Job Tracker. 
 This can perhaps be implemented as a servlet on the Job tracker.
 * Create a jsp/servlet that accepts job id as a request parameter
 * Stream the contents of the history file corresponding to the job id, if 
 user has permissions to view the job details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.