date:20100714


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888209#action_12888209
 ] 

Ravi Gummadi commented on MAPREDUCE-1925:
-

'git diff' couldn't get the changes to the gzipped file in my patch attached.
I will remove the whole file and will have the expected output in an array in 
the test case itself --- as suggested by Amar offline.

 TestRumenJobTraces fails in trunk
 -

 Key: MAPREDUCE-1925
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 1925.patch


 TestRumenJobTraces failed with following error:
 Error Message
 the gold file contains more text at line 1 expected:56 but was:0
 Stacktrace
   at 
 org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
 Full log of the failure is available at 
 http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-14 Thread Amar Kamat (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888212#action_12888212
 ] 

Amar Kamat commented on MAPREDUCE-1925:
---

Few comments
# The patch doesnt contain the changes to the 
{{v20-single-input-log-event-classes.text.gz}} file. 
# You can get rid of the {{inputLogStream}} variable to avoid future confusion
# You can make the resulting events list and the gold-standard list, in-memory. 
Instead of writing the test events into a file ({{result.txt}}) and then 
comparing the contents of 2 files ({{result.txt}} and 
{{v20-single-input-log-event-classes.text.gz}}) you can keep the contents of 
both the files in memory and get rid of {{result.txt}} and 
{{v20-single-input-log-event-classes.text.gz}}. The test will be faster and 
also easy to change in future.
# If you decide to do the above then there is no need of {{tempDir}} and 
{{rootTempDir}}.

 TestRumenJobTraces fails in trunk
 -

 Key: MAPREDUCE-1925
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 1925.patch


 TestRumenJobTraces failed with following error:
 Error Message
 the gold file contains more text at line 1 expected:56 but was:0
 Stacktrace
   at 
 org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
 Full log of the failure is available at 
 http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1415) With streaming jobs and LinuxTaskController, the localized streaming binary has 571 permissions instead of 570


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1415:
---

Status: Open  (was: Patch Available)

Patch needs to be updated to trunk.

 With streaming jobs and LinuxTaskController, the localized streaming binary 
 has 571 permissions instead of 570
 --

 Key: MAPREDUCE-1415
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1415
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming, security
Reporter: Vinod K V
Assignee: Amareshwari Sriramadasu
 Fix For: 0.22.0

 Attachments: patch-1415-1.txt, patch-1415-2.txt, patch-1415-3.txt, 
 patch-1415.txt


 After MAPREDUCE-856, all localized files are expected to have **0 permissions 
 for the sake of security.
 This was found by Karam while testing LinuxTaskController functionality after 
 MAPREDUCE-856.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1686) ClassNotFoundException for custom format classes provided in libjars


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888221#action_12888221
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1686:


Paul, Can you create the patch with suggested change and a unit test, and 
upload here? 

 ClassNotFoundException for custom format classes provided in libjars
 

 Key: MAPREDUCE-1686
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1686
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.20.2
Reporter: Paul Burkhardt
Priority: Minor

 The StreamUtil::goodClassOrNull method assumes user-provided classes have 
 package names and if not, they are part of the Hadoop Streaming package. For 
 example, using custom InputFormat or OutputFormat classes without package 
 names will fail with a ClassNotFound exception which is not indicative given 
 the classes are provided in the libjars option. Admittedly, most Java 
 packages should have a package name so this should rarely come up.
 Possible resolution options:
 1) modify the error message to include the actual classname that was 
 attempted in the goodClassOrNull method
 2) call the Configuration::getClassByName method first and if class not found 
 check for default package name and try the call again
 {code}
 public static Class goodClassOrNull(Configuration conf, String className, 
 String defaultPackage) {
 Class clazz = null;
 try {
 clazz = conf.getClassByName(className);
 } catch (ClassNotFoundException cnf) {
 }
 if (clazz == null) {
 if (className.indexOf('.') == -1  defaultPackage != null) {
 className = defaultPackage + . + className;
 try {
 clazz = conf.getClassByName(className);
 } catch (ClassNotFoundException cnf) {
 }
 }
 }
 return clazz;
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1865) [Rumen] Rumen should also support jobhistory files generated using trunk

2010-07-14 Thread Amar Kamat (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-1865:
--

Attachment: mapreduce-1865-v1.7.1.patch

Attaching a slightly modified patch with changes to comments and assert 
messages.

 [Rumen] Rumen should also support jobhistory files generated using trunk
 

 Key: MAPREDUCE-1865
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1865
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.22.0

 Attachments: mapreduce-1865-v1.2.patch, mapreduce-1865-v1.6.2.patch, 
 mapreduce-1865-v1.7.1.patch, mapreduce-1865-v1.7.patch


 Rumen code in trunk parses and process only jobhistory files from pre-21 
 hadoop mapreduce clusters. It should also support jobhistory files generated 
 using trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1878) Add MRUnit documentation


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888238#action_12888238
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1878:


I think the document can be added as package.html in mrunit package instead of 
.txt file, similar to all other packages. 

 Add MRUnit documentation
 

 Key: MAPREDUCE-1878
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1878
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/mrunit
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1878.2.patch, MAPREDUCE-1878.patch


 A short user guide for MRUnit, written in asciidoc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1713) Utilities for system tests specific.


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888243#action_12888243
 ] 

Hadoop QA commented on MAPREDUCE-1713:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449108/MAPREDUCE-1713.patch
  against trunk revision 962682.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/console

This message is automatically generated.

 Utilities for system tests specific.
 

 Key: MAPREDUCE-1713
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1713
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: test
Affects Versions: 0.21.0
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota
 Attachments: 1713-ydist-security.patch, 1713-ydist-security.patch, 
 1713-ydist-security.patch, 1713-ydist-security.patch, 
 1713-ydist-security.patch, MAPREDUCE-1713.patch, MAPREDUCE-1713.patch, 
 MAPREDUCE-1713.patch, systemtestutils_MR1713.patch, 
 utilsforsystemtest_1713.patch


 1.  A method for restarting  the daemon with new configuration.
   public static  void restartCluster(HashtableString,Long props, String 
 confFile) throws Exception;
 2.  A method for resetting the daemon with default configuration.
   public void resetCluster() throws Exception;
 3.  A method for waiting until daemon to stop.
   public  void waitForClusterToStop() throws Exception;
 4.  A method for waiting until daemon to start.
   public  void waitForClusterToStart() throws Exception;
 5.  A method for checking the job whether it has started or not.
   public boolean isJobStarted(JobID id) throws IOException;
 6.  A method for checking the task whether it has started or not.
   public boolean isTaskStarted(TaskInfo taskInfo) throws IOException;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1865) [Rumen] Rumen should also support jobhistory files generated using trunk


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-1865:


Status: Patch Available  (was: Open)

 [Rumen] Rumen should also support jobhistory files generated using trunk
 

 Key: MAPREDUCE-1865
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1865
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.22.0

 Attachments: mapreduce-1865-v1.2.patch, mapreduce-1865-v1.6.2.patch, 
 mapreduce-1865-v1.7.1.patch, mapreduce-1865-v1.7.patch


 Rumen code in trunk parses and process only jobhistory files from pre-21 
 hadoop mapreduce clusters. It should also support jobhistory files generated 
 using trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1710) Process tree clean up of exceeding memory limit tasks.

[
https://issues.apache.org/jira/browse/MAPREDUCE-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888247#action_12888247
]

Hadoop QA commented on MAPREDUCE-1710:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12449101/MAPREDUCE-1710.patch
against trunk revision 962682.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/console

This message is automatically generated.

Process tree clean up of exceeding memory limit tasks.
--

Key: MAPREDUCE-1710
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1710
Project: Hadoop Map/Reduce
Issue Type: Task
Components: test
Affects Versions: 0.21.0
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota
Attachments: 1710-ydist_security.patch, 1710-ydist_security.patch,
1710-ydist_security.patch, MAPREDUCE-1710.patch, memorylimittask_1710.patch,
memorylimittask_1710.patch, memorylimittask_1710.patch,
memorylimittask_1710.patch, memorylimittask_1710.patch

1. Submit a job which would spawn child processes and each of the child
processes exceeds the memory limits. Let the job complete . Check if all the
child processes are killed, the overall job should fail.
2. Submit a job which would spawn child processes and each of the child
processes exceeds the memory limits. Kill/fail the job while in progress.
Check if all the child processes are killed.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-14 Thread Hong Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888249#action_12888249
 ] 

Hong Tang commented on MAPREDUCE-1925:
--

Git diff --text will add binary diff to the patch.





 TestRumenJobTraces fails in trunk
 -

 Key: MAPREDUCE-1925
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 1925.patch


 TestRumenJobTraces failed with following error:
 Error Message
 the gold file contains more text at line 1 expected:56 but was:0
 Stacktrace
   at 
 org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
 Full log of the failure is available at 
 http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888257#action_12888257
 ] 

Ravi Gummadi commented on MAPREDUCE-1925:
-

Thanks Hong.
Will upload new patch which removes that .gz file and the testcase itself 
contains the expected list of events as array of Strings.

 TestRumenJobTraces fails in trunk
 -

 Key: MAPREDUCE-1925
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 1925.patch


 TestRumenJobTraces failed with following error:
 Error Message
 the gold file contains more text at line 1 expected:56 but was:0
 Stacktrace
   at 
 org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
 Full log of the failure is available at 
 http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-1925:


Attachment: 1925.v1.patch

Attaching new patch incorporating review comments.

 TestRumenJobTraces fails in trunk
 -

 Key: MAPREDUCE-1925
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 1925.patch, 1925.v1.patch


 TestRumenJobTraces failed with following error:
 Error Message
 the gold file contains more text at line 1 expected:56 but was:0
 Stacktrace
   at 
 org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
 Full log of the failure is available at 
 http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1840) [Gridmix] Exploit/Add security features in GridMix


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1840:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

I committed this.

Thanks to Amar, Rahul, and Hong

 [Gridmix] Exploit/Add security features in GridMix
 --

 Key: MAPREDUCE-1840
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1840
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/gridmix
Affects Versions: 0.22.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.22.0

 Attachments: mapreduce-gridmix-fp-v1.3.3.patch, 
 mapreduce-gridmix-fp-v1.3.9.patch


 Use security information while replaying jobs in Gridmix. This includes
 - Support for multiple users
 - Submitting jobs as different users
 - Allowing usage of secure cluster (hdfs + mapreduce)
 - Support for multiple queues
 Other features include : 
 - Support for sleep job
 - Support for load job 
 + testcases for verifying all of the above changes

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAPREDUCE-1594) Support for Sleep Jobs in gridmix


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas resolved MAPREDUCE-1594.
--

 Hadoop Flags: [Reviewed]
 Assignee: rahul k singh
Fix Version/s: 0.22.0
   Resolution: Fixed

Fixed in MAPREDUCE-1840

 Support for Sleep Jobs in gridmix
 -

 Key: MAPREDUCE-1594
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1594
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/gridmix
Reporter: rahul k singh
Assignee: rahul k singh
 Fix For: 0.22.0

 Attachments: 1376-5-yhadoop20-100-3.patch, 1594-diff-4-5.patch, 
 1594-yhadoop-20-1xx-1-2.patch, 1594-yhadoop-20-1xx-1-3.patch, 
 1594-yhadoop-20-1xx-1-4.patch, 1594-yhadoop-20-1xx-1-5.patch, 
 1594-yhadoop-20-1xx-1.patch, 1594-yhadoop-20-1xx.patch


 Support for Sleep jobs in gridmix

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAPREDUCE-1376) Support for varied user submission in Gridmix


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas resolved MAPREDUCE-1376.
--

 Hadoop Flags: [Reviewed]
Fix Version/s: 0.22.0
   Resolution: Fixed

Fixed in MAPREDUCE-1840

 Support for varied user submission in Gridmix
 -

 Key: MAPREDUCE-1376
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1376
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/gridmix
Reporter: Chris Douglas
Assignee: Chris Douglas
 Fix For: 0.22.0

 Attachments: 1376-2-yhadoop-security.patch, 
 1376-3-yhadoop20.100.patch, 1376-4-yhadoop20.100.patch, 
 1376-5-yhadoop20-100.patch, 1376-yhadoop-security.patch, M1376-0.patch, 
 M1376-1.patch, M1376-2.patch, M1376-3.patch, M1376-4.patch


 Gridmix currently submits all synthetic jobs as the client user. It should be 
 possible to map users in the trace to a set of users appropriate for the 
 target cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAPREDUCE-1711) Gridmix should provide an option to submit jobs to the same queues as specified in the trace.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas resolved MAPREDUCE-1711.
--

 Hadoop Flags: [Reviewed]
Fix Version/s: 0.22.0
   Resolution: Fixed

Fixed in MAPREDUCE-1840

 Gridmix should provide an option to submit jobs to the same queues as 
 specified in the trace.
 -

 Key: MAPREDUCE-1711
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1711
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/gridmix
Reporter: Hong Tang
Assignee: rahul k singh
 Fix For: 0.22.0

 Attachments: diff-gridmix.patch, diff-rumen.patch, 
 MR-1711-yhadoop-20-1xx-2.patch, MR-1711-yhadoop-20-1xx-3.patch, 
 MR-1711-yhadoop-20-1xx-4.patch, MR-1711-yhadoop-20-1xx-5.patch, 
 MR-1711-yhadoop-20-1xx-6.patch, MR-1711-yhadoop-20-1xx-7.patch, 
 MR-1711-yhadoop-20-1xx.patch, MR-1711-Yhadoop-20-crossPort-1.patch, 
 MR-1711-Yhadoop-20-crossPort-2.patch, MR-1711-Yhadoop-20-crossPort.patch, 
 mr-1711-yhadoop-20.1xx-20100416.patch


 Gridmix should provide an option to submit jobs to the same queues as 
 specified in the trace.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAPREDUCE-1526) Cache the job related information while submitting the job , this would avoid many RPC calls to JobTracker.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas resolved MAPREDUCE-1526.
--

 Hadoop Flags: [Reviewed]
 Assignee: rahul k singh
Fix Version/s: 0.22.0
   Resolution: Fixed

Fixed in MAPREDUCE-1840

 Cache the job related information while submitting the job , this would avoid 
 many RPC calls to JobTracker.
 ---

 Key: MAPREDUCE-1526
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1526
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/gridmix
Reporter: rahul k singh
Assignee: rahul k singh
 Fix For: 0.22.0

 Attachments: 1526-yahadoop-20-101-2.patch, 
 1526-yahadoop-20-101-3.patch, 1526-yahadoop-20-101.patch, 
 1526-yhadoop-20-101-4.patch, 1526-yhadoop-20-101-4.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-1940) [Rumen] Add appropriate switches to Folder and TraceBuilder w.r.t input and output files

2010-07-14 Thread Amar Kamat (JIRA)

[Rumen] Add appropriate switches to Folder and TraceBuilder w.r.t input and 
output files


 Key: MAPREDUCE-1940
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1940
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Reporter: Amar Kamat


Currently Folder and TraceBuilder expect the input and output to be the last 
arguments in the command line. It would be better to add special switches to 
the input and output files to avoid confusion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1912) [Rumen] Add a driver for Rumen tool


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888300#action_12888300
 ] 

Ravi Gummadi commented on MAPREDUCE-1912:
-

Some comments:

(1) In build.xml, please change ${common.ivy.lib.dir dir} to 
${common.ivy.lib.dir} directory.

(2) In Folder.java, in initialize() method, printUsage() should be called at 
the 2 places where IllegalArgumentException is thrown(just before throwing).

(3) In Rumen.java, please change A Rumen tool fold/scale the trace to A 
Rumen tool to fold/scale the trace.

(4) In TraceBuilder.java, please reverse the conditions in the following while 
statement so that validation of index is done before accessing the element at 
that index. {code}while (args[switchTop].startsWith(-)  switchTop  
args.length){code}

(5) As you observed the bug, please make the necessary code change of moving 
++switchTop; out of if statement in the above while loop --- to fix the bug 
of the infinite loop when some option that starts with -(and is not same as 
-denuxer) is given.

(6) In both places in TraceBuilder.java where printUsage() is called, you are 
checking the case of zero more arguments only. We need to make sure that there
are at least 3 arguments in both places.
So change (a) if (0 == args.length) to if (args.length  3) and (b) if 
(switchTop == args.length) to if (switchTop+2 = args.length).

 [Rumen] Add a driver for Rumen tool 
 

 Key: MAPREDUCE-1912
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1912
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tools/rumen
Affects Versions: 0.22.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.22.0

 Attachments: mapreduce-1912-v1.1.patch


 Rumen, as a tool, has 2 entry points :
 - Trace builder
 - Folder
 It would be nice to have a single driver program and have 'trace-builder' and 
 'folder' as its options. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1896) [Herriot] New property for multi user list.

[
https://issues.apache.org/jira/browse/MAPREDUCE-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888307#action_12888307
]

Hadoop QA commented on MAPREDUCE-1896:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12448436/MAPREDUCE-1896.patch
against trunk revision 962682.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/console

This message is automatically generated.

[Herriot] New property for multi user list.
---

Key: MAPREDUCE-1896
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1896
Project: Hadoop Map/Reduce
Issue Type: Task
Components: test
Affects Versions: 0.21.0
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota
Attachments: MAPREDUCE-1896.patch, MAPREDUCE-1896.patch,
MAPREDUCE-1896.patch

Adding new property for multi user list.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1621) Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output

[
https://issues.apache.org/jira/browse/MAPREDUCE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888319#action_12888319
]

Hadoop QA commented on MAPREDUCE-1621:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12449214/patch-1621.txt
against trunk revision 962682.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/console

This message is automatically generated.

Streaming's TextOutputReader.getLastOutput throws NPE if it has never read
any output
-

Key: MAPREDUCE-1621
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1621
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: contrib/streaming
Affects Versions: 0.21.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Fix For: 0.22.0

Attachments: patch-1621.txt

If TextOutputReader.readKeyValue() has never successfully read a line, then
its bytes member will be left null. Thus when logging a task failure,
PipeMapRed.getContext() can trigger an NPE when it calls
outReader_.getLastOutput().

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1928) Dynamic information fed into Hadoop for controlling execution of a submitted job

2010-07-14 Thread Steven Lewis (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888332#action_12888332
]

Steven Lewis commented on MAPREDUCE-1928:
-

Another possible use has to do with adjusting parameters to avoid failures. I
have an issue where a reducer is running out of memory. If I was aware that
certain keys lead to this failure I could take steps such as sampling data
rather than processing the whole set do I would add access to data about
failures

Dynamic information fed into Hadoop for controlling execution of a submitted
job

Key: MAPREDUCE-1928
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1928
Project: Hadoop Map/Reduce
Issue Type: New Feature
Components: job submission, jobtracker, tasktracker
Affects Versions: 0.20.3
Reporter: Raman Grover
Original Estimate: 2016h
Remaining Estimate: 2016h

Currently the job submission protocol requires the job provider to put every
bit of information inside an instance of JobConf. The submitted information
includes the input data (hdfs path) , suspected resource requirement, number
of reducers etc. This information is read by JobTracker as part of job
initialization. Once initialized, job is moved into a running state. From
this point, there is no mechanism for any additional information to be fed
into Hadoop infrastructure for controlling the job execution.
The execution pattern for the job looks very much
static from this point. Using the size of input data and a few settings
inside JobConf, number of mappers is computed. Hadoop attempts at reading the
whole of data in parallel by launching parallel map tasks. Once map phase is
over, a known number of reduce tasks (supplied as part of JobConf) are
started.
Parameters that control the job execution were set in JobConf prior to
reading the input data. As the map phase progresses, useful information based
upon the content of the input data surfaces and can be used in controlling
the further execution of the job. Let us walk through some of the examples
where additional information can be fed to Hadoop subsequent to job
submission for optimal execution of the job.
I) Process a part of the input , based upon the results decide if reading
more input is required
In a huge data set, user is interested in finding 'k' records that
satisfy a predicate, essentially sampling the data. In current
implementation, as the data is huge, a large no of mappers would be launched
consuming a significant fraction of the available map slots in the cluster.
Each map task would attempt at emitting a max of 'k' records. With N
mappers, we get N*k records out of which one can pick any k to form the final
result.
This is not optimal as:
1) A larger number of map slots get occupied initially, affecting other
jobs in the queue.
2) If the selectivity of input data is very low, we essentially did not
need scanning the whole of data to form our result.
we could have finished by reading a fraction of input data,
monitoring the cardinality of the map output and determining if
more input needs to be processed.

Optimal way: If reading the whole of input requires N mappers, launch only
'M' initially. Allow them to complete. Based upon the statistics collected,
decide additional number of mappers to be launched next and so on until the
whole of input has been processed or enough records have been collected to
for the results, whichever is earlier.

II) Here is some data, the remaining is yet to arrive, but you may start
with it, and receive more input later
Consider a chain of 2 M-R jobs chained together such that the latter
reads the output of the former. The second MR job cannot be started until the
first has finished completely. This is essentially because Hadoop needs to be
told the complete information about the input before beginning the job.
The first M-R has produced enough data ( not finished yet) that can be
processed by another MR job and hence the other MR need not wait to grab the
whole of input before beginning. Input splits could be supplied later , but
ofcourse before the copy/shuffle phase.

III) Input data has undergone one round of processing by map phase, have
some stats, can now say of the resources
required further
Mappers can produce useful stats about of their output, like the
cardinality or produce a histogram describing distribution of output . These
stats are available to the job provider (Hive/Pig/End User) who can
now determine with better accuracy of the resources (memory
requirements ) required in

[jira] Created: (MAPREDUCE-1941) Need a servlet in JobTracker to stream contents of the job history file

2010-07-14 Thread Srikanth Sundarrajan (JIRA)

Need a servlet in JobTracker to stream contents of the job history file
---

 Key: MAPREDUCE-1941
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1941
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Srikanth Sundarrajan
Assignee: Srikanth Sundarrajan


There is no convenient mechanism to retrieve the contents of the job history 
file. Need a way to retrieve the job history file contents from Job Tracker. 

This can perhaps be implemented as a servlet on the Job tracker.

* Create a jsp/servlet that accepts job id as a request parameter
* Stream the contents of the history file corresponding to the job id, if user 
has permissions to view the job details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1686) ClassNotFoundException for custom format classes provided in libjars

2010-07-14 Thread Paul Burkhardt (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888358#action_12888358
 ] 

Paul Burkhardt commented on MAPREDUCE-1686:
---

Okay, I'll try and do that.

Paul



 ClassNotFoundException for custom format classes provided in libjars
 

 Key: MAPREDUCE-1686
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1686
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.20.2
Reporter: Paul Burkhardt
Priority: Minor

 The StreamUtil::goodClassOrNull method assumes user-provided classes have 
 package names and if not, they are part of the Hadoop Streaming package. For 
 example, using custom InputFormat or OutputFormat classes without package 
 names will fail with a ClassNotFound exception which is not indicative given 
 the classes are provided in the libjars option. Admittedly, most Java 
 packages should have a package name so this should rarely come up.
 Possible resolution options:
 1) modify the error message to include the actual classname that was 
 attempted in the goodClassOrNull method
 2) call the Configuration::getClassByName method first and if class not found 
 check for default package name and try the call again
 {code}
 public static Class goodClassOrNull(Configuration conf, String className, 
 String defaultPackage) {
 Class clazz = null;
 try {
 clazz = conf.getClassByName(className);
 } catch (ClassNotFoundException cnf) {
 }
 if (clazz == null) {
 if (className.indexOf('.') == -1  defaultPackage != null) {
 className = defaultPackage + . + className;
 try {
 clazz = conf.getClassByName(className);
 } catch (ClassNotFoundException cnf) {
 }
 }
 }
 return clazz;
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1911) Fix errors in -info option in streaming

[
https://issues.apache.org/jira/browse/MAPREDUCE-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888399#action_12888399
]

Hadoop QA commented on MAPREDUCE-1911:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12449235/patch-1911-1.txt
against trunk revision 963986.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/console

This message is automatically generated.

Fix errors in -info option in streaming
---

Key: MAPREDUCE-1911
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1911
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: contrib/streaming
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Fix For: 0.22.0

Attachments: patch-1911-1.txt, patch-1911.txt

Here are some of the findings by Karam while verifying -info option in
streaming:
# We need to add Optional for -mapper, -reducer,-combiner and -file options.
# For -inputformat and -outputformat options, we should put Optional in the
prefix for the sake on uniformity.
# We need to remove -cluster decription.
# -help option is not displayed in usage message.
# when displaying message for -info or -help options, we should not display
Streaming Job Failed!; also exit code should be 0 in case of -help/-info
option.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Doug Cutting (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888433#action_12888433
]

Doug Cutting commented on MAPREDUCE-1938:
-

Two thoughts:
1. In general, we need to better separate the kernel from the library.
CombineFileInputFormat is library code and should be easy to update without
updating the cluster. Long-term, only kernel code should be hardwired on the
classpath of tasks, with library and user code both specified per job. There
should be no default version of library classes for a task: tasks should always
specify their required libraries. Is there a Jira for this? I know Tom's
expressed interest in working on this.
2. We should permit user code to depend on different versions of things than
the kernel does. For example, user code might rely on a different version of
HttpClient or Avro than that used by MapReduce. This should be possible if
instances of classes from these are not a passed between user and kernel code,
e.g., as long as Avro and HttpClient classes are not a part of the MapReduce
API. In this case classloaders (probably via OSGI) could permit this.

Ability for having user's classes take precedence over the system classes for
tasks' classpath
--

Key: MAPREDUCE-1938
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
Project: Hadoop Map/Reduce
Issue Type: New Feature
Components: job submission, task, tasktracker
Reporter: Devaraj Das
Fix For: 0.22.0

Attachments: mr-1938-bp20.patch

It would be nice to have the ability in MapReduce to allow users to specify
for their jobs alternate implementations of classes that are already defined
in the MapReduce libraries. For example, an alternate implementation for
CombineFileInputFormat.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888436#action_12888436
 ] 

Owen O'Malley commented on MAPREDUCE-1938:
--

I think that the default for this should be on.

Rather than add HADOOP_CLIENT_CLASSPATH, let's make a new variable 
HADOOP_USER_CLASSPATH_LAST. If it is defined, we add HADOOP_CLASSPATH to the 
tail like we currently do. Otherwise it is added to the front.

 Ability for having user's classes take precedence over the system classes for 
 tasks' classpath
 --

 Key: MAPREDUCE-1938
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: job submission, task, tasktracker
Reporter: Devaraj Das
 Fix For: 0.22.0

 Attachments: mr-1938-bp20.patch


 It would be nice to have the ability in MapReduce to allow users to specify 
 for their jobs alternate implementations of classes that are already defined 
 in the MapReduce libraries. For example, an alternate implementation for 
 CombineFileInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Owen O'Malley (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888445#action_12888445
]

Owen O'Malley commented on MAPREDUCE-1938:
--

Doug,

I agree that the kernel code should be split out from libraries, however, that
work is much more involved. I don't see a problem with putting the user's code
first. It is not a security concern. The user's code is only run as the user.
Furthermore, it doesn't actually stop them from loading system classes. They
can exec a new jvm with a new class path of their own choosing.

Therefore, by putting the user's classes last all that we've done is make it
harder for the user to implement hot fixes in their own jobs. That doesn't seem
like a good goal.

Ability for having user's classes take precedence over the system classes for
tasks' classpath
--

Attachments: mr-1938-bp20.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1933) Create automated testcase for tasktracker dealing with corrupted disk.


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888468#action_12888468
 ] 

Konstantin Boudnik commented on MAPREDUCE-1933:
---

bq. prop.put(mapred.local.dir, 
/grid/0/dev/tmp/mapred/mapred-local,/grid/1/dev/tmp/mapred/mapred-local,/grid/2/dev/tmp/mapred/mapred-local,/grid/3/dev/tmp/mapred/mapred-local);

Absolutely, besides this particular parameter should be set by a normal MR 
config already. 

Also, please don't use string literals for configuration parameters. There was 
a significant effort in 0.21 to have all configuration keys refactored to named 
constants. Use them instead.

 Create automated testcase for tasktracker dealing with corrupted disk.
 --

 Key: MAPREDUCE-1933
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1933
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Reporter: Iyappan Srinivasan
Assignee: Iyappan Srinivasan
 Attachments: TestCorruptedDiskJob.java


 After the TaskTracker has already run some tasks successfully, corrupt a 
 disk by making the corresponding mapred.local.dir unreadable/unwritable. 
 Make sure that jobs continue to succeed even though some tasks scheduled 
 there fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1919) [Herriot] Test for verification of per cache file ref count.

[
https://issues.apache.org/jira/browse/MAPREDUCE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888472#action_12888472
]

Konstantin Boudnik commented on MAPREDUCE-1919:
---

I want to disagree with the suggestion on moving this little method to a helper
class. It doesn't make much sense to create a wrapper around a well know
ToolRunner interface - it just creates confusion. Why don't you simply use
{{int exitCode = ToolRunner.run(job, tool, jobArgs)}} ? Why do you need a
method to wrap a call to another one?

Also, please consider the optimization for the imports list - it is over
detailed.

[Herriot] Test for verification of per cache file ref count.
-

Key: MAPREDUCE-1919
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1919
Project: Hadoop Map/Reduce
Issue Type: Task
Components: test
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota
Attachments: 1919-ydist-security.patch, MAPREDUCE-1919.patch

It covers the following scenarios.
1. Run the job with two distributed cache files and verify whether job is
succeeded or not.
2. Run the job with distributed cache files and remove one cache file from
the DFS when it is localized.verify whether the job is failed or not.
3. Run the job with two distribute cache files and the size of one file
should be larger than local.cache.size.Verify whether job is succeeded or
not.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1942) 'compile-fault-inject' should never be called directly.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated MAPREDUCE-1942:
--

Attachment: MAPREDUCE-1942.patch

The fix.

  'compile-fault-inject' should never be called directly.
 

 Key: MAPREDUCE-1942
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1942
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.21.0
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
Priority: Minor
 Attachments: MAPREDUCE-1942.patch


 Similar to HDFS-1299: prevent calls to helper targets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-1942) 'compile-fault-inject' should never be called directly.

 'compile-fault-inject' should never be called directly.


 Key: MAPREDUCE-1942
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1942
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.21.0
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
Priority: Minor


Similar to HDFS-1299: prevent calls to helper targets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Doug Cutting (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888482#action_12888482
]

Doug Cutting commented on MAPREDUCE-1938:
-

Owen, I agree with your analysis. I'm just trying to put this patch in context
of these other related discussions.

This patch addresses some issues relevant to separation of kernel library.
In common cases one can merely provide an alternate version of the library
class in one's job. Fully separating kernel library with a well-defined,
minimal kernel API is clearly aesthetically better. Are there use cases that
will that enable that this patch will not? I think mostly it will just make it
clear which classes are safe to replace with updated versions and which are
not. Does that sound right?

The issue of user versions of libraries that the kernel uses (like Avro, log4j,
HttpClient, etc.) is not entirely addressed by this patch. If the user's
version is backwards compatible with the kernel's version then this patch is
sufficient. But if the user's version of a library makes incompatible changes
then we'd need a classloader/OSGI solution. Even then, I think it only works
if user and kernel code do not interchange instances of classes defined by
these libraries. A minimal kernel API will help reduce that risk. Does this
analysis sound right?

I'm trying to understand how far this patch gets us towards those goals: what
it solves and what it doesn't.

Ability for having user's classes take precedence over the system classes for
tasks' classpath
--

Attachments: mr-1938-bp20.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1928) Dynamic information fed into Hadoop for controlling execution of a submitted job

2010-07-14 Thread Joydeep Sen Sarma (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888503#action_12888503
]

Joydeep Sen Sarma commented on MAPREDUCE-1928:
--

to add to #1 - we may be able to change the split size based on the observed
selectivity of an ongoing job (ie. add splits with larger/smaller size
depending on stats from the first set of splits). It's possible that Hadoop may
want to do this as part of the basic framework (by exploiting any mechanisms
provided here).

This is a huge win for a framework like Hive. It would drastically reduce the
amount of wasted work (limit N queries) and spawning unnecessarily large number
of mappers (unknown selectivity) - just to name to obvious use cases.

Can you supply a more concrete proposal in terms of api changes?

Dynamic information fed into Hadoop for controlling execution of a submitted
job

III) Input data has undergone one round of processing by map phase, have
some stats, can now say of the resources
required further

[jira] Updated: (MAPREDUCE-1733) Authentication between pipes processes and java counterparts.

2010-07-14 Thread Jitendra Nath Pandey (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated MAPREDUCE-1733:


Status: Patch Available  (was: Open)

 Authentication between pipes processes and java counterparts.
 -

 Key: MAPREDUCE-1733
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1733
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: MR-1733-y20.1.patch, MR-1733-y20.2.patch, 
 MR-1733-y20.3.patch, MR-1733.5.patch


 The connection between a pipe process and its parent java process should be 
 authenticated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1812) New properties for suspend and resume process.

[
https://issues.apache.org/jira/browse/MAPREDUCE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888511#action_12888511
]

Hadoop QA commented on MAPREDUCE-1812:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12449207/MAPREDUCE-1812.patch
against trunk revision 963986.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/console

This message is automatically generated.

New properties for suspend and resume process.
--

Key: MAPREDUCE-1812
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1812
Project: Hadoop Map/Reduce
Issue Type: Task
Components: test
Affects Versions: 0.21.0
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota
Attachments: MAPREDUCE-1812.patch, MAPREDUCE-1812.patch

Adding new properties in system-test-mr.xml file for suspend and resume
process.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Devaraj Das (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Devaraj Das updated MAPREDUCE-1938:
---

Attachment: mr-1938-bp20.1.patch

Addressing Owen's comment on the shell script part of the patch.

Doug, this patch is a first step towards letting users use their own versions
of library provided implementation for things like CombineFileInputFormat. The
use case is to allow for specific implementations of library classes for
certain classes of jobs.

This doesn't aim to address the kernel/library separation in its entirety. So
yes, if the user puts a class on the classpath that doesn't work with the
kernel compatibly then tasks will fail, or produce obscure/inconsistent
results, but that will affect only that job, and the user would notice that
soon (hopefully). Did i understand your concern right?

Ability for having user's classes take precedence over the system classes for
tasks' classpath
--

Attachments: mr-1938-bp20.1.patch, mr-1938-bp20.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Doug Cutting (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888532#action_12888532
 ] 

Doug Cutting commented on MAPREDUCE-1938:
-

 Did i understand your concern right?

I don't have specific concerns about this patch.  Sorry for any confusion in 
that regard.  I thought it worthwhile to discuss how this change relates to 
other changes that are contemplated.  It seems not inconsistent, provides some 
of the benefits, and is considerably simpler; in short, a good thing.

 Ability for having user's classes take precedence over the system classes for 
 tasks' classpath
 --

 Key: MAPREDUCE-1938
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: job submission, task, tasktracker
Reporter: Devaraj Das
 Fix For: 0.22.0

 Attachments: mr-1938-bp20.1.patch, mr-1938-bp20.patch


 It would be nice to have the ability in MapReduce to allow users to specify 
 for their jobs alternate implementations of classes that are already defined 
 in the MapReduce libraries. For example, an alternate implementation for 
 CombineFileInputFormat. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath

2010-07-14 Thread Owen O'Malley (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888536#action_12888536
]

Owen O'Malley commented on MAPREDUCE-1938:
--

This patch basically puts the user in charge of their job. They can leave the
safety switch set in which case they get the current behavior. But if they turn
off the safety, their classes go ahead of the ones installed on the cluster.
That means that they can break things, but all they can break is their own
tasks.

After we do the split of core from library, you still need this switch. There
will always be the possibility of needing to patch something in the core,
because even MapTask has bugs. *smile* After splitting them apart, we can put
the library code at the very end

safety on: core, user, library
safety off: user, core, library

This patch is just about providing the safety switch.

Ability for having user's classes take precedence over the system classes for
tasks' classpath
--

Attachments: mr-1938-bp20.1.patch, mr-1938-bp20.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes


 Key: MAPREDUCE-1943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar
 Fix For: 0.22.0


We have come across issues in production clusters wherein users abuse counters, 
statusreport messages and split sizes. One such case was when one of the users 
had 100 million counters. This leads to jobtracker going out of memory and 
being unresponsive. In this jira I am proposing to put sane limits on the 
status report length, the number of counters and the size of block locations 
returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1943:
-

Fix Version/s: (was: 0.22.0)

 Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
 

 Key: MAPREDUCE-1943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar

 We have come across issues in production clusters wherein users abuse 
 counters, statusreport messages and split sizes. One such case was when one 
 of the users had 100 million counters. This leads to jobtracker going out of 
 memory and being unresponsive. In this jira I am proposing to put sane limits 
 on the status report length, the number of counters and the size of block 
 locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1942) 'compile-fault-inject' should never be called directly.

2010-07-14 Thread Eli Collins (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888548#action_12888548
 ] 

Eli Collins commented on MAPREDUCE-1942:


+1

  'compile-fault-inject' should never be called directly.
 

 Key: MAPREDUCE-1942
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1942
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.21.0
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
Priority: Minor
 Attachments: MAPREDUCE-1942.patch


 Similar to HDFS-1299: prevent calls to helper targets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2010-07-14 Thread Scott Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888557#action_12888557
 ] 

Scott Chen commented on MAPREDUCE-1943:
---

+1 to the idea. We have seen the huge split-size kills JT. This will help.

 Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
 

 Key: MAPREDUCE-1943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar

 We have come across issues in production clusters wherein users abuse 
 counters, statusreport messages and split sizes. One such case was when one 
 of the users had 100 million counters. This leads to jobtracker going out of 
 memory and being unresponsive. In this jira I am proposing to put sane limits 
 on the status report length, the number of counters and the size of block 
 locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1848) Put number of speculative, data local, rack local tasks in JobTracker metrics

2010-07-14 Thread Dmytro Molkov (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888553#action_12888553
 ] 

Dmytro Molkov commented on MAPREDUCE-1848:
--

Patch looks good to me

 Put number of speculative, data local, rack local tasks in JobTracker metrics
 -

 Key: MAPREDUCE-1848
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1848
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1848-20100614.txt, 
 MAPREDUCE-1848-20100617.txt, MAPREDUCE-1848-20100623.txt


 It will be nice that we can collect these information in JobTracker metrics

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

[
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mahadev konar updated MAPREDUCE-1943:
-

Attachment: MAPREDUCE-1521-0.20-yahoo.patch

this patch imposes some limits.

the following are the limits it imposes:

1) The number of counters per group is limited to 40. If the counters increase
that amount they are dropped silently.
2) The number of counter groups is restricted to 40. Again if the groups are
more than the limit they are dropped silently.
3) The string size of counter name is restricted to 64 characters.
4) the string size of group name is restricted to 128 characters.
5) The number of block locations returned by a split is restricted to 100, this
can be changed with a configuration parameter.
6) limit the reporter.setstatus() string size to 512 characters.

I havent added tests yet. Will upload one shortly. Also, this patch is for
yahoo 0.20 branch. I will upload one for the trunk shortly.

Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

Key: MAPREDUCE-1943
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar
Attachments: MAPREDUCE-1521-0.20-yahoo.patch

We have come across issues in production clusters wherein users abuse
counters, statusreport messages and split sizes. One such case was when one
of the users had 100 million counters. This leads to jobtracker going out of
memory and being unresponsive. In this jira I am proposing to put sane limits
on the status report length, the number of counters and the size of block
locations returned by the input split.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1943:
-

Attachment: MAPREDUCE-1943-0.20-yahoo.patch

attached the wrong file.. :)

 Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
 

 Key: MAPREDUCE-1943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar
 Attachments: MAPREDUCE-1943-0.20-yahoo.patch


 We have come across issues in production clusters wherein users abuse 
 counters, statusreport messages and split sizes. One such case was when one 
 of the users had 100 million counters. This leads to jobtracker going out of 
 memory and being unresponsive. In this jira I am proposing to put sane limits 
 on the status report length, the number of counters and the size of block 
 locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1943:
-

Attachment: (was: MAPREDUCE-1521-0.20-yahoo.patch)

 Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
 

 Key: MAPREDUCE-1943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar
 Attachments: MAPREDUCE-1943-0.20-yahoo.patch


 We have come across issues in production clusters wherein users abuse 
 counters, statusreport messages and split sizes. One such case was when one 
 of the users had 100 million counters. This leads to jobtracker going out of 
 memory and being unresponsive. In this jira I am proposing to put sane limits 
 on the status report length, the number of counters and the size of block 
 locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker Jobtracker

2010-07-14 Thread Scott Carey (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated MAPREDUCE-1906:
---

Status: Open  (was: Patch Available)

re-subit for hudson.

 Lower minimum heartbeat interval for tasktracker  Jobtracker
 -

 Key: MAPREDUCE-1906
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.20.2, 0.20.1
Reporter: Scott Carey
 Attachments: MAPREDUCE-1906-0.21-v2.patch, MAPREDUCE-1906-0.21.patch


 I get a 0% to 15% performance increase for smaller clusters by making the 
 heartbeat throttle stop penalizing clusters with less than 300 nodes.
 Between 0.19 and 0.20, the default minimum heartbeat interval increased from 
 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large 
 clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats 
 per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker Jobtracker

2010-07-14 Thread Scott Carey (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Carey updated MAPREDUCE-1906:
---

Status: Patch Available  (was: Open)

re-submit for hudson.

 Lower minimum heartbeat interval for tasktracker  Jobtracker
 -

 Key: MAPREDUCE-1906
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.20.2, 0.20.1
Reporter: Scott Carey
 Attachments: MAPREDUCE-1906-0.21-v2.patch, MAPREDUCE-1906-0.21.patch


 I get a 0% to 15% performance increase for smaller clusters by making the 
 heartbeat throttle stop penalizing clusters with less than 300 nodes.
 Between 0.19 and 0.20, the default minimum heartbeat interval increased from 
 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large 
 clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats 
 per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1730) Automate test scenario for successful/killed jobs' memory is properly removed from jobtracker after these jobs retire.

[
https://issues.apache.org/jira/browse/MAPREDUCE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888623#action_12888623
]

Hadoop QA commented on MAPREDUCE-1730:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12449081/MAPREDUCE-1730.patch
against trunk revision 963986.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/console

This message is automatically generated.

Automate test scenario for successful/killed jobs' memory is properly removed
from jobtracker after these jobs retire.
--

Key: MAPREDUCE-1730
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1730
Project: Hadoop Map/Reduce
Issue Type: Test
Affects Versions: 0.21.0
Reporter: Iyappan Srinivasan
Assignee: Iyappan Srinivasan
Attachments: MAPREDUCE-1730.patch, MAPREDUCE-1730.patch,
MAPREDUCE-1730.patch, TestJobRetired.patch, TestJobRetired.patch,
TestRetiredJobs-ydist-security-patch.txt,
TestRetiredJobs-ydist-security-patch.txt, TestRetiredJobs.patch

Automate using herriot framework, test scenario for successful/killed jobs'
memory is properly removed from jobtracker after these jobs retire.
This should test when successful and failed jobs are retired, their
jobInProgress object are removed properly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888693#action_12888693
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1943:


Limiting task diagnostic info and status are done in MAPREDUCE-1482.

 Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
 

 Key: MAPREDUCE-1943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar
 Attachments: MAPREDUCE-1943-0.20-yahoo.patch


 We have come across issues in production clusters wherein users abuse 
 counters, statusreport messages and split sizes. One such case was when one 
 of the users had 100 million counters. This leads to jobtracker going out of 
 memory and being unresponsive. In this jira I am proposing to put sane limits 
 on the status report length, the number of counters and the size of block 
 locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1896) [Herriot] New property for multi user list.

2010-07-14 Thread Vinay Kumar Thota (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888694#action_12888694
 ] 

Vinay Kumar Thota commented on MAPREDUCE-1896:
--

I could see two failures and they are unrelated to this patch. I don't think so 
the patch could raise these failures because the scope is just adds the new 
property in a xml file.

 [Herriot] New property for multi user list.
 ---

 Key: MAPREDUCE-1896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1896
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: test
Affects Versions: 0.21.0
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota
 Attachments: MAPREDUCE-1896.patch, MAPREDUCE-1896.patch, 
 MAPREDUCE-1896.patch


 Adding new property for multi user list.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1941) Need a servlet in JobTracker to stream contents of the job history file


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888697#action_12888697
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1941:


This can be done in Job client itself, no? History url is already available in 
JobStatus. 

 Need a servlet in JobTracker to stream contents of the job history file
 ---

 Key: MAPREDUCE-1941
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1941
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: jobtracker
Affects Versions: 0.22.0
Reporter: Srikanth Sundarrajan
Assignee: Srikanth Sundarrajan

 There is no convenient mechanism to retrieve the contents of the job history 
 file. Need a way to retrieve the job history file contents from Job Tracker. 
 This can perhaps be implemented as a servlet on the Job tracker.
 * Create a jsp/servlet that accepts job id as a request parameter
 * Stream the contents of the history file corresponding to the job id, if 
 user has permissions to view the job details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1911) Fix errors in -info option in streaming


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888702#action_12888702
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1911:


Test failures are because of MAPREDUCE-1834 and MAPREDUCE-1925

 Fix errors in -info option in streaming
 ---

 Key: MAPREDUCE-1911
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1911
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.22.0

 Attachments: patch-1911-1.txt, patch-1911.txt


 Here are some of the findings by Karam while verifying -info option in 
 streaming:
 # We need to add Optional for -mapper, -reducer,-combiner and -file options.
 # For -inputformat and -outputformat options, we should put Optional in the 
 prefix for the sake on uniformity.
 # We need to remove -cluster decription.
 # -help option is not displayed in usage message.
 # when displaying message for -info or -help options, we should not display 
 Streaming Job Failed!; also exit code should be 0 in case of -help/-info 
 option.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1621) Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1621:
---

Status: Open  (was: Patch Available)

Many tests failed because of NoClassDefFoundError. Re-submitting to hudson

 Streaming's TextOutputReader.getLastOutput throws NPE if it has never read 
 any output
 -

 Key: MAPREDUCE-1621
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1621
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.21.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.22.0

 Attachments: patch-1621.txt


 If TextOutputReader.readKeyValue() has never successfully read a line, then 
 its bytes member will be left null. Thus when logging a task failure, 
 PipeMapRed.getContext() can trigger an NPE when it calls 
 outReader_.getLastOutput().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1621) Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output