[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888209#action_12888209 ] Ravi Gummadi commented on MAPREDUCE-1925: - 'git diff' couldn't get the changes to the gzipped file in my patch attached. I will remove the whole file and will have the expected output in an array in the test case itself --- as suggested by Amar offline. TestRumenJobTraces fails in trunk - Key: MAPREDUCE-1925 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Affects Versions: 0.22.0 Reporter: Amareshwari Sriramadasu Assignee: Ravi Gummadi Fix For: 0.22.0 Attachments: 1925.patch TestRumenJobTraces failed with following error: Error Message the gold file contains more text at line 1 expected:56 but was:0 Stacktrace at org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294) Full log of the failure is available at http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888212#action_12888212 ] Amar Kamat commented on MAPREDUCE-1925: --- Few comments # The patch doesnt contain the changes to the {{v20-single-input-log-event-classes.text.gz}} file. # You can get rid of the {{inputLogStream}} variable to avoid future confusion # You can make the resulting events list and the gold-standard list, in-memory. Instead of writing the test events into a file ({{result.txt}}) and then comparing the contents of 2 files ({{result.txt}} and {{v20-single-input-log-event-classes.text.gz}}) you can keep the contents of both the files in memory and get rid of {{result.txt}} and {{v20-single-input-log-event-classes.text.gz}}. The test will be faster and also easy to change in future. # If you decide to do the above then there is no need of {{tempDir}} and {{rootTempDir}}. TestRumenJobTraces fails in trunk - Key: MAPREDUCE-1925 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Affects Versions: 0.22.0 Reporter: Amareshwari Sriramadasu Assignee: Ravi Gummadi Fix For: 0.22.0 Attachments: 1925.patch TestRumenJobTraces failed with following error: Error Message the gold file contains more text at line 1 expected:56 but was:0 Stacktrace at org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294) Full log of the failure is available at http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1415) With streaming jobs and LinuxTaskController, the localized streaming binary has 571 permissions instead of 570
[ https://issues.apache.org/jira/browse/MAPREDUCE-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1415: --- Status: Open (was: Patch Available) Patch needs to be updated to trunk. With streaming jobs and LinuxTaskController, the localized streaming binary has 571 permissions instead of 570 -- Key: MAPREDUCE-1415 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1415 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming, security Reporter: Vinod K V Assignee: Amareshwari Sriramadasu Fix For: 0.22.0 Attachments: patch-1415-1.txt, patch-1415-2.txt, patch-1415-3.txt, patch-1415.txt After MAPREDUCE-856, all localized files are expected to have **0 permissions for the sake of security. This was found by Karam while testing LinuxTaskController functionality after MAPREDUCE-856. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1686) ClassNotFoundException for custom format classes provided in libjars
[ https://issues.apache.org/jira/browse/MAPREDUCE-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888221#action_12888221 ] Amareshwari Sriramadasu commented on MAPREDUCE-1686: Paul, Can you create the patch with suggested change and a unit test, and upload here? ClassNotFoundException for custom format classes provided in libjars Key: MAPREDUCE-1686 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1686 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.20.2 Reporter: Paul Burkhardt Priority: Minor The StreamUtil::goodClassOrNull method assumes user-provided classes have package names and if not, they are part of the Hadoop Streaming package. For example, using custom InputFormat or OutputFormat classes without package names will fail with a ClassNotFound exception which is not indicative given the classes are provided in the libjars option. Admittedly, most Java packages should have a package name so this should rarely come up. Possible resolution options: 1) modify the error message to include the actual classname that was attempted in the goodClassOrNull method 2) call the Configuration::getClassByName method first and if class not found check for default package name and try the call again {code} public static Class goodClassOrNull(Configuration conf, String className, String defaultPackage) { Class clazz = null; try { clazz = conf.getClassByName(className); } catch (ClassNotFoundException cnf) { } if (clazz == null) { if (className.indexOf('.') == -1 defaultPackage != null) { className = defaultPackage + . + className; try { clazz = conf.getClassByName(className); } catch (ClassNotFoundException cnf) { } } } return clazz; } {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1865) [Rumen] Rumen should also support jobhistory files generated using trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amar Kamat updated MAPREDUCE-1865: -- Attachment: mapreduce-1865-v1.7.1.patch Attaching a slightly modified patch with changes to comments and assert messages. [Rumen] Rumen should also support jobhistory files generated using trunk Key: MAPREDUCE-1865 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1865 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Affects Versions: 0.22.0 Reporter: Amar Kamat Assignee: Amar Kamat Fix For: 0.22.0 Attachments: mapreduce-1865-v1.2.patch, mapreduce-1865-v1.6.2.patch, mapreduce-1865-v1.7.1.patch, mapreduce-1865-v1.7.patch Rumen code in trunk parses and process only jobhistory files from pre-21 hadoop mapreduce clusters. It should also support jobhistory files generated using trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1878) Add MRUnit documentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888238#action_12888238 ] Amareshwari Sriramadasu commented on MAPREDUCE-1878: I think the document can be added as package.html in mrunit package instead of .txt file, similar to all other packages. Add MRUnit documentation Key: MAPREDUCE-1878 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1878 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/mrunit Reporter: Aaron Kimball Assignee: Aaron Kimball Attachments: MAPREDUCE-1878.2.patch, MAPREDUCE-1878.patch A short user guide for MRUnit, written in asciidoc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1713) Utilities for system tests specific.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888243#action_12888243 ] Hadoop QA commented on MAPREDUCE-1713: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449108/MAPREDUCE-1713.patch against trunk revision 962682. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/console This message is automatically generated. Utilities for system tests specific. Key: MAPREDUCE-1713 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1713 Project: Hadoop Map/Reduce Issue Type: Task Components: test Affects Versions: 0.21.0 Reporter: Vinay Kumar Thota Assignee: Vinay Kumar Thota Attachments: 1713-ydist-security.patch, 1713-ydist-security.patch, 1713-ydist-security.patch, 1713-ydist-security.patch, 1713-ydist-security.patch, MAPREDUCE-1713.patch, MAPREDUCE-1713.patch, MAPREDUCE-1713.patch, systemtestutils_MR1713.patch, utilsforsystemtest_1713.patch 1. A method for restarting the daemon with new configuration. public static void restartCluster(HashtableString,Long props, String confFile) throws Exception; 2. A method for resetting the daemon with default configuration. public void resetCluster() throws Exception; 3. A method for waiting until daemon to stop. public void waitForClusterToStop() throws Exception; 4. A method for waiting until daemon to start. public void waitForClusterToStart() throws Exception; 5. A method for checking the job whether it has started or not. public boolean isJobStarted(JobID id) throws IOException; 6. A method for checking the task whether it has started or not. public boolean isTaskStarted(TaskInfo taskInfo) throws IOException; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1865) [Rumen] Rumen should also support jobhistory files generated using trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-1865: Status: Patch Available (was: Open) [Rumen] Rumen should also support jobhistory files generated using trunk Key: MAPREDUCE-1865 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1865 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Affects Versions: 0.22.0 Reporter: Amar Kamat Assignee: Amar Kamat Fix For: 0.22.0 Attachments: mapreduce-1865-v1.2.patch, mapreduce-1865-v1.6.2.patch, mapreduce-1865-v1.7.1.patch, mapreduce-1865-v1.7.patch Rumen code in trunk parses and process only jobhistory files from pre-21 hadoop mapreduce clusters. It should also support jobhistory files generated using trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1710) Process tree clean up of exceeding memory limit tasks.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888247#action_12888247 ] Hadoop QA commented on MAPREDUCE-1710: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449101/MAPREDUCE-1710.patch against trunk revision 962682. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/console This message is automatically generated. Process tree clean up of exceeding memory limit tasks. -- Key: MAPREDUCE-1710 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1710 Project: Hadoop Map/Reduce Issue Type: Task Components: test Affects Versions: 0.21.0 Reporter: Vinay Kumar Thota Assignee: Vinay Kumar Thota Attachments: 1710-ydist_security.patch, 1710-ydist_security.patch, 1710-ydist_security.patch, MAPREDUCE-1710.patch, memorylimittask_1710.patch, memorylimittask_1710.patch, memorylimittask_1710.patch, memorylimittask_1710.patch, memorylimittask_1710.patch 1. Submit a job which would spawn child processes and each of the child processes exceeds the memory limits. Let the job complete . Check if all the child processes are killed, the overall job should fail. 2. Submit a job which would spawn child processes and each of the child processes exceeds the memory limits. Kill/fail the job while in progress. Check if all the child processes are killed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888249#action_12888249 ] Hong Tang commented on MAPREDUCE-1925: -- Git diff --text will add binary diff to the patch. TestRumenJobTraces fails in trunk - Key: MAPREDUCE-1925 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Affects Versions: 0.22.0 Reporter: Amareshwari Sriramadasu Assignee: Ravi Gummadi Fix For: 0.22.0 Attachments: 1925.patch TestRumenJobTraces failed with following error: Error Message the gold file contains more text at line 1 expected:56 but was:0 Stacktrace at org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294) Full log of the failure is available at http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888257#action_12888257 ] Ravi Gummadi commented on MAPREDUCE-1925: - Thanks Hong. Will upload new patch which removes that .gz file and the testcase itself contains the expected list of events as array of Strings. TestRumenJobTraces fails in trunk - Key: MAPREDUCE-1925 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Affects Versions: 0.22.0 Reporter: Amareshwari Sriramadasu Assignee: Ravi Gummadi Fix For: 0.22.0 Attachments: 1925.patch TestRumenJobTraces failed with following error: Error Message the gold file contains more text at line 1 expected:56 but was:0 Stacktrace at org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294) Full log of the failure is available at http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-1925: Attachment: 1925.v1.patch Attaching new patch incorporating review comments. TestRumenJobTraces fails in trunk - Key: MAPREDUCE-1925 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Affects Versions: 0.22.0 Reporter: Amareshwari Sriramadasu Assignee: Ravi Gummadi Fix For: 0.22.0 Attachments: 1925.patch, 1925.v1.patch TestRumenJobTraces failed with following error: Error Message the gold file contains more text at line 1 expected:56 but was:0 Stacktrace at org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294) Full log of the failure is available at http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1840) [Gridmix] Exploit/Add security features in GridMix
[ https://issues.apache.org/jira/browse/MAPREDUCE-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1840: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed I committed this. Thanks to Amar, Rahul, and Hong [Gridmix] Exploit/Add security features in GridMix -- Key: MAPREDUCE-1840 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1840 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/gridmix Affects Versions: 0.22.0 Reporter: Amar Kamat Assignee: Amar Kamat Fix For: 0.22.0 Attachments: mapreduce-gridmix-fp-v1.3.3.patch, mapreduce-gridmix-fp-v1.3.9.patch Use security information while replaying jobs in Gridmix. This includes - Support for multiple users - Submitting jobs as different users - Allowing usage of secure cluster (hdfs + mapreduce) - Support for multiple queues Other features include : - Support for sleep job - Support for load job + testcases for verifying all of the above changes -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1594) Support for Sleep Jobs in gridmix
[ https://issues.apache.org/jira/browse/MAPREDUCE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas resolved MAPREDUCE-1594. -- Hadoop Flags: [Reviewed] Assignee: rahul k singh Fix Version/s: 0.22.0 Resolution: Fixed Fixed in MAPREDUCE-1840 Support for Sleep Jobs in gridmix - Key: MAPREDUCE-1594 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1594 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/gridmix Reporter: rahul k singh Assignee: rahul k singh Fix For: 0.22.0 Attachments: 1376-5-yhadoop20-100-3.patch, 1594-diff-4-5.patch, 1594-yhadoop-20-1xx-1-2.patch, 1594-yhadoop-20-1xx-1-3.patch, 1594-yhadoop-20-1xx-1-4.patch, 1594-yhadoop-20-1xx-1-5.patch, 1594-yhadoop-20-1xx-1.patch, 1594-yhadoop-20-1xx.patch Support for Sleep jobs in gridmix -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1376) Support for varied user submission in Gridmix
[ https://issues.apache.org/jira/browse/MAPREDUCE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas resolved MAPREDUCE-1376. -- Hadoop Flags: [Reviewed] Fix Version/s: 0.22.0 Resolution: Fixed Fixed in MAPREDUCE-1840 Support for varied user submission in Gridmix - Key: MAPREDUCE-1376 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1376 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/gridmix Reporter: Chris Douglas Assignee: Chris Douglas Fix For: 0.22.0 Attachments: 1376-2-yhadoop-security.patch, 1376-3-yhadoop20.100.patch, 1376-4-yhadoop20.100.patch, 1376-5-yhadoop20-100.patch, 1376-yhadoop-security.patch, M1376-0.patch, M1376-1.patch, M1376-2.patch, M1376-3.patch, M1376-4.patch Gridmix currently submits all synthetic jobs as the client user. It should be possible to map users in the trace to a set of users appropriate for the target cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1711) Gridmix should provide an option to submit jobs to the same queues as specified in the trace.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas resolved MAPREDUCE-1711. -- Hadoop Flags: [Reviewed] Fix Version/s: 0.22.0 Resolution: Fixed Fixed in MAPREDUCE-1840 Gridmix should provide an option to submit jobs to the same queues as specified in the trace. - Key: MAPREDUCE-1711 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1711 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/gridmix Reporter: Hong Tang Assignee: rahul k singh Fix For: 0.22.0 Attachments: diff-gridmix.patch, diff-rumen.patch, MR-1711-yhadoop-20-1xx-2.patch, MR-1711-yhadoop-20-1xx-3.patch, MR-1711-yhadoop-20-1xx-4.patch, MR-1711-yhadoop-20-1xx-5.patch, MR-1711-yhadoop-20-1xx-6.patch, MR-1711-yhadoop-20-1xx-7.patch, MR-1711-yhadoop-20-1xx.patch, MR-1711-Yhadoop-20-crossPort-1.patch, MR-1711-Yhadoop-20-crossPort-2.patch, MR-1711-Yhadoop-20-crossPort.patch, mr-1711-yhadoop-20.1xx-20100416.patch Gridmix should provide an option to submit jobs to the same queues as specified in the trace. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1526) Cache the job related information while submitting the job , this would avoid many RPC calls to JobTracker.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas resolved MAPREDUCE-1526. -- Hadoop Flags: [Reviewed] Assignee: rahul k singh Fix Version/s: 0.22.0 Resolution: Fixed Fixed in MAPREDUCE-1840 Cache the job related information while submitting the job , this would avoid many RPC calls to JobTracker. --- Key: MAPREDUCE-1526 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1526 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/gridmix Reporter: rahul k singh Assignee: rahul k singh Fix For: 0.22.0 Attachments: 1526-yahadoop-20-101-2.patch, 1526-yahadoop-20-101-3.patch, 1526-yahadoop-20-101.patch, 1526-yhadoop-20-101-4.patch, 1526-yhadoop-20-101-4.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1940) [Rumen] Add appropriate switches to Folder and TraceBuilder w.r.t input and output files
[Rumen] Add appropriate switches to Folder and TraceBuilder w.r.t input and output files Key: MAPREDUCE-1940 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1940 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tools/rumen Reporter: Amar Kamat Currently Folder and TraceBuilder expect the input and output to be the last arguments in the command line. It would be better to add special switches to the input and output files to avoid confusion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1912) [Rumen] Add a driver for Rumen tool
[ https://issues.apache.org/jira/browse/MAPREDUCE-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888300#action_12888300 ] Ravi Gummadi commented on MAPREDUCE-1912: - Some comments: (1) In build.xml, please change ${common.ivy.lib.dir dir} to ${common.ivy.lib.dir} directory. (2) In Folder.java, in initialize() method, printUsage() should be called at the 2 places where IllegalArgumentException is thrown(just before throwing). (3) In Rumen.java, please change A Rumen tool fold/scale the trace to A Rumen tool to fold/scale the trace. (4) In TraceBuilder.java, please reverse the conditions in the following while statement so that validation of index is done before accessing the element at that index. {code}while (args[switchTop].startsWith(-) switchTop args.length){code} (5) As you observed the bug, please make the necessary code change of moving ++switchTop; out of if statement in the above while loop --- to fix the bug of the infinite loop when some option that starts with -(and is not same as -denuxer) is given. (6) In both places in TraceBuilder.java where printUsage() is called, you are checking the case of zero more arguments only. We need to make sure that there are at least 3 arguments in both places. So change (a) if (0 == args.length) to if (args.length 3) and (b) if (switchTop == args.length) to if (switchTop+2 = args.length). [Rumen] Add a driver for Rumen tool Key: MAPREDUCE-1912 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1912 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tools/rumen Affects Versions: 0.22.0 Reporter: Amar Kamat Assignee: Amar Kamat Fix For: 0.22.0 Attachments: mapreduce-1912-v1.1.patch Rumen, as a tool, has 2 entry points : - Trace builder - Folder It would be nice to have a single driver program and have 'trace-builder' and 'folder' as its options. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1896) [Herriot] New property for multi user list.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888307#action_12888307 ] Hadoop QA commented on MAPREDUCE-1896: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448436/MAPREDUCE-1896.patch against trunk revision 962682. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/console This message is automatically generated. [Herriot] New property for multi user list. --- Key: MAPREDUCE-1896 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1896 Project: Hadoop Map/Reduce Issue Type: Task Components: test Affects Versions: 0.21.0 Reporter: Vinay Kumar Thota Assignee: Vinay Kumar Thota Attachments: MAPREDUCE-1896.patch, MAPREDUCE-1896.patch, MAPREDUCE-1896.patch Adding new property for multi user list. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1621) Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output
[ https://issues.apache.org/jira/browse/MAPREDUCE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888319#action_12888319 ] Hadoop QA commented on MAPREDUCE-1621: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449214/patch-1621.txt against trunk revision 962682. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/console This message is automatically generated. Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output - Key: MAPREDUCE-1621 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1621 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.21.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.22.0 Attachments: patch-1621.txt If TextOutputReader.readKeyValue() has never successfully read a line, then its bytes member will be left null. Thus when logging a task failure, PipeMapRed.getContext() can trigger an NPE when it calls outReader_.getLastOutput(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1928) Dynamic information fed into Hadoop for controlling execution of a submitted job
[ https://issues.apache.org/jira/browse/MAPREDUCE-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888332#action_12888332 ] Steven Lewis commented on MAPREDUCE-1928: - Another possible use has to do with adjusting parameters to avoid failures. I have an issue where a reducer is running out of memory. If I was aware that certain keys lead to this failure I could take steps such as sampling data rather than processing the whole set do I would add access to data about failures Dynamic information fed into Hadoop for controlling execution of a submitted job Key: MAPREDUCE-1928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1928 Project: Hadoop Map/Reduce Issue Type: New Feature Components: job submission, jobtracker, tasktracker Affects Versions: 0.20.3 Reporter: Raman Grover Original Estimate: 2016h Remaining Estimate: 2016h Currently the job submission protocol requires the job provider to put every bit of information inside an instance of JobConf. The submitted information includes the input data (hdfs path) , suspected resource requirement, number of reducers etc. This information is read by JobTracker as part of job initialization. Once initialized, job is moved into a running state. From this point, there is no mechanism for any additional information to be fed into Hadoop infrastructure for controlling the job execution. The execution pattern for the job looks very much static from this point. Using the size of input data and a few settings inside JobConf, number of mappers is computed. Hadoop attempts at reading the whole of data in parallel by launching parallel map tasks. Once map phase is over, a known number of reduce tasks (supplied as part of JobConf) are started. Parameters that control the job execution were set in JobConf prior to reading the input data. As the map phase progresses, useful information based upon the content of the input data surfaces and can be used in controlling the further execution of the job. Let us walk through some of the examples where additional information can be fed to Hadoop subsequent to job submission for optimal execution of the job. I) Process a part of the input , based upon the results decide if reading more input is required In a huge data set, user is interested in finding 'k' records that satisfy a predicate, essentially sampling the data. In current implementation, as the data is huge, a large no of mappers would be launched consuming a significant fraction of the available map slots in the cluster. Each map task would attempt at emitting a max of 'k' records. With N mappers, we get N*k records out of which one can pick any k to form the final result. This is not optimal as: 1) A larger number of map slots get occupied initially, affecting other jobs in the queue. 2) If the selectivity of input data is very low, we essentially did not need scanning the whole of data to form our result. we could have finished by reading a fraction of input data, monitoring the cardinality of the map output and determining if more input needs to be processed. Optimal way: If reading the whole of input requires N mappers, launch only 'M' initially. Allow them to complete. Based upon the statistics collected, decide additional number of mappers to be launched next and so on until the whole of input has been processed or enough records have been collected to for the results, whichever is earlier. II) Here is some data, the remaining is yet to arrive, but you may start with it, and receive more input later Consider a chain of 2 M-R jobs chained together such that the latter reads the output of the former. The second MR job cannot be started until the first has finished completely. This is essentially because Hadoop needs to be told the complete information about the input before beginning the job. The first M-R has produced enough data ( not finished yet) that can be processed by another MR job and hence the other MR need not wait to grab the whole of input before beginning. Input splits could be supplied later , but ofcourse before the copy/shuffle phase. III) Input data has undergone one round of processing by map phase, have some stats, can now say of the resources required further Mappers can produce useful stats about of their output, like the cardinality or produce a histogram describing distribution of output . These stats are available to the job provider (Hive/Pig/End User) who can now determine with better accuracy of the resources (memory requirements ) required in
[jira] Created: (MAPREDUCE-1941) Need a servlet in JobTracker to stream contents of the job history file
Need a servlet in JobTracker to stream contents of the job history file --- Key: MAPREDUCE-1941 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1941 Project: Hadoop Map/Reduce Issue Type: New Feature Components: jobtracker Affects Versions: 0.22.0 Reporter: Srikanth Sundarrajan Assignee: Srikanth Sundarrajan There is no convenient mechanism to retrieve the contents of the job history file. Need a way to retrieve the job history file contents from Job Tracker. This can perhaps be implemented as a servlet on the Job tracker. * Create a jsp/servlet that accepts job id as a request parameter * Stream the contents of the history file corresponding to the job id, if user has permissions to view the job details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1686) ClassNotFoundException for custom format classes provided in libjars
[ https://issues.apache.org/jira/browse/MAPREDUCE-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888358#action_12888358 ] Paul Burkhardt commented on MAPREDUCE-1686: --- Okay, I'll try and do that. Paul ClassNotFoundException for custom format classes provided in libjars Key: MAPREDUCE-1686 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1686 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.20.2 Reporter: Paul Burkhardt Priority: Minor The StreamUtil::goodClassOrNull method assumes user-provided classes have package names and if not, they are part of the Hadoop Streaming package. For example, using custom InputFormat or OutputFormat classes without package names will fail with a ClassNotFound exception which is not indicative given the classes are provided in the libjars option. Admittedly, most Java packages should have a package name so this should rarely come up. Possible resolution options: 1) modify the error message to include the actual classname that was attempted in the goodClassOrNull method 2) call the Configuration::getClassByName method first and if class not found check for default package name and try the call again {code} public static Class goodClassOrNull(Configuration conf, String className, String defaultPackage) { Class clazz = null; try { clazz = conf.getClassByName(className); } catch (ClassNotFoundException cnf) { } if (clazz == null) { if (className.indexOf('.') == -1 defaultPackage != null) { className = defaultPackage + . + className; try { clazz = conf.getClassByName(className); } catch (ClassNotFoundException cnf) { } } } return clazz; } {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1911) Fix errors in -info option in streaming
[ https://issues.apache.org/jira/browse/MAPREDUCE-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888399#action_12888399 ] Hadoop QA commented on MAPREDUCE-1911: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449235/patch-1911-1.txt against trunk revision 963986. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/console This message is automatically generated. Fix errors in -info option in streaming --- Key: MAPREDUCE-1911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1911 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.22.0 Attachments: patch-1911-1.txt, patch-1911.txt Here are some of the findings by Karam while verifying -info option in streaming: # We need to add Optional for -mapper, -reducer,-combiner and -file options. # For -inputformat and -outputformat options, we should put Optional in the prefix for the sake on uniformity. # We need to remove -cluster decription. # -help option is not displayed in usage message. # when displaying message for -info or -help options, we should not display Streaming Job Failed!; also exit code should be 0 in case of -help/-info option. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888433#action_12888433 ] Doug Cutting commented on MAPREDUCE-1938: - Two thoughts: 1. In general, we need to better separate the kernel from the library. CombineFileInputFormat is library code and should be easy to update without updating the cluster. Long-term, only kernel code should be hardwired on the classpath of tasks, with library and user code both specified per job. There should be no default version of library classes for a task: tasks should always specify their required libraries. Is there a Jira for this? I know Tom's expressed interest in working on this. 2. We should permit user code to depend on different versions of things than the kernel does. For example, user code might rely on a different version of HttpClient or Avro than that used by MapReduce. This should be possible if instances of classes from these are not a passed between user and kernel code, e.g., as long as Avro and HttpClient classes are not a part of the MapReduce API. In this case classloaders (probably via OSGI) could permit this. Ability for having user's classes take precedence over the system classes for tasks' classpath -- Key: MAPREDUCE-1938 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938 Project: Hadoop Map/Reduce Issue Type: New Feature Components: job submission, task, tasktracker Reporter: Devaraj Das Fix For: 0.22.0 Attachments: mr-1938-bp20.patch It would be nice to have the ability in MapReduce to allow users to specify for their jobs alternate implementations of classes that are already defined in the MapReduce libraries. For example, an alternate implementation for CombineFileInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888436#action_12888436 ] Owen O'Malley commented on MAPREDUCE-1938: -- I think that the default for this should be on. Rather than add HADOOP_CLIENT_CLASSPATH, let's make a new variable HADOOP_USER_CLASSPATH_LAST. If it is defined, we add HADOOP_CLASSPATH to the tail like we currently do. Otherwise it is added to the front. Ability for having user's classes take precedence over the system classes for tasks' classpath -- Key: MAPREDUCE-1938 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938 Project: Hadoop Map/Reduce Issue Type: New Feature Components: job submission, task, tasktracker Reporter: Devaraj Das Fix For: 0.22.0 Attachments: mr-1938-bp20.patch It would be nice to have the ability in MapReduce to allow users to specify for their jobs alternate implementations of classes that are already defined in the MapReduce libraries. For example, an alternate implementation for CombineFileInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888445#action_12888445 ] Owen O'Malley commented on MAPREDUCE-1938: -- Doug, I agree that the kernel code should be split out from libraries, however, that work is much more involved. I don't see a problem with putting the user's code first. It is not a security concern. The user's code is only run as the user. Furthermore, it doesn't actually stop them from loading system classes. They can exec a new jvm with a new class path of their own choosing. Therefore, by putting the user's classes last all that we've done is make it harder for the user to implement hot fixes in their own jobs. That doesn't seem like a good goal. Ability for having user's classes take precedence over the system classes for tasks' classpath -- Key: MAPREDUCE-1938 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938 Project: Hadoop Map/Reduce Issue Type: New Feature Components: job submission, task, tasktracker Reporter: Devaraj Das Fix For: 0.22.0 Attachments: mr-1938-bp20.patch It would be nice to have the ability in MapReduce to allow users to specify for their jobs alternate implementations of classes that are already defined in the MapReduce libraries. For example, an alternate implementation for CombineFileInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1933) Create automated testcase for tasktracker dealing with corrupted disk.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888468#action_12888468 ] Konstantin Boudnik commented on MAPREDUCE-1933: --- bq. prop.put(mapred.local.dir, /grid/0/dev/tmp/mapred/mapred-local,/grid/1/dev/tmp/mapred/mapred-local,/grid/2/dev/tmp/mapred/mapred-local,/grid/3/dev/tmp/mapred/mapred-local); Absolutely, besides this particular parameter should be set by a normal MR config already. Also, please don't use string literals for configuration parameters. There was a significant effort in 0.21 to have all configuration keys refactored to named constants. Use them instead. Create automated testcase for tasktracker dealing with corrupted disk. -- Key: MAPREDUCE-1933 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1933 Project: Hadoop Map/Reduce Issue Type: Test Components: test Reporter: Iyappan Srinivasan Assignee: Iyappan Srinivasan Attachments: TestCorruptedDiskJob.java After the TaskTracker has already run some tasks successfully, corrupt a disk by making the corresponding mapred.local.dir unreadable/unwritable. Make sure that jobs continue to succeed even though some tasks scheduled there fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1919) [Herriot] Test for verification of per cache file ref count.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888472#action_12888472 ] Konstantin Boudnik commented on MAPREDUCE-1919: --- I want to disagree with the suggestion on moving this little method to a helper class. It doesn't make much sense to create a wrapper around a well know ToolRunner interface - it just creates confusion. Why don't you simply use {{int exitCode = ToolRunner.run(job, tool, jobArgs)}} ? Why do you need a method to wrap a call to another one? Also, please consider the optimization for the imports list - it is over detailed. [Herriot] Test for verification of per cache file ref count. - Key: MAPREDUCE-1919 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1919 Project: Hadoop Map/Reduce Issue Type: Task Components: test Reporter: Vinay Kumar Thota Assignee: Vinay Kumar Thota Attachments: 1919-ydist-security.patch, MAPREDUCE-1919.patch It covers the following scenarios. 1. Run the job with two distributed cache files and verify whether job is succeeded or not. 2. Run the job with distributed cache files and remove one cache file from the DFS when it is localized.verify whether the job is failed or not. 3. Run the job with two distribute cache files and the size of one file should be larger than local.cache.size.Verify whether job is succeeded or not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1942) 'compile-fault-inject' should never be called directly.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated MAPREDUCE-1942: -- Attachment: MAPREDUCE-1942.patch The fix. 'compile-fault-inject' should never be called directly. Key: MAPREDUCE-1942 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1942 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.21.0 Reporter: Konstantin Boudnik Assignee: Konstantin Boudnik Priority: Minor Attachments: MAPREDUCE-1942.patch Similar to HDFS-1299: prevent calls to helper targets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1942) 'compile-fault-inject' should never be called directly.
'compile-fault-inject' should never be called directly. Key: MAPREDUCE-1942 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1942 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.21.0 Reporter: Konstantin Boudnik Assignee: Konstantin Boudnik Priority: Minor Similar to HDFS-1299: prevent calls to helper targets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888482#action_12888482 ] Doug Cutting commented on MAPREDUCE-1938: - Owen, I agree with your analysis. I'm just trying to put this patch in context of these other related discussions. This patch addresses some issues relevant to separation of kernel library. In common cases one can merely provide an alternate version of the library class in one's job. Fully separating kernel library with a well-defined, minimal kernel API is clearly aesthetically better. Are there use cases that will that enable that this patch will not? I think mostly it will just make it clear which classes are safe to replace with updated versions and which are not. Does that sound right? The issue of user versions of libraries that the kernel uses (like Avro, log4j, HttpClient, etc.) is not entirely addressed by this patch. If the user's version is backwards compatible with the kernel's version then this patch is sufficient. But if the user's version of a library makes incompatible changes then we'd need a classloader/OSGI solution. Even then, I think it only works if user and kernel code do not interchange instances of classes defined by these libraries. A minimal kernel API will help reduce that risk. Does this analysis sound right? I'm trying to understand how far this patch gets us towards those goals: what it solves and what it doesn't. Ability for having user's classes take precedence over the system classes for tasks' classpath -- Key: MAPREDUCE-1938 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938 Project: Hadoop Map/Reduce Issue Type: New Feature Components: job submission, task, tasktracker Reporter: Devaraj Das Fix For: 0.22.0 Attachments: mr-1938-bp20.patch It would be nice to have the ability in MapReduce to allow users to specify for their jobs alternate implementations of classes that are already defined in the MapReduce libraries. For example, an alternate implementation for CombineFileInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1928) Dynamic information fed into Hadoop for controlling execution of a submitted job
[ https://issues.apache.org/jira/browse/MAPREDUCE-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888503#action_12888503 ] Joydeep Sen Sarma commented on MAPREDUCE-1928: -- to add to #1 - we may be able to change the split size based on the observed selectivity of an ongoing job (ie. add splits with larger/smaller size depending on stats from the first set of splits). It's possible that Hadoop may want to do this as part of the basic framework (by exploiting any mechanisms provided here). This is a huge win for a framework like Hive. It would drastically reduce the amount of wasted work (limit N queries) and spawning unnecessarily large number of mappers (unknown selectivity) - just to name to obvious use cases. Can you supply a more concrete proposal in terms of api changes? Dynamic information fed into Hadoop for controlling execution of a submitted job Key: MAPREDUCE-1928 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1928 Project: Hadoop Map/Reduce Issue Type: New Feature Components: job submission, jobtracker, tasktracker Affects Versions: 0.20.3 Reporter: Raman Grover Original Estimate: 2016h Remaining Estimate: 2016h Currently the job submission protocol requires the job provider to put every bit of information inside an instance of JobConf. The submitted information includes the input data (hdfs path) , suspected resource requirement, number of reducers etc. This information is read by JobTracker as part of job initialization. Once initialized, job is moved into a running state. From this point, there is no mechanism for any additional information to be fed into Hadoop infrastructure for controlling the job execution. The execution pattern for the job looks very much static from this point. Using the size of input data and a few settings inside JobConf, number of mappers is computed. Hadoop attempts at reading the whole of data in parallel by launching parallel map tasks. Once map phase is over, a known number of reduce tasks (supplied as part of JobConf) are started. Parameters that control the job execution were set in JobConf prior to reading the input data. As the map phase progresses, useful information based upon the content of the input data surfaces and can be used in controlling the further execution of the job. Let us walk through some of the examples where additional information can be fed to Hadoop subsequent to job submission for optimal execution of the job. I) Process a part of the input , based upon the results decide if reading more input is required In a huge data set, user is interested in finding 'k' records that satisfy a predicate, essentially sampling the data. In current implementation, as the data is huge, a large no of mappers would be launched consuming a significant fraction of the available map slots in the cluster. Each map task would attempt at emitting a max of 'k' records. With N mappers, we get N*k records out of which one can pick any k to form the final result. This is not optimal as: 1) A larger number of map slots get occupied initially, affecting other jobs in the queue. 2) If the selectivity of input data is very low, we essentially did not need scanning the whole of data to form our result. we could have finished by reading a fraction of input data, monitoring the cardinality of the map output and determining if more input needs to be processed. Optimal way: If reading the whole of input requires N mappers, launch only 'M' initially. Allow them to complete. Based upon the statistics collected, decide additional number of mappers to be launched next and so on until the whole of input has been processed or enough records have been collected to for the results, whichever is earlier. II) Here is some data, the remaining is yet to arrive, but you may start with it, and receive more input later Consider a chain of 2 M-R jobs chained together such that the latter reads the output of the former. The second MR job cannot be started until the first has finished completely. This is essentially because Hadoop needs to be told the complete information about the input before beginning the job. The first M-R has produced enough data ( not finished yet) that can be processed by another MR job and hence the other MR need not wait to grab the whole of input before beginning. Input splits could be supplied later , but ofcourse before the copy/shuffle phase. III) Input data has undergone one round of processing by map phase, have some stats, can now say of the resources required further
[jira] Updated: (MAPREDUCE-1733) Authentication between pipes processes and java counterparts.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated MAPREDUCE-1733: Status: Patch Available (was: Open) Authentication between pipes processes and java counterparts. - Key: MAPREDUCE-1733 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1733 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: MR-1733-y20.1.patch, MR-1733-y20.2.patch, MR-1733-y20.3.patch, MR-1733.5.patch The connection between a pipe process and its parent java process should be authenticated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1812) New properties for suspend and resume process.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888511#action_12888511 ] Hadoop QA commented on MAPREDUCE-1812: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449207/MAPREDUCE-1812.patch against trunk revision 963986. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/console This message is automatically generated. New properties for suspend and resume process. -- Key: MAPREDUCE-1812 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1812 Project: Hadoop Map/Reduce Issue Type: Task Components: test Affects Versions: 0.21.0 Reporter: Vinay Kumar Thota Assignee: Vinay Kumar Thota Attachments: MAPREDUCE-1812.patch, MAPREDUCE-1812.patch Adding new properties in system-test-mr.xml file for suspend and resume process. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated MAPREDUCE-1938: --- Attachment: mr-1938-bp20.1.patch Addressing Owen's comment on the shell script part of the patch. Doug, this patch is a first step towards letting users use their own versions of library provided implementation for things like CombineFileInputFormat. The use case is to allow for specific implementations of library classes for certain classes of jobs. This doesn't aim to address the kernel/library separation in its entirety. So yes, if the user puts a class on the classpath that doesn't work with the kernel compatibly then tasks will fail, or produce obscure/inconsistent results, but that will affect only that job, and the user would notice that soon (hopefully). Did i understand your concern right? Ability for having user's classes take precedence over the system classes for tasks' classpath -- Key: MAPREDUCE-1938 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938 Project: Hadoop Map/Reduce Issue Type: New Feature Components: job submission, task, tasktracker Reporter: Devaraj Das Fix For: 0.22.0 Attachments: mr-1938-bp20.1.patch, mr-1938-bp20.patch It would be nice to have the ability in MapReduce to allow users to specify for their jobs alternate implementations of classes that are already defined in the MapReduce libraries. For example, an alternate implementation for CombineFileInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888532#action_12888532 ] Doug Cutting commented on MAPREDUCE-1938: - Did i understand your concern right? I don't have specific concerns about this patch. Sorry for any confusion in that regard. I thought it worthwhile to discuss how this change relates to other changes that are contemplated. It seems not inconsistent, provides some of the benefits, and is considerably simpler; in short, a good thing. Ability for having user's classes take precedence over the system classes for tasks' classpath -- Key: MAPREDUCE-1938 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938 Project: Hadoop Map/Reduce Issue Type: New Feature Components: job submission, task, tasktracker Reporter: Devaraj Das Fix For: 0.22.0 Attachments: mr-1938-bp20.1.patch, mr-1938-bp20.patch It would be nice to have the ability in MapReduce to allow users to specify for their jobs alternate implementations of classes that are already defined in the MapReduce libraries. For example, an alternate implementation for CombineFileInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888536#action_12888536 ] Owen O'Malley commented on MAPREDUCE-1938: -- This patch basically puts the user in charge of their job. They can leave the safety switch set in which case they get the current behavior. But if they turn off the safety, their classes go ahead of the ones installed on the cluster. That means that they can break things, but all they can break is their own tasks. After we do the split of core from library, you still need this switch. There will always be the possibility of needing to patch something in the core, because even MapTask has bugs. *smile* After splitting them apart, we can put the library code at the very end safety on: core, user, library safety off: user, core, library This patch is just about providing the safety switch. Ability for having user's classes take precedence over the system classes for tasks' classpath -- Key: MAPREDUCE-1938 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938 Project: Hadoop Map/Reduce Issue Type: New Feature Components: job submission, task, tasktracker Reporter: Devaraj Das Fix For: 0.22.0 Attachments: mr-1938-bp20.1.patch, mr-1938-bp20.patch It would be nice to have the ability in MapReduce to allow users to specify for their jobs alternate implementations of classes that are already defined in the MapReduce libraries. For example, an alternate implementation for CombineFileInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes Key: MAPREDUCE-1943 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 0.22.0 We have come across issues in production clusters wherein users abuse counters, statusreport messages and split sizes. One such case was when one of the users had 100 million counters. This leads to jobtracker going out of memory and being unresponsive. In this jira I am proposing to put sane limits on the status report length, the number of counters and the size of block locations returned by the input split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated MAPREDUCE-1943: - Fix Version/s: (was: 0.22.0) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes Key: MAPREDUCE-1943 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Mahadev konar Assignee: Mahadev konar We have come across issues in production clusters wherein users abuse counters, statusreport messages and split sizes. One such case was when one of the users had 100 million counters. This leads to jobtracker going out of memory and being unresponsive. In this jira I am proposing to put sane limits on the status report length, the number of counters and the size of block locations returned by the input split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1942) 'compile-fault-inject' should never be called directly.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888548#action_12888548 ] Eli Collins commented on MAPREDUCE-1942: +1 'compile-fault-inject' should never be called directly. Key: MAPREDUCE-1942 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1942 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.21.0 Reporter: Konstantin Boudnik Assignee: Konstantin Boudnik Priority: Minor Attachments: MAPREDUCE-1942.patch Similar to HDFS-1299: prevent calls to helper targets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888557#action_12888557 ] Scott Chen commented on MAPREDUCE-1943: --- +1 to the idea. We have seen the huge split-size kills JT. This will help. Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes Key: MAPREDUCE-1943 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Mahadev konar Assignee: Mahadev konar We have come across issues in production clusters wherein users abuse counters, statusreport messages and split sizes. One such case was when one of the users had 100 million counters. This leads to jobtracker going out of memory and being unresponsive. In this jira I am proposing to put sane limits on the status report length, the number of counters and the size of block locations returned by the input split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1848) Put number of speculative, data local, rack local tasks in JobTracker metrics
[ https://issues.apache.org/jira/browse/MAPREDUCE-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888553#action_12888553 ] Dmytro Molkov commented on MAPREDUCE-1848: -- Patch looks good to me Put number of speculative, data local, rack local tasks in JobTracker metrics - Key: MAPREDUCE-1848 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1848 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.22.0 Reporter: Scott Chen Assignee: Scott Chen Fix For: 0.22.0 Attachments: MAPREDUCE-1848-20100614.txt, MAPREDUCE-1848-20100617.txt, MAPREDUCE-1848-20100623.txt It will be nice that we can collect these information in JobTracker metrics -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated MAPREDUCE-1943: - Attachment: MAPREDUCE-1521-0.20-yahoo.patch this patch imposes some limits. the following are the limits it imposes: 1) The number of counters per group is limited to 40. If the counters increase that amount they are dropped silently. 2) The number of counter groups is restricted to 40. Again if the groups are more than the limit they are dropped silently. 3) The string size of counter name is restricted to 64 characters. 4) the string size of group name is restricted to 128 characters. 5) The number of block locations returned by a split is restricted to 100, this can be changed with a configuration parameter. 6) limit the reporter.setstatus() string size to 512 characters. I havent added tests yet. Will upload one shortly. Also, this patch is for yahoo 0.20 branch. I will upload one for the trunk shortly. Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes Key: MAPREDUCE-1943 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Mahadev konar Assignee: Mahadev konar Attachments: MAPREDUCE-1521-0.20-yahoo.patch We have come across issues in production clusters wherein users abuse counters, statusreport messages and split sizes. One such case was when one of the users had 100 million counters. This leads to jobtracker going out of memory and being unresponsive. In this jira I am proposing to put sane limits on the status report length, the number of counters and the size of block locations returned by the input split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated MAPREDUCE-1943: - Attachment: MAPREDUCE-1943-0.20-yahoo.patch attached the wrong file.. :) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes Key: MAPREDUCE-1943 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Mahadev konar Assignee: Mahadev konar Attachments: MAPREDUCE-1943-0.20-yahoo.patch We have come across issues in production clusters wherein users abuse counters, statusreport messages and split sizes. One such case was when one of the users had 100 million counters. This leads to jobtracker going out of memory and being unresponsive. In this jira I am proposing to put sane limits on the status report length, the number of counters and the size of block locations returned by the input split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated MAPREDUCE-1943: - Attachment: (was: MAPREDUCE-1521-0.20-yahoo.patch) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes Key: MAPREDUCE-1943 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Mahadev konar Assignee: Mahadev konar Attachments: MAPREDUCE-1943-0.20-yahoo.patch We have come across issues in production clusters wherein users abuse counters, statusreport messages and split sizes. One such case was when one of the users had 100 million counters. This leads to jobtracker going out of memory and being unresponsive. In this jira I am proposing to put sane limits on the status report length, the number of counters and the size of block locations returned by the input split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker Jobtracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated MAPREDUCE-1906: --- Status: Open (was: Patch Available) re-subit for hudson. Lower minimum heartbeat interval for tasktracker Jobtracker - Key: MAPREDUCE-1906 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.20.2, 0.20.1 Reporter: Scott Carey Attachments: MAPREDUCE-1906-0.21-v2.patch, MAPREDUCE-1906-0.21.patch I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes. Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s. If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker Jobtracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated MAPREDUCE-1906: --- Status: Patch Available (was: Open) re-submit for hudson. Lower minimum heartbeat interval for tasktracker Jobtracker - Key: MAPREDUCE-1906 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.20.2, 0.20.1 Reporter: Scott Carey Attachments: MAPREDUCE-1906-0.21-v2.patch, MAPREDUCE-1906-0.21.patch I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes. Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s. If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1730) Automate test scenario for successful/killed jobs' memory is properly removed from jobtracker after these jobs retire.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888623#action_12888623 ] Hadoop QA commented on MAPREDUCE-1730: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449081/MAPREDUCE-1730.patch against trunk revision 963986. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/console This message is automatically generated. Automate test scenario for successful/killed jobs' memory is properly removed from jobtracker after these jobs retire. -- Key: MAPREDUCE-1730 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1730 Project: Hadoop Map/Reduce Issue Type: Test Affects Versions: 0.21.0 Reporter: Iyappan Srinivasan Assignee: Iyappan Srinivasan Attachments: MAPREDUCE-1730.patch, MAPREDUCE-1730.patch, MAPREDUCE-1730.patch, TestJobRetired.patch, TestJobRetired.patch, TestRetiredJobs-ydist-security-patch.txt, TestRetiredJobs-ydist-security-patch.txt, TestRetiredJobs.patch Automate using herriot framework, test scenario for successful/killed jobs' memory is properly removed from jobtracker after these jobs retire. This should test when successful and failed jobs are retired, their jobInProgress object are removed properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888693#action_12888693 ] Amareshwari Sriramadasu commented on MAPREDUCE-1943: Limiting task diagnostic info and status are done in MAPREDUCE-1482. Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes Key: MAPREDUCE-1943 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Mahadev konar Assignee: Mahadev konar Attachments: MAPREDUCE-1943-0.20-yahoo.patch We have come across issues in production clusters wherein users abuse counters, statusreport messages and split sizes. One such case was when one of the users had 100 million counters. This leads to jobtracker going out of memory and being unresponsive. In this jira I am proposing to put sane limits on the status report length, the number of counters and the size of block locations returned by the input split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1896) [Herriot] New property for multi user list.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888694#action_12888694 ] Vinay Kumar Thota commented on MAPREDUCE-1896: -- I could see two failures and they are unrelated to this patch. I don't think so the patch could raise these failures because the scope is just adds the new property in a xml file. [Herriot] New property for multi user list. --- Key: MAPREDUCE-1896 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1896 Project: Hadoop Map/Reduce Issue Type: Task Components: test Affects Versions: 0.21.0 Reporter: Vinay Kumar Thota Assignee: Vinay Kumar Thota Attachments: MAPREDUCE-1896.patch, MAPREDUCE-1896.patch, MAPREDUCE-1896.patch Adding new property for multi user list. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1941) Need a servlet in JobTracker to stream contents of the job history file
[ https://issues.apache.org/jira/browse/MAPREDUCE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888697#action_12888697 ] Amareshwari Sriramadasu commented on MAPREDUCE-1941: This can be done in Job client itself, no? History url is already available in JobStatus. Need a servlet in JobTracker to stream contents of the job history file --- Key: MAPREDUCE-1941 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1941 Project: Hadoop Map/Reduce Issue Type: New Feature Components: jobtracker Affects Versions: 0.22.0 Reporter: Srikanth Sundarrajan Assignee: Srikanth Sundarrajan There is no convenient mechanism to retrieve the contents of the job history file. Need a way to retrieve the job history file contents from Job Tracker. This can perhaps be implemented as a servlet on the Job tracker. * Create a jsp/servlet that accepts job id as a request parameter * Stream the contents of the history file corresponding to the job id, if user has permissions to view the job details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1911) Fix errors in -info option in streaming
[ https://issues.apache.org/jira/browse/MAPREDUCE-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888702#action_12888702 ] Amareshwari Sriramadasu commented on MAPREDUCE-1911: Test failures are because of MAPREDUCE-1834 and MAPREDUCE-1925 Fix errors in -info option in streaming --- Key: MAPREDUCE-1911 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1911 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.22.0 Attachments: patch-1911-1.txt, patch-1911.txt Here are some of the findings by Karam while verifying -info option in streaming: # We need to add Optional for -mapper, -reducer,-combiner and -file options. # For -inputformat and -outputformat options, we should put Optional in the prefix for the sake on uniformity. # We need to remove -cluster decription. # -help option is not displayed in usage message. # when displaying message for -info or -help options, we should not display Streaming Job Failed!; also exit code should be 0 in case of -help/-info option. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1621) Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output
[ https://issues.apache.org/jira/browse/MAPREDUCE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1621: --- Status: Open (was: Patch Available) Many tests failed because of NoClassDefFoundError. Re-submitting to hudson Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output - Key: MAPREDUCE-1621 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1621 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.21.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.22.0 Attachments: patch-1621.txt If TextOutputReader.readKeyValue() has never successfully read a line, then its bytes member will be left null. Thus when logging a task failure, PipeMapRed.getContext() can trigger an NPE when it calls outReader_.getLastOutput(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1621) Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output
[ https://issues.apache.org/jira/browse/MAPREDUCE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1621: --- Status: Patch Available (was: Open) Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output - Key: MAPREDUCE-1621 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1621 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.21.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.22.0 Attachments: patch-1621.txt If TextOutputReader.readKeyValue() has never successfully read a line, then its bytes member will be left null. Thus when logging a task failure, PipeMapRed.getContext() can trigger an NPE when it calls outReader_.getLastOutput(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1812) New properties for suspend and resume process.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888704#action_12888704 ] Vinay Kumar Thota commented on MAPREDUCE-1812: -- I could see 6 failures and they are unrelated to this patch. I don't think so the patch could raise these failures because the scope is just adds the new properties in a xml file. New properties for suspend and resume process. -- Key: MAPREDUCE-1812 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1812 Project: Hadoop Map/Reduce Issue Type: Task Components: test Affects Versions: 0.21.0 Reporter: Vinay Kumar Thota Assignee: Vinay Kumar Thota Attachments: MAPREDUCE-1812.patch, MAPREDUCE-1812.patch Adding new properties in system-test-mr.xml file for suspend and resume process. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1730) Automate test scenario for successful/killed jobs' memory is properly removed from jobtracker after these jobs retire.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888706#action_12888706 ] Iyappan Srinivasan commented on MAPREDUCE-1730: --- The two errors are unrelated to the patch. Automate test scenario for successful/killed jobs' memory is properly removed from jobtracker after these jobs retire. -- Key: MAPREDUCE-1730 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1730 Project: Hadoop Map/Reduce Issue Type: Test Affects Versions: 0.21.0 Reporter: Iyappan Srinivasan Assignee: Iyappan Srinivasan Attachments: MAPREDUCE-1730.patch, MAPREDUCE-1730.patch, MAPREDUCE-1730.patch, TestJobRetired.patch, TestJobRetired.patch, TestRetiredJobs-ydist-security-patch.txt, TestRetiredJobs-ydist-security-patch.txt, TestRetiredJobs.patch Automate using herriot framework, test scenario for successful/killed jobs' memory is properly removed from jobtracker after these jobs retire. This should test when successful and failed jobs are retired, their jobInProgress object are removed properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1941) Need a servlet in JobTracker to stream contents of the job history file
[ https://issues.apache.org/jira/browse/MAPREDUCE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888710#action_12888710 ] Srikanth Sundarrajan commented on MAPREDUCE-1941: - {quote} This can be done in Job client itself, no? History url is already available in JobStatus. {quote} While the history file name may be available through JobStatus, the history file is owned by user who runs the job tracker. However access to history file should be governed by JobACL.VIEW_JOB. Hence the request to have a separate servlet to provide job history file contents. Need a servlet in JobTracker to stream contents of the job history file --- Key: MAPREDUCE-1941 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1941 Project: Hadoop Map/Reduce Issue Type: New Feature Components: jobtracker Affects Versions: 0.22.0 Reporter: Srikanth Sundarrajan Assignee: Srikanth Sundarrajan There is no convenient mechanism to retrieve the contents of the job history file. Need a way to retrieve the job history file contents from Job Tracker. This can perhaps be implemented as a servlet on the Job tracker. * Create a jsp/servlet that accepts job id as a request parameter * Stream the contents of the history file corresponding to the job id, if user has permissions to view the job details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.