[jira] Commented: (MAPREDUCE-1941) Need a servlet in JobTracker to stream contents of the job history file
[ https://issues.apache.org/jira/browse/MAPREDUCE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888710#action_12888710 ] Srikanth Sundarrajan commented on MAPREDUCE-1941: - {quote} This can be done in Job client itself, no? History url is already available in JobStatus. {quote} While the history file name may be available through JobStatus, the history file is owned by user who runs the job tracker. However access to history file should be governed by JobACL.VIEW_JOB. Hence the request to have a separate servlet to provide job history file contents. > Need a servlet in JobTracker to stream contents of the job history file > --- > > Key: MAPREDUCE-1941 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1941 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: jobtracker >Affects Versions: 0.22.0 >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > > There is no convenient mechanism to retrieve the contents of the job history > file. Need a way to retrieve the job history file contents from Job Tracker. > This can perhaps be implemented as a servlet on the Job tracker. > * Create a jsp/servlet that accepts job id as a request parameter > * Stream the contents of the history file corresponding to the job id, if > user has permissions to view the job details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1554) If user name contains '_', then searching of jobs based on user name on job history web UI doesn't work
[ https://issues.apache.org/jira/browse/MAPREDUCE-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-1554: Description: If user name contains underscore as part of it, then searching of jobs based on user name on job history web UI doesn't work. This is because in code, everywhere {code}split("_"){code} is done on history file name to get user name. And other parts of history file name also should *not* be obtained by using split("_"). (was: If user name contains '_', then searching of jobs based on user name on job history web UI doesn't work. This is because in code, everywhere split("_") is done on history file name to get user name. And other parts of history file name also should not be obtained by using split("_").) > If user name contains '_', then searching of jobs based on user name on job > history web UI doesn't work > --- > > Key: MAPREDUCE-1554 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1554 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Ravi Gummadi > > If user name contains underscore as part of it, then searching of jobs based > on user name on job history web UI doesn't work. This is because in code, > everywhere {code}split("_"){code} is done on history file name to get user > name. And other parts of history file name also should *not* be obtained by > using split("_"). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1730) Automate test scenario for successful/killed jobs' memory is properly removed from jobtracker after these jobs retire.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888706#action_12888706 ] Iyappan Srinivasan commented on MAPREDUCE-1730: --- The two errors are unrelated to the patch. > Automate test scenario for successful/killed jobs' memory is properly removed > from jobtracker after these jobs retire. > -- > > Key: MAPREDUCE-1730 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1730 > Project: Hadoop Map/Reduce > Issue Type: Test >Affects Versions: 0.21.0 >Reporter: Iyappan Srinivasan >Assignee: Iyappan Srinivasan > Attachments: MAPREDUCE-1730.patch, MAPREDUCE-1730.patch, > MAPREDUCE-1730.patch, TestJobRetired.patch, TestJobRetired.patch, > TestRetiredJobs-ydist-security-patch.txt, > TestRetiredJobs-ydist-security-patch.txt, TestRetiredJobs.patch > > > Automate using herriot framework, test scenario for successful/killed jobs' > memory is properly removed from jobtracker after these jobs retire. > This should test when successful and failed jobs are retired, their > jobInProgress object are removed properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1812) New properties for suspend and resume process.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888704#action_12888704 ] Vinay Kumar Thota commented on MAPREDUCE-1812: -- I could see 6 failures and they are unrelated to this patch. I don't think so the patch could raise these failures because the scope is just adds the new properties in a xml file. > New properties for suspend and resume process. > -- > > Key: MAPREDUCE-1812 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1812 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Affects Versions: 0.21.0 >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: MAPREDUCE-1812.patch, MAPREDUCE-1812.patch > > > Adding new properties in system-test-mr.xml file for suspend and resume > process. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1621) Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output
[ https://issues.apache.org/jira/browse/MAPREDUCE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1621: --- Status: Open (was: Patch Available) Many tests failed because of NoClassDefFoundError. Re-submitting to hudson > Streaming's TextOutputReader.getLastOutput throws NPE if it has never read > any output > - > > Key: MAPREDUCE-1621 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1621 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 0.22.0 > > Attachments: patch-1621.txt > > > If TextOutputReader.readKeyValue() has never successfully read a line, then > its bytes member will be left null. Thus when logging a task failure, > PipeMapRed.getContext() can trigger an NPE when it calls > outReader_.getLastOutput(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1621) Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output
[ https://issues.apache.org/jira/browse/MAPREDUCE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1621: --- Status: Patch Available (was: Open) > Streaming's TextOutputReader.getLastOutput throws NPE if it has never read > any output > - > > Key: MAPREDUCE-1621 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1621 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 0.22.0 > > Attachments: patch-1621.txt > > > If TextOutputReader.readKeyValue() has never successfully read a line, then > its bytes member will be left null. Thus when logging a task failure, > PipeMapRed.getContext() can trigger an NPE when it calls > outReader_.getLastOutput(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1911) Fix errors in -info option in streaming
[ https://issues.apache.org/jira/browse/MAPREDUCE-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888702#action_12888702 ] Amareshwari Sriramadasu commented on MAPREDUCE-1911: Test failures are because of MAPREDUCE-1834 and MAPREDUCE-1925 > Fix errors in -info option in streaming > --- > > Key: MAPREDUCE-1911 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1911 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu > Fix For: 0.22.0 > > Attachments: patch-1911-1.txt, patch-1911.txt > > > Here are some of the findings by Karam while verifying -info option in > streaming: > # We need to add "Optional" for -mapper, -reducer,-combiner and -file options. > # For -inputformat and -outputformat options, we should put "Optional" in the > prefix for the sake on uniformity. > # We need to remove -cluster decription. > # -help option is not displayed in usage message. > # when displaying message for -info or -help options, we should not display > "Streaming Job Failed!"; also exit code should be 0 in case of -help/-info > option. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1941) Need a servlet in JobTracker to stream contents of the job history file
[ https://issues.apache.org/jira/browse/MAPREDUCE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888697#action_12888697 ] Amareshwari Sriramadasu commented on MAPREDUCE-1941: This can be done in Job client itself, no? History url is already available in JobStatus. > Need a servlet in JobTracker to stream contents of the job history file > --- > > Key: MAPREDUCE-1941 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1941 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: jobtracker >Affects Versions: 0.22.0 >Reporter: Srikanth Sundarrajan >Assignee: Srikanth Sundarrajan > > There is no convenient mechanism to retrieve the contents of the job history > file. Need a way to retrieve the job history file contents from Job Tracker. > This can perhaps be implemented as a servlet on the Job tracker. > * Create a jsp/servlet that accepts job id as a request parameter > * Stream the contents of the history file corresponding to the job id, if > user has permissions to view the job details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1896) [Herriot] New property for multi user list.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888694#action_12888694 ] Vinay Kumar Thota commented on MAPREDUCE-1896: -- I could see two failures and they are unrelated to this patch. I don't think so the patch could raise these failures because the scope is just adds the new property in a xml file. > [Herriot] New property for multi user list. > --- > > Key: MAPREDUCE-1896 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1896 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Affects Versions: 0.21.0 >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: MAPREDUCE-1896.patch, MAPREDUCE-1896.patch, > MAPREDUCE-1896.patch > > > Adding new property for multi user list. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888693#action_12888693 ] Amareshwari Sriramadasu commented on MAPREDUCE-1943: Limiting task diagnostic info and status are done in MAPREDUCE-1482. > Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes > > > Key: MAPREDUCE-1943 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Mahadev konar >Assignee: Mahadev konar > Attachments: MAPREDUCE-1943-0.20-yahoo.patch > > > We have come across issues in production clusters wherein users abuse > counters, statusreport messages and split sizes. One such case was when one > of the users had 100 million counters. This leads to jobtracker going out of > memory and being unresponsive. In this jira I am proposing to put sane limits > on the status report length, the number of counters and the size of block > locations returned by the input split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1730) Automate test scenario for successful/killed jobs' memory is properly removed from jobtracker after these jobs retire.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888623#action_12888623 ] Hadoop QA commented on MAPREDUCE-1730: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449081/MAPREDUCE-1730.patch against trunk revision 963986. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300/console This message is automatically generated. > Automate test scenario for successful/killed jobs' memory is properly removed > from jobtracker after these jobs retire. > -- > > Key: MAPREDUCE-1730 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1730 > Project: Hadoop Map/Reduce > Issue Type: Test >Affects Versions: 0.21.0 >Reporter: Iyappan Srinivasan >Assignee: Iyappan Srinivasan > Attachments: MAPREDUCE-1730.patch, MAPREDUCE-1730.patch, > MAPREDUCE-1730.patch, TestJobRetired.patch, TestJobRetired.patch, > TestRetiredJobs-ydist-security-patch.txt, > TestRetiredJobs-ydist-security-patch.txt, TestRetiredJobs.patch > > > Automate using herriot framework, test scenario for successful/killed jobs' > memory is properly removed from jobtracker after these jobs retire. > This should test when successful and failed jobs are retired, their > jobInProgress object are removed properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated MAPREDUCE-1906: --- Status: Patch Available (was: Open) re-submit for hudson. > Lower minimum heartbeat interval for tasktracker > Jobtracker > - > > Key: MAPREDUCE-1906 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.2, 0.20.1 >Reporter: Scott Carey > Attachments: MAPREDUCE-1906-0.21-v2.patch, MAPREDUCE-1906-0.21.patch > > > I get a 0% to 15% performance increase for smaller clusters by making the > heartbeat throttle stop penalizing clusters with less than 300 nodes. > Between 0.19 and 0.20, the default minimum heartbeat interval increased from > 2s to 3s. If a JobTracker is throttled at 100 heartbeats / sec for large > clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats > per second? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Carey updated MAPREDUCE-1906: --- Status: Open (was: Patch Available) re-subit for hudson. > Lower minimum heartbeat interval for tasktracker > Jobtracker > - > > Key: MAPREDUCE-1906 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.2, 0.20.1 >Reporter: Scott Carey > Attachments: MAPREDUCE-1906-0.21-v2.patch, MAPREDUCE-1906-0.21.patch > > > I get a 0% to 15% performance increase for smaller clusters by making the > heartbeat throttle stop penalizing clusters with less than 300 nodes. > Between 0.19 and 0.20, the default minimum heartbeat interval increased from > 2s to 3s. If a JobTracker is throttled at 100 heartbeats / sec for large > clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats > per second? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated MAPREDUCE-1943: - Attachment: (was: MAPREDUCE-1521-0.20-yahoo.patch) > Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes > > > Key: MAPREDUCE-1943 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Mahadev konar >Assignee: Mahadev konar > Attachments: MAPREDUCE-1943-0.20-yahoo.patch > > > We have come across issues in production clusters wherein users abuse > counters, statusreport messages and split sizes. One such case was when one > of the users had 100 million counters. This leads to jobtracker going out of > memory and being unresponsive. In this jira I am proposing to put sane limits > on the status report length, the number of counters and the size of block > locations returned by the input split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated MAPREDUCE-1943: - Attachment: MAPREDUCE-1943-0.20-yahoo.patch attached the wrong file.. :) > Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes > > > Key: MAPREDUCE-1943 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Mahadev konar >Assignee: Mahadev konar > Attachments: MAPREDUCE-1943-0.20-yahoo.patch > > > We have come across issues in production clusters wherein users abuse > counters, statusreport messages and split sizes. One such case was when one > of the users had 100 million counters. This leads to jobtracker going out of > memory and being unresponsive. In this jira I am proposing to put sane limits > on the status report length, the number of counters and the size of block > locations returned by the input split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated MAPREDUCE-1943: - Attachment: MAPREDUCE-1521-0.20-yahoo.patch this patch imposes some limits. the following are the limits it imposes: 1) The number of counters per group is limited to 40. If the counters increase that amount they are dropped silently. 2) The number of counter groups is restricted to 40. Again if the groups are more than the limit they are dropped silently. 3) The string size of counter name is restricted to 64 characters. 4) the string size of group name is restricted to 128 characters. 5) The number of block locations returned by a split is restricted to 100, this can be changed with a configuration parameter. 6) limit the reporter.setstatus() string size to 512 characters. I havent added tests yet. Will upload one shortly. Also, this patch is for yahoo 0.20 branch. I will upload one for the trunk shortly. > Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes > > > Key: MAPREDUCE-1943 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Mahadev konar >Assignee: Mahadev konar > Attachments: MAPREDUCE-1521-0.20-yahoo.patch > > > We have come across issues in production clusters wherein users abuse > counters, statusreport messages and split sizes. One such case was when one > of the users had 100 million counters. This leads to jobtracker going out of > memory and being unresponsive. In this jira I am proposing to put sane limits > on the status report length, the number of counters and the size of block > locations returned by the input split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1848) Put number of speculative, data local, rack local tasks in JobTracker metrics
[ https://issues.apache.org/jira/browse/MAPREDUCE-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888553#action_12888553 ] Dmytro Molkov commented on MAPREDUCE-1848: -- Patch looks good to me > Put number of speculative, data local, rack local tasks in JobTracker metrics > - > > Key: MAPREDUCE-1848 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1848 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1848-20100614.txt, > MAPREDUCE-1848-20100617.txt, MAPREDUCE-1848-20100623.txt > > > It will be nice that we can collect these information in JobTracker metrics -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888557#action_12888557 ] Scott Chen commented on MAPREDUCE-1943: --- +1 to the idea. We have seen the huge split-size kills JT. This will help. > Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes > > > Key: MAPREDUCE-1943 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Mahadev konar >Assignee: Mahadev konar > > We have come across issues in production clusters wherein users abuse > counters, statusreport messages and split sizes. One such case was when one > of the users had 100 million counters. This leads to jobtracker going out of > memory and being unresponsive. In this jira I am proposing to put sane limits > on the status report length, the number of counters and the size of block > locations returned by the input split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1942) 'compile-fault-inject' should never be called directly.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888548#action_12888548 ] Eli Collins commented on MAPREDUCE-1942: +1 > 'compile-fault-inject' should never be called directly. > > > Key: MAPREDUCE-1942 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1942 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: build >Affects Versions: 0.21.0 >Reporter: Konstantin Boudnik >Assignee: Konstantin Boudnik >Priority: Minor > Attachments: MAPREDUCE-1942.patch > > > Similar to HDFS-1299: prevent calls to helper targets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated MAPREDUCE-1943: - Fix Version/s: (was: 0.22.0) > Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes > > > Key: MAPREDUCE-1943 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Mahadev konar >Assignee: Mahadev konar > > We have come across issues in production clusters wherein users abuse > counters, statusreport messages and split sizes. One such case was when one > of the users had 100 million counters. This leads to jobtracker going out of > memory and being unresponsive. In this jira I am proposing to put sane limits > on the status report length, the number of counters and the size of block > locations returned by the input split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes Key: MAPREDUCE-1943 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 0.22.0 We have come across issues in production clusters wherein users abuse counters, statusreport messages and split sizes. One such case was when one of the users had 100 million counters. This leads to jobtracker going out of memory and being unresponsive. In this jira I am proposing to put sane limits on the status report length, the number of counters and the size of block locations returned by the input split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888536#action_12888536 ] Owen O'Malley commented on MAPREDUCE-1938: -- This patch basically puts the user in charge of their job. They can leave the safety switch set in which case they get the current behavior. But if they turn off the safety, their classes go ahead of the ones installed on the cluster. That means that they can break things, but all they can break is their own tasks. After we do the split of core from library, you still need this switch. There will always be the possibility of needing to patch something in the core, because even MapTask has bugs. *smile* After splitting them apart, we can put the library code at the very end safety on: core, user, library safety off: user, core, library This patch is just about providing the safety switch. > Ability for having user's classes take precedence over the system classes for > tasks' classpath > -- > > Key: MAPREDUCE-1938 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: job submission, task, tasktracker >Reporter: Devaraj Das > Fix For: 0.22.0 > > Attachments: mr-1938-bp20.1.patch, mr-1938-bp20.patch > > > It would be nice to have the ability in MapReduce to allow users to specify > for their jobs alternate implementations of classes that are already defined > in the MapReduce libraries. For example, an alternate implementation for > CombineFileInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888532#action_12888532 ] Doug Cutting commented on MAPREDUCE-1938: - > Did i understand your concern right? I don't have specific concerns about this patch. Sorry for any confusion in that regard. I thought it worthwhile to discuss how this change relates to other changes that are contemplated. It seems not inconsistent, provides some of the benefits, and is considerably simpler; in short, a good thing. > Ability for having user's classes take precedence over the system classes for > tasks' classpath > -- > > Key: MAPREDUCE-1938 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: job submission, task, tasktracker >Reporter: Devaraj Das > Fix For: 0.22.0 > > Attachments: mr-1938-bp20.1.patch, mr-1938-bp20.patch > > > It would be nice to have the ability in MapReduce to allow users to specify > for their jobs alternate implementations of classes that are already defined > in the MapReduce libraries. For example, an alternate implementation for > CombineFileInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated MAPREDUCE-1938: --- Attachment: mr-1938-bp20.1.patch Addressing Owen's comment on the shell script part of the patch. Doug, this patch is a first step towards letting users use their own versions of library provided implementation for things like CombineFileInputFormat. The use case is to allow for specific implementations of library classes for certain classes of jobs. This doesn't aim to address the kernel/library separation in its entirety. So yes, if the user puts a class on the classpath that doesn't work with the kernel compatibly then tasks will fail, or produce obscure/inconsistent results, but that will affect only that job, and the user would notice that soon (hopefully). Did i understand your concern right? > Ability for having user's classes take precedence over the system classes for > tasks' classpath > -- > > Key: MAPREDUCE-1938 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: job submission, task, tasktracker >Reporter: Devaraj Das > Fix For: 0.22.0 > > Attachments: mr-1938-bp20.1.patch, mr-1938-bp20.patch > > > It would be nice to have the ability in MapReduce to allow users to specify > for their jobs alternate implementations of classes that are already defined > in the MapReduce libraries. For example, an alternate implementation for > CombineFileInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1812) New properties for suspend and resume process.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888511#action_12888511 ] Hadoop QA commented on MAPREDUCE-1812: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449207/MAPREDUCE-1812.patch against trunk revision 963986. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/299/console This message is automatically generated. > New properties for suspend and resume process. > -- > > Key: MAPREDUCE-1812 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1812 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Affects Versions: 0.21.0 >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: MAPREDUCE-1812.patch, MAPREDUCE-1812.patch > > > Adding new properties in system-test-mr.xml file for suspend and resume > process. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1733) Authentication between pipes processes and java counterparts.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated MAPREDUCE-1733: Status: Patch Available (was: Open) > Authentication between pipes processes and java counterparts. > - > > Key: MAPREDUCE-1733 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1733 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: MR-1733-y20.1.patch, MR-1733-y20.2.patch, > MR-1733-y20.3.patch, MR-1733.5.patch > > > The connection between a pipe process and its parent java process should be > authenticated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1928) Dynamic information fed into Hadoop for controlling execution of a submitted job
[ https://issues.apache.org/jira/browse/MAPREDUCE-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888503#action_12888503 ] Joydeep Sen Sarma commented on MAPREDUCE-1928: -- to add to #1 - we may be able to change the split size based on the observed selectivity of an ongoing job (ie. add splits with larger/smaller size depending on stats from the first set of splits). It's possible that Hadoop may want to do this as part of the basic framework (by exploiting any mechanisms provided here). This is a huge win for a framework like Hive. It would drastically reduce the amount of wasted work (limit N queries) and spawning unnecessarily large number of mappers (unknown selectivity) - just to name to obvious use cases. Can you supply a more concrete proposal in terms of api changes? > Dynamic information fed into Hadoop for controlling execution of a submitted > job > > > Key: MAPREDUCE-1928 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1928 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: job submission, jobtracker, tasktracker >Affects Versions: 0.20.3 >Reporter: Raman Grover > Original Estimate: 2016h > Remaining Estimate: 2016h > > Currently the job submission protocol requires the job provider to put every > bit of information inside an instance of JobConf. The submitted information > includes the input data (hdfs path) , suspected resource requirement, number > of reducers etc. This information is read by JobTracker as part of job > initialization. Once initialized, job is moved into a running state. From > this point, there is no mechanism for any additional information to be fed > into Hadoop infrastructure for controlling the job execution. >The execution pattern for the job looks very much > static from this point. Using the size of input data and a few settings > inside JobConf, number of mappers is computed. Hadoop attempts at reading the > whole of data in parallel by launching parallel map tasks. Once map phase is > over, a known number of reduce tasks (supplied as part of JobConf) are > started. > Parameters that control the job execution were set in JobConf prior to > reading the input data. As the map phase progresses, useful information based > upon the content of the input data surfaces and can be used in controlling > the further execution of the job. Let us walk through some of the examples > where additional information can be fed to Hadoop subsequent to job > submission for optimal execution of the job. > I) "Process a part of the input , based upon the results decide if reading > more input is required " > In a huge data set, user is interested in finding 'k' records that > satisfy a predicate, essentially sampling the data. In current > implementation, as the data is huge, a large no of mappers would be launched > consuming a significant fraction of the available map slots in the cluster. > Each map task would attempt at emitting a max of 'k' records. With N > mappers, we get N*k records out of which one can pick any k to form the final > result. >This is not optimal as: >1) A larger number of map slots get occupied initially, affecting other > jobs in the queue. >2) If the selectivity of input data is very low, we essentially did not > need scanning the whole of data to form our result. > we could have finished by reading a fraction of input data, > monitoring the cardinality of the map output and determining if >more input needs to be processed. > >Optimal way: If reading the whole of input requires N mappers, launch only > 'M' initially. Allow them to complete. Based upon the statistics collected, > decide additional number of mappers to be launched next and so on until the > whole of input has been processed or enough records have been collected to > for the results, whichever is earlier. > > > II) "Here is some data, the remaining is yet to arrive, but you may start > with it, and receive more input later" > Consider a chain of 2 M-R jobs chained together such that the latter > reads the output of the former. The second MR job cannot be started until the > first has finished completely. This is essentially because Hadoop needs to be > told the complete information about the input before beginning the job. > The first M-R has produced enough data ( not finished yet) that can be > processed by another MR job and hence the other MR need not wait to grab the > whole of input before beginning. Input splits could be supplied later , but > ofcourse before the copy/shuffle phase. > > III) " Input data has undergone one round of processing by map phase, have
[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888482#action_12888482 ] Doug Cutting commented on MAPREDUCE-1938: - Owen, I agree with your analysis. I'm just trying to put this patch in context of these other related discussions. This patch addresses some issues relevant to separation of kernel & library. In common cases one can merely provide an alternate version of the library class in one's job. Fully separating kernel & library with a well-defined, minimal kernel API is clearly aesthetically better. Are there use cases that will that enable that this patch will not? I think mostly it will just make it clear which classes are safe to replace with updated versions and which are not. Does that sound right? The issue of user versions of libraries that the kernel uses (like Avro, log4j, HttpClient, etc.) is not entirely addressed by this patch. If the user's version is backwards compatible with the kernel's version then this patch is sufficient. But if the user's version of a library makes incompatible changes then we'd need a classloader/OSGI solution. Even then, I think it only works if user and kernel code do not interchange instances of classes defined by these libraries. A minimal kernel API will help reduce that risk. Does this analysis sound right? I'm trying to understand how far this patch gets us towards those goals: what it solves and what it doesn't. > Ability for having user's classes take precedence over the system classes for > tasks' classpath > -- > > Key: MAPREDUCE-1938 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: job submission, task, tasktracker >Reporter: Devaraj Das > Fix For: 0.22.0 > > Attachments: mr-1938-bp20.patch > > > It would be nice to have the ability in MapReduce to allow users to specify > for their jobs alternate implementations of classes that are already defined > in the MapReduce libraries. For example, an alternate implementation for > CombineFileInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1942) 'compile-fault-inject' should never be called directly.
'compile-fault-inject' should never be called directly. Key: MAPREDUCE-1942 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1942 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.21.0 Reporter: Konstantin Boudnik Assignee: Konstantin Boudnik Priority: Minor Similar to HDFS-1299: prevent calls to helper targets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1942) 'compile-fault-inject' should never be called directly.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated MAPREDUCE-1942: -- Attachment: MAPREDUCE-1942.patch The fix. > 'compile-fault-inject' should never be called directly. > > > Key: MAPREDUCE-1942 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1942 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: build >Affects Versions: 0.21.0 >Reporter: Konstantin Boudnik >Assignee: Konstantin Boudnik >Priority: Minor > Attachments: MAPREDUCE-1942.patch > > > Similar to HDFS-1299: prevent calls to helper targets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1919) [Herriot] Test for verification of per cache file ref count.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888472#action_12888472 ] Konstantin Boudnik commented on MAPREDUCE-1919: --- I want to disagree with the suggestion on moving this little method to a helper class. It doesn't make much sense to create a wrapper around a well know ToolRunner interface - it just creates confusion. Why don't you simply use {{int exitCode = ToolRunner.run(job, tool, jobArgs)}} ? Why do you need a method to wrap a call to another one? Also, please consider the optimization for the imports list - it is over detailed. > [Herriot] Test for verification of per cache file ref count. > - > > Key: MAPREDUCE-1919 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1919 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1919-ydist-security.patch, MAPREDUCE-1919.patch > > > It covers the following scenarios. > 1. Run the job with two distributed cache files and verify whether job is > succeeded or not. > 2. Run the job with distributed cache files and remove one cache file from > the DFS when it is localized.verify whether the job is failed or not. > 3. Run the job with two distribute cache files and the size of one file > should be larger than local.cache.size.Verify whether job is succeeded or > not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1933) Create automated testcase for tasktracker dealing with corrupted disk.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888468#action_12888468 ] Konstantin Boudnik commented on MAPREDUCE-1933: --- bq. prop.put("mapred.local.dir", "/grid/0/dev/tmp/mapred/mapred-local,/grid/1/dev/tmp/mapred/mapred-local,/grid/2/dev/tmp/mapred/mapred-local,/grid/3/dev/tmp/mapred/mapred-local"); Absolutely, besides this particular parameter should be set by a normal MR config already. Also, please don't use string literals for configuration parameters. There was a significant effort in 0.21 to have all configuration keys refactored to named constants. Use them instead. > Create automated testcase for tasktracker dealing with corrupted disk. > -- > > Key: MAPREDUCE-1933 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1933 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Reporter: Iyappan Srinivasan >Assignee: Iyappan Srinivasan > Attachments: TestCorruptedDiskJob.java > > > After the TaskTracker has already run some tasks successfully, "corrupt" a > disk by making the corresponding mapred.local.dir unreadable/unwritable. > Make sure that jobs continue to succeed even though some tasks scheduled > there fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888445#action_12888445 ] Owen O'Malley commented on MAPREDUCE-1938: -- Doug, I agree that the kernel code should be split out from libraries, however, that work is much more involved. I don't see a problem with putting the user's code first. It is not a security concern. The user's code is only run as the user. Furthermore, it doesn't actually stop them from loading system classes. They can exec a new jvm with a new class path of their own choosing. Therefore, by putting the user's classes last all that we've done is make it harder for the user to implement hot fixes in their own jobs. That doesn't seem like a good goal. > Ability for having user's classes take precedence over the system classes for > tasks' classpath > -- > > Key: MAPREDUCE-1938 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: job submission, task, tasktracker >Reporter: Devaraj Das > Fix For: 0.22.0 > > Attachments: mr-1938-bp20.patch > > > It would be nice to have the ability in MapReduce to allow users to specify > for their jobs alternate implementations of classes that are already defined > in the MapReduce libraries. For example, an alternate implementation for > CombineFileInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888436#action_12888436 ] Owen O'Malley commented on MAPREDUCE-1938: -- I think that the default for this should be on. Rather than add HADOOP_CLIENT_CLASSPATH, let's make a new variable HADOOP_USER_CLASSPATH_LAST. If it is defined, we add HADOOP_CLASSPATH to the tail like we currently do. Otherwise it is added to the front. > Ability for having user's classes take precedence over the system classes for > tasks' classpath > -- > > Key: MAPREDUCE-1938 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: job submission, task, tasktracker >Reporter: Devaraj Das > Fix For: 0.22.0 > > Attachments: mr-1938-bp20.patch > > > It would be nice to have the ability in MapReduce to allow users to specify > for their jobs alternate implementations of classes that are already defined > in the MapReduce libraries. For example, an alternate implementation for > CombineFileInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1938) Ability for having user's classes take precedence over the system classes for tasks' classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888433#action_12888433 ] Doug Cutting commented on MAPREDUCE-1938: - Two thoughts: 1. In general, we need to better separate the kernel from the library. CombineFileInputFormat is library code and should be easy to update without updating the cluster. Long-term, only kernel code should be hardwired on the classpath of tasks, with library and user code both specified per job. There should be no default version of library classes for a task: tasks should always specify their required libraries. Is there a Jira for this? I know Tom's expressed interest in working on this. 2. We should permit user code to depend on different versions of things than the kernel does. For example, user code might rely on a different version of HttpClient or Avro than that used by MapReduce. This should be possible if instances of classes from these are not a passed between user and kernel code, e.g., as long as Avro and HttpClient classes are not a part of the MapReduce API. In this case classloaders (probably via OSGI) could permit this. > Ability for having user's classes take precedence over the system classes for > tasks' classpath > -- > > Key: MAPREDUCE-1938 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1938 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: job submission, task, tasktracker >Reporter: Devaraj Das > Fix For: 0.22.0 > > Attachments: mr-1938-bp20.patch > > > It would be nice to have the ability in MapReduce to allow users to specify > for their jobs alternate implementations of classes that are already defined > in the MapReduce libraries. For example, an alternate implementation for > CombineFileInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1911) Fix errors in -info option in streaming
[ https://issues.apache.org/jira/browse/MAPREDUCE-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888399#action_12888399 ] Hadoop QA commented on MAPREDUCE-1911: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449235/patch-1911-1.txt against trunk revision 963986. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/298/console This message is automatically generated. > Fix errors in -info option in streaming > --- > > Key: MAPREDUCE-1911 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1911 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Reporter: Amareshwari Sriramadasu >Assignee: Amareshwari Sriramadasu > Fix For: 0.22.0 > > Attachments: patch-1911-1.txt, patch-1911.txt > > > Here are some of the findings by Karam while verifying -info option in > streaming: > # We need to add "Optional" for -mapper, -reducer,-combiner and -file options. > # For -inputformat and -outputformat options, we should put "Optional" in the > prefix for the sake on uniformity. > # We need to remove -cluster decription. > # -help option is not displayed in usage message. > # when displaying message for -info or -help options, we should not display > "Streaming Job Failed!"; also exit code should be 0 in case of -help/-info > option. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1686) ClassNotFoundException for custom format classes provided in libjars
[ https://issues.apache.org/jira/browse/MAPREDUCE-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888358#action_12888358 ] Paul Burkhardt commented on MAPREDUCE-1686: --- Okay, I'll try and do that. Paul > ClassNotFoundException for custom format classes provided in libjars > > > Key: MAPREDUCE-1686 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1686 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.20.2 >Reporter: Paul Burkhardt >Priority: Minor > > The StreamUtil::goodClassOrNull method assumes user-provided classes have > package names and if not, they are part of the Hadoop Streaming package. For > example, using custom InputFormat or OutputFormat classes without package > names will fail with a ClassNotFound exception which is not indicative given > the classes are provided in the libjars option. Admittedly, most Java > packages should have a package name so this should rarely come up. > Possible resolution options: > 1) modify the error message to include the actual classname that was > attempted in the goodClassOrNull method > 2) call the Configuration::getClassByName method first and if class not found > check for default package name and try the call again > {code} > public static Class goodClassOrNull(Configuration conf, String className, > String defaultPackage) { > Class clazz = null; > try { > clazz = conf.getClassByName(className); > } catch (ClassNotFoundException cnf) { > } > if (clazz == null) { > if (className.indexOf('.') == -1 && defaultPackage != null) { > className = defaultPackage + "." + className; > try { > clazz = conf.getClassByName(className); > } catch (ClassNotFoundException cnf) { > } > } > } > return clazz; > } > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1941) Need a servlet in JobTracker to stream contents of the job history file
Need a servlet in JobTracker to stream contents of the job history file --- Key: MAPREDUCE-1941 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1941 Project: Hadoop Map/Reduce Issue Type: New Feature Components: jobtracker Affects Versions: 0.22.0 Reporter: Srikanth Sundarrajan Assignee: Srikanth Sundarrajan There is no convenient mechanism to retrieve the contents of the job history file. Need a way to retrieve the job history file contents from Job Tracker. This can perhaps be implemented as a servlet on the Job tracker. * Create a jsp/servlet that accepts job id as a request parameter * Stream the contents of the history file corresponding to the job id, if user has permissions to view the job details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1928) Dynamic information fed into Hadoop for controlling execution of a submitted job
[ https://issues.apache.org/jira/browse/MAPREDUCE-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888332#action_12888332 ] Steven Lewis commented on MAPREDUCE-1928: - Another possible use has to do with adjusting parameters to avoid failures. I have an issue where a reducer is running out of memory. If I was aware that certain keys lead to this failure I could take steps such as sampling data rather than processing the whole set do I would add access to data about failures > Dynamic information fed into Hadoop for controlling execution of a submitted > job > > > Key: MAPREDUCE-1928 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1928 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: job submission, jobtracker, tasktracker >Affects Versions: 0.20.3 >Reporter: Raman Grover > Original Estimate: 2016h > Remaining Estimate: 2016h > > Currently the job submission protocol requires the job provider to put every > bit of information inside an instance of JobConf. The submitted information > includes the input data (hdfs path) , suspected resource requirement, number > of reducers etc. This information is read by JobTracker as part of job > initialization. Once initialized, job is moved into a running state. From > this point, there is no mechanism for any additional information to be fed > into Hadoop infrastructure for controlling the job execution. >The execution pattern for the job looks very much > static from this point. Using the size of input data and a few settings > inside JobConf, number of mappers is computed. Hadoop attempts at reading the > whole of data in parallel by launching parallel map tasks. Once map phase is > over, a known number of reduce tasks (supplied as part of JobConf) are > started. > Parameters that control the job execution were set in JobConf prior to > reading the input data. As the map phase progresses, useful information based > upon the content of the input data surfaces and can be used in controlling > the further execution of the job. Let us walk through some of the examples > where additional information can be fed to Hadoop subsequent to job > submission for optimal execution of the job. > I) "Process a part of the input , based upon the results decide if reading > more input is required " > In a huge data set, user is interested in finding 'k' records that > satisfy a predicate, essentially sampling the data. In current > implementation, as the data is huge, a large no of mappers would be launched > consuming a significant fraction of the available map slots in the cluster. > Each map task would attempt at emitting a max of 'k' records. With N > mappers, we get N*k records out of which one can pick any k to form the final > result. >This is not optimal as: >1) A larger number of map slots get occupied initially, affecting other > jobs in the queue. >2) If the selectivity of input data is very low, we essentially did not > need scanning the whole of data to form our result. > we could have finished by reading a fraction of input data, > monitoring the cardinality of the map output and determining if >more input needs to be processed. > >Optimal way: If reading the whole of input requires N mappers, launch only > 'M' initially. Allow them to complete. Based upon the statistics collected, > decide additional number of mappers to be launched next and so on until the > whole of input has been processed or enough records have been collected to > for the results, whichever is earlier. > > > II) "Here is some data, the remaining is yet to arrive, but you may start > with it, and receive more input later" > Consider a chain of 2 M-R jobs chained together such that the latter > reads the output of the former. The second MR job cannot be started until the > first has finished completely. This is essentially because Hadoop needs to be > told the complete information about the input before beginning the job. > The first M-R has produced enough data ( not finished yet) that can be > processed by another MR job and hence the other MR need not wait to grab the > whole of input before beginning. Input splits could be supplied later , but > ofcourse before the copy/shuffle phase. > > III) " Input data has undergone one round of processing by map phase, have > some stats, can now say of the resources > required further" >Mappers can produce useful stats about of their output, like the > cardinality or produce a histogram describing distribution of output . These > stats are available to the job provider (Hive/Pig/End User) who can > now determ
[jira] Commented: (MAPREDUCE-1621) Streaming's TextOutputReader.getLastOutput throws NPE if it has never read any output
[ https://issues.apache.org/jira/browse/MAPREDUCE-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888319#action_12888319 ] Hadoop QA commented on MAPREDUCE-1621: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449214/patch-1621.txt against trunk revision 962682. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/597/console This message is automatically generated. > Streaming's TextOutputReader.getLastOutput throws NPE if it has never read > any output > - > > Key: MAPREDUCE-1621 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1621 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.21.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Fix For: 0.22.0 > > Attachments: patch-1621.txt > > > If TextOutputReader.readKeyValue() has never successfully read a line, then > its bytes member will be left null. Thus when logging a task failure, > PipeMapRed.getContext() can trigger an NPE when it calls > outReader_.getLastOutput(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1896) [Herriot] New property for multi user list.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888307#action_12888307 ] Hadoop QA commented on MAPREDUCE-1896: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448436/MAPREDUCE-1896.patch against trunk revision 962682. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/297/console This message is automatically generated. > [Herriot] New property for multi user list. > --- > > Key: MAPREDUCE-1896 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1896 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Affects Versions: 0.21.0 >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: MAPREDUCE-1896.patch, MAPREDUCE-1896.patch, > MAPREDUCE-1896.patch > > > Adding new property for multi user list. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1912) [Rumen] Add a driver for Rumen tool
[ https://issues.apache.org/jira/browse/MAPREDUCE-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888300#action_12888300 ] Ravi Gummadi commented on MAPREDUCE-1912: - Some comments: (1) In build.xml, please change ${common.ivy.lib.dir dir} to ${common.ivy.lib.dir} directory. (2) In Folder.java, in initialize() method, printUsage() should be called at the 2 places where IllegalArgumentException is thrown(just before throwing). (3) In Rumen.java, please change "A Rumen tool fold/scale the trace" to "A Rumen tool to fold/scale the trace". (4) In TraceBuilder.java, please reverse the conditions in the following while statement so that validation of index is done before accessing the element at that index. {code}while (args[switchTop].startsWith("-") && switchTop < args.length){code} (5) As you observed the bug, please make the necessary code change of moving "++switchTop;" out of if statement in the above while loop --- to fix the bug of the infinite loop when some option that starts with "-"(and is not same as -denuxer) is given. (6) In both places in TraceBuilder.java where printUsage() is called, you are checking the case of zero more arguments only. We need to make sure that there are at least 3 arguments in both places. So change (a) "if (0 == args.length)" to "if (args.length < 3)" and (b) "if (switchTop == args.length)" to "if (switchTop+2 >= args.length)". > [Rumen] Add a driver for Rumen tool > > > Key: MAPREDUCE-1912 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1912 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tools/rumen >Affects Versions: 0.22.0 >Reporter: Amar Kamat >Assignee: Amar Kamat > Fix For: 0.22.0 > > Attachments: mapreduce-1912-v1.1.patch > > > Rumen, as a tool, has 2 entry points : > - Trace builder > - Folder > It would be nice to have a single driver program and have 'trace-builder' and > 'folder' as its options. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1940) [Rumen] Add appropriate switches to Folder and TraceBuilder w.r.t input and output files
[Rumen] Add appropriate switches to Folder and TraceBuilder w.r.t input and output files Key: MAPREDUCE-1940 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1940 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tools/rumen Reporter: Amar Kamat Currently Folder and TraceBuilder expect the input and output to be the last arguments in the command line. It would be better to add special switches to the input and output files to avoid confusion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1526) Cache the job related information while submitting the job , this would avoid many RPC calls to JobTracker.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas resolved MAPREDUCE-1526. -- Hadoop Flags: [Reviewed] Assignee: rahul k singh Fix Version/s: 0.22.0 Resolution: Fixed Fixed in MAPREDUCE-1840 > Cache the job related information while submitting the job , this would avoid > many RPC calls to JobTracker. > --- > > Key: MAPREDUCE-1526 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1526 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/gridmix >Reporter: rahul k singh >Assignee: rahul k singh > Fix For: 0.22.0 > > Attachments: 1526-yahadoop-20-101-2.patch, > 1526-yahadoop-20-101-3.patch, 1526-yahadoop-20-101.patch, > 1526-yhadoop-20-101-4.patch, 1526-yhadoop-20-101-4.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1376) Support for varied user submission in Gridmix
[ https://issues.apache.org/jira/browse/MAPREDUCE-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas resolved MAPREDUCE-1376. -- Hadoop Flags: [Reviewed] Fix Version/s: 0.22.0 Resolution: Fixed Fixed in MAPREDUCE-1840 > Support for varied user submission in Gridmix > - > > Key: MAPREDUCE-1376 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1376 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/gridmix >Reporter: Chris Douglas >Assignee: Chris Douglas > Fix For: 0.22.0 > > Attachments: 1376-2-yhadoop-security.patch, > 1376-3-yhadoop20.100.patch, 1376-4-yhadoop20.100.patch, > 1376-5-yhadoop20-100.patch, 1376-yhadoop-security.patch, M1376-0.patch, > M1376-1.patch, M1376-2.patch, M1376-3.patch, M1376-4.patch > > > Gridmix currently submits all synthetic jobs as the client user. It should be > possible to map users in the trace to a set of users appropriate for the > target cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1711) Gridmix should provide an option to submit jobs to the same queues as specified in the trace.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas resolved MAPREDUCE-1711. -- Hadoop Flags: [Reviewed] Fix Version/s: 0.22.0 Resolution: Fixed Fixed in MAPREDUCE-1840 > Gridmix should provide an option to submit jobs to the same queues as > specified in the trace. > - > > Key: MAPREDUCE-1711 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1711 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/gridmix >Reporter: Hong Tang >Assignee: rahul k singh > Fix For: 0.22.0 > > Attachments: diff-gridmix.patch, diff-rumen.patch, > MR-1711-yhadoop-20-1xx-2.patch, MR-1711-yhadoop-20-1xx-3.patch, > MR-1711-yhadoop-20-1xx-4.patch, MR-1711-yhadoop-20-1xx-5.patch, > MR-1711-yhadoop-20-1xx-6.patch, MR-1711-yhadoop-20-1xx-7.patch, > MR-1711-yhadoop-20-1xx.patch, MR-1711-Yhadoop-20-crossPort-1.patch, > MR-1711-Yhadoop-20-crossPort-2.patch, MR-1711-Yhadoop-20-crossPort.patch, > mr-1711-yhadoop-20.1xx-20100416.patch > > > Gridmix should provide an option to submit jobs to the same queues as > specified in the trace. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1594) Support for Sleep Jobs in gridmix
[ https://issues.apache.org/jira/browse/MAPREDUCE-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas resolved MAPREDUCE-1594. -- Hadoop Flags: [Reviewed] Assignee: rahul k singh Fix Version/s: 0.22.0 Resolution: Fixed Fixed in MAPREDUCE-1840 > Support for Sleep Jobs in gridmix > - > > Key: MAPREDUCE-1594 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1594 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/gridmix >Reporter: rahul k singh >Assignee: rahul k singh > Fix For: 0.22.0 > > Attachments: 1376-5-yhadoop20-100-3.patch, 1594-diff-4-5.patch, > 1594-yhadoop-20-1xx-1-2.patch, 1594-yhadoop-20-1xx-1-3.patch, > 1594-yhadoop-20-1xx-1-4.patch, 1594-yhadoop-20-1xx-1-5.patch, > 1594-yhadoop-20-1xx-1.patch, 1594-yhadoop-20-1xx.patch > > > Support for Sleep jobs in gridmix -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1840) [Gridmix] Exploit/Add security features in GridMix
[ https://issues.apache.org/jira/browse/MAPREDUCE-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1840: - Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed I committed this. Thanks to Amar, Rahul, and Hong > [Gridmix] Exploit/Add security features in GridMix > -- > > Key: MAPREDUCE-1840 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1840 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/gridmix >Affects Versions: 0.22.0 >Reporter: Amar Kamat >Assignee: Amar Kamat > Fix For: 0.22.0 > > Attachments: mapreduce-gridmix-fp-v1.3.3.patch, > mapreduce-gridmix-fp-v1.3.9.patch > > > Use security information while replaying jobs in Gridmix. This includes > - Support for multiple users > - Submitting jobs as different users > - Allowing usage of secure cluster (hdfs + mapreduce) > - Support for multiple queues > Other features include : > - Support for sleep job > - Support for load job > + testcases for verifying all of the above changes -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-1925: Attachment: 1925.v1.patch Attaching new patch incorporating review comments. > TestRumenJobTraces fails in trunk > - > > Key: MAPREDUCE-1925 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tools/rumen >Affects Versions: 0.22.0 >Reporter: Amareshwari Sriramadasu >Assignee: Ravi Gummadi > Fix For: 0.22.0 > > Attachments: 1925.patch, 1925.v1.patch > > > TestRumenJobTraces failed with following error: > Error Message > the gold file contains more text at line 1 expected:<56> but was:<0> > Stacktrace > at > org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294) > Full log of the failure is available at > http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888257#action_12888257 ] Ravi Gummadi commented on MAPREDUCE-1925: - Thanks Hong. Will upload new patch which removes that .gz file and the testcase itself contains the expected list of events as array of Strings. > TestRumenJobTraces fails in trunk > - > > Key: MAPREDUCE-1925 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tools/rumen >Affects Versions: 0.22.0 >Reporter: Amareshwari Sriramadasu >Assignee: Ravi Gummadi > Fix For: 0.22.0 > > Attachments: 1925.patch > > > TestRumenJobTraces failed with following error: > Error Message > the gold file contains more text at line 1 expected:<56> but was:<0> > Stacktrace > at > org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294) > Full log of the failure is available at > http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888249#action_12888249 ] Hong Tang commented on MAPREDUCE-1925: -- Git diff --text will add binary diff to the patch. > TestRumenJobTraces fails in trunk > - > > Key: MAPREDUCE-1925 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tools/rumen >Affects Versions: 0.22.0 >Reporter: Amareshwari Sriramadasu >Assignee: Ravi Gummadi > Fix For: 0.22.0 > > Attachments: 1925.patch > > > TestRumenJobTraces failed with following error: > Error Message > the gold file contains more text at line 1 expected:<56> but was:<0> > Stacktrace > at > org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294) > Full log of the failure is available at > http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1710) Process tree clean up of exceeding memory limit tasks.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888247#action_12888247 ] Hadoop QA commented on MAPREDUCE-1710: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449101/MAPREDUCE-1710.patch against trunk revision 962682. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/596/console This message is automatically generated. > Process tree clean up of exceeding memory limit tasks. > -- > > Key: MAPREDUCE-1710 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1710 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Affects Versions: 0.21.0 >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1710-ydist_security.patch, 1710-ydist_security.patch, > 1710-ydist_security.patch, MAPREDUCE-1710.patch, memorylimittask_1710.patch, > memorylimittask_1710.patch, memorylimittask_1710.patch, > memorylimittask_1710.patch, memorylimittask_1710.patch > > > 1. Submit a job which would spawn child processes and each of the child > processes exceeds the memory limits. Let the job complete . Check if all the > child processes are killed, the overall job should fail. > 2. Submit a job which would spawn child processes and each of the child > processes exceeds the memory limits. Kill/fail the job while in progress. > Check if all the child processes are killed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1865) [Rumen] Rumen should also support jobhistory files generated using trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-1865: Status: Patch Available (was: Open) > [Rumen] Rumen should also support jobhistory files generated using trunk > > > Key: MAPREDUCE-1865 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1865 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tools/rumen >Affects Versions: 0.22.0 >Reporter: Amar Kamat >Assignee: Amar Kamat > Fix For: 0.22.0 > > Attachments: mapreduce-1865-v1.2.patch, mapreduce-1865-v1.6.2.patch, > mapreduce-1865-v1.7.1.patch, mapreduce-1865-v1.7.patch > > > Rumen code in trunk parses and process only jobhistory files from pre-21 > hadoop mapreduce clusters. It should also support jobhistory files generated > using trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1713) Utilities for system tests specific.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888243#action_12888243 ] Hadoop QA commented on MAPREDUCE-1713: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449108/MAPREDUCE-1713.patch against trunk revision 962682. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/296/console This message is automatically generated. > Utilities for system tests specific. > > > Key: MAPREDUCE-1713 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1713 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Affects Versions: 0.21.0 >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1713-ydist-security.patch, 1713-ydist-security.patch, > 1713-ydist-security.patch, 1713-ydist-security.patch, > 1713-ydist-security.patch, MAPREDUCE-1713.patch, MAPREDUCE-1713.patch, > MAPREDUCE-1713.patch, systemtestutils_MR1713.patch, > utilsforsystemtest_1713.patch > > > 1. A method for restarting the daemon with new configuration. > public static void restartCluster(Hashtable props, String > confFile) throws Exception; > 2. A method for resetting the daemon with default configuration. > public void resetCluster() throws Exception; > 3. A method for waiting until daemon to stop. > public void waitForClusterToStop() throws Exception; > 4. A method for waiting until daemon to start. > public void waitForClusterToStart() throws Exception; > 5. A method for checking the job whether it has started or not. > public boolean isJobStarted(JobID id) throws IOException; > 6. A method for checking the task whether it has started or not. > public boolean isTaskStarted(TaskInfo taskInfo) throws IOException; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1878) Add MRUnit documentation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888238#action_12888238 ] Amareshwari Sriramadasu commented on MAPREDUCE-1878: I think the document can be added as package.html in mrunit package instead of .txt file, similar to all other packages. > Add MRUnit documentation > > > Key: MAPREDUCE-1878 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1878 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/mrunit >Reporter: Aaron Kimball >Assignee: Aaron Kimball > Attachments: MAPREDUCE-1878.2.patch, MAPREDUCE-1878.patch > > > A short user guide for MRUnit, written in asciidoc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1865) [Rumen] Rumen should also support jobhistory files generated using trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amar Kamat updated MAPREDUCE-1865: -- Attachment: mapreduce-1865-v1.7.1.patch Attaching a slightly modified patch with changes to comments and assert messages. > [Rumen] Rumen should also support jobhistory files generated using trunk > > > Key: MAPREDUCE-1865 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1865 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tools/rumen >Affects Versions: 0.22.0 >Reporter: Amar Kamat >Assignee: Amar Kamat > Fix For: 0.22.0 > > Attachments: mapreduce-1865-v1.2.patch, mapreduce-1865-v1.6.2.patch, > mapreduce-1865-v1.7.1.patch, mapreduce-1865-v1.7.patch > > > Rumen code in trunk parses and process only jobhistory files from pre-21 > hadoop mapreduce clusters. It should also support jobhistory files generated > using trunk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.