[jira] [Commented] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often
[ https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425493#comment-13425493 ] Hadoop QA commented on MAPREDUCE-3943: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538489/MR3943_trunk.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 17 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2680//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2680//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2680//console This message is automatically generated. > RM-NM secret-keys should be randomly generated and rolled every so often > > > Key: MAPREDUCE-3943 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: mrv2, security >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, > MR3943_branch-23.txt, MR3943_branch-23.txt, MR3943_trunk.txt, MR3943_trunk.txt > > > - RM should generate the master-key randomly > - The master-key should roll every so often > - NM should remember old expired keys so that already doled out > container-requests can be satisfied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often
[ https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3943: -- Attachment: MR3943_trunk.txt Updated to fix javadoc warning and a findbugs warning. 4 of them are from the FairScheduler - unrelated to this patch. > RM-NM secret-keys should be randomly generated and rolled every so often > > > Key: MAPREDUCE-3943 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: mrv2, security >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, > MR3943_branch-23.txt, MR3943_branch-23.txt, MR3943_trunk.txt, MR3943_trunk.txt > > > - RM should generate the master-key randomly > - The master-key should roll every so often > - NM should remember old expired keys so that already doled out > container-requests can be satisfied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often
[ https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3943: -- Status: Patch Available (was: Open) > RM-NM secret-keys should be randomly generated and rolled every so often > > > Key: MAPREDUCE-3943 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: mrv2, security >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, > MR3943_branch-23.txt, MR3943_branch-23.txt, MR3943_trunk.txt, MR3943_trunk.txt > > > - RM should generate the master-key randomly > - The master-key should roll every so often > - NM should remember old expired keys so that already doled out > container-requests can be satisfied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often
[ https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3943: -- Attachment: MR3943_branch-23.txt > RM-NM secret-keys should be randomly generated and rolled every so often > > > Key: MAPREDUCE-3943 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: mrv2, security >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, > MR3943_branch-23.txt, MR3943_branch-23.txt, MR3943_trunk.txt > > > - RM should generate the master-key randomly > - The master-key should roll every so often > - NM should remember old expired keys so that already doled out > container-requests can be satisfied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often
[ https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3943: -- Status: Open (was: Patch Available) > RM-NM secret-keys should be randomly generated and rolled every so often > > > Key: MAPREDUCE-3943 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: mrv2, security >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, > MR3943_branch-23.txt, MR3943_branch-23.txt, MR3943_trunk.txt > > > - RM should generate the master-key randomly > - The master-key should roll every so often > - NM should remember old expired keys so that already doled out > container-requests can be satisfied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often
[ https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425469#comment-13425469 ] Hadoop QA commented on MAPREDUCE-3943: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538477/MR3943_trunk.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 17 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2679//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2679//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2679//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2679//console This message is automatically generated. > RM-NM secret-keys should be randomly generated and rolled every so often > > > Key: MAPREDUCE-3943 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: mrv2, security >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, > MR3943_branch-23.txt, MR3943_trunk.txt > > > - RM should generate the master-key randomly > - The master-key should roll every so often > - NM should remember old expired keys so that already doled out > container-requests can be satisfied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often
[ https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3943: -- Attachment: MR3943_trunk.txt Up-merged patch for trunk. Also ads some securityEnabled checks. > RM-NM secret-keys should be randomly generated and rolled every so often > > > Key: MAPREDUCE-3943 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: mrv2, security >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, > MR3943_branch-23.txt, MR3943_trunk.txt > > > - RM should generate the master-key randomly > - The master-key should roll every so often > - NM should remember old expired keys so that already doled out > container-requests can be satisfied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often
[ https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3943: -- Attachment: MR3943_branch-23.txt Patch for branch-23. > RM-NM secret-keys should be randomly generated and rolled every so often > > > Key: MAPREDUCE-3943 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: mrv2, security >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, > MR3943_branch-23.txt, MR3943_trunk.txt > > > - RM should generate the master-key randomly > - The master-key should roll every so often > - NM should remember old expired keys so that already doled out > container-requests can be satisfied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often
[ https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3943: -- Status: Patch Available (was: Open) > RM-NM secret-keys should be randomly generated and rolled every so often > > > Key: MAPREDUCE-3943 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: mrv2, security >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, > MR3943_branch-23.txt, MR3943_trunk.txt > > > - RM should generate the master-key randomly > - The master-key should roll every so often > - NM should remember old expired keys so that already doled out > container-requests can be satisfied. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425435#comment-13425435 ] Bo Wang commented on MAPREDUCE-4495: Hi Mayank, Memory is just for the first version. WF state will be serialized to HDFS in JSON format. We can update the file once a job is done, or serialize multiple versions, so that only those unfinished jobs need to be rerun. As the interface, currently I follow Oozie workflow.xml, though I expect more types of interfaces can be supported. Thanks, Bo > Workflow Application Master in YARN > --- > > Key: MAPREDUCE-4495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.0.0-alpha >Reporter: Bo Wang >Assignee: Bo Wang > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-2454) Allow external sorter plugin for MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur reassigned MAPREDUCE-2454: - Assignee: Mariappan Asokan Mariappan, the latest patch does not apply to trunk, would you post a new patch rebased to trunk? Thx > Allow external sorter plugin for MR > --- > > Key: MAPREDUCE-2454 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.0.0-alpha, 3.0.0, 2.2.0-alpha >Reporter: Mariappan Asokan >Assignee: Mariappan Asokan >Priority: Minor > Labels: features, performance, plugin, sort > Attachments: HadoopSortPlugin.pdf, KeyValueIterator.java, > MR-2454-trunkPatchPreview.gz, MapOutputSorter.java, > MapOutputSorterAbstract.java, ReduceInputSorter.java, mapreduce-2454.patch, > mr-2454-on-mr-279-build82.patch.gz > > > Define interfaces and some abstract classes in the Hadoop framework to > facilitate external sorter plugins both on the Map and Reduce sides. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425421#comment-13425421 ] Mayank Bansal commented on MAPREDUCE-4495: -- Hi, I am not sure memory is the right thing to do, HDFS file is OK. Reason is if you have one workflow dag of 10 mapreduce jobs and 10th job is failed for some reason then you need to run the whole DAG again if you want to retry. In some cases you may need to rerun everything but most of the cases you don't need to, so persisting the state is very useful. I also want to understand the interface to user? Is it same as workflow.xml in Oozie? Thanks, Mayank > Workflow Application Master in YARN > --- > > Key: MAPREDUCE-4495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.0.0-alpha >Reporter: Bo Wang >Assignee: Bo Wang > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4497) JobTracker webUI "Kill Selected Jobs" button has no effect
[ https://issues.apache.org/jira/browse/MAPREDUCE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] George Datskos updated MAPREDUCE-4497: -- Attachment: JT-webui-modify.patch > JobTracker webUI "Kill Selected Jobs" button has no effect > -- > > Key: MAPREDUCE-4497 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4497 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 1.0.3 >Reporter: George Datskos >Priority: Minor > Attachments: JT-webui-modify.patch > > > When enabling the *webinterface.private.actions* property to true, the > JobTracker displays additional buttons allowing the user to (1) kill jobs or > (2) change the priority of a job. However, an erroneous interaction between > the HTML (produced by mapred/JSPUtil.java) and sorttable.js leads to these > two form buttons having no effect because the {{}} {{}} is moved > down below the submit buttons (by sorttable.js). sorttable.js was introduced > by MAPREDUCE-1118 (so all versions from v0.20.203 are probably affected). In > JSPUtil.java, the form element is placed inside the table and spans multiple > {{}} and {{}} which is incorrect. Placing the form around the table > fixes this bug (see patch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4497) JobTracker webUI "Kill Selected Jobs" button has no effect
George Datskos created MAPREDUCE-4497: - Summary: JobTracker webUI "Kill Selected Jobs" button has no effect Key: MAPREDUCE-4497 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4497 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.0.3 Reporter: George Datskos Priority: Minor When enabling the *webinterface.private.actions* property to true, the JobTracker displays additional buttons allowing the user to (1) kill jobs or (2) change the priority of a job. However, an erroneous interaction between the HTML (produced by mapred/JSPUtil.java) and sorttable.js leads to these two form buttons having no effect because the {{}} {{}} is moved down below the submit buttons (by sorttable.js). sorttable.js was introduced by MAPREDUCE-1118 (so all versions from v0.20.203 are probably affected). In JSPUtil.java, the form element is placed inside the table and spans multiple {{}} and {{}} which is incorrect. Placing the form around the table fixes this bug (see patch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425412#comment-13425412 ] Bo Wang commented on MAPREDUCE-4495: [~mayank_bansal] I will post a design document soon. For the place to store WF state, initially it will be kept in memory. Later it could be persisted to a file in HDFS. DAG AM will run a single WF, so no DB is required here. > Workflow Application Master in YARN > --- > > Key: MAPREDUCE-4495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.0.0-alpha >Reporter: Bo Wang >Assignee: Bo Wang > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425406#comment-13425406 ] Alejandro Abdelnur commented on MAPREDUCE-4495: --- @arun, I assume you've meant ' You'd have to contribute to Hadoop whatever you want to use from Oozie', right? If so, I'm good with that, I don't care where the code lives if I can use it. I've handed over to Bo a version of OOZIE-593 patch that does not have Oozie dependencies. > Workflow Application Master in YARN > --- > > Key: MAPREDUCE-4495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.0.0-alpha >Reporter: Bo Wang >Assignee: Bo Wang > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425398#comment-13425398 ] Arun C Murthy commented on MAPREDUCE-4495: -- Also, pls note that we cannot have a dependency on Oozie. You'd have to contribute whatever you want to use to Oozie. > Workflow Application Master in YARN > --- > > Key: MAPREDUCE-4495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.0.0-alpha >Reporter: Bo Wang >Assignee: Bo Wang > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425394#comment-13425394 ] Arun C Murthy commented on MAPREDUCE-4495: -- bq. Thats great however can you please post the initial design document for that and your idea for how user will be expose to the interface. Agreed. Please do that first. It's much easier to talk through that than reams of code - let's avoid a situation like the one we have in MAPREDUCE-4334. > Workflow Application Master in YARN > --- > > Key: MAPREDUCE-4495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.0.0-alpha >Reporter: Bo Wang >Assignee: Bo Wang > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425383#comment-13425383 ] Karthik Kambatla commented on MAPREDUCE-4334: - +1 on design - 2(b), and the patch looks good. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, > MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, > mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425370#comment-13425370 ] Alejandro Abdelnur commented on MAPREDUCE-4334: --- I like the current patch, it does not add complexity and it will be trivial to wire it with MAPREDUCE-4327 once CPU units are part of resources. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, > MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, > mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425366#comment-13425366 ] Mayank Bansal commented on MAPREDUCE-4495: -- Thats great however can you please post the initial design document for that and your idea for how user will be expose to the interface. Where are you planning to store the state as Oozie does that on the db? Thanks, Mayank > Workflow Application Master in YARN > --- > > Key: MAPREDUCE-4495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.0.0-alpha >Reporter: Bo Wang >Assignee: Bo Wang > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425361#comment-13425361 ] Bo Wang commented on MAPREDUCE-4495: Thanks for the suggestion, Mayank. There are several components helpful in Oozie. I plan to use the new wflib posted in OOZIE-593 by Alejandro. > Workflow Application Master in YARN > --- > > Key: MAPREDUCE-4495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.0.0-alpha >Reporter: Bo Wang >Assignee: Bo Wang > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425359#comment-13425359 ] Bo Wang commented on MAPREDUCE-4495: Hi Robert, thanks for offering to help. I am working on the first version and will let you know once it works. > Workflow Application Master in YARN > --- > > Key: MAPREDUCE-4495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.0.0-alpha >Reporter: Bo Wang >Assignee: Bo Wang > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425346#comment-13425346 ] Mayank Bansal commented on MAPREDUCE-4342: -- Yes for Archive files there is a Jira. MAPREDUCE-4349 I am working on that. For Hadoop 22 we didn't need to do anything special however I have added the test case for that. I am investigating for trunk. Thanks, Mayank > Distributed Cache gives inconsistent result if cache files get deleted from > task tracker > - > > Key: MAPREDUCE-4342 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.22.0, 1.0.3, trunk >Reporter: Mayank Bansal >Assignee: Mayank Bansal > Fix For: 2.2.0-alpha > > Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, > MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, > MAPREDUCE-4342-trunk-v2.patch, MAPREDUCE-4342-trunk-v3.patch, > MAPREDUCE-4342-trunk.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated MAPREDUCE-4342: -- Resolution: Fixed Fix Version/s: 2.2.0-alpha Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Mayank. Committed to trunk and branch-2. > Distributed Cache gives inconsistent result if cache files get deleted from > task tracker > - > > Key: MAPREDUCE-4342 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.22.0, 1.0.3, trunk >Reporter: Mayank Bansal >Assignee: Mayank Bansal > Fix For: 2.2.0-alpha > > Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, > MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, > MAPREDUCE-4342-trunk-v2.patch, MAPREDUCE-4342-trunk-v3.patch, > MAPREDUCE-4342-trunk.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425343#comment-13425343 ] Alejandro Abdelnur commented on MAPREDUCE-4342: --- +1. Is there a follow up JIRA to address Robert concerns on archives and corrupt files? > Distributed Cache gives inconsistent result if cache files get deleted from > task tracker > - > > Key: MAPREDUCE-4342 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.22.0, 1.0.3, trunk >Reporter: Mayank Bansal >Assignee: Mayank Bansal > Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, > MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, > MAPREDUCE-4342-trunk-v2.patch, MAPREDUCE-4342-trunk-v3.patch, > MAPREDUCE-4342-trunk.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4234) SortValidator.java is incompatible with multi-user or parallel use (due to a /tmp file with static name)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425326#comment-13425326 ] Hadoop QA commented on MAPREDUCE-4234: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538433/MR-4234.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2677//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2677//console This message is automatically generated. > SortValidator.java is incompatible with multi-user or parallel use (due to a > /tmp file with static name) > > > Key: MAPREDUCE-4234 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4234 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: examples >Affects Versions: 0.23.3, trunk >Reporter: Randy Clayton >Assignee: Robert Joseph Evans >Priority: Minor > Attachments: MAPREDUCE-4234.patch, MR-4234.txt, MR-4234.txt > > > The SortValidator.java file checkRecords method creates a file in the > /tmp/sortvalidator directory using a static filename. This can result in > failures due to name collisions when the > hadoop-mapreduce-client-jobclient-*-tests jar is used by more than one task > or one user simultaneously. We use this jar when testing compression codecs > and after we started running tests in parallel (four at a time to reduce > overall test time) we started experiencing random test failures due to name > collisions. Creating a random or unique per thread filename may resolve this > issue. We have developed a change to introduce per use unique file names. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4491) Encryption and Key Protection
[ https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425320#comment-13425320 ] Benoy Antony commented on MAPREDUCE-4491: - To Alejandro's questions: 1) If using compression codec for encryption, are you losing the compression capabilities if doing using encryption or will it work as a composition? What I have done is to first compress and then encrypt. I have hardcoded to ZIP. I can expose this as a configuration with a choice of {UNCOMPRESSED, ZIP, ZLIB, BZIP2}. This is an enhancement that I can add. I have also provided a DistributedSplitter so that files can be split into smaller files. I am not aware of an ability to chain multiple compression Codecs, though it was a desirable capability in this case. 2) For the keystores, are you proposing to store them in HDFS use file system permissions to protect them? Actually, I am not proposing to store them in HDFS. The keystores themselves are encrypted and a password is required to read keys from them. In the use cases that I have encountered, the keystores were external to the cluster. They were either on the CLI machine from where the jobs were submitted or on a separate machine from where the keys were retrieved based on user's credentials. (Alfredo was used in this regard to fetch keys via webservice) So they were two schemes that I have supported - 1) reading keys from Java keystore 2) reading keys from a web Service based keystore ("Safe") > Encryption and Key Protection > - > > Key: MAPREDUCE-4491 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: documentation, security, task-controller, tasktracker >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: Hadoop_Encryption.pdf > > > When dealing with sensitive data, it is required to keep the data encrypted > wherever it is stored. Common use case is to pull encrypted data out of a > datasource and store in HDFS for analysis. The keys are stored in an external > keystore. > The feature adds a customizable framework to integrate different types of > keystores, support for Java KeyStore, read keys from keystores, and transport > keys from JobClient to Tasks. > The feature adds PGP encryption as a codec and additional utilities to > perform encryption related steps. > The design document is attached. It explains the requirement, design and use > cases. > Kindly review and comment. Collaboration is very much welcome. > I have a tested patch for this for 1.1 and will upload it soon as an initial > work for further refinement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4496) AM logs link is missing user name
[ https://issues.apache.org/jira/browse/MAPREDUCE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425304#comment-13425304 ] Jason Lowe commented on MAPREDUCE-4496: --- This causes two problems: # While the AM is running, the stderr, stdout, and syslog links have to be clicked twice to display the page containing the log data. # When the AM is no longer running and log aggregation is enabled, the link no longer properly redirects to the history server. Instead it results in the error message: "Cannot get container logs without an app owner". > AM logs link is missing user name > - > > Key: MAPREDUCE-4496 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4496 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.3, 2.2.0-alpha >Reporter: Jason Lowe >Assignee: Jason Lowe > > The link to the ApplicationMaster's logs on the MRAppMaster's web page is > missing the user name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425303#comment-13425303 ] Hadoop QA commented on MAPREDUCE-4342: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538436/MAPREDUCE-4342-trunk-v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2678//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2678//console This message is automatically generated. > Distributed Cache gives inconsistent result if cache files get deleted from > task tracker > - > > Key: MAPREDUCE-4342 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.22.0, 1.0.3, trunk >Reporter: Mayank Bansal >Assignee: Mayank Bansal > Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, > MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, > MAPREDUCE-4342-trunk-v2.patch, MAPREDUCE-4342-trunk-v3.patch, > MAPREDUCE-4342-trunk.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ferguson updated MAPREDUCE-4334: --- Attachment: MAPREDUCE-4334-executor-v4.patch Updated version of executor-v3 which moves the actual wrapping of the launched command inside the "wrapCommand" method (which previously returned a value to prefix onto the launched command. > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, > MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, > mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated MAPREDUCE-4342: - Attachment: MAPREDUCE-4342-trunk-v3.patch Thanks Alejandro . I misunderstood your comments. Thanks for the clarification. It make sense, I am attaching the updated patch. Thanks, Mayank > Distributed Cache gives inconsistent result if cache files get deleted from > task tracker > - > > Key: MAPREDUCE-4342 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.22.0, 1.0.3, trunk >Reporter: Mayank Bansal >Assignee: Mayank Bansal > Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, > MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, > MAPREDUCE-4342-trunk-v2.patch, MAPREDUCE-4342-trunk-v3.patch, > MAPREDUCE-4342-trunk.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4491) Encryption and Key Protection
[ https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425286#comment-13425286 ] Benoy Antony commented on MAPREDUCE-4491: - To Rob's questions : Different Encryption Keys for Different files: At this point, the PGPCodec supports only one secret key/Key Pair for all input files. What we need is the ability to specify secret keys/key pair per input file. Another enhancement will be to specify secret keys/key pair per each phase like map->output , reduce->output . As you mentioned, this mapping has to specified via configuration. I'll try to add these two enhancements. Decryption/Encryption of different columns within the same file: This is actually left to the mapreduce programmer as he has to do the Decryption/Encryption of the fields programmatically. The programmer can choose to use different keys for different fields in the mapreduce program. Multiple keys can be retrieved from the keystore and these keys can be retrieved in the mapper/reducer using the credentials API. In a higher level interface like Hive, it may be possible to add additional metadata information to specify the key name. Another reviewer also has recommended to add this capability Hive to identify an encryption field and specify the key (name of the key) to be used to decrypt/encrypt it. Thanks for the review and recommendations, Rob. Please let me know if I have not answered the question correctly. > Encryption and Key Protection > - > > Key: MAPREDUCE-4491 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: documentation, security, task-controller, tasktracker >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: Hadoop_Encryption.pdf > > > When dealing with sensitive data, it is required to keep the data encrypted > wherever it is stored. Common use case is to pull encrypted data out of a > datasource and store in HDFS for analysis. The keys are stored in an external > keystore. > The feature adds a customizable framework to integrate different types of > keystores, support for Java KeyStore, read keys from keystores, and transport > keys from JobClient to Tasks. > The feature adds PGP encryption as a codec and additional utilities to > perform encryption related steps. > The design document is attached. It explains the requirement, design and use > cases. > Kindly review and comment. Collaboration is very much welcome. > I have a tested patch for this for 1.1 and will upload it soon as an initial > work for further refinement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425280#comment-13425280 ] Mayank Bansal commented on MAPREDUCE-4495: -- Thats a good idea however I will suggest use Oozie workflow library to do that. Please let me know if you need any help in that regard I can collaborate with you. Thanks, Mayank > Workflow Application Master in YARN > --- > > Key: MAPREDUCE-4495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.0.0-alpha >Reporter: Bo Wang >Assignee: Bo Wang > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4496) AM logs link is missing user name
Jason Lowe created MAPREDUCE-4496: - Summary: AM logs link is missing user name Key: MAPREDUCE-4496 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4496 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.3, 2.2.0-alpha Reporter: Jason Lowe Assignee: Jason Lowe The link to the ApplicationMaster's logs on the MRAppMaster's web page is missing the user name. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4234) SortValidator.java is incompatible with multi-user or parallel use (due to a /tmp file with static name)
[ https://issues.apache.org/jira/browse/MAPREDUCE-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-4234: --- Attachment: MR-4234.txt The failure looks spurious to me. I have not been able to reproduce it. I have upmerged and am posting the updated patch. > SortValidator.java is incompatible with multi-user or parallel use (due to a > /tmp file with static name) > > > Key: MAPREDUCE-4234 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4234 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: examples >Affects Versions: 0.23.3, trunk >Reporter: Randy Clayton >Assignee: Robert Joseph Evans >Priority: Minor > Attachments: MAPREDUCE-4234.patch, MR-4234.txt, MR-4234.txt > > > The SortValidator.java file checkRecords method creates a file in the > /tmp/sortvalidator directory using a static filename. This can result in > failures due to name collisions when the > hadoop-mapreduce-client-jobclient-*-tests jar is used by more than one task > or one user simultaneously. We use this jar when testing compression codecs > and after we started running tests in parallel (four at a time to reduce > overall test time) we started experiencing random test failures due to name > collisions. Creating a random or unique per thread filename may resolve this > issue. We have developed a change to introduce per use unique file names. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425262#comment-13425262 ] Robert Joseph Evans commented on MAPREDUCE-4495: I have been thinking about this a lot and I am very much +1 on this. It is not on the top of my priority list yet, but if you want help on this I would be very happy to collaborate on it with you. > Workflow Application Master in YARN > --- > > Key: MAPREDUCE-4495 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.0.0-alpha >Reporter: Bo Wang >Assignee: Bo Wang > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ferguson updated MAPREDUCE-4334: --- Attachment: mapreduce-4334-design-doc-v2.txt just a quick update to the design doc. at two points I wrote "create cgroups" when I meant "mount cgroups"; also fixes a typo. sorry for the spam! thanks, Andrew > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, > MAPREDUCE-4334-pre1.patch, MAPREDUCE-4334-pre2-with_cpu.patch, > MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre3-with_cpu.patch, > MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, > mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-4408) allow jobs to set a JAR that is in the distributed cached
[ https://issues.apache.org/jira/browse/MAPREDUCE-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur reassigned MAPREDUCE-4408: - Assignee: Robert Kanter (was: Alejandro Abdelnur) reassigning it Robert as he is doing the Oozie JIRA that requires this one. > allow jobs to set a JAR that is in the distributed cached > - > > Key: MAPREDUCE-4408 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4408 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, mrv2 >Affects Versions: 1.0.3, 2.0.0-alpha >Reporter: Alejandro Abdelnur >Assignee: Robert Kanter > > Setting a job JAR with JobConf.setJar(String) and Job.setJar(String) assumes > that the JAR is local to the client submitting the job, thus it triggers > copying the JAR to HDFS and injecting it to the distributed cached. > AFAIK, this is the only way to use uber JARs (JARs with JARs inside) in MR > jobs. > For jobs launched by Oozie, all JARs are already in HDFS. In order for Oozie > to suport uber JARs (OOZIE-654) there should be a way for specifying as JAR a > JAR that is already in HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ferguson updated MAPREDUCE-4334: --- Attachment: MAPREDUCE-4334-executor-v3.patch Updated version of "executor-v2" patch, which uses cgexec and hooks into the ContainersLauncher. See previously attached design doc for further details. thanks! Andrew > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, > MAPREDUCE-4334-pre1.patch, MAPREDUCE-4334-pre2-with_cpu.patch, > MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre3-with_cpu.patch, > MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, > mapreduce-4334-design-doc.txt > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4493) Distibuted Cache Compatability Issues
[ https://issues.apache.org/jira/browse/MAPREDUCE-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-4493: --- Attachment: MR-4493.txt This patch deprecates the configs and functions associated with turning symlinks on. It also updates the docs. > Distibuted Cache Compatability Issues > - > > Key: MAPREDUCE-4493 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4493 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0 >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans >Priority: Critical > Attachments: MR-4493.txt > > > The distributed cache does not work like it does in 1.0. > mapreduce.job.cache.symlink.create is completely ignored and symlinks are > always created no matter what. Files and archives without a fragment will > also have symlinks created. > If two cache archives or cache files happen to have the same name, or same > symlink fragment only the last one in the list is localized. > The localCacheArchives and LocalCacheFiles are not set correctly when these > duplicates happen causing off by one or more errors for anyone trying to use > them. > The reality is that use of symlinking is so common currently that these > incompatibilities are not that likely to show up, but we still need to fix > them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers
[ https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ferguson updated MAPREDUCE-4334: --- Attachment: mapreduce-4334-design-doc.txt Design document outlining the two primary designs proposed here, as well as an alternate version of the second. Summarizes pros/cons discussed earlier in the JIRA. More data, including screenshots from a live demo available here: http://www.cs.brown.edu/~adf/files/CgroupsPresentation.pptx > Add support for CPU isolation/monitoring of containers > -- > > Key: MAPREDUCE-4334 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Reporter: Arun C Murthy >Assignee: Andrew Ferguson > Attachments: MAPREDUCE-4334-executor-v1.patch, > MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-pre1.patch, > MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, > MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, > MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, > mapreduce-4334-design-doc.txt > > > Once we get in MAPREDUCE-4327, it will be important to actually enforce > limits on CPU consumption of containers. > Several options spring to mind: > # taskset (RHEL5+) > # cgroups (RHEL6+) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4491) Encryption and Key Protection
[ https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425244#comment-13425244 ] Alejandro Abdelnur commented on MAPREDUCE-4491: --- Benoy, I've done a quick read to the doc. A couple of initial questions: * If using compression codec for encryption, are you losing the compression capabilities if doing using encryption or will it work as a composition? * For the keystores, are you proposing to store them in HDFS use file system permissions to protect them? I'm not sure if I understood this part correctly. If that is the case, then HDFS-3637 would ensure secure transfer. I'll read the design doc in more detail later this week. > Encryption and Key Protection > - > > Key: MAPREDUCE-4491 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: documentation, security, task-controller, tasktracker >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: Hadoop_Encryption.pdf > > > When dealing with sensitive data, it is required to keep the data encrypted > wherever it is stored. Common use case is to pull encrypted data out of a > datasource and store in HDFS for analysis. The keys are stored in an external > keystore. > The feature adds a customizable framework to integrate different types of > keystores, support for Java KeyStore, read keys from keystores, and transport > keys from JobClient to Tasks. > The feature adds PGP encryption as a codec and additional utilities to > perform encryption related steps. > The design document is attached. It explains the requirement, design and use > cases. > Kindly review and comment. Collaboration is very much welcome. > I have a tested patch for this for 1.1 and will upload it soon as an initial > work for further refinement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4495) Workflow Application Master in YARN
Bo Wang created MAPREDUCE-4495: -- Summary: Workflow Application Master in YARN Key: MAPREDUCE-4495 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha Reporter: Bo Wang Assignee: Bo Wang It is useful to have a workflow application master, which will be capable of running a DAG of jobs. The workflow client submits a DAG request to the AM and then the AM will manage the life cycle of this application in terms of requesting the needed resources from the RM, and starting, monitoring and retrying the application's individual tasks. Compared to running Oozie with the current MapReduce Application Master, these are some of the advantages: - Less number of consumed resources, since only one application master will be spawned for the whole workflow. - Reuse of resources, since the same resources can be used by multiple consecutive jobs in the workflow (no need to request/wait for resources for every individual job from the central RM). - More optimization opportunities in terms of collective resource requests. - Optimization opportunities in terms of rewriting and composing jobs in the workflow (e.g. pushing down Mappers). - This Application Master can be reused/extended by higher systems like Pig and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425225#comment-13425225 ] Alejandro Abdelnur commented on MAPREDUCE-4342: --- Mayank, I don't see my previous second comment being addressed. I was meaning the following. Instead doing: {code} public void handle(ResourceEvent event) { LocalResourceRequest req = event.getLocalResourceRequest(); LocalizedResource rsrc = localrsrc.get(req); if (rsrc != null && (event.getType() == ResourceEventType.LOCALIZED || event.getType() == ResourceEventType.REQUEST) && (!isResourcePresent(rsrc))) { LOG.info("Resource " + rsrc.getLocalPath() + " is missing, localizing it again"); localrsrc.remove(req); rsrc = null; } switch (event.getType()) { case REQUEST: case LOCALIZED: if (null == rsrc) { rsrc = new LocalizedResource(req, dispatcher); localrsrc.put(req, rsrc); } break; {code} Do: {code} public void handle(ResourceEvent event) { LocalResourceRequest req = event.getLocalResourceRequest(); LocalizedResource rsrc = localrsrc.get(req); switch (event.getType()) { case REQUEST: case LOCALIZED: if (rsrc != null && !isResourcePresent(rsrc)) { LOG.info("Resource " + rsrc.getLocalPath() + " is missing, localizing it again"); localrsrc.remove(req); rsrc = null; } if (null == rsrc) { rsrc = new LocalizedResource(req, dispatcher); localrsrc.put(req, rsrc); } break; {code} > Distributed Cache gives inconsistent result if cache files get deleted from > task tracker > - > > Key: MAPREDUCE-4342 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.22.0, 1.0.3, trunk >Reporter: Mayank Bansal >Assignee: Mayank Bansal > Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, > MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, > MAPREDUCE-4342-trunk-v2.patch, MAPREDUCE-4342-trunk.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4444) nodemanager fails to start when one of the local-dirs is bad
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425155#comment-13425155 ] Hadoop QA commented on MAPREDUCE-: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538396/MAPREDUCE-.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2676//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2676//console This message is automatically generated. > nodemanager fails to start when one of the local-dirs is bad > > > Key: MAPREDUCE- > URL: https://issues.apache.org/jira/browse/MAPREDUCE- > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: nodemanager >Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0 >Reporter: Nathan Roberts >Assignee: Jason Lowe >Priority: Blocker > Attachments: MAPREDUCE-.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4444) nodemanager fails to start when one of the local-dirs is bad
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-: -- Target Version/s: 0.23.3, 2.2.0-alpha Status: Patch Available (was: Open) > nodemanager fails to start when one of the local-dirs is bad > > > Key: MAPREDUCE- > URL: https://issues.apache.org/jira/browse/MAPREDUCE- > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.0-alpha, 0.23.3, 3.0.0 >Reporter: Nathan Roberts >Assignee: Jason Lowe >Priority: Blocker > Attachments: MAPREDUCE-.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4444) nodemanager fails to start when one of the local-dirs is bad
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-: -- Attachment: MAPREDUCE-.patch Patch that changes LocalDirsHandlerService to check for bad directories during init so they're removed from the list of directories before subsequent init code tries to access them. > nodemanager fails to start when one of the local-dirs is bad > > > Key: MAPREDUCE- > URL: https://issues.apache.org/jira/browse/MAPREDUCE- > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: nodemanager >Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0 >Reporter: Nathan Roberts >Assignee: Jason Lowe >Priority: Blocker > Attachments: MAPREDUCE-.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425129#comment-13425129 ] Hadoop QA commented on MAPREDUCE-4342: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538384/MAPREDUCE-4342-trunk-v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2675//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2675//console This message is automatically generated. > Distributed Cache gives inconsistent result if cache files get deleted from > task tracker > - > > Key: MAPREDUCE-4342 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.22.0, 1.0.3, trunk >Reporter: Mayank Bansal >Assignee: Mayank Bansal > Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, > MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, > MAPREDUCE-4342-trunk-v2.patch, MAPREDUCE-4342-trunk.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-3726) jobstatus.getjobfile should return jobtracker copy of job.xml instead of .staging copy of job.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal reassigned MAPREDUCE-3726: Assignee: Mayank Bansal > jobstatus.getjobfile should return jobtracker copy of job.xml instead of > .staging copy of job.xml > - > > Key: MAPREDUCE-3726 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3726 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, jobtracker >Affects Versions: 0.22.1 >Reporter: Mayank Bansal >Assignee: Mayank Bansal >Priority: Minor > > jobstatus.getjobfile should return jobtracker copy of job.xml instead of > .staging copy of job.xml -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4482) Backport MR sort plugin(MAPREDUCE-2454) to Hadoop 1.1.x
[ https://issues.apache.org/jira/browse/MAPREDUCE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mariappan Asokan updated MAPREDUCE-4482: Affects Version/s: (was: 1.1.1) (was: 1.1.0) 1.2.0 > Backport MR sort plugin(MAPREDUCE-2454) to Hadoop 1.1.x > --- > > Key: MAPREDUCE-4482 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4482 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: mrv1 >Affects Versions: 1.2.0 >Reporter: Mariappan Asokan > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mayank Bansal updated MAPREDUCE-4342: - Attachment: MAPREDUCE-4342-trunk-v2.patch Thanks Alejandro for your review and comments Incorporating all your comments Thanks, Mayank > Distributed Cache gives inconsistent result if cache files get deleted from > task tracker > - > > Key: MAPREDUCE-4342 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.22.0, 1.0.3, trunk >Reporter: Mayank Bansal >Assignee: Mayank Bansal > Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, > MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, > MAPREDUCE-4342-trunk-v2.patch, MAPREDUCE-4342-trunk.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425047#comment-13425047 ] Alejandro Abdelnur commented on MAPREDUCE-4342: --- The log message should be something like 'Resource ### is missing, localizing it again' If moving the check within the case block for REQUEST/LOCALIZED there is no need for the outer IF check. > Distributed Cache gives inconsistent result if cache files get deleted from > task tracker > - > > Key: MAPREDUCE-4342 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.22.0, 1.0.3, trunk >Reporter: Mayank Bansal >Assignee: Mayank Bansal > Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, > MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, MAPREDUCE-4342-trunk.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4456) LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating symlinks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424946#comment-13424946 ] Hadoop QA commented on MAPREDUCE-4456: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538358/MR-4456.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified test files. -1 javac. The applied patch generated 2049 javac compiler warnings (more than the trunk's current 2048 warnings). +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2674//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2674//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2674//console This message is automatically generated. > LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating > symlinks > > > Key: MAPREDUCE-4456 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4456 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: MR-4456.txt, MR-4456.txt > > > {noformat} > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:194) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:154) > at > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:620) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:385) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1212) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1212) > at > org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336) > at > org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:233) > at java.lang.Thread.run(Thread.java:619) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4493) Distibuted Cache Compatability Issues
[ https://issues.apache.org/jira/browse/MAPREDUCE-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-4493: --- Priority: Critical (was: Blocker) Dropping severity because this is just going to be a configuration change. > Distibuted Cache Compatability Issues > - > > Key: MAPREDUCE-4493 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4493 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0 >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans >Priority: Critical > > The distributed cache does not work like it does in 1.0. > mapreduce.job.cache.symlink.create is completely ignored and symlinks are > always created no matter what. Files and archives without a fragment will > also have symlinks created. > If two cache archives or cache files happen to have the same name, or same > symlink fragment only the last one in the list is localized. > The localCacheArchives and LocalCacheFiles are not set correctly when these > duplicates happen causing off by one or more errors for anyone trying to use > them. > The reality is that use of symlinking is so common currently that these > incompatibilities are not that likely to show up, but we still need to fix > them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4493) Distibuted Cache Compatability Issues
[ https://issues.apache.org/jira/browse/MAPREDUCE-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424929#comment-13424929 ] Robert Joseph Evans commented on MAPREDUCE-4493: Sorry I meant documentation not configuration. > Distibuted Cache Compatability Issues > - > > Key: MAPREDUCE-4493 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4493 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0 >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans >Priority: Critical > > The distributed cache does not work like it does in 1.0. > mapreduce.job.cache.symlink.create is completely ignored and symlinks are > always created no matter what. Files and archives without a fragment will > also have symlinks created. > If two cache archives or cache files happen to have the same name, or same > symlink fragment only the last one in the list is localized. > The localCacheArchives and LocalCacheFiles are not set correctly when these > duplicates happen causing off by one or more errors for anyone trying to use > them. > The reality is that use of symlinking is so common currently that these > incompatibilities are not that likely to show up, but we still need to fix > them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4456) LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating symlinks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-4456: --- Status: Patch Available (was: Open) > LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating > symlinks > > > Key: MAPREDUCE-4456 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4456 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: MR-4456.txt, MR-4456.txt > > > {noformat} > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:194) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:154) > at > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:620) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:385) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1212) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1212) > at > org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336) > at > org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:233) > at java.lang.Thread.run(Thread.java:619) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4456) LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating symlinks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-4456: --- Attachment: MR-4456.txt This patch fixes the test failure. > LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating > symlinks > > > Key: MAPREDUCE-4456 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4456 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: MR-4456.txt, MR-4456.txt > > > {noformat} > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:194) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:154) > at > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:620) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:385) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1212) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1212) > at > org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336) > at > org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:233) > at java.lang.Thread.run(Thread.java:619) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4456) LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating symlinks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424917#comment-13424917 ] Robert Joseph Evans commented on MAPREDUCE-4456: After talking with Arun on MAPREDUCE-4493. He feels that the current MR2 behavior is correct, and we should just document the differences. I am fine with going that rout so I will just update the test to expect the new behavior, and then document that behavior on the other JIRA. > LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating > symlinks > > > Key: MAPREDUCE-4456 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4456 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha >Reporter: Robert Joseph Evans >Assignee: Robert Joseph Evans > Attachments: MR-4456.txt > > > {noformat} > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:194) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:154) > at > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:620) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:385) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1212) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1212) > at > org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336) > at > org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:233) > at java.lang.Thread.run(Thread.java:619) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4010) TestWritableJobConf fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-4010: --- Fix Version/s: 0.23.3 I pulled this into branch-0.23 > TestWritableJobConf fails on trunk > -- > > Key: MAPREDUCE-4010 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4010 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.24.0 >Reporter: Jason Lowe >Assignee: Alejandro Abdelnur >Priority: Critical > Fix For: 0.23.3, 2.0.0-alpha > > Attachments: MAPREDUCE-4010.patch, MAPREDUCE-4010.patch, > MAPREDUCE-4010.patch > > > TestWritableJobConf is currently failing two tests on trunk: > * testEmptyConfiguration > * testNonEmptyConfiguration > Appears to have been caused by HADOOP-8167. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4039) Sort Avoidance
[ https://issues.apache.org/jira/browse/MAPREDUCE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424741#comment-13424741 ] Ahmed Radwan commented on MAPREDUCE-4039: - bq. when sort plugin is finished, this patch will need be modified. Sure Anty, are you already working on updating the patch with pluggable MapOutputBuffer and Shuffle then? > Sort Avoidance > -- > > Key: MAPREDUCE-4039 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4039 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: mrv2 >Affects Versions: 0.23.2 >Reporter: anty.rao >Assignee: anty >Priority: Minor > Fix For: 0.23.2 > > Attachments: IndexedCountingSortable.java, > MAPREDUCE-4039-branch-0.23.2.patch, MAPREDUCE-4039-branch-0.23.2.patch, > MAPREDUCE-4039-branch-0.23.2.patch > > > Inspired by > [Tenzing|http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/37200.pdf], > in 5.1 MapReduce Enhanceemtns: > {quote}*Sort Avoidance*. Certain operators such as hash join > and hash aggregation require shuffling, but not sorting. The > MapReduce API was enhanced to automatically turn off > sorting for these operations. When sorting is turned off, the > mapper feeds data to the reducer which directly passes the > data to the Reduce() function bypassing the intermediate > sorting step. This makes many SQL operators significantly > more ecient.{quote} > There are a lot of applications which need aggregation only, not > sorting.Using sorting to achieve aggregation is costly and inefficient. > Without sorting, up application can make use of hash table or hash map to do > aggregation efficiently.But application should bear in mind that reduce > memory is limited, itself is committed to manage memory of reduce, guard > against out of memory. Map-side combiner is not supported, you can also do > hash aggregation in map side as a workaround. > the following is the main points of sort avoidance implementation > # add a configuration parameter ??mapreduce.sort.avoidance??, boolean type, > to turn on/off sort avoidance workflow.Two type of workflow are coexist > together. > # key/value pairs emitted by map function is sorted by partition only, using > a more efficient sorting algorithm: counting sort. > # map-side merge, use a kind of byte merge, which just concatenate bytes from > generated spills, read in bytes, write out bytes, without overhead of > key/value serialization/deserailization, comparison, which current version > incurs. > # reduce can start up as soon as there is any map output available, in > contrast to sort workflow which must wait until all map outputs are fetched > and merged. > # map output in memory can be directly consumed by reduce.When reduce can't > catch up with the speed of incoming map outputs, in-memory merge thread will > kick in, merging in-memory map outputs onto disk. > # sequentially read in on-disk files to feed reduce, in contrast to currently > implementation which read multiple files concurrently, result in many disk > seek. Map output in memory take precedence over on disk files in feeding > reduce function. > I have already implement this feature based on hadoop CDH3U3 and done some > performance evaluation, you can reference to > [https://github.com/hanborq/hadoop] for details. Now,I'm willing to port it > into yarn. Welcome for commenting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira