date:20120730

[jira] [Commented] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often

2012-07-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425493#comment-13425493
 ] 

Hadoop QA commented on MAPREDUCE-3943:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12538489/MR3943_trunk.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 17 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 4 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2680//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2680//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2680//console

This message is automatically generated.

> RM-NM secret-keys should be randomly generated and rolled every so often
> 
>
> Key: MAPREDUCE-3943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: mrv2, security
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, 
> MR3943_branch-23.txt, MR3943_branch-23.txt, MR3943_trunk.txt, MR3943_trunk.txt
>
>
>  - RM should generate the master-key randomly
>  - The master-key should roll every so often
>  - NM should remember old expired keys so that already doled out 
> container-requests can be satisfied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often

2012-07-30 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3943:
--

Attachment: MR3943_trunk.txt

Updated to fix javadoc warning and a findbugs warning. 4 of them are from the 
FairScheduler - unrelated to this patch.

> RM-NM secret-keys should be randomly generated and rolled every so often
> 
>
> Key: MAPREDUCE-3943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: mrv2, security
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, 
> MR3943_branch-23.txt, MR3943_branch-23.txt, MR3943_trunk.txt, MR3943_trunk.txt
>
>
>  - RM should generate the master-key randomly
>  - The master-key should roll every so often
>  - NM should remember old expired keys so that already doled out 
> container-requests can be satisfied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often

2012-07-30 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3943:
--

Status: Patch Available  (was: Open)

> RM-NM secret-keys should be randomly generated and rolled every so often
> 
>
> Key: MAPREDUCE-3943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: mrv2, security
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, 
> MR3943_branch-23.txt, MR3943_branch-23.txt, MR3943_trunk.txt, MR3943_trunk.txt
>
>
>  - RM should generate the master-key randomly
>  - The master-key should roll every so often
>  - NM should remember old expired keys so that already doled out 
> container-requests can be satisfied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often

2012-07-30 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3943:
--

Attachment: MR3943_branch-23.txt

> RM-NM secret-keys should be randomly generated and rolled every so often
> 
>
> Key: MAPREDUCE-3943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: mrv2, security
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, 
> MR3943_branch-23.txt, MR3943_branch-23.txt, MR3943_trunk.txt
>
>
>  - RM should generate the master-key randomly
>  - The master-key should roll every so often
>  - NM should remember old expired keys so that already doled out 
> container-requests can be satisfied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often

2012-07-30 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3943:
--

Status: Open  (was: Patch Available)

> RM-NM secret-keys should be randomly generated and rolled every so often
> 
>
> Key: MAPREDUCE-3943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: mrv2, security
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, 
> MR3943_branch-23.txt, MR3943_branch-23.txt, MR3943_trunk.txt
>
>
>  - RM should generate the master-key randomly
>  - The master-key should roll every so often
>  - NM should remember old expired keys so that already doled out 
> container-requests can be satisfied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often

2012-07-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425469#comment-13425469
 ] 

Hadoop QA commented on MAPREDUCE-3943:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12538477/MR3943_trunk.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 17 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 5 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2679//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2679//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2679//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2679//console

This message is automatically generated.

> RM-NM secret-keys should be randomly generated and rolled every so often
> 
>
> Key: MAPREDUCE-3943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: mrv2, security
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, 
> MR3943_branch-23.txt, MR3943_trunk.txt
>
>
>  - RM should generate the master-key randomly
>  - The master-key should roll every so often
>  - NM should remember old expired keys so that already doled out 
> container-requests can be satisfied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often

2012-07-30 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3943:
--

Attachment: MR3943_trunk.txt

Up-merged patch for trunk. Also ads some securityEnabled checks.

> RM-NM secret-keys should be randomly generated and rolled every so often
> 
>
> Key: MAPREDUCE-3943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: mrv2, security
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, 
> MR3943_branch-23.txt, MR3943_trunk.txt
>
>
>  - RM should generate the master-key randomly
>  - The master-key should roll every so often
>  - NM should remember old expired keys so that already doled out 
> container-requests can be satisfied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often

2012-07-30 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3943:
--

Attachment: MR3943_branch-23.txt

Patch for branch-23.

> RM-NM secret-keys should be randomly generated and rolled every so often
> 
>
> Key: MAPREDUCE-3943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: mrv2, security
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, 
> MR3943_branch-23.txt, MR3943_trunk.txt
>
>
>  - RM should generate the master-key randomly
>  - The master-key should roll every so often
>  - NM should remember old expired keys so that already doled out 
> container-requests can be satisfied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often

2012-07-30 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3943:
--

Status: Patch Available  (was: Open)

> RM-NM secret-keys should be randomly generated and rolled every so often
> 
>
> Key: MAPREDUCE-3943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: mrv2, security
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: MAPREDUCE-3943-20120416.txt, MR3943.txt, MR3943.txt, 
> MR3943_branch-23.txt, MR3943_trunk.txt
>
>
>  - RM should generate the master-key randomly
>  - The master-key should roll every so often
>  - NM should remember old expired keys so that already doled out 
> container-requests can be satisfied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN

2012-07-30 Thread Bo Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425435#comment-13425435
 ] 

Bo Wang commented on MAPREDUCE-4495:


Hi Mayank,

Memory is just for the first version. WF state will be serialized to HDFS in 
JSON format. We can update the file once a job is done, or serialize multiple 
versions, so that only those unfinished jobs need to be rerun. As the 
interface, currently I follow Oozie workflow.xml, though I expect more types of 
interfaces can be supported.

Thanks,
Bo



> Workflow Application Master in YARN
> ---
>
> Key: MAPREDUCE-4495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.0-alpha
>Reporter: Bo Wang
>Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-2454) Allow external sorter plugin for MR

2012-07-30 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur reassigned MAPREDUCE-2454:
-

Assignee: Mariappan Asokan

Mariappan, the latest patch does not apply to trunk, would you post a new patch 
rebased to trunk? Thx

> Allow external sorter plugin for MR
> ---
>
> Key: MAPREDUCE-2454
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.0-alpha, 3.0.0, 2.2.0-alpha
>Reporter: Mariappan Asokan
>Assignee: Mariappan Asokan
>Priority: Minor
>  Labels: features, performance, plugin, sort
> Attachments: HadoopSortPlugin.pdf, KeyValueIterator.java, 
> MR-2454-trunkPatchPreview.gz, MapOutputSorter.java, 
> MapOutputSorterAbstract.java, ReduceInputSorter.java, mapreduce-2454.patch, 
> mr-2454-on-mr-279-build82.patch.gz
>
>
> Define interfaces and some abstract classes in the Hadoop framework to 
> facilitate external sorter plugins both on the Map and Reduce sides.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN

2012-07-30 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425421#comment-13425421
 ] 

Mayank Bansal commented on MAPREDUCE-4495:
--

Hi,

I am not sure memory is the right thing to do, HDFS file is OK.

Reason is if you have one workflow dag of 10 mapreduce jobs and 10th job is 
failed for some reason then you need to run the whole DAG again if you want to 
retry.

In some cases you may need to rerun everything but most of the cases you don't 
need to, so persisting the state is very useful.

I also want to understand the interface to user? Is it same as workflow.xml in 
Oozie?

Thanks,
Mayank

> Workflow Application Master in YARN
> ---
>
> Key: MAPREDUCE-4495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.0-alpha
>Reporter: Bo Wang
>Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4497) JobTracker webUI "Kill Selected Jobs" button has no effect

2012-07-30 Thread George Datskos (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Datskos updated MAPREDUCE-4497:
--

Attachment: JT-webui-modify.patch

> JobTracker webUI "Kill Selected Jobs" button has no effect
> --
>
> Key: MAPREDUCE-4497
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4497
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 1.0.3
>Reporter: George Datskos
>Priority: Minor
> Attachments: JT-webui-modify.patch
>
>
> When enabling the *webinterface.private.actions* property to true, the 
> JobTracker displays additional buttons allowing the user to (1) kill jobs or 
> (2) change the priority of a job.  However, an erroneous interaction between 
> the HTML (produced by mapred/JSPUtil.java) and sorttable.js leads to these 
> two form buttons having no effect because the {{}} {{}} is moved 
> down below the submit buttons (by sorttable.js). sorttable.js was introduced 
> by MAPREDUCE-1118 (so all versions from v0.20.203 are probably affected). In 
> JSPUtil.java, the form element is placed inside the table and spans multiple 
> {{}} and {{}} which is incorrect. Placing the form around the table 
> fixes this bug (see patch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-4497) JobTracker webUI "Kill Selected Jobs" button has no effect

2012-07-30 Thread George Datskos (JIRA)

George Datskos created MAPREDUCE-4497:
-

 Summary: JobTracker webUI "Kill Selected Jobs" button has no effect
 Key: MAPREDUCE-4497
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4497
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1.0.3
Reporter: George Datskos
Priority: Minor


When enabling the *webinterface.private.actions* property to true, the 
JobTracker displays additional buttons allowing the user to (1) kill jobs or 
(2) change the priority of a job.  However, an erroneous interaction between 
the HTML (produced by mapred/JSPUtil.java) and sorttable.js leads to these two 
form buttons having no effect because the {{}} {{}} is moved down 
below the submit buttons (by sorttable.js). sorttable.js was introduced by 
MAPREDUCE-1118 (so all versions from v0.20.203 are probably affected). In 
JSPUtil.java, the form element is placed inside the table and spans multiple 
{{}} and {{}} which is incorrect. Placing the form around the table 
fixes this bug (see patch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN

2012-07-30 Thread Bo Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425412#comment-13425412
 ] 

Bo Wang commented on MAPREDUCE-4495:


[~mayank_bansal] I will post a design document soon. For the place to store WF 
state, initially it will be kept in memory. Later it could be persisted to a 
file in HDFS. DAG AM will run a single WF, so no DB is required here.

> Workflow Application Master in YARN
> ---
>
> Key: MAPREDUCE-4495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.0-alpha
>Reporter: Bo Wang
>Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN

2012-07-30 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425406#comment-13425406
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4495:
---

@arun, I assume you've meant ' You'd have to contribute to Hadoop whatever you 
want to use from Oozie', right? If so, I'm good with that, I don't care where 
the code lives if I can use it. I've handed over to Bo a version of OOZIE-593 
patch that does not have Oozie dependencies.

> Workflow Application Master in YARN
> ---
>
> Key: MAPREDUCE-4495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.0-alpha
>Reporter: Bo Wang
>Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN

2012-07-30 Thread Arun C Murthy (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425398#comment-13425398
 ] 

Arun C Murthy commented on MAPREDUCE-4495:
--

Also, pls note that we cannot have a dependency on Oozie. You'd have to 
contribute whatever you want to use to Oozie.

> Workflow Application Master in YARN
> ---
>
> Key: MAPREDUCE-4495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.0-alpha
>Reporter: Bo Wang
>Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN

2012-07-30 Thread Arun C Murthy (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425394#comment-13425394
 ] 

Arun C Murthy commented on MAPREDUCE-4495:
--

bq. Thats great however can you please post the initial design document for 
that and your idea for how user will be expose to the interface.

Agreed. Please do that first. It's much easier to talk through that than reams 
of code - let's avoid a situation like the one we have in MAPREDUCE-4334.


> Workflow Application Master in YARN
> ---
>
> Key: MAPREDUCE-4495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.0-alpha
>Reporter: Bo Wang
>Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-30 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425383#comment-13425383
 ] 

Karthik Kambatla commented on MAPREDUCE-4334:
-

+1 on design - 2(b), and the patch looks good.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 
> mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-30 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425370#comment-13425370
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4334:
---

I like the current patch, it does not add complexity and it will be trivial to 
wire it with MAPREDUCE-4327 once CPU units are part of resources.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 
> mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN

2012-07-30 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425366#comment-13425366
 ] 

Mayank Bansal commented on MAPREDUCE-4495:
--

Thats great however can you please post the initial design document for that 
and your idea for how user will be expose to the interface.

Where are you planning to store the state as Oozie does that on the db?

Thanks,
Mayank

> Workflow Application Master in YARN
> ---
>
> Key: MAPREDUCE-4495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.0-alpha
>Reporter: Bo Wang
>Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN

2012-07-30 Thread Bo Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425361#comment-13425361
 ] 

Bo Wang commented on MAPREDUCE-4495:


Thanks for the suggestion, Mayank. There are several components helpful in 
Oozie. I plan to use the new wflib posted in OOZIE-593 by Alejandro. 

> Workflow Application Master in YARN
> ---
>
> Key: MAPREDUCE-4495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.0-alpha
>Reporter: Bo Wang
>Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN

2012-07-30 Thread Bo Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425359#comment-13425359
 ] 

Bo Wang commented on MAPREDUCE-4495:


Hi Robert, thanks for offering to help. I am working on the first version and 
will let you know once it works.

> Workflow Application Master in YARN
> ---
>
> Key: MAPREDUCE-4495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.0-alpha
>Reporter: Bo Wang
>Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker

2012-07-30 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425346#comment-13425346
 ] 

Mayank Bansal commented on MAPREDUCE-4342:
--

Yes for Archive files there is a Jira.
MAPREDUCE-4349

I am working on that. For Hadoop 22 we didn't need to do anything special 
however I have added the test case for that. I am investigating for trunk.

Thanks,
Mayank

> Distributed Cache gives inconsistent result if cache files get deleted from 
> task tracker 
> -
>
> Key: MAPREDUCE-4342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0, 1.0.3, trunk
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Fix For: 2.2.0-alpha
>
> Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, 
> MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, 
> MAPREDUCE-4342-trunk-v2.patch, MAPREDUCE-4342-trunk-v3.patch, 
> MAPREDUCE-4342-trunk.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker

2012-07-30 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated MAPREDUCE-4342:
--

   Resolution: Fixed
Fix Version/s: 2.2.0-alpha
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks Mayank. Committed to trunk and branch-2.

> Distributed Cache gives inconsistent result if cache files get deleted from 
> task tracker 
> -
>
> Key: MAPREDUCE-4342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0, 1.0.3, trunk
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Fix For: 2.2.0-alpha
>
> Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, 
> MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, 
> MAPREDUCE-4342-trunk-v2.patch, MAPREDUCE-4342-trunk-v3.patch, 
> MAPREDUCE-4342-trunk.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker

2012-07-30 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425343#comment-13425343
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4342:
---

+1.

Is there a follow up JIRA to address Robert concerns on archives and corrupt 
files?

> Distributed Cache gives inconsistent result if cache files get deleted from 
> task tracker 
> -
>
> Key: MAPREDUCE-4342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0, 1.0.3, trunk
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, 
> MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, 
> MAPREDUCE-4342-trunk-v2.patch, MAPREDUCE-4342-trunk-v3.patch, 
> MAPREDUCE-4342-trunk.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4234) SortValidator.java is incompatible with multi-user or parallel use (due to a /tmp file with static name)

2012-07-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425326#comment-13425326
 ] 

Hadoop QA commented on MAPREDUCE-4234:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12538433/MR-4234.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  
org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2677//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2677//console

This message is automatically generated.

> SortValidator.java is incompatible with multi-user or parallel use (due to a 
> /tmp file with static name)
> 
>
> Key: MAPREDUCE-4234
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4234
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: examples
>Affects Versions: 0.23.3, trunk
>Reporter: Randy Clayton
>Assignee: Robert Joseph Evans
>Priority: Minor
> Attachments: MAPREDUCE-4234.patch, MR-4234.txt, MR-4234.txt
>
>
> The SortValidator.java file checkRecords method creates a file in the 
> /tmp/sortvalidator directory using a static filename. This can result in 
> failures due to name collisions when the 
> hadoop-mapreduce-client-jobclient-*-tests jar is used by more than one task 
> or one user simultaneously. We use this jar when testing compression codecs 
> and after we started running tests in parallel (four at a time to reduce 
> overall test time) we started experiencing random test failures due to name 
> collisions. Creating a random or unique per thread filename may resolve this 
> issue. We have developed a change to introduce per use unique file names. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4491) Encryption and Key Protection

2012-07-30 Thread Benoy Antony (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425320#comment-13425320
 ] 

Benoy Antony commented on MAPREDUCE-4491:
-

To Alejandro's questions:

1) If using compression codec for encryption, are you losing the compression 
capabilities if doing using encryption or will it work as a composition?
What I have done is to first compress and then encrypt. I have hardcoded to 
ZIP. I can expose this as a configuration with a choice of {UNCOMPRESSED, ZIP, 
ZLIB, BZIP2}. This is an enhancement that I can add.
I have also provided a DistributedSplitter  so that files can be split into 
smaller files.
I am not aware of an ability to chain multiple compression Codecs, though it 
was a desirable capability in this case. 

2) For the keystores, are you proposing to store them in HDFS use file system 
permissions to protect them?

Actually, I am not proposing to store them in HDFS. The keystores themselves 
are encrypted and a password is required to read keys from them. 

In the use cases that I have encountered, the keystores were external to the 
cluster. They were either on the CLI machine from where the jobs were submitted 
or on a separate machine from where the keys were retrieved based on user's 
credentials. (Alfredo was used in this regard to fetch keys via webservice)
So they were two schemes that I have supported -
  1) reading keys from Java keystore
  2) reading keys from a web Service based keystore  ("Safe")





> Encryption and Key Protection
> -
>
> Key: MAPREDUCE-4491
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: documentation, security, task-controller, tasktracker
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: Hadoop_Encryption.pdf
>
>
> When dealing with sensitive data, it is required to keep the data encrypted 
> wherever it is stored. Common use case is to pull encrypted data out of a 
> datasource and store in HDFS for analysis. The keys are stored in an external 
> keystore. 
> The feature adds a customizable framework to integrate different types of 
> keystores, support for Java KeyStore, read keys from keystores, and transport 
> keys from JobClient to Tasks.
> The feature adds PGP encryption as a codec and additional utilities to 
> perform encryption related steps.
> The design document is attached. It explains the requirement, design and use 
> cases.
> Kindly review and comment. Collaboration is very much welcome.
> I have a tested patch for this for 1.1 and will upload it soon as an initial 
> work for further refinement. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4496) AM logs link is missing user name

2012-07-30 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425304#comment-13425304
 ] 

Jason Lowe commented on MAPREDUCE-4496:
---

This causes two problems:

# While the AM is running, the stderr, stdout, and syslog links have to be 
clicked twice to display the page containing the log data.
# When the AM is no longer running and log aggregation is enabled, the link no 
longer properly redirects to the history server.  Instead it results in the 
error message: "Cannot get container logs without an app owner".

> AM logs link is missing user name
> -
>
> Key: MAPREDUCE-4496
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4496
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.3, 2.2.0-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>
> The link to the ApplicationMaster's logs on the MRAppMaster's web page is 
> missing the user name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker

2012-07-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425303#comment-13425303
 ] 

Hadoop QA commented on MAPREDUCE-4342:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12538436/MAPREDUCE-4342-trunk-v3.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2678//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2678//console

This message is automatically generated.

> Distributed Cache gives inconsistent result if cache files get deleted from 
> task tracker 
> -
>
> Key: MAPREDUCE-4342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0, 1.0.3, trunk
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, 
> MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, 
> MAPREDUCE-4342-trunk-v2.patch, MAPREDUCE-4342-trunk-v3.patch, 
> MAPREDUCE-4342-trunk.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-30 Thread Andrew Ferguson (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated MAPREDUCE-4334:
---

Attachment: MAPREDUCE-4334-executor-v4.patch

Updated version of executor-v3 which moves the actual wrapping of the launched 
command inside the "wrapCommand" method (which previously returned a value to 
prefix onto the launched command.

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-executor-v4.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 
> mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker

2012-07-30 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated MAPREDUCE-4342:
-

Attachment: MAPREDUCE-4342-trunk-v3.patch

Thanks Alejandro . I misunderstood your comments.

Thanks for the clarification.

It make sense, I am attaching the updated patch.

Thanks,
Mayank

> Distributed Cache gives inconsistent result if cache files get deleted from 
> task tracker 
> -
>
> Key: MAPREDUCE-4342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0, 1.0.3, trunk
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, 
> MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, 
> MAPREDUCE-4342-trunk-v2.patch, MAPREDUCE-4342-trunk-v3.patch, 
> MAPREDUCE-4342-trunk.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4491) Encryption and Key Protection

2012-07-30 Thread Benoy Antony (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425286#comment-13425286
 ] 

Benoy Antony commented on MAPREDUCE-4491:
-

To Rob's questions :

Different Encryption Keys for Different files:  At this point, the PGPCodec 
supports only one secret key/Key Pair  for all input files. 
What we need is the ability to specify secret keys/key pair per input file. 
Another enhancement will be to specify secret keys/key pair per each phase like 
map->output , reduce->output .
As you mentioned, this mapping has to specified via configuration.
I'll try to add these two enhancements. 

Decryption/Encryption of different columns within the same file: This is 
actually left to the mapreduce programmer as he has to do the 
Decryption/Encryption of the fields programmatically. The programmer can choose 
to use different keys  for different fields in the mapreduce program. Multiple 
keys can be retrieved from the keystore and these keys can be retrieved in the 
mapper/reducer using the credentials API.  
In a higher level interface like Hive, it may be possible to add additional 
metadata information to specify the key name. Another reviewer also has 
recommended to add this capability Hive to identify an encryption field and 
specify the key (name of the key)  to be used to decrypt/encrypt it.

Thanks for the review and recommendations, Rob. Please let me know if I have 
not answered the question correctly.

> Encryption and Key Protection
> -
>
> Key: MAPREDUCE-4491
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: documentation, security, task-controller, tasktracker
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: Hadoop_Encryption.pdf
>
>
> When dealing with sensitive data, it is required to keep the data encrypted 
> wherever it is stored. Common use case is to pull encrypted data out of a 
> datasource and store in HDFS for analysis. The keys are stored in an external 
> keystore. 
> The feature adds a customizable framework to integrate different types of 
> keystores, support for Java KeyStore, read keys from keystores, and transport 
> keys from JobClient to Tasks.
> The feature adds PGP encryption as a codec and additional utilities to 
> perform encryption related steps.
> The design document is attached. It explains the requirement, design and use 
> cases.
> Kindly review and comment. Collaboration is very much welcome.
> I have a tested patch for this for 1.1 and will upload it soon as an initial 
> work for further refinement. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN

2012-07-30 Thread Mayank Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425280#comment-13425280
 ] 

Mayank Bansal commented on MAPREDUCE-4495:
--

Thats a good idea however I will suggest use Oozie workflow library to do that. 
Please let me know if you need any help in that regard I can collaborate with 
you.

Thanks,
Mayank

> Workflow Application Master in YARN
> ---
>
> Key: MAPREDUCE-4495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.0-alpha
>Reporter: Bo Wang
>Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-4496) AM logs link is missing user name

2012-07-30 Thread Jason Lowe (JIRA)

Jason Lowe created MAPREDUCE-4496:
-

 Summary: AM logs link is missing user name
 Key: MAPREDUCE-4496
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4496
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.3, 2.2.0-alpha
Reporter: Jason Lowe
Assignee: Jason Lowe


The link to the ApplicationMaster's logs on the MRAppMaster's web page is 
missing the user name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4234) SortValidator.java is incompatible with multi-user or parallel use (due to a /tmp file with static name)

2012-07-30 Thread Robert Joseph Evans (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4234:
---

Attachment: MR-4234.txt

The failure looks spurious to me. I have not been able to reproduce it.  I have 
upmerged and am posting the updated patch.

> SortValidator.java is incompatible with multi-user or parallel use (due to a 
> /tmp file with static name)
> 
>
> Key: MAPREDUCE-4234
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4234
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: examples
>Affects Versions: 0.23.3, trunk
>Reporter: Randy Clayton
>Assignee: Robert Joseph Evans
>Priority: Minor
> Attachments: MAPREDUCE-4234.patch, MR-4234.txt, MR-4234.txt
>
>
> The SortValidator.java file checkRecords method creates a file in the 
> /tmp/sortvalidator directory using a static filename. This can result in 
> failures due to name collisions when the 
> hadoop-mapreduce-client-jobclient-*-tests jar is used by more than one task 
> or one user simultaneously. We use this jar when testing compression codecs 
> and after we started running tests in parallel (four at a time to reduce 
> overall test time) we started experiencing random test failures due to name 
> collisions. Creating a random or unique per thread filename may resolve this 
> issue. We have developed a change to introduce per use unique file names. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN

2012-07-30 Thread Robert Joseph Evans (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425262#comment-13425262
 ] 

Robert Joseph Evans commented on MAPREDUCE-4495:


I have been thinking about this a lot and I am very much +1 on this.  It is not 
on the top of my priority list yet, but if you want help on this I would be 
very happy to collaborate on it with you.

> Workflow Application Master in YARN
> ---
>
> Key: MAPREDUCE-4495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.0.0-alpha
>Reporter: Bo Wang
>Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-30 Thread Andrew Ferguson (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated MAPREDUCE-4334:
---

Attachment: mapreduce-4334-design-doc-v2.txt

just a quick update to the design doc. at two points I wrote "create cgroups" 
when I meant "mount cgroups"; also fixes a typo. sorry for the spam!

thanks,
Andrew

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-pre1.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
> MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
> MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 
> mapreduce-4334-design-doc-v2.txt, mapreduce-4334-design-doc.txt
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-4408) allow jobs to set a JAR that is in the distributed cached

2012-07-30 Thread Alejandro Abdelnur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur reassigned MAPREDUCE-4408:
-

Assignee: Robert Kanter  (was: Alejandro Abdelnur)

reassigning it Robert as he is doing the Oozie JIRA that requires this one.

> allow jobs to set a JAR that is in the distributed cached
> -
>
> Key: MAPREDUCE-4408
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4408
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2
>Affects Versions: 1.0.3, 2.0.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Robert Kanter
>
> Setting a job JAR with JobConf.setJar(String) and Job.setJar(String) assumes 
> that the JAR is local to the client submitting the job, thus it triggers 
> copying the JAR to HDFS and injecting it to the distributed cached.
> AFAIK, this is the only way to use uber JARs (JARs with JARs inside) in MR 
> jobs.
> For jobs launched by Oozie, all JARs are already in HDFS. In order for Oozie 
> to suport uber JARs (OOZIE-654) there should be a way for specifying as JAR a 
> JAR that is already in HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-30 Thread Andrew Ferguson (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated MAPREDUCE-4334:
---

Attachment: MAPREDUCE-4334-executor-v3.patch

Updated version of "executor-v2" patch, which uses cgexec and hooks into the 
ContainersLauncher. See previously attached design doc for further details.

thanks!
Andrew

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-executor-v3.patch, 
> MAPREDUCE-4334-pre1.patch, MAPREDUCE-4334-pre2-with_cpu.patch, 
> MAPREDUCE-4334-pre2.patch, MAPREDUCE-4334-pre3-with_cpu.patch, 
> MAPREDUCE-4334-pre3.patch, MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 
> mapreduce-4334-design-doc.txt
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4493) Distibuted Cache Compatability Issues

2012-07-30 Thread Robert Joseph Evans (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4493:
---

Attachment: MR-4493.txt

This patch deprecates the configs and functions associated with turning 
symlinks on.  It also updates the docs.

> Distibuted Cache Compatability Issues
> -
>
> Key: MAPREDUCE-4493
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4493
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Critical
> Attachments: MR-4493.txt
>
>
> The distributed cache does not work like it does in 1.0.
> mapreduce.job.cache.symlink.create is completely ignored and symlinks are 
> always created no matter what.  Files and archives without a fragment will 
> also have symlinks created.
> If two cache archives or cache files happen to have the same name, or same 
> symlink fragment only the last one in the list is localized.
> The localCacheArchives and LocalCacheFiles are not set correctly when these 
> duplicates happen causing off by one or more errors for anyone trying to use 
> them.
> The reality is that use of symlinking is so common currently that these 
> incompatibilities are not that likely to show up, but we still need to fix 
> them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4334) Add support for CPU isolation/monitoring of containers

2012-07-30 Thread Andrew Ferguson (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Ferguson updated MAPREDUCE-4334:
---

Attachment: mapreduce-4334-design-doc.txt

Design document outlining the two primary designs proposed here, as well as an 
alternate version of the second. Summarizes pros/cons discussed earlier in the 
JIRA.

More data, including screenshots from a live demo available here: 
http://www.cs.brown.edu/~adf/files/CgroupsPresentation.pptx

> Add support for CPU isolation/monitoring of containers
> --
>
> Key: MAPREDUCE-4334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4334
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Andrew Ferguson
> Attachments: MAPREDUCE-4334-executor-v1.patch, 
> MAPREDUCE-4334-executor-v2.patch, MAPREDUCE-4334-pre1.patch, 
> MAPREDUCE-4334-pre2-with_cpu.patch, MAPREDUCE-4334-pre2.patch, 
> MAPREDUCE-4334-pre3-with_cpu.patch, MAPREDUCE-4334-pre3.patch, 
> MAPREDUCE-4334-v1.patch, MAPREDUCE-4334-v2.patch, 
> mapreduce-4334-design-doc.txt
>
>
> Once we get in MAPREDUCE-4327, it will be important to actually enforce 
> limits on CPU consumption of containers. 
> Several options spring to mind:
> # taskset (RHEL5+)
> # cgroups (RHEL6+)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4491) Encryption and Key Protection

2012-07-30 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425244#comment-13425244
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4491:
---

Benoy, I've done a quick read to the doc. A couple of initial questions:

* If using compression codec for encryption, are you losing the compression 
capabilities if doing using encryption or will it work as a composition?
* For the keystores, are you proposing to store them in HDFS use file system 
permissions to protect them? I'm not sure if I understood this part correctly. 
If that is the case, then HDFS-3637 would ensure secure transfer.

I'll read the design doc in more detail later this week.



> Encryption and Key Protection
> -
>
> Key: MAPREDUCE-4491
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: documentation, security, task-controller, tasktracker
>Reporter: Benoy Antony
>Assignee: Benoy Antony
> Attachments: Hadoop_Encryption.pdf
>
>
> When dealing with sensitive data, it is required to keep the data encrypted 
> wherever it is stored. Common use case is to pull encrypted data out of a 
> datasource and store in HDFS for analysis. The keys are stored in an external 
> keystore. 
> The feature adds a customizable framework to integrate different types of 
> keystores, support for Java KeyStore, read keys from keystores, and transport 
> keys from JobClient to Tasks.
> The feature adds PGP encryption as a codec and additional utilities to 
> perform encryption related steps.
> The design document is attached. It explains the requirement, design and use 
> cases.
> Kindly review and comment. Collaboration is very much welcome.
> I have a tested patch for this for 1.1 and will upload it soon as an initial 
> work for further refinement. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-4495) Workflow Application Master in YARN

2012-07-30 Thread Bo Wang (JIRA)

Bo Wang created MAPREDUCE-4495:
--

 Summary: Workflow Application Master in YARN
 Key: MAPREDUCE-4495
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.0-alpha
Reporter: Bo Wang
Assignee: Bo Wang


It is useful to have a workflow application master, which will be capable of 
running a DAG of jobs. The workflow client submits a DAG request to the AM and 
then the AM will manage the life cycle of this application in terms of 
requesting the needed resources from the RM, and starting, monitoring and 
retrying the application's individual tasks.

Compared to running Oozie with the current MapReduce Application Master, these 
are some of the advantages:
 - Less number of consumed resources, since only one application master will be 
spawned for the whole workflow.
 - Reuse of resources, since the same resources can be used by multiple 
consecutive jobs in the workflow (no need to request/wait for resources for 
every individual job from the central RM).
 - More optimization opportunities in terms of collective resource requests.
 - Optimization opportunities in terms of rewriting and composing jobs in the 
workflow (e.g. pushing down Mappers).
 - This Application Master can be reused/extended by higher systems like Pig 
and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker

2012-07-30 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425225#comment-13425225
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4342:
---

Mayank, I don't see my previous second comment being addressed. I was meaning 
the following.

Instead doing:

{code}
  public void handle(ResourceEvent event) {
LocalResourceRequest req = event.getLocalResourceRequest();
LocalizedResource rsrc = localrsrc.get(req);
if (rsrc != null
&& (event.getType() == ResourceEventType.LOCALIZED || event.getType() 
== ResourceEventType.REQUEST)
&& (!isResourcePresent(rsrc))) {
  LOG.info("Resource " + rsrc.getLocalPath()
  + " is missing, localizing it again");
  localrsrc.remove(req);
  rsrc = null;
}
switch (event.getType()) {
case REQUEST:
case LOCALIZED:
  if (null == rsrc) {
rsrc = new LocalizedResource(req, dispatcher);
localrsrc.put(req, rsrc);
  }
  break;

{code}

Do:

{code}
  public void handle(ResourceEvent event) {
LocalResourceRequest req = event.getLocalResourceRequest();
LocalizedResource rsrc = localrsrc.get(req);
switch (event.getType()) {
case REQUEST:
case LOCALIZED:
  if (rsrc != null && !isResourcePresent(rsrc)) {
LOG.info("Resource " + rsrc.getLocalPath()
+ " is missing, localizing it again");
localrsrc.remove(req);
rsrc = null;
  }
  if (null == rsrc) {
rsrc = new LocalizedResource(req, dispatcher);
localrsrc.put(req, rsrc);
  }
  break;

{code}



> Distributed Cache gives inconsistent result if cache files get deleted from 
> task tracker 
> -
>
> Key: MAPREDUCE-4342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0, 1.0.3, trunk
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, 
> MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, 
> MAPREDUCE-4342-trunk-v2.patch, MAPREDUCE-4342-trunk.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4444) nodemanager fails to start when one of the local-dirs is bad

2012-07-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425155#comment-13425155
 ] 

Hadoop QA commented on MAPREDUCE-:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12538396/MAPREDUCE-.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2676//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2676//console

This message is automatically generated.

> nodemanager fails to start when one of the local-dirs is bad
> 
>
> Key: MAPREDUCE-
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0
>Reporter: Nathan Roberts
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4444) nodemanager fails to start when one of the local-dirs is bad

2012-07-30 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-:
--

Target Version/s: 0.23.3, 2.2.0-alpha
  Status: Patch Available  (was: Open)

> nodemanager fails to start when one of the local-dirs is bad
> 
>
> Key: MAPREDUCE-
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.0-alpha, 0.23.3, 3.0.0
>Reporter: Nathan Roberts
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4444) nodemanager fails to start when one of the local-dirs is bad

2012-07-30 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-:
--

Attachment: MAPREDUCE-.patch

Patch that changes LocalDirsHandlerService to check for bad directories during 
init so they're removed from the list of directories before subsequent init 
code tries to access them.

> nodemanager fails to start when one of the local-dirs is bad
> 
>
> Key: MAPREDUCE-
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0
>Reporter: Nathan Roberts
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker

2012-07-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425129#comment-13425129
 ] 

Hadoop QA commented on MAPREDUCE-4342:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12538384/MAPREDUCE-4342-trunk-v2.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2675//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2675//console

This message is automatically generated.

> Distributed Cache gives inconsistent result if cache files get deleted from 
> task tracker 
> -
>
> Key: MAPREDUCE-4342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0, 1.0.3, trunk
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, 
> MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, 
> MAPREDUCE-4342-trunk-v2.patch, MAPREDUCE-4342-trunk.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-3726) jobstatus.getjobfile should return jobtracker copy of job.xml instead of .staging copy of job.xml

2012-07-30 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal reassigned MAPREDUCE-3726:


Assignee: Mayank Bansal

> jobstatus.getjobfile should return jobtracker copy of job.xml instead of 
> .staging copy of job.xml
> -
>
> Key: MAPREDUCE-3726
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3726
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, jobtracker
>Affects Versions: 0.22.1
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
>Priority: Minor
>
> jobstatus.getjobfile should return jobtracker copy of job.xml instead of 
> .staging copy of job.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4482) Backport MR sort plugin(MAPREDUCE-2454) to Hadoop 1.1.x

2012-07-30 Thread Mariappan Asokan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mariappan Asokan updated MAPREDUCE-4482:


Affects Version/s: (was: 1.1.1)
   (was: 1.1.0)
   1.2.0

> Backport MR sort plugin(MAPREDUCE-2454) to Hadoop 1.1.x
> ---
>
> Key: MAPREDUCE-4482
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4482
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: mrv1
>Affects Versions: 1.2.0
>Reporter: Mariappan Asokan
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker

2012-07-30 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated MAPREDUCE-4342:
-

Attachment: MAPREDUCE-4342-trunk-v2.patch

Thanks Alejandro for your review and comments
Incorporating all your comments

Thanks,
Mayank



> Distributed Cache gives inconsistent result if cache files get deleted from 
> task tracker 
> -
>
> Key: MAPREDUCE-4342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0, 1.0.3, trunk
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, 
> MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, 
> MAPREDUCE-4342-trunk-v2.patch, MAPREDUCE-4342-trunk.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4342) Distributed Cache gives inconsistent result if cache files get deleted from task tracker

2012-07-30 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425047#comment-13425047
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4342:
---

The log message should be something like 'Resource ### is missing, localizing 
it again'

If moving the check within the case block for REQUEST/LOCALIZED there is no 
need for the outer IF check.

> Distributed Cache gives inconsistent result if cache files get deleted from 
> task tracker 
> -
>
> Key: MAPREDUCE-4342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0, 1.0.3, trunk
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
> Attachments: MAPREDUCE-4342-22-1.patch, MAPREDUCE-4342-22-2.patch, 
> MAPREDUCE-4342-22-3.patch, MAPREDUCE-4342-22.patch, MAPREDUCE-4342-trunk.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4456) LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating symlinks

2012-07-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424946#comment-13424946
 ] 

Hadoop QA commented on MAPREDUCE-4456:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12538358/MR-4456.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified test 
files.

-1 javac.  The applied patch generated 2049 javac compiler warnings (more 
than the trunk's current 2048 warnings).

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2674//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2674//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2674//console

This message is automatically generated.

> LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating 
> symlinks
> 
>
> Key: MAPREDUCE-4456
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4456
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Attachments: MR-4456.txt, MR-4456.txt
>
>
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 1
> at
> org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:194)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:154)
> at
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:620)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:385)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1212)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1212)
> at
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
> at
> org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:233)
> at java.lang.Thread.run(Thread.java:619)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4493) Distibuted Cache Compatability Issues

2012-07-30 Thread Robert Joseph Evans (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4493:
---

Priority: Critical  (was: Blocker)

Dropping severity because this is just going to be a configuration change.

> Distibuted Cache Compatability Issues
> -
>
> Key: MAPREDUCE-4493
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4493
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> The distributed cache does not work like it does in 1.0.
> mapreduce.job.cache.symlink.create is completely ignored and symlinks are 
> always created no matter what.  Files and archives without a fragment will 
> also have symlinks created.
> If two cache archives or cache files happen to have the same name, or same 
> symlink fragment only the last one in the list is localized.
> The localCacheArchives and LocalCacheFiles are not set correctly when these 
> duplicates happen causing off by one or more errors for anyone trying to use 
> them.
> The reality is that use of symlinking is so common currently that these 
> incompatibilities are not that likely to show up, but we still need to fix 
> them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4493) Distibuted Cache Compatability Issues

2012-07-30 Thread Robert Joseph Evans (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424929#comment-13424929
 ] 

Robert Joseph Evans commented on MAPREDUCE-4493:


Sorry I meant documentation not configuration.

> Distibuted Cache Compatability Issues
> -
>
> Key: MAPREDUCE-4493
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4493
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.3, 2.0.0-alpha, 3.0.0
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
>Priority: Critical
>
> The distributed cache does not work like it does in 1.0.
> mapreduce.job.cache.symlink.create is completely ignored and symlinks are 
> always created no matter what.  Files and archives without a fragment will 
> also have symlinks created.
> If two cache archives or cache files happen to have the same name, or same 
> symlink fragment only the last one in the list is localized.
> The localCacheArchives and LocalCacheFiles are not set correctly when these 
> duplicates happen causing off by one or more errors for anyone trying to use 
> them.
> The reality is that use of symlinking is so common currently that these 
> incompatibilities are not that likely to show up, but we still need to fix 
> them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4456) LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating symlinks

2012-07-30 Thread Robert Joseph Evans (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4456:
---

Status: Patch Available  (was: Open)

> LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating 
> symlinks
> 
>
> Key: MAPREDUCE-4456
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4456
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Attachments: MR-4456.txt, MR-4456.txt
>
>
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 1
> at
> org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:194)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:154)
> at
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:620)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:385)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1212)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1212)
> at
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
> at
> org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:233)
> at java.lang.Thread.run(Thread.java:619)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4456) LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating symlinks

2012-07-30 Thread Robert Joseph Evans (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4456:
---

Attachment: MR-4456.txt

This patch fixes the test failure.

> LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating 
> symlinks
> 
>
> Key: MAPREDUCE-4456
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4456
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Attachments: MR-4456.txt, MR-4456.txt
>
>
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 1
> at
> org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:194)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:154)
> at
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:620)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:385)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1212)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1212)
> at
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
> at
> org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:233)
> at java.lang.Thread.run(Thread.java:619)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4456) LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating symlinks

2012-07-30 Thread Robert Joseph Evans (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424917#comment-13424917
 ] 

Robert Joseph Evans commented on MAPREDUCE-4456:


After talking with Arun on MAPREDUCE-4493.  He feels that the current MR2 
behavior is correct, and we should just document the differences.  I am fine 
with going that rout so I will just update the test to expect the new behavior, 
and then document that behavior on the other JIRA. 

> LocalDistributedCacheManager can get an ArrayIndexOutOfBounds when creating 
> symlinks
> 
>
> Key: MAPREDUCE-4456
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4456
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Attachments: MR-4456.txt
>
>
> {noformat}
> java.lang.ArrayIndexOutOfBoundsException: 1
> at
> org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:194)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:154)
> at
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:620)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:385)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1212)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1212)
> at
> org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:336)
> at
> org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:233)
> at java.lang.Thread.run(Thread.java:619)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-4010) TestWritableJobConf fails on trunk

2012-07-30 Thread Robert Joseph Evans (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-4010:
---

Fix Version/s: 0.23.3

I pulled this into branch-0.23

> TestWritableJobConf fails on trunk
> --
>
> Key: MAPREDUCE-4010
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4010
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.24.0
>Reporter: Jason Lowe
>Assignee: Alejandro Abdelnur
>Priority: Critical
> Fix For: 0.23.3, 2.0.0-alpha
>
> Attachments: MAPREDUCE-4010.patch, MAPREDUCE-4010.patch, 
> MAPREDUCE-4010.patch
>
>
> TestWritableJobConf is currently failing two tests on trunk:
> * testEmptyConfiguration
> * testNonEmptyConfiguration
> Appears to have been caused by HADOOP-8167.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4039) Sort Avoidance

2012-07-30 Thread Ahmed Radwan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424741#comment-13424741
 ] 

Ahmed Radwan commented on MAPREDUCE-4039:
-

bq. when sort plugin is finished, this patch will need be modified.

Sure Anty, are you already working on updating the patch with pluggable 
MapOutputBuffer and Shuffle then?

> Sort Avoidance
> --
>
> Key: MAPREDUCE-4039
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4039
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: mrv2
>Affects Versions: 0.23.2
>Reporter: anty.rao
>Assignee: anty
>Priority: Minor
> Fix For: 0.23.2
>
> Attachments: IndexedCountingSortable.java, 
> MAPREDUCE-4039-branch-0.23.2.patch, MAPREDUCE-4039-branch-0.23.2.patch, 
> MAPREDUCE-4039-branch-0.23.2.patch
>
>
> Inspired by 
> [Tenzing|http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/37200.pdf],
>  in 5.1 MapReduce Enhanceemtns:
> {quote}*Sort Avoidance*. Certain operators such as hash join
> and hash aggregation require shuffling, but not sorting. The
> MapReduce API was enhanced to automatically turn off
> sorting for these operations. When sorting is turned off, the
> mapper feeds data to the reducer which directly passes the
> data to the Reduce() function bypassing the intermediate
> sorting step. This makes many SQL operators significantly
> more ecient.{quote}
> There are a lot of applications which need aggregation only, not 
> sorting.Using sorting to achieve aggregation is costly and inefficient. 
> Without sorting, up application can make use of hash table or hash map to do 
> aggregation efficiently.But application should bear in mind that reduce 
> memory is limited, itself is committed to manage memory of reduce, guard 
> against out of memory. Map-side combiner is not supported, you can also do 
> hash aggregation in map side  as a workaround.
> the following is the main points of sort avoidance implementation
> # add a configuration parameter ??mapreduce.sort.avoidance??, boolean type, 
> to turn on/off sort avoidance workflow.Two type of workflow are coexist 
> together.
> # key/value pairs emitted by map function is sorted by partition only, using 
> a more efficient sorting algorithm: counting sort.
> # map-side merge, use a kind of byte merge, which just concatenate bytes from 
> generated spills, read in bytes, write out bytes, without overhead of 
> key/value serialization/deserailization, comparison, which current version 
> incurs.
> # reduce can start up as soon as there is any map output available, in 
> contrast to sort workflow which must wait until all map outputs are fetched 
> and merged.
> # map output in memory can be directly consumed by reduce.When reduce can't 
> catch up with the speed of incoming map outputs, in-memory merge thread will 
> kick in, merging in-memory map outputs onto disk.
> # sequentially read in on-disk files to feed reduce, in contrast to currently 
> implementation which read multiple files concurrently, result in many disk 
> seek. Map output in memory take precedence over on disk files in feeding 
> reduce function.
> I have already implement this feature based on hadoop CDH3U3 and done some 
> performance evaluation, you can reference to 
> [https://github.com/hanborq/hadoop] for details. Now,I'm willing to port it 
> into yarn. Welcome for commenting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

61 matches

Mail list logo