[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427585#comment-13427585
 ] 

Chris A. Mattmann commented on MAPREDUCE-4495:
----------------------------------------------

Hi Josh:

bq. @Chris, thanks for clarifying your meaning. I think the line that threw me 
for a curve in your original comment was "Putting code into Apache Hadoop is 
the same as putting code in yet-to-be-named-Apache-Incubator-project," since 
for that to be true, we would need for the incubating project to have, at the 
very least, a source code repository created by infrastructure.

And again, having the repo created isn't anything big. Most mentors can do this 
themselves on day 1 of the project being accepted? It's a simple svn mkdir 
https://svn.apache.org/repos/asf/...

Besides that, regarding your statistics data, I wouldn't put much data quality 
investment into your stats since:

* JIRA issues aren't always closed the minute/second/milisecond the work is 
done. I've been involved in lots of projects where an issue is finished, and 
then the issue is closed days, weeks, even months later (e.g., "oh, I forgot to 
close that issue...")

* Not all work in INFRA is a JIRA issue.

* The JIRA SVN plugin requires that the person tagged the SVN commit with the 
JIRA issue ID. Not everything is tracked in Subversion.

Thus, unfortunately, garbage in, garbage out.

                
> Workflow Application Master in YARN
> -----------------------------------
>
>                 Key: MAPREDUCE-4495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha
>            Reporter: Bo Wang
>            Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to