[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427496#comment-13427496
 ] 

Arun C Murthy commented on MAPREDUCE-4495:
------------------------------------------

For disclosure, I had another chat with Alejandro. I'll let Alejandro post his 
responses of course.

I did this because it's easier to talk things through than debate, in an 
offline manner, on jira. 

My proposal to him was that I, as someone who has vested interest in seeing 
Apache Hadoop YARN succeed, would be willing to help out to take care of the 
'bureaucracy' to establish a new Incubator project (such as Incubator proposal, 
mentoring, project setup etc.).

The reason I am willing to do that is that I believe making this 'DAG-AM' 
generic and much more than MapReduce will, in turn, aid adoption of YARN itself 
- my primary goal.

Thus, I can get started on the 'infra' work so that Alejandro can concentrate 
on shipping the code and then we can collaborate via the Incubator project.

This should satisfy Alejandro's concerns about the overheads and also the 
concerns raised by Chris about the umbrella projects.


Thoughts?

                
> Workflow Application Master in YARN
> -----------------------------------
>
>                 Key: MAPREDUCE-4495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha
>            Reporter: Bo Wang
>            Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to