[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN

Chris A. Mattmann (JIRA) Thu, 02 Aug 2012 08:48:05 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427398#comment-13427398
 ]


Chris A. Mattmann commented on MAPREDUCE-4495:
----------------------------------------------

Hi Alejandro:

bq. as we all know, doing a few JIRAs to add new functionality to Hadoop is 
completely different from bootstrapping an incubator project. 

Actually, no we don't all know this.

What's the overhead for an Incubator project other than?

# writing up a proposal here: http://wiki.apache.org/incubator/YourProposalName 
using the existing template
# finding 3 ASF members (of which there are already 3+ discussing this) willing 
to mentor
# picking one of them as a Champion

Going from there. Putting code into Apache Hadoop is the same as putting code 
in yet-to-be-named-Apache-Incubator-project.

So, sorry I don't see the overhead.

Regarding your other points:

bq. We can easily avoid circular dependencies, thus we should. 

Sorry that's not always the best bet -- if you're speaking technically, fine, 
but Apache is a home for communities, and you have to deal with community 
issues, as well as technical ones. 

{quote}
Regarding you 3 questions, only time will tell. I'm not opposed to a separate 
project, I'm just worried about starting a new project without a need for it, 
and only time and adoption will tell (I'm trying not to go 'project happy' 
here). At that point we can split it from Hadoop as it has been done with other 
projects.
{quote}

I think you're missing the point. The ASF will not allow umbrella projects -- 
they have pushed out the cases of those projects into separate TLPs. If you are 
trying to start a new project at the ASF and it has a distinct community, 
separate release cycle, etc., etc., (the questions I asked you), then the 
Apache Hadoop PMC should be very leery about accepting the contribution into 
Hadoop and instead gently nudge :) you guys to head the distinct project route.



                
> Workflow Application Master in YARN
> -----------------------------------
>
>                 Key: MAPREDUCE-4495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha
>            Reporter: Bo Wang
>            Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4495) Workflow Application Master in YARN

Reply via email to