[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427484#comment-13427484
 ] 

Josh Wills commented on MAPREDUCE-4495:
---------------------------------------

@Chris, thanks for clarifying your meaning. I think the line that threw me for 
a curve in your original comment was "Putting code into Apache Hadoop is the 
same as putting code in yet-to-be-named-Apache-Incubator-project," since for 
that to be true, we would need for the incubating project to have, at the very 
least, a source code repository created by infrastructure.

And of course, we shouldn't rely on anyone's anecdotal experience, especially 
when we have tools that can show us resolution times for infrastructure issues 
tagged with/Subversion going all the way back to 2005:

Quarter Closed  Days    Closed/Days
Q1/2005 1       17      17
Q2/2005 18      459     25
Q3/2005 24      403     16
Q4/2005 12      98      8
Q1/2006 8       104     13
Q2/2006 8       149     18
Q3/2006 8       483     60
Q4/2006 10      745     74
Q1/2007 5       733     146
Q2/2007 4       27      6
Q3/2007 1       9       9
Q4/2007 1       0       0
Q1/2008 13      328     25
Q2/2008 7       90      12
Q3/2008 3       65      21
Q4/2008 7       519     74
Q1/2009 9       1393    154
Q2/2009 4       92      23
Q3/2009 8       409     51
Q4/2009 9       934     103
Q1/2010 9       42      4
Q2/2010 13      749     57
Q3/2010 7       92      13
Q4/2010 17      1086    63
Q1/2011 11      102     9
Q2/2011 11      82      7
Q3/2011 17      96      5
Q4/2011 10      72      7
Q1/2012 20      72      3
Q2/2012 13      129     9
Q3/2012 2       10      5

Or for git since it was added in 2009:
Quarter Closed  Days    Closed/Days
Q1/2009 2       6       3
Q2/2009 22      158     7
Q3/2009 17      249     14
Q4/2009 4       114     28
Q1/2010 12      266     22
Q2/2010 11      80      7
Q3/2010 15      271     18
Q4/2010 17      112     6
Q1/2011 15      379     25
Q2/2011 14      7       0
Q3/2011 33      2281    69
Q4/2011 33      632     19
Q1/2012 23      403     17
Q2/2012 28      491     17
Q3/2012 18      1074    59

Sigh. The median would be so much more useful, right?
                
> Workflow Application Master in YARN
> -----------------------------------
>
>                 Key: MAPREDUCE-4495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha
>            Reporter: Bo Wang
>            Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to