[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490736#comment-16490736 ] Peter Cseh commented on OOZIE-1178: --- Yeah, this is for run and manage the workflow execution from a Yarn AM instead of the Oozie server to make Oozie more scalable. There are some issues with this though: # how to not DDOS the database? If every WFAM communicates with the Oozie server to talk to the database, would it help the scalability at all? # As [~andras.piros] mentioned there are some issues with synchronous actions? ## how to run ssh action? - will it be even supported? ## Email and FS action are looking more managable # How we handle getting and injecting delegation tokens to the WFAM for every action? We certainly don't want to distribute the Oozie keytab within the Yarn cluster There are some crazy upsides in this though: it would open up the possibility to execute way more dynamic workflows (e.g. workflow defined by code) as user code would run more contained. > Workflow Application Master in YARN > --- > > Key: OOZIE-1178 > URL: https://issues.apache.org/jira/browse/OOZIE-1178 > Project: Oozie > Issue Type: New Feature >Reporter: Bo Wang >Priority: Major > Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, > MapReduceWorkflowAM.pdf, yapp_proposal.txt > > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490722#comment-16490722 ] Andras Piros commented on OOZIE-1178: - [~dbist13] not exactly. This JIRA is about having the whole workflow (all applications) run on YARN in a single {{WorkflowAM}} ApplicationMaster, whereas [*Oozie On YARN*|https://issues.apache.org/jira/browse/OOZIE-1770] was about having one workflow application's launcher run on YARN as a {{LauncherAM}} ApplicationMaster. I wouldn't close for that reason. Another question can be if we really want to support something like that; in the meanwhile we have workflow actions that are meant to run on one of the Oozie servers (synchronous actions) that cannot be run directly on a YARN NodeManager container in any case. [~gezapeti] what are your two cents? > Workflow Application Master in YARN > --- > > Key: OOZIE-1178 > URL: https://issues.apache.org/jira/browse/OOZIE-1178 > Project: Oozie > Issue Type: New Feature >Reporter: Bo Wang >Priority: Major > Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, > MapReduceWorkflowAM.pdf, yapp_proposal.txt > > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490688#comment-16490688 ] Artem Ervits commented on OOZIE-1178: - [~andras.piros] with OYA this can be closed, no? > Workflow Application Master in YARN > --- > > Key: OOZIE-1178 > URL: https://issues.apache.org/jira/browse/OOZIE-1178 > Project: Oozie > Issue Type: New Feature >Reporter: Bo Wang >Priority: Major > Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, > MapReduceWorkflowAM.pdf, yapp_proposal.txt > > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13581199#comment-13581199 ] Tianyou Li commented on OOZIE-1178: --- Hi [~acmurthy], Is there any progress on submitting YAPP proposal to Apache Incubator? The team here is hoping to participate in YAPP project development and make contributions. By chance I saw project Tez{http://wiki.apache.org/incubator/TezProposal}, do you see any relationship between YAPP and Tez? Thanks. > Workflow Application Master in YARN > --- > > Key: OOZIE-1178 > URL: https://issues.apache.org/jira/browse/OOZIE-1178 > Project: Oozie > Issue Type: New Feature >Reporter: Bo Wang > Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, > MapReduceWorkflowAM.pdf, yapp_proposal.txt > > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562018#comment-13562018 ] Andrew Purtell commented on OOZIE-1178: --- Hi [~acmurthy]. I'd love to but I must decline, unfortunately I will have full time commitments to HBase and HDFS issues at least through 2013. > Workflow Application Master in YARN > --- > > Key: OOZIE-1178 > URL: https://issues.apache.org/jira/browse/OOZIE-1178 > Project: Oozie > Issue Type: New Feature >Reporter: Bo Wang > Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, > MapReduceWorkflowAM.pdf, yapp_proposal.txt > > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13561469#comment-13561469 ] Tianyou Li commented on OOZIE-1178: --- [~bowang]Thanks Bo, looking forward to work together with you too! > Workflow Application Master in YARN > --- > > Key: OOZIE-1178 > URL: https://issues.apache.org/jira/browse/OOZIE-1178 > Project: Oozie > Issue Type: New Feature >Reporter: Bo Wang > Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, > MapReduceWorkflowAM.pdf, yapp_proposal.txt > > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13561468#comment-13561468 ] Tianyou Li commented on OOZIE-1178: --- [~acmurthy]Thanks for your response. And thank you for accepting us as part of the project, look forward to work with you and others of this. If there are anything we can do to help you for the legwork, please let us know. We will wait to hear from you on how and when we can collaborate on this in the community, meanwhile we continue to build up the solution and once it ready we are glad to share the code with the community. > Workflow Application Master in YARN > --- > > Key: OOZIE-1178 > URL: https://issues.apache.org/jira/browse/OOZIE-1178 > Project: Oozie > Issue Type: New Feature >Reporter: Bo Wang > Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, > MapReduceWorkflowAM.pdf, yapp_proposal.txt > > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13561429#comment-13561429 ] Bo Wang commented on OOZIE-1178: Hi Tianyou, glad to hear from you! I am a 3rd year PhD at Stanford and has been working on this JIRA since my internship at Cloudera last summer. Look forward to collaborating on this project. Hi Arun, definitely I'd love to contribute to this and make it into the production. > Workflow Application Master in YARN > --- > > Key: OOZIE-1178 > URL: https://issues.apache.org/jira/browse/OOZIE-1178 > Project: Oozie > Issue Type: New Feature >Reporter: Bo Wang > Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, > MapReduceWorkflowAM.pdf, yapp_proposal.txt > > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13561374#comment-13561374 ] Arun C Murthy commented on OOZIE-1178: -- [~tianyou...@gmail.com] That is great to hear! I'd love to get started too. I volunteer to take this fwd to do the legwork etc. Andrew - I presume you would be interested too? Tucu? Bo? > Workflow Application Master in YARN > --- > > Key: OOZIE-1178 > URL: https://issues.apache.org/jira/browse/OOZIE-1178 > Project: Oozie > Issue Type: New Feature >Reporter: Bo Wang > Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, > MapReduceWorkflowAM.pdf, yapp_proposal.txt > > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13561353#comment-13561353 ] Tianyou Li commented on OOZIE-1178: --- Hello, This is Tianyou Li from Intel. This JIRA, and the YAPP proposal specifically, have been recently brought to our attention. We think YAPP is very interesting and would be eager to participate in the development of the software and creation of a viable developer and user community. We have been working on hosting Hive execution plans in a specialized AM that can run MR jobs internally according to a DAG supplied by the Hive front end. Initial performance tests show attractive numbers that seem to bear out the approach. Our current plan is to finish a production ready specialized Hive AM for executing plans (job DAGs), and then work on managing reuse of a scalable pool of persistent containers for the executors, and also reuse of the specialized Hive AM so the AM does not need to be instantiated for every query. However it would be great if, rather than focus on a specialized Hive AM exclusively, we could contribute efforts to something useful to Hive, Pig, Oozie, and many other new efforts that could benefit. We hope it is a suitable time to consider collaboration, before we make any further progress. We would like to contribute our work in some form, but more importantly our ongoing efforts. Both myself (Tianyou Li, tianyou...@gmail.com) and my colleague Yi Liu (hitli...@gmail.com) have been doing the above described work internally and would like to volunteer as additional initial committers on the YAPP proposal, with the backing of our employer Intel. Others in our team are Apache committers and PMC members, so we are aware of the responsibilities and are committed to fulfilling them. Thank you for your kind consideration. > Workflow Application Master in YARN > --- > > Key: OOZIE-1178 > URL: https://issues.apache.org/jira/browse/OOZIE-1178 > Project: Oozie > Issue Type: New Feature >Reporter: Bo Wang > Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, > MapReduceWorkflowAM.pdf, yapp_proposal.txt > > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559188#comment-13559188 ] Bo Wang commented on OOZIE-1178: Hi Arun, I'd love to keep on working on WfAM. But I think the discussion on where to put it is not resolved yet. > Workflow Application Master in YARN > --- > > Key: OOZIE-1178 > URL: https://issues.apache.org/jira/browse/OOZIE-1178 > Project: Oozie > Issue Type: New Feature >Reporter: Bo Wang > Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, > MapReduceWorkflowAM.pdf, yapp_proposal.txt > > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559186#comment-13559186 ] Bo Wang commented on OOZIE-1178: Hi Andrew, bq. Would it be possible to refresh the patch on this JIRA? The code is in an internal repo at Cloudera, but I am now back to school and have no access to it. {quote} But for V2 and V3 when an AM is launched by the WF AM and not directly by the RM the WF AM must take over some responsibilities of the RM. I am curious how many of those responsibilities it will take over. I am also curious about what modifications will be required to other AMs so that they can interact with both the WF AM and also the RM directly. bq. Would it be possible this could be handled by a RM<->AM delegation API, with consideration for when the RM can kill a delegate not responding sufficiently to its responsibilities? {quote} This is a good question. WfAM takes over responsibilities including monitoring child AMs, killing/restarting child AMs in case of failure, etc. One of the design principles is to allow AMs to run in WfAM without modification. In other words, AMs should just treat WfAM as the "RM". Resource requests/releases should all be sent to WfAM instead. Then WfAM will determine how to serve these requests (either locally or forward it to RM). When a WfAM is not responding (over a period long enough for restarting), RM should kill the WfAM together with all the containers allocated to it. These containers include child AMs and workers. When a child AM is not responding, WfAM can trigger the kill and restart for it. bq. Finally, it would be interesting and useful if something like the WFAM proposed on this issue could maintain a persistent pool of workers... Yes, maintaining a (relatively) persistent pool of workers can reduce the scheduling cost. This is a great benefit of WfAM. Your comment reminds me of a discussion on a RM<->WfAM protocol in one of the early design meetings. This RM<->WfAM protocol allows RM to distinguish WfAM from other AMs. Thus each WfAM can report to RM the idle resources it retains (for possible reallocations) via this protocol. Then when there is a shortage of resources globally, RM can request WfAMs to release withheld resources. This protocol is not included in the proposal due to the potentially big changes to RM. > Workflow Application Master in YARN > --- > > Key: OOZIE-1178 > URL: https://issues.apache.org/jira/browse/OOZIE-1178 > Project: Oozie > Issue Type: New Feature >Reporter: Bo Wang > Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, > MapReduceWorkflowAM.pdf, yapp_proposal.txt > > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557491#comment-13557491 ] Arun C Murthy commented on OOZIE-1178: -- Moved to Oozie. If there is interest in doing 'yapp', I'm happy to drive it - but I don't want to do it without go ahead from people actually working on it. Bo? > Workflow Application Master in YARN > --- > > Key: OOZIE-1178 > URL: https://issues.apache.org/jira/browse/OOZIE-1178 > Project: Oozie > Issue Type: New Feature >Reporter: Bo Wang > Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, > MapReduceWorkflowAM.pdf, yapp_proposal.txt > > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN
[ https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557489#comment-13557489 ] Andrew Purtell commented on OOZIE-1178: --- And interestingly suddenly this is moved from MAPREDUCE to OOZIE. > Workflow Application Master in YARN > --- > > Key: OOZIE-1178 > URL: https://issues.apache.org/jira/browse/OOZIE-1178 > Project: Oozie > Issue Type: New Feature >Reporter: Bo Wang > Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, > MapReduceWorkflowAM.pdf, yapp_proposal.txt > > > It is useful to have a workflow application master, which will be capable of > running a DAG of jobs. The workflow client submits a DAG request to the AM > and then the AM will manage the life cycle of this application in terms of > requesting the needed resources from the RM, and starting, monitoring and > retrying the application's individual tasks. > Compared to running Oozie with the current MapReduce Application Master, > these are some of the advantages: > - Less number of consumed resources, since only one application master will > be spawned for the whole workflow. > - Reuse of resources, since the same resources can be used by multiple > consecutive jobs in the workflow (no need to request/wait for resources for > every individual job from the central RM). > - More optimization opportunities in terms of collective resource requests. > - Optimization opportunities in terms of rewriting and composing jobs in the > workflow (e.g. pushing down Mappers). > - This Application Master can be reused/extended by higher systems like Pig > and hive to provide an optimized way of running their workflows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira