[jira] [Commented] (OOZIE-674) resolveInstanceRange doesn't work for EL extensions

2013-01-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560576#comment-13560576
 ] 

Hadoop QA commented on OOZIE-674:
-

Testing JIRA OOZIE-674

Cleaning local svn workspace

 resolveInstanceRange doesn't work for EL extensions
 ---

 Key: OOZIE-674
 URL: https://issues.apache.org/jira/browse/OOZIE-674
 Project: Oozie
  Issue Type: Bug
Reporter: Shwetha G S
Assignee: Shwetha G S
  Labels: EL, extension
 Attachments: OOZIE-674.patch, OOZIE-674-ver2.patch


 I have an EL extension today(0,0) which maps to start day of nominal time. 
 This is used to specify startInstance, endInstance and instance in dataIn and 
 dataOut of coordinator.
 In CoordCommandUtils.resolveInstanceRange(), getInstanceNumber has to return 
 the instance number with respect to current. So, for coord-action-create-inst 
 context, I have mapped today to current and hence getInstanceNumber returns 
 the correct number. But later in resolveInstanceRange(), getFuncType is 
 called with startInstance value which is today in this case and it maps to 
 UNEXPECTED and throws up. getFuncType should be passed the evaluation of 
 coord-action-create-inst context

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (OOZIE-674) resolveInstanceRange doesn't work for EL extensions

2013-01-23 Thread Shwetha G S (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560646#comment-13560646
 ] 

Shwetha G S commented on OOZIE-674:
---

https://reviews.apache.org/r/9065/

 resolveInstanceRange doesn't work for EL extensions
 ---

 Key: OOZIE-674
 URL: https://issues.apache.org/jira/browse/OOZIE-674
 Project: Oozie
  Issue Type: Bug
Reporter: Shwetha G S
Assignee: Shwetha G S
  Labels: EL, extension
 Attachments: OOZIE-674.patch, OOZIE-674-ver2.patch


 I have an EL extension today(0,0) which maps to start day of nominal time. 
 This is used to specify startInstance, endInstance and instance in dataIn and 
 dataOut of coordinator.
 In CoordCommandUtils.resolveInstanceRange(), getInstanceNumber has to return 
 the instance number with respect to current. So, for coord-action-create-inst 
 context, I have mapped today to current and hence getInstanceNumber returns 
 the correct number. But later in resolveInstanceRange(), getFuncType is 
 called with startInstance value which is today in this case and it maps to 
 UNEXPECTED and throws up. getFuncType should be passed the evaluation of 
 coord-action-create-inst context

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (OOZIE-1186) Image load for Job DAG visualization should handle resources better

2013-01-23 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560876#comment-13560876
 ] 

Rohini Palaniswamy commented on OOZIE-1186:
---

Do not close response.getOutputStream();. Tomcat takes care of that. 

 Image load for Job DAG visualization should handle resources better
 ---

 Key: OOZIE-1186
 URL: https://issues.apache.org/jira/browse/OOZIE-1186
 Project: Oozie
  Issue Type: Bug
Affects Versions: trunk, 3.3.1
Reporter: Mona Chitnis
Assignee: Mona Chitnis
 Fix For: trunk, 3.3.1

 Attachments: OOZIE-1186.patch


 The Job DAG visualization loads an image into memory to be streamed on 
 outputstream. However, it does not free up memory and I/O resources leading 
 to 'Out of Java heap space' errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (OOZIE-1160) Oozie web-console to display all job URLs spawned by Pig

2013-01-23 Thread Mona Chitnis (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560923#comment-13560923
 ] 

Mona Chitnis commented on OOZIE-1160:
-

+1'ed on Reviewboard. Committed to trunk

 Oozie web-console to display all job URLs spawned by Pig
 

 Key: OOZIE-1160
 URL: https://issues.apache.org/jira/browse/OOZIE-1160
 Project: Oozie
  Issue Type: Improvement
Reporter: Mona Chitnis
Assignee: Mona Chitnis
 Attachments: OOZIE-1160v7.patch, Screen Shot 2013-01-14 at 8.20.52 
 PM.png

   Original Estimate: 72h
  Remaining Estimate: 72h

 The Oozie web UI only displays the console URL of the Pig launcher job. Users 
 need access to clickable console URLs of all the spawned 'child' jobs for 
 quicker debugging
 To elaborate a bit more on the changes,
 The Workflow action JSON response object returned would now have an 
 additional field called e.g. 'pigUrls', which I'm planning in a 
 comma-separated string of all the child URLs, and the web console javascript 
 code then obtains and splits into clickable fields, dynamically displayed 
 based on the quantity. So this is mainly a client-side API change, making use 
 of that field. 
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (OOZIE-1160) Oozie web-console to display all job URLs spawned by Pig

2013-01-23 Thread Mona Chitnis (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mona Chitnis resolved OOZIE-1160.
-

Resolution: Fixed

 Oozie web-console to display all job URLs spawned by Pig
 

 Key: OOZIE-1160
 URL: https://issues.apache.org/jira/browse/OOZIE-1160
 Project: Oozie
  Issue Type: Improvement
Reporter: Mona Chitnis
Assignee: Mona Chitnis
 Attachments: OOZIE-1160v7.patch, Screen Shot 2013-01-14 at 8.20.52 
 PM.png

   Original Estimate: 72h
  Remaining Estimate: 72h

 The Oozie web UI only displays the console URL of the Pig launcher job. Users 
 need access to clickable console URLs of all the spawned 'child' jobs for 
 quicker debugging
 To elaborate a bit more on the changes,
 The Workflow action JSON response object returned would now have an 
 additional field called e.g. 'pigUrls', which I'm planning in a 
 comma-separated string of all the child URLs, and the web console javascript 
 code then obtains and splits into clickable fields, dynamically displayed 
 based on the quantity. So this is mainly a client-side API change, making use 
 of that field. 
 Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (OOZIE-1186) Image load for Job DAG visualization should handle resources better

2013-01-23 Thread Virag Kothari (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561093#comment-13561093
 ] 

Virag Kothari commented on OOZIE-1186:
--

From the java docs, dispose() is not required when paint() method of component 
is used 
(http://docs.oracle.com/javase/6/docs/api/java/awt/Graphics.html#dispose())
From the below code, it seems like construction of BufferedImage can occupy 
lot of memory if d.width and d.height are huge as each pixel is an int.
{code}
 BufferedImage img = new BufferedImage(d.width, d.height, 
BufferedImage.TYPE_INT_RGB);
{code}
I dont think img.flush() will free this databuffer holding the pixel data. 
(http://docs.oracle.com/javase/6/docs/api/java/awt/Image.html#flush())
So, I am not sure which portion of the patch is actually releasing the 
resources and making a difference in memory consumption.
It seems creation of BufferedImage itself can lead to OOM if multiple servlets 
are creating this image simultaneously.



 

 


 Image load for Job DAG visualization should handle resources better
 ---

 Key: OOZIE-1186
 URL: https://issues.apache.org/jira/browse/OOZIE-1186
 Project: Oozie
  Issue Type: Bug
Affects Versions: trunk, 3.3.1
Reporter: Mona Chitnis
Assignee: Mona Chitnis
 Fix For: trunk, 3.3.1

 Attachments: OOZIE-1186.patch


 The Job DAG visualization loads an image into memory to be streamed on 
 outputstream. However, it does not free up memory and I/O resources leading 
 to 'Out of Java heap space' errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (OOZIE-1186) Image load for Job DAG visualization should handle resources better

2013-01-23 Thread Virag Kothari (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561096#comment-13561096
 ] 

Virag Kothari commented on OOZIE-1186:
--

Comment not related to the patch. Saw this while looking at the code
{code}
 public void finalize() {
// No-op; just to avoid finalizer attack
// as the constructor is throwing an exception
}
{code}
To avoid the finalizer attack, the method should be final. As its a one-line 
change, it would be good to have this added as part of the patch.

 Image load for Job DAG visualization should handle resources better
 ---

 Key: OOZIE-1186
 URL: https://issues.apache.org/jira/browse/OOZIE-1186
 Project: Oozie
  Issue Type: Bug
Affects Versions: trunk, 3.3.1
Reporter: Mona Chitnis
Assignee: Mona Chitnis
 Fix For: trunk, 3.3.1

 Attachments: OOZIE-1186.patch


 The Job DAG visualization loads an image into memory to be streamed on 
 outputstream. However, it does not free up memory and I/O resources leading 
 to 'Out of Java heap space' errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (OOZIE-1187) reduce memory usage of SLA query (invoked by CLI command) to avoid OOM

2013-01-23 Thread Ryota Egashira (JIRA)
Ryota Egashira created OOZIE-1187:
-

 Summary: reduce memory usage of SLA query (invoked by CLI command) 
to avoid OOM
 Key: OOZIE-1187
 URL: https://issues.apache.org/jira/browse/OOZIE-1187
 Project: Oozie
  Issue Type: Bug
  Components: core
Affects Versions: trunk
Reporter: Ryota Egashira
 Fix For: trunk


oozie sla -len 1000 caused OOM in Y! setting.
this jira to do following
1) use JDBCFetchPlan to reduce memory usage in SLAEventsGetJPAExecutor
 -just like we are doing for oozie jobs command
2) enforce max cap(say, 1000) in -len parameter that user sets 
 -need documentation change to notify user


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: OOZIE-1186 Image load for Job DAG visualization should handle resources better

2013-01-23 Thread Mona Chitnis

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9079/
---

Review request for oozie.


Description
---

https://issues.apache.org/jira/browse/OOZIE-1186


This addresses bug OOZIE-1186.
https://issues.apache.org/jira/browse/OOZIE-1186


Diffs
-

  trunk/core/src/main/java/org/apache/oozie/servlet/V1JobServlet.java 1437616 
  trunk/core/src/main/java/org/apache/oozie/util/GraphGenerator.java 1437616 
  trunk/core/src/test/java/org/apache/oozie/util/TestGraphGenerator.java 
1437616 

Diff: https://reviews.apache.org/r/9079/diff/


Testing
---

unit test done + end-to-end using Yourkit memory profiler


Thanks,

Mona Chitnis



[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN

2013-01-23 Thread Tianyou Li (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561353#comment-13561353
 ] 

Tianyou Li commented on OOZIE-1178:
---

Hello,

This is Tianyou Li from Intel. This JIRA, and the YAPP proposal specifically, 
have been recently brought to our attention. We think YAPP is very interesting 
and would be eager to participate in the development of the software and 
creation of a viable developer and user community. We have been working on 
hosting Hive execution plans in a specialized AM that can run MR jobs 
internally according to a DAG supplied by the Hive front end. Initial 
performance tests show attractive numbers that seem to bear out the approach. 
Our current plan is to finish a production ready specialized Hive AM for 
executing plans (job DAGs), and then work on managing reuse of a scalable pool 
of persistent containers for the executors, and also reuse of the specialized 
Hive AM so the AM does not need to be instantiated for every query.

However it would be great if, rather than focus on a specialized Hive AM 
exclusively, we could contribute efforts to something useful to Hive, Pig, 
Oozie, and many other new efforts that could benefit. We hope it is a suitable 
time to consider collaboration, before we make any further progress. We would 
like to contribute our work in some form, but more importantly our ongoing 
efforts. Both myself (Tianyou Li, tianyou...@gmail.com) and my colleague Yi Liu 
(hitli...@gmail.com) have been doing the above described work internally and 
would like to volunteer as additional initial committers  on the YAPP proposal, 
with the backing of our employer Intel. Others in our team are Apache 
committers and PMC members, so we are aware of the responsibilities and are 
committed to fulfilling them.

Thank you for your kind consideration.


 Workflow Application Master in YARN
 ---

 Key: OOZIE-1178
 URL: https://issues.apache.org/jira/browse/OOZIE-1178
 Project: Oozie
  Issue Type: New Feature
Reporter: Bo Wang
 Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, 
 MapReduceWorkflowAM.pdf, yapp_proposal.txt


 It is useful to have a workflow application master, which will be capable of 
 running a DAG of jobs. The workflow client submits a DAG request to the AM 
 and then the AM will manage the life cycle of this application in terms of 
 requesting the needed resources from the RM, and starting, monitoring and 
 retrying the application's individual tasks.
 Compared to running Oozie with the current MapReduce Application Master, 
 these are some of the advantages:
  - Less number of consumed resources, since only one application master will 
 be spawned for the whole workflow.
  - Reuse of resources, since the same resources can be used by multiple 
 consecutive jobs in the workflow (no need to request/wait for resources for 
 every individual job from the central RM).
  - More optimization opportunities in terms of collective resource requests.
  - Optimization opportunities in terms of rewriting and composing jobs in the 
 workflow (e.g. pushing down Mappers).
  - This Application Master can be reused/extended by higher systems like Pig 
 and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN

2013-01-23 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561374#comment-13561374
 ] 

Arun C Murthy commented on OOZIE-1178:
--

[~tianyou...@gmail.com] That is great to hear! I'd love to get started too. I 
volunteer to take this fwd to do the legwork etc. Andrew - I presume you would 
be interested too? Tucu? Bo?

 Workflow Application Master in YARN
 ---

 Key: OOZIE-1178
 URL: https://issues.apache.org/jira/browse/OOZIE-1178
 Project: Oozie
  Issue Type: New Feature
Reporter: Bo Wang
 Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, 
 MapReduceWorkflowAM.pdf, yapp_proposal.txt


 It is useful to have a workflow application master, which will be capable of 
 running a DAG of jobs. The workflow client submits a DAG request to the AM 
 and then the AM will manage the life cycle of this application in terms of 
 requesting the needed resources from the RM, and starting, monitoring and 
 retrying the application's individual tasks.
 Compared to running Oozie with the current MapReduce Application Master, 
 these are some of the advantages:
  - Less number of consumed resources, since only one application master will 
 be spawned for the whole workflow.
  - Reuse of resources, since the same resources can be used by multiple 
 consecutive jobs in the workflow (no need to request/wait for resources for 
 every individual job from the central RM).
  - More optimization opportunities in terms of collective resource requests.
  - Optimization opportunities in terms of rewriting and composing jobs in the 
 workflow (e.g. pushing down Mappers).
  - This Application Master can be reused/extended by higher systems like Pig 
 and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (OOZIE-1186) Image load for Job DAG visualization should handle resources better

2013-01-23 Thread Mona Chitnis (JIRA)

 [ 
https://issues.apache.org/jira/browse/OOZIE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mona Chitnis updated OOZIE-1186:


Attachment: HeapMemoryExistingTrunkCode.png
HeapMemoryWithPatchedCode.png

See attachments for snapshots from the memory profiler, with a WF job (DAG 
image of about 1MB size) run with and without the patch. I hit reload on the 
'show=graph' command 5 times for both cases to see how multiple servlet 
responses are handled.

Basic observation is heap memory occupied goes on increasing without patch, 
whereas stays lower and constant with patch.

I am still digging further into any lingering references to the BufferedImage 
object that are not released after every image load. The javadoc for 
img.flush() or Graphics2D.dispose seems to indicate no obvious relinquish in 
our case.



 Image load for Job DAG visualization should handle resources better
 ---

 Key: OOZIE-1186
 URL: https://issues.apache.org/jira/browse/OOZIE-1186
 Project: Oozie
  Issue Type: Bug
Affects Versions: trunk, 3.3.1
Reporter: Mona Chitnis
Assignee: Mona Chitnis
 Fix For: trunk, 3.3.1

 Attachments: HeapMemoryExistingTrunkCode.png, 
 HeapMemoryWithPatchedCode.png, OOZIE-1186.patch


 The Job DAG visualization loads an image into memory to be streamed on 
 outputstream. However, it does not free up memory and I/O resources leading 
 to 'Out of Java heap space' errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN

2013-01-23 Thread Bo Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561429#comment-13561429
 ] 

Bo Wang commented on OOZIE-1178:


Hi Tianyou, glad to hear from you! I am a 3rd year PhD at Stanford and has been 
working on this JIRA since my internship at Cloudera last summer. Look forward 
to collaborating on this project.

Hi Arun, definitely I'd love to contribute to this and make it into the 
production.

 Workflow Application Master in YARN
 ---

 Key: OOZIE-1178
 URL: https://issues.apache.org/jira/browse/OOZIE-1178
 Project: Oozie
  Issue Type: New Feature
Reporter: Bo Wang
 Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, 
 MapReduceWorkflowAM.pdf, yapp_proposal.txt


 It is useful to have a workflow application master, which will be capable of 
 running a DAG of jobs. The workflow client submits a DAG request to the AM 
 and then the AM will manage the life cycle of this application in terms of 
 requesting the needed resources from the RM, and starting, monitoring and 
 retrying the application's individual tasks.
 Compared to running Oozie with the current MapReduce Application Master, 
 these are some of the advantages:
  - Less number of consumed resources, since only one application master will 
 be spawned for the whole workflow.
  - Reuse of resources, since the same resources can be used by multiple 
 consecutive jobs in the workflow (no need to request/wait for resources for 
 every individual job from the central RM).
  - More optimization opportunities in terms of collective resource requests.
  - Optimization opportunities in terms of rewriting and composing jobs in the 
 workflow (e.g. pushing down Mappers).
  - This Application Master can be reused/extended by higher systems like Pig 
 and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN

2013-01-23 Thread Tianyou Li (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561468#comment-13561468
 ] 

Tianyou Li commented on OOZIE-1178:
---

[~acmurthy]Thanks for your response. And thank you for accepting us as part of 
the project, look forward to work with you and others of this. If there are 
anything we can do to help you for the legwork, please let us know. We will 
wait to hear from you on how and when we can collaborate on this in the 
community, meanwhile we continue to build up the solution and once it ready we 
are glad to share the code with the community.


 Workflow Application Master in YARN
 ---

 Key: OOZIE-1178
 URL: https://issues.apache.org/jira/browse/OOZIE-1178
 Project: Oozie
  Issue Type: New Feature
Reporter: Bo Wang
 Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, 
 MapReduceWorkflowAM.pdf, yapp_proposal.txt


 It is useful to have a workflow application master, which will be capable of 
 running a DAG of jobs. The workflow client submits a DAG request to the AM 
 and then the AM will manage the life cycle of this application in terms of 
 requesting the needed resources from the RM, and starting, monitoring and 
 retrying the application's individual tasks.
 Compared to running Oozie with the current MapReduce Application Master, 
 these are some of the advantages:
  - Less number of consumed resources, since only one application master will 
 be spawned for the whole workflow.
  - Reuse of resources, since the same resources can be used by multiple 
 consecutive jobs in the workflow (no need to request/wait for resources for 
 every individual job from the central RM).
  - More optimization opportunities in terms of collective resource requests.
  - Optimization opportunities in terms of rewriting and composing jobs in the 
 workflow (e.g. pushing down Mappers).
  - This Application Master can be reused/extended by higher systems like Pig 
 and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (OOZIE-1178) Workflow Application Master in YARN

2013-01-23 Thread Tianyou Li (JIRA)

[ 
https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13561469#comment-13561469
 ] 

Tianyou Li commented on OOZIE-1178:
---

[~bowang]Thanks Bo, looking forward to work together with you too!

 Workflow Application Master in YARN
 ---

 Key: OOZIE-1178
 URL: https://issues.apache.org/jira/browse/OOZIE-1178
 Project: Oozie
  Issue Type: New Feature
Reporter: Bo Wang
 Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, 
 MapReduceWorkflowAM.pdf, yapp_proposal.txt


 It is useful to have a workflow application master, which will be capable of 
 running a DAG of jobs. The workflow client submits a DAG request to the AM 
 and then the AM will manage the life cycle of this application in terms of 
 requesting the needed resources from the RM, and starting, monitoring and 
 retrying the application's individual tasks.
 Compared to running Oozie with the current MapReduce Application Master, 
 these are some of the advantages:
  - Less number of consumed resources, since only one application master will 
 be spawned for the whole workflow.
  - Reuse of resources, since the same resources can be used by multiple 
 consecutive jobs in the workflow (no need to request/wait for resources for 
 every individual job from the central RM).
  - More optimization opportunities in terms of collective resource requests.
  - Optimization opportunities in terms of rewriting and composing jobs in the 
 workflow (e.g. pushing down Mappers).
  - This Application Master can be reused/extended by higher systems like Pig 
 and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira