[ 
https://issues.apache.org/jira/browse/OOZIE-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated OOZIE-1770:
---------------------------------
    Attachment: oya.patch
                oya-rm-screenshot.jpg

We did the code against CDH, but I was able to port it back over to Apache 
trunk without conflicts other than pom changes.  I decided that it was better 
to make this available sooner rather than spend too much time spiffying it up.

Here’s a brief overview of the design:
Oozie has an unmanaged AM “pool” that it uses for submitting jobs.  We need a 
pool because we have to create an AM for each user that submits a job (we 
adapted some code from Llama).  When Oozie wants to submit a job, instead of 
submitting an MR launcher job, it can create/get one of these AM’s and use it 
to create a Yarn container, and then run the launcher in that container.  
During our testing, we were using a Java action that launches a simple MR job.  
In the screenshot below, you can see that we have the one “OozieServer” AM, and 
then 3 MAPREDUCE applications, from when we ran the workflow 3 times.  The 
OozieServer AM was reused each time to submit the MR jobs, and there’s no 
longer a Launcher Job.
!oya-rm-screenshot.jpg!

Given that this was more of a proof-of-concept and we didn’t have a lot of 
time, we didn’t redo the launcher code.  It still uses LauncherMapper; I just 
hacked in some extra methods for running it outside of a map task so we could 
run it in the container.  This is definitely an area where we can improve 
things a lot.  One major thing to keep in mind is that the container gives us a 
Shell; right now, we’re then starting a JVM to run the LauncherMapper code, but 
it probably would make sense to see if we can skip the JVM and run most actions 
directly in the shell.

Interestingly, running a Container that doesn’t do much (e.g. a “Hello World” 
Java action), runs so fast, that Oozie is now the bottleneck.  The callback 
comes in before the action has a chance to transition to RUNNING, so Oozie 
complains.  We fixed this by adding a delay.  We’ll probably want to improve 
this.

I had to remove MR1 support, obviously.  This greatly simplifies the build 
because we don’t need hadooplibs anymore.  I also had to up the Hadoop version 
to 2.4.0+ from 2.3.0.

We can probably get rid of this eventually, but for now, I made it so that if 
you set {{ooze.use.jobclient.launch=true}} in oozie-site, Oozie will use the 
old MR launcher job behavior instead of the container behavior.

Here’s a to do list of things that need to be fixed or improved:
- The container currently runs as the Yarn user regardless of who submitted the 
workflow.  So, permissions-wise, workflows only work if you submit them as the 
Yarn user.  
- The shell action should not start a JVM
- Actions should be refactored/cleaned-up/simplified to not need all the stuff 
LauncherMapper is doing
        -- Regardless of whether or not we still start a JVM, there’s a bunch 
of stuff LauncherMapper does that we should get rid of or simplify or change
        -- The Shell action definitely doesn’t need a JVM; once the user 
problem is fixed, it should also finally run as the proper user!
- The NMToken expires after 10-15min and you can’t submit containers anymore 
from the AM’s in the AM pool
- The Oozie kill command still expects an MR job and doesn’t work; it needs to 
use the container id
- The status checking code is pretty hacky.  I modified it to check the 
container status, but it doesn’t check if the job actually failed or not
- The callback is pretty hacky.  The Launcher’s MR AM was sending the callback 
to Oozie before.  I had to make the container do this now
        -- It may make sense to come up with a different mechanism for this
- I haven’t tried any recovery stuff or Oozie HA
- I had to add two columns: one to store node Id host and one to store node Id 
port.  OozieDBCLI needs to be updated to create these during an upgrade.  
Creating a new database works fine though because OpenJPA handles it.
- We’ve only tested with the Java action.  Most of the other actions should 
work with some minor tweaking (need to set expected extra env vars, etc).  The 
MR action probably won’t work because of the swapping optimization; it probably 
makes sense to get rid of that.

The oya.patch has all of the changes.

> Create Oozie Application Master for YARN
> ----------------------------------------
>
>                 Key: OOZIE-1770
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1770
>             Project: Oozie
>          Issue Type: New Feature
>            Reporter: Bowen Zhang
>            Assignee: Bowen Zhang
>         Attachments: oya-rm-screenshot.jpg, oya.patch
>
>
> After the first release of oozie on hadoop 2, it will be good if users can 
> set execution engine in oozie conf, be it YARN AM or traditional MR. We can 
> target this for post oozie 4.1 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to