Re: MiniOozie for local dryrun or other options for doing dryrun of oozie workflows?

2016-12-05 Thread Serega Sheypak
Yeah, I see it. I found the way to test wokflow locally but it's suuuper
complex. I have to start local MR, Local HDFS and Local OOzie things. Then
I do mock on the fly xml actions with my test actions and run workflow.
It's super complex and fragile unfortunately...
I'll try to reach dev group. I found LiteWorkflow thing, it seems to be the
engine that execute nodes and it's super lightweight.

2016-12-05 14:11 GMT+01:00 Andras Piros :

> Hi Serega,
>
> as per *Oozie documentation
>  Dryrun_of_Workflow_Job>*
> we
> can see that with -dryrun option does not create nor run a job.
>
> So for the killer feature request, I think it's not possible ATM.
>
> Regards,
>
> Andras
>
> --
> Andras PIROS
> Software Engineer
> 
>
> On Thu, Dec 1, 2016 at 8:33 PM, Serega Sheypak 
> wrote:
>
> > Hi, did anyone make it work property in his project?
> > I need to do dry run for my workflows.
> > The usecase is:
> > User writes workflow and wants to:
> > 1. Check if it valid
> > 2. do dryrun, see how it flows without executing steps.
> >
> > Let say I have wflow with three steps:
> >
> > 1. disctp data from $A to $B
> > 2. run spark action with $B as input
> > 3. disctp $B to $C
> >
> > I want to do dryrun and check how my variables were interpolated it
> wflow.
> > The killer feature is: I want to imitate spark action failure and check
> how
> > my kill node looks like.
> >
>


Exception when setup oozie HA using virtual IP

2016-12-05 Thread Dongying Jiao
Hi:
Do you have the detail steps on setting up oozie HA using virtual IP?
I setup oozie HA using virtual IP, server-1 and server-2(active-active),
when we take down server-1 any oozie job submitted fails with below
stacktrace. If both are up , there is no issue.
ERROR RecoveryService$RecoveryRunnable:517 - SERVER[] USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[-] ACTION[-] Exception, / by zero
java.lang.ArithmeticException: / by zero
at
org.apache.oozie.service.ZKJobsConcurrencyService.checkJobIdForServer(ZKJobsConcurrencyService.java:167)
at
org.apache.oozie.service.ZKJobsConcurrencyService.isJobIdForThisServer(ZKJobsConcurrencyService.java:129)
at
org.apache.oozie.service.RecoveryService$RecoveryRunnable.runWFRecovery(RecoveryService.java:362)
at
org.apache.oozie.service.RecoveryService$RecoveryRunnable.run(RecoveryService.java:146)
at
org.apache.oozie.service.SchedulerService$2.run(SchedulerService.java:175)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

It seems server-2 can't get oozie server list from zookeeper. Zookeeper
connection string is already added to oozie site.

Thanks

Best Regards,
Dongying Jiao


Question about spark-bagel in oozie 4.3.0

2016-12-05 Thread Dongying Jiao
Hi:
I noticed oozie 4.3.0 add spark-bagel lib to spark sharelib compared to
oozie 4.2.0.
I found this module is deprecated and superseded by GraphX in spark
official site, why do we add this deprecated compenent since GraphX is
already in sharelib?

And I want to use spark 2.0 for oozie 4.3.0, as there is no spark-bagel
module in spark 2.X, is there any risk if I delete this module in oozie
4.3.0?

Thanks very much

Best Regards,
Dongying Jiao


How to run and manage a jobs like storm , flink , python that don't have inbuilt action plugin in Oozie

2016-12-05 Thread prateek arora
Hi team

As per my knowledge  Oozie is  supporting several types of Hadoop jobs such
as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop , spark and
Distcp , as well as system specific jobs such as Java programs and shell
scripts .
but  i want to run and manage a jobs like storm , flink , python and some
deep learning and machine learning jobs .
Is it possible to support these type of jobs in oozie ? . if yes then  how
can i do that ?

Regards
Prateek


Re: Change NN and JobTracker dynamically during runtime

2016-12-05 Thread mdk-swandha
You mean I have to set env variables for each job/workflow execution and
then it will be picked up by Oozie. And I should set them in my service
(the service which is finding the best cluster?).

For example let say I have 3 cluster:
- When a job is sent via Oozie/Hue/Zepellin/Livy etc. - they are mapped to
one cluster and jobs always goes there. Let say this as a default cluster
- I have a service which determines what can be best cluster for a given
job considering various attributes (availability, data locality, network
bandwidth etc.)
- This service has exposed an API and caller just passes the required
parameters(job/input/output/queue etc.) and this service will return the
best available cluster

With what I have above, I feel keeping the calling code should be in the
caller (Oozie/Zepellin/Any application) should be the way to go to keep it
simple to isolate JT's default behavior. This won't disrupt existing jobs
which are running on these clusters by introducing some new settings. May
be I'm missing how are you advising creating load balancer setting in JT
and configuring it during runtime. Can you please tell me more how this can
be done?

Thanks.
-Dipesh



On Mon, Dec 5, 2016 at 10:59 AM, Andras Piros 
wrote:

> Hi Dipesh,
>
> during workflow / job submission you can define variables inside
> job.properties coming e.g. from env vars that are used in workflow.xml. So
> much for the flexibility.
>
> Can you tell me a use case where runtime routing to different JT / NN
> instances via Oozie (and not e.g. coming from a load balancer setting
> configured runtime) is better?
>
> Thanks,
>
> Andras
>
> --
> Andras PIROS
> Software Engineer
> 
>
> On Mon, Dec 5, 2016 at 7:45 PM, mdk-swandha 
> wrote:
>
> > Hi Alex,
> >
> > The idea is to call this external service which will find the best
> cluster
> > and inform the caller. So today this caller is Oozie, tomorrow it will be
> > Zeppelin or any other application.
> >
> > How can I provide multiple JT and NN addresses in job.properties? You
> mean
> > during job/workflow creation? I will still need to overwrite
> job.properties
> > or provide these values somewhere dynamically?
> >
> > Thanks.
> > -Dipesh
> >
> > On Mon, Dec 5, 2016 at 5:24 AM, Andras Piros 
> > wrote:
> >
> > > Hi Dipesh,
> > >
> > > seems like a bad idea to programmatically change job-tracker or
> > > name-node properties
> > > - it's just not the task of Oozie to determine what are the exact JT or
> > NN
> > > instances Oozie should use.
> > >
> > > Instead, I'd rather setup a load balancer for JT and another one for
> NN,
> > > and provide those addresses to Oozie's job.properties. That way, we
> > > separate concerns - the load balancer can choose the JT or NN node
> > runtime,
> > > e.g. on a round robin basis.
> > >
> > > Regards,
> > >
> > > Andras
> > >
> > > --
> > > Andras PIROS
> > > Software Engineer
> > > 
> > >
> > > On Thu, Dec 1, 2016 at 9:29 PM, mdk-swandha 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I have a use case like this - in a multi cluster (hadoop cluster)
> > > > environment if I would like to send a job/oozie workflow to a desired
> > > > cluster during runtime, how can this be done.
> > > >
> > > > I see that there is JavaActionExecutor class which read NN and
> > JobTracker
> > > > in createBaseHadoopConf method
> > > >
> > > > All HadoopActionExectors are derived from JavaActionExecutor so this
> > > seems
> > > > to be a place wherein I can insert my code. How can I do this without
> > > > disrupting the original flow by adding my hook.
> > > >
> > > > One option is to to derive my new JavaActionExecutor and over ride
> > > > createBaseHadoopConf method and then derive all ActionExecutors from
> my
> > > new
> > > > JavaActionExecutor. It doesn't seem to be elegant to me, so thought
> to
> > > ask
> > > > out here.
> > > >
> > > > Any input will be useful.
> > > >
> > > > Thanks.
> > > > -Dipesh
> > > >
> > >
> >
>


Re: Change NN and JobTracker dynamically during runtime

2016-12-05 Thread Andras Piros
Hi Dipesh,

during workflow / job submission you can define variables inside
job.properties coming e.g. from env vars that are used in workflow.xml. So
much for the flexibility.

Can you tell me a use case where runtime routing to different JT / NN
instances via Oozie (and not e.g. coming from a load balancer setting
configured runtime) is better?

Thanks,

Andras

--
Andras PIROS
Software Engineer


On Mon, Dec 5, 2016 at 7:45 PM, mdk-swandha 
wrote:

> Hi Alex,
>
> The idea is to call this external service which will find the best cluster
> and inform the caller. So today this caller is Oozie, tomorrow it will be
> Zeppelin or any other application.
>
> How can I provide multiple JT and NN addresses in job.properties? You mean
> during job/workflow creation? I will still need to overwrite job.properties
> or provide these values somewhere dynamically?
>
> Thanks.
> -Dipesh
>
> On Mon, Dec 5, 2016 at 5:24 AM, Andras Piros 
> wrote:
>
> > Hi Dipesh,
> >
> > seems like a bad idea to programmatically change job-tracker or
> > name-node properties
> > - it's just not the task of Oozie to determine what are the exact JT or
> NN
> > instances Oozie should use.
> >
> > Instead, I'd rather setup a load balancer for JT and another one for NN,
> > and provide those addresses to Oozie's job.properties. That way, we
> > separate concerns - the load balancer can choose the JT or NN node
> runtime,
> > e.g. on a round robin basis.
> >
> > Regards,
> >
> > Andras
> >
> > --
> > Andras PIROS
> > Software Engineer
> > 
> >
> > On Thu, Dec 1, 2016 at 9:29 PM, mdk-swandha 
> > wrote:
> >
> > > Hi,
> > >
> > > I have a use case like this - in a multi cluster (hadoop cluster)
> > > environment if I would like to send a job/oozie workflow to a desired
> > > cluster during runtime, how can this be done.
> > >
> > > I see that there is JavaActionExecutor class which read NN and
> JobTracker
> > > in createBaseHadoopConf method
> > >
> > > All HadoopActionExectors are derived from JavaActionExecutor so this
> > seems
> > > to be a place wherein I can insert my code. How can I do this without
> > > disrupting the original flow by adding my hook.
> > >
> > > One option is to to derive my new JavaActionExecutor and over ride
> > > createBaseHadoopConf method and then derive all ActionExecutors from my
> > new
> > > JavaActionExecutor. It doesn't seem to be elegant to me, so thought to
> > ask
> > > out here.
> > >
> > > Any input will be useful.
> > >
> > > Thanks.
> > > -Dipesh
> > >
> >
>


Re: Change NN and JobTracker dynamically during runtime

2016-12-05 Thread mdk-swandha
Hi Alex,

The idea is to call this external service which will find the best cluster
and inform the caller. So today this caller is Oozie, tomorrow it will be
Zeppelin or any other application.

How can I provide multiple JT and NN addresses in job.properties? You mean
during job/workflow creation? I will still need to overwrite job.properties
or provide these values somewhere dynamically?

Thanks.
-Dipesh

On Mon, Dec 5, 2016 at 5:24 AM, Andras Piros 
wrote:

> Hi Dipesh,
>
> seems like a bad idea to programmatically change job-tracker or
> name-node properties
> - it's just not the task of Oozie to determine what are the exact JT or NN
> instances Oozie should use.
>
> Instead, I'd rather setup a load balancer for JT and another one for NN,
> and provide those addresses to Oozie's job.properties. That way, we
> separate concerns - the load balancer can choose the JT or NN node runtime,
> e.g. on a round robin basis.
>
> Regards,
>
> Andras
>
> --
> Andras PIROS
> Software Engineer
> 
>
> On Thu, Dec 1, 2016 at 9:29 PM, mdk-swandha 
> wrote:
>
> > Hi,
> >
> > I have a use case like this - in a multi cluster (hadoop cluster)
> > environment if I would like to send a job/oozie workflow to a desired
> > cluster during runtime, how can this be done.
> >
> > I see that there is JavaActionExecutor class which read NN and JobTracker
> > in createBaseHadoopConf method
> >
> > All HadoopActionExectors are derived from JavaActionExecutor so this
> seems
> > to be a place wherein I can insert my code. How can I do this without
> > disrupting the original flow by adding my hook.
> >
> > One option is to to derive my new JavaActionExecutor and over ride
> > createBaseHadoopConf method and then derive all ActionExecutors from my
> new
> > JavaActionExecutor. It doesn't seem to be elegant to me, so thought to
> ask
> > out here.
> >
> > Any input will be useful.
> >
> > Thanks.
> > -Dipesh
> >
>


Re: Change NN and JobTracker dynamically during runtime

2016-12-05 Thread Andras Piros
Hi Dipesh,

seems like a bad idea to programmatically change job-tracker or
name-node properties
- it's just not the task of Oozie to determine what are the exact JT or NN
instances Oozie should use.

Instead, I'd rather setup a load balancer for JT and another one for NN,
and provide those addresses to Oozie's job.properties. That way, we
separate concerns - the load balancer can choose the JT or NN node runtime,
e.g. on a round robin basis.

Regards,

Andras

--
Andras PIROS
Software Engineer


On Thu, Dec 1, 2016 at 9:29 PM, mdk-swandha 
wrote:

> Hi,
>
> I have a use case like this - in a multi cluster (hadoop cluster)
> environment if I would like to send a job/oozie workflow to a desired
> cluster during runtime, how can this be done.
>
> I see that there is JavaActionExecutor class which read NN and JobTracker
> in createBaseHadoopConf method
>
> All HadoopActionExectors are derived from JavaActionExecutor so this seems
> to be a place wherein I can insert my code. How can I do this without
> disrupting the original flow by adding my hook.
>
> One option is to to derive my new JavaActionExecutor and over ride
> createBaseHadoopConf method and then derive all ActionExecutors from my new
> JavaActionExecutor. It doesn't seem to be elegant to me, so thought to ask
> out here.
>
> Any input will be useful.
>
> Thanks.
> -Dipesh
>


Re: MiniOozie for local dryrun or other options for doing dryrun of oozie workflows?

2016-12-05 Thread Andras Piros
Hi Serega,

as per *Oozie documentation
*
we
can see that with -dryrun option does not create nor run a job.

So for the killer feature request, I think it's not possible ATM.

Regards,

Andras

--
Andras PIROS
Software Engineer


On Thu, Dec 1, 2016 at 8:33 PM, Serega Sheypak 
wrote:

> Hi, did anyone make it work property in his project?
> I need to do dry run for my workflows.
> The usecase is:
> User writes workflow and wants to:
> 1. Check if it valid
> 2. do dryrun, see how it flows without executing steps.
>
> Let say I have wflow with three steps:
>
> 1. disctp data from $A to $B
> 2. run spark action with $B as input
> 3. disctp $B to $C
>
> I want to do dryrun and check how my variables were interpolated it wflow.
> The killer feature is: I want to imitate spark action failure and check how
> my kill node looks like.
>