Hi Neil,

Q1: Oozie doesn't support the feature yet. I think it is beneficial for many 
cases. Would you please create a JIRA regarding this?

Q2 : Timeout value of '1' should be fine. Timeout only applies for data 
dependency check. If the data is not available for 'timeout' interval, your job 
will move to TIMEOUT state. I think in your case, you data directories for 
thousand jobs are available and hence it moves to READY state. There is no 
scope of moving to TIMEOUT state from READY state. More importantly, you are 
using older version of oozie. In the latest version 3.1.3 released in apache, 
has the throttling mechanism which will prevent from creating thousands of 
actions at a time. You could also overload the throttling default value. Please 
see the following link: 
http://incubator.apache.org/oozie/docs/3.1.3/docs/CoordinatorFunctionalSpec.html

search for "throttle" for materialization.

Hope this helps.

Regards,
Mohammad


----- Original Message -----
From: Neil Yalowitz <[email protected]>
To: [email protected]
Cc: 
Sent: Tuesday, March 13, 2012 9:56 AM
Subject: coordinator start and timeout

Hi all!  Two part question...


1) Can the "start" value in the coordinator-app tag be a dynamic value?
ie-- current datetime

Example:

<coordinator-app name="my-coord-job" frequency="1" start="${start}"
end="${end}" timezone="UTC" xmlns="uri:oozie:coordinator:0.1">
...SNIP...
</coordinator-app>

I would like that ${start} tag to be the current datetime when I submit the
job.  If I recall correctly, 2.x versions of Oozie did not support this
feature... is it still the case?

The danger there is if I neglect to update the value for "start=xxxxx" in
the job.properties file, then I may kick off a job with a "start" value
that is far in the past and materialize many, many jobs to catch up.  I
recently committed this error and kicked off a coord with the start value
in the past by 6 months... In my use case, I really never want to process a
backlog so if the "start" can be set to NOW then it would be ideal.


2) Is there an issue with setting a timeout value to 1?  It seems that the
timeout is being ignored in my job.

Example:

<coordinator-app name="my-coord-job" frequency="1" start="${start}"
end="${end}" timezone="UTC" xmlns="uri:oozie:coordinator:0.1">
    <controls>
      <timeout>1</timeout>
      <concurrency>1</concurrency>
      <execution>LAST_ONLY</execution>
    </controls>
    ...SNIP...
</coordinator-app>

In this case, if my start value is far in the past (as I mentioned in
question #1) then it may materialize many, many workflows... not a huge
problem if the timeout is being honored since the jobs should quickly
transition to TIMEDOUT.  What I'm seeing, however, is many jobs holding in
the READY state.  The jobs are in both the far past and future... and it's
over 100,000 jobs sitting in the READY state!  The Oozie UI is barely
responsive with that many tasks to handle.

Is there a case where a job with timeout=1 would stay in the READY state
beyond one minute?  Perhaps I'm just missing something here.


Neil

Reply via email to