[09/21] oozie git commit: OOZIE-2734 amend [docs] Switch from TWiki to Markdown (asalamon74 via andras.piros, pbacsko, gezapeti)

andras Fri, 14 Sep 2018 07:45:31 -0700

http://git-wip-us.apache.org/repos/asf/oozie/blob/6a6f2199/docs/src/site/twiki/CoordinatorFunctionalSpec.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/CoordinatorFunctionalSpec.twiki 
b/docs/src/site/twiki/CoordinatorFunctionalSpec.twiki
deleted file mode 100644
index d31d1aa..0000000
--- a/docs/src/site/twiki/CoordinatorFunctionalSpec.twiki
+++ /dev/null
@@ -1,4890 +0,0 @@
-
-
-[::Go back to Oozie Documentation Index::](index.html)
-
------
-
-# Oozie Coordinator Specification
-
-The goal of this document is to define a coordinator engine system specialized 
in submitting workflows based on time and data triggers.
-
-<!-- MACRO{toc|fromDepth=1|toDepth=4} -->
-
-## Changelog
-
-**03/JUL/2013**
-
-   * Appendix A, Added new coordinator schema 0.4, sla schema 0.2 and changed 
schemas ordering to newest first
-
-**07/JAN/2013**
-
-   * 6.8 Added section on new EL functions for datasets defined with HCatalog
-
-**26/JUL/2012**
-
-   * Appendix A, updated XML schema 0.4 to include `parameters` element
-   * 6.5 Updated to mention about `parameters` element as of schema 0.4
-
-**23/NOV/2011:**
-
-   * Update execution order typo
-
-**05/MAY/2011:**
-
-   * Update coordinator schema 0.2
-
-**09/MAR/2011:**
-
-   * Update coordinator status
-
-**02/DEC/2010:**
-
-   * Update coordinator done-flag
-
-**26/AUG/2010:**
-
-   * Update coordinator rerun
-
-**09/JUN/2010:**
-
-   * Clean up unsupported functions
-
-**02/JUN/2010:**
-
-   * Update all EL functions in CoordFunctionalSpec with "coord:" prefix
-
-**02/OCT/2009:**
-
-   * Added Appendix A, Oozie Coordinator XML-Schema
-   * Change #5.3., Datasets definition supports 'include' element
-
-**29/SEP/2009:**
-
-   * Change #4.4.1, added `${coord:endOfDays(int n)}` EL function
-   * Change #4.4.2, added `${coord:endOfMonths(int n)}` EL function
-
-**11/SEP/2009:**
-
-   * Change #6.6.4. `${coord:tzOffset()}` EL function now returns offset in 
minutes. Added more explanation on behavior
-   * Removed 'oozie' URL from action workflow invocation, per arch review 
feedback coord&wf run on the same instance
-
-**07/SEP/2009:**
-
-   * Full rewrite of sections #4 and #7
-   * Added sections #6.1.7, #6.6.2, #6.6.3 & #6.6.4
-   * Rewording through the spec definitions
-   * Updated all examples and syntax to latest changes
-
-**03/SEP/2009:**
-
-   * Change #2. Definitions. Some rewording in the definitions
-   * Change #6.6.4. Replaced `${coord:next(int n)}` with `${coord:version(int 
n)}` EL Function
-   * Added #6.6.5. Dataset Instance Resolution for Instances Before the 
Initial Instance
-
-## 1. Coordinator Overview
-
-Users typically run map-reduce, hadoop-streaming, hdfs and/or Pig jobs on the 
grid. Multiple of these jobs can be combined to form a workflow job. [Oozie, 
Hadoop Workflow System](https://issues.apache.org/jira/browse/HADOOP-5303) 
defines a workflow system that runs such jobs.
-
-Commonly, workflow jobs are run based on regular time intervals and/or data 
availability. And, in some cases, they can be triggered by an external event.
-
-Expressing the condition(s) that trigger a workflow job can be modeled as a 
predicate that has to be satisfied. The workflow job is started after the 
predicate is satisfied. A predicate can reference to data, time and/or external 
events. In the future, the model can be extended to support additional event 
types.
-
-It is also necessary to connect workflow jobs that run regularly, but at 
different time intervals. The outputs of multiple subsequent runs of a workflow 
become the input to the next workflow. For example, the outputs of last 4 runs 
of a workflow that runs every 15 minutes become the input of another workflow 
that runs every 60 minutes. Chaining together these workflows result it is 
referred as a data application pipeline.
-
-The Oozie **Coordinator** system allows the user to define and execute 
recurrent and interdependent workflow jobs (data application pipelines).
-
-Real world data application pipelines have to account for reprocessing, late 
processing, catchup, partial processing, monitoring, notification and SLAs.
-
-This document defines the functional specification for the Oozie Coordinator 
system.
-
-## 2. Definitions
-
-**Actual time:** The actual time indicates the time when something actually 
happens.
-
-**Nominal time:** The nominal time specifies the time when something should 
happen. In theory the nominal time and the actual time should match, however, 
in practice due to delays the actual time may occur later than the nominal time.
-
-**Dataset:** Collection of data referred to by a logical name. A dataset 
normally has several instances of data and each
-one of them can be referred individually. Each dataset instance is represented 
by a unique set of URIs.
-
-**Synchronous Dataset:** Synchronous datasets instances are generated at fixed 
time intervals and there is a dataset
-instance associated with each time interval. Synchronous dataset instances are 
identified by their nominal time.
-For example, in the case of a HDFS based dataset, the nominal time would be 
somewhere in the file path of the
-dataset instance: `hdfs://foo:8020/usr/logs/2009/04/15/23/30`. In the case of 
HCatalog table partitions, the nominal time
-would be part of some partition values: 
`hcat://bar:8020/mydb/mytable/year=2009;month=04;dt=15;region=us`.
-
-**Coordinator Action:** A coordinator action is a workflow job that is started 
when a set of conditions are met (input dataset instances are available).
-
-**Coordinator Application:** A coordinator application defines the conditions 
under which coordinator actions should be created (the frequency) and when the 
actions can be started. The coordinator application also defines a start and an 
end time. Normally, coordinator applications are parameterized. A Coordinator 
application is written in XML.
-
-**Coordinator Job:** A coordinator job is an executable instance of a 
coordination definition. A job submission is done by submitting a job 
configuration that resolves all parameters in the application definition.
-
-**Data pipeline:** A data pipeline is a connected set of coordinator 
applications that consume and produce interdependent datasets.
-
-**Coordinator Definition Language:** The language used to describe datasets 
and coordinator applications.
-
-**Coordinator Engine:** A system that executes coordinator jobs.
-
-## 3. Expression Language for Parameterization
-
-Coordinator application definitions can be parameterized with variables, 
built-in constants and built-in functions.
-
-At execution time all the parameters are resolved into concrete values.
-
-The parameterization of workflow definitions it done using JSP Expression 
Language syntax from the [JSP 2.0 Specification 
(JSP.2.3)](http://jcp.org/aboutJava/communityprocess/final/jsr152/index.html), 
allowing not only to support variables as parameters but also functions and 
complex expressions.
-
-EL expressions can be used in XML attribute values and XML text element 
values. They cannot be used in XML element and XML attribute names.
-
-Refer to section #6.5 'Parameterization of Coordinator Applications' for more 
details.
-
-## 4. Datetime, Frequency and Time-Period Representation
-
-Oozie processes coordinator jobs in a fixed timezone with no DST (typically 
`UTC`), this timezone is referred as 'Oozie
-processing timezone'.
-
-The Oozie processing timezone is used to resolve coordinator jobs start/end 
times, job pause times and the initial-instance
-of datasets. Also, all coordinator dataset instance URI templates are resolved 
to a datetime in the Oozie processing
-time-zone.
-
-All the datetimes used in coordinator applications and job parameters to 
coordinator applications must be specified
-in the Oozie processing timezone. If Oozie processing timezone is `UTC`, the 
qualifier is  **Z**. If Oozie processing
-time zone is other than `UTC`, the qualifier must be the GMT offset, 
`(+/-)####`.
-
-For example, a datetime in `UTC`  is `2012-08-12T00:00Z`, the same datetime in 
`GMT+5:30` is `2012-08-12T05:30+0530`.
-
-For simplicity, the rest of this specification uses `UTC` datetimes.
-
-<a name="datetime"></a>
-### 4.1. Datetime
-
-If the Oozie processing timezone is `UTC`, all datetime values are always in
-[UTC](http://en.wikipedia.org/wiki/Coordinated_Universal_Time) down to a 
minute precision, 'YYYY-MM-DDTHH:mmZ'.
-
-For example `2009-08-10T13:10Z` is August 10th 2009 at 13:10 UTC.
-
-If the Oozie processing timezone is a GMT offset `GMT(+/-)####`, all datetime 
values are always in
-[ISO 8601](http://en.wikipedia.org/wiki/ISO_8601) in the corresponding GMT 
offset down to a minute precision,
-'YYYY-MM-DDTHH:mmGMT(+/-)####'.
-
-For example `2009-08-10T13:10+0530` is August 10th 2009 at 13:10 GMT+0530, 
India timezone.
-
-#### 4.1.1 End of the day in Datetime Values
-
-It is valid to express the end of day as a '24:00' hour (i.e. 
`2009-08-10T24:00Z`).
-
-However, for all calculations and display, Oozie resolves such dates as the 
zero hour of the following day
-(i.e. `2009-08-11T00:00Z`).
-
-### 4.2. Timezone Representation
-
-There is no widely accepted standard to identify timezones.
-
-Oozie Coordinator will understand the following timezone identifiers:
-
-   * Generic NON-DST timezone identifier: `GMT[+/-]##:##` (i.e.: GMT+05:30)
-   * UTC timezone identifier: `UTC` (i.e.: 2009-06-06T00:00Z)
-   * ZoneInfo identifiers, with DST support, understood by Java JDK (about 600 
IDs) (i.e.: America/Los_Angeles)
-
-Due to DST shift from PST to PDT, it is preferred that GMT, UTC or Region/City 
timezone notation is used in
-favor of direct three-letter ID (PST, PDT, BST, etc.). For example, 
America/Los_Angeles switches from PST to PDT
-at a DST shift. If used directly, PST will not handle DST shift when time is 
switched to PDT.
-
-Oozie Coordinator must provide a tool for developers to list all supported 
timezone identifiers.
-
-### 4.3. Timezones and Daylight-Saving
-
-While Oozie coordinator engine works in a fixed timezone with no DST 
(typically `UTC`), it provides DST support for coordinator applications.
-
-The baseline datetime for datasets and coordinator applications are expressed 
in UTC. The baseline datetime is the time of the first occurrence.
-
-Datasets and coordinator applications also contain a timezone indicator.
-
-The use of UTC as baseline enables a simple way of mix and matching datasets 
and coordinator applications that use a different timezone by just adding the 
timezone offset.
-
-The timezone indicator enables Oozie coordinator engine to properly compute 
frequencies that are daylight-saving sensitive. For example: a daily frequency 
can be 23, 24 or 25 hours for timezones that observe daylight-saving. Weekly 
and monthly frequencies are also affected by this as the number of hours in the 
day may change.
-
-Section #7 'Handling Timezones and Daylight Saving Time' explains how 
coordinator applications can be written to handle timezones and 
daylight-saving-time properly.
-
-### 4.4. Frequency and Time-Period Representation
-
-Frequency is used to capture the periodic intervals at which datasets that are 
produced, and coordinator applications are scheduled to run.
-
-This time periods representation is also used to specify non-recurrent 
time-periods, for example a timeout interval.
-
-For datasets and coordinator applications the frequency time-period is applied 
`N` times to the baseline datetime to compute recurrent times.
-
-Frequency is always expressed in minutes.
-
-Because the number of minutes in day may vary for timezones that observe 
daylight saving time, constants cannot be use to express frequencies greater 
than a day for datasets and coordinator applications for such timezones. For 
such uses cases, Oozie coordinator provides 2 EL functions, `${coord:days(int 
n)}` and `${coord:months(int n)}`.
-
-Frequencies can be expressed using EL constants and EL functions that evaluate 
to an positive integer number.
-
-Coordinator Frequencies can also be expressed using cron syntax.
-
-**<font color="#008000"> Examples: </font>**
-
-| **EL Constant** | **Value** | **Example** |
-| --- | --- | --- |
-| `${coord:minutes(int n)}` | _n_ | `${coord:minutes(45)}` --> `45` |
-| `${coord:hours(int n)}` | _n * 60_ | `${coord:hours(3)}` --> `180` |
-| `${coord:days(int n)}` | _variable_ | `${coord:days(2)}` --> minutes in 2 
full days from the current date |
-| `${coord:months(int n)}` | _variable_ | `${coord:months(1)}` --> minutes in 
a 1 full month from the current date |
-| `${cron syntax}` | _variable_ | `${0,10 15 * * 2-6}` --> a job that runs 
every weekday at 3:00pm and 3:10pm UTC time|
-
-Note that, though `${coord:days(int n)}` and `${coord:months(int n)}` EL 
functions are used to calculate minutes precisely including
-variations due to daylight saving time for Frequency representation, when 
specified for coordinator timeout interval, one day is
-calculated as 24 hours and one month is calculated as 30 days for simplicity.
-
-#### 4.4.1. The coord:days(int n) and coord:endOfDays(int n) EL functions
-
-The `${coord:days(int n)}` and `${coord:endOfDays(int n)}` EL functions should 
be used to handle day based frequencies.
-
-Constant values should not be used to indicate a day based frequency (every 1 
day, every 1 week, etc) because the number of hours in
-every day is not always the same for timezones that observe daylight-saving 
time.
-
-It is a good practice to use always these EL functions instead of using a 
constant expression (i.e. `24 * 60`) even if the timezone
-for which the application is being written for does not support daylight 
saving time. This makes application foolproof to country
-legislation changes and also makes applications portable across timezones.
-
-##### 4.4.1.1. The coord:days(int n) EL function
-
-The `${coord:days(int n)}` EL function returns the number of minutes for 'n' 
complete days starting with the day of the specified nominal time for which the 
computation is being done.
-
-The `${coord:days(int n)}` EL function includes **all** the minutes of the 
current day, regardless of the time of the day of the current nominal time.
-
-**<font color="#008000"> Examples: </font>**
-
-| **Starting Nominal UTC time** | **Timezone** | **Usage**  | **Value** | 
**First Occurrence** | **Comments** |
-| --- | --- | --- | --- | --- | --- |
-| `2009-01-01T08:00Z` | `UTC` | `${coord:days(1)}` | 1440 | 
`2009-01-01T08:00Z` | total minutes on 2009JAN01 UTC time |
-| `2009-01-01T08:00Z` | `America/Los_Angeles` | `${coord:days(1)}` | 1440 | 
`2009-01-01T08:00Z` | total minutes in 2009JAN01 PST8PDT time |
-| `2009-01-01T08:00Z` | `America/Los_Angeles` | `${coord:days(2)}` | 2880 | 
`2009-01-01T08:00Z` | total minutes in 2009JAN01 and 2009JAN02 PST8PDT time |
-| |||||
-| `2009-03-08T08:00Z` | `UTC` | `${coord:days(1)}` | 1440 | 
`2009-03-08T08:00Z` | total minutes on 2009MAR08 UTC time |
-| `2009-03-08T08:00Z` | `Europe/London` | `${coord:days(1)}` | 1440 | 
`2009-03-08T08:00Z` | total minutes in 2009MAR08 BST1BDT time |
-| `2009-03-08T08:00Z` | `America/Los_Angeles` | `${coord:days(1)}` | 1380 | 
`2009-03-08T08:00Z` | total minutes in 2009MAR08 PST8PDT time <br/> (2009MAR08 
is DST switch in the US) |
-| `2009-03-08T08:00Z` | `UTC` | `${coord:days(2)}` | 2880 | 
`2009-03-08T08:00Z` | total minutes in 2009MAR08 and 2009MAR09 UTC time |
-| `2009-03-08T08:00Z` | `America/Los_Angeles` | `${coord:days(2)}` | 2820 | 
`2009-03-08T08:00Z` | total minutes in 2009MAR08 and 2009MAR09 PST8PDT time 
<br/> (2009MAR08 is DST switch in the US) |
-| `2009-03-09T08:00Z` | `America/Los_Angeles` | `${coord:days(1)}` | 1440 | 
`2009-03-09T07:00Z` | total minutes in 2009MAR09 PST8PDT time <br/> (2009MAR08 
is DST ON, frequency tick is earlier in UTC) |
-
-For all these examples, the first occurrence of the frequency will be at 
`08:00Z` (UTC time).
-
-##### 4.4.1.2. The coord:endOfDays(int n) EL function
-
-The `${coord:endOfDays(int n)}` EL function is identical to the 
`${coord:days(int n)}` except that it shifts the first occurrence to the end of 
the day for the specified timezone before computing the interval in minutes.
-
-**<font color="#008000"> Examples: </font>**
-
-| **Starting Nominal UTC time** | **Timezone** | **Usage**  | **Value** | 
**First Occurrence** | **Comments** |
-| --- | --- | --- | --- | --- | --- |
-| `2009-01-01T08:00Z` | `UTC` | `${coord:endOfDays(1)}` | 1440 | 
`2009-01-02T00:00Z` | first occurrence in 2009JAN02 00:00 UTC time, <br/> first 
occurrence shifted to the end of the UTC day |
-| `2009-01-01T08:00Z` | `America/Los_Angeles` | `${coord:endOfDays(1)}` | 1440 
| `2009-01-02T08:00Z` | first occurrence in 2009JAN02 08:00 UTC time, <br/> 
first occurrence shifted to the end of the PST8PDT day |
-| `2009-01-01T08:01Z` | `America/Los_Angeles` | `${coord:endOfDays(1)}` | 1440 
| `2009-01-02T08:00Z` | first occurrence in 2009JAN02 08:00 UTC time, <br/> 
first occurrence shifted to the end of the PST8PDT day |
-| `2009-01-01T18:00Z` | `America/Los_Angeles` | `${coord:endOfDays(1)}` | 1440 
| `2009-01-02T08:00Z` | first occurrence in 2009JAN02 08:00 UTC time, <br/> 
first occurrence shifted to the end of the PST8PDT day |
-| |||||
-| `2009-03-07T09:00Z` | `America/Los_Angeles` | `${coord:endOfDays(1)}` | 1380 
| `2009-03-08T08:00Z` | first occurrence in 2009MAR08 08:00 UTC time <br/> 
first occurrence shifted to the end of the PST8PDT day |
-| `2009-03-08T07:00Z` | `America/Los_Angeles` | `${coord:endOfDays(1)}` | 1440 
| `2009-03-08T08:00Z` | first occurrence in 2009MAR08 08:00 UTC time <br/> 
first occurrence shifted to the end of the PST8PDT day |
-| `2009-03-09T07:00Z` | `America/Los_Angeles` | `${coord:endOfDays(1)}` | 1440 
| `2009-03-10T07:00Z` | first occurrence in 2009MAR10 07:00 UTC time <br/> 
(2009MAR08 is DST switch in the US), <br/> first occurrence shifted to the end 
of the PST8PDT day |
-
-
-```
-<coordinator-app name="hello-coord" frequency="${coord:days(1)}"
-                  start="2009-01-02T08:00Z" end="2009-01-04T08:00Z" 
timezone="America/Los_Angeles"
-                 xmlns="uri:oozie:coordinator:0.5">
-      <controls>
-        <timeout>10</timeout>
-        <concurrency>${concurrency_level}</concurrency>
-        <execution>${execution_order}</execution>
-        <throttle>${materialization_throttle}</throttle>
-      </controls>
-
-      <datasets>
-       <dataset name="din" frequency="${coord:endOfDays(1)}"
-                initial-instance="2009-01-02T08:00Z" 
timezone="America/Los_Angeles">
-         
<uri-template>${baseFsURI}/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}</uri-template>
-        </dataset>
-       <dataset name="dout" frequency="${coord:minutes(30)}"
-                initial-instance="2009-01-02T08:00Z" timezone="UTC">
-         
<uri-template>${baseFsURI}/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}</uri-template>
-        </dataset>
-      </datasets>
-
-      <input-events>
-         <data-in name="input" dataset="din">
-                               <instance>${coord:current(0)}</instance>
-         </data-in>
-      </input-events>
-
-      <output-events>
-         <data-out name="output" dataset="dout">
-                               <instance>${coord:current(1)}</instance>
-         </data-out>
-      </output-events>
-
-      <action>
-        <workflow>
-          <app-path>${wf_app_path}</app-path>
-          <configuration>
-              <property>
-              <name>wfInput</name>
-              <value>${coord:dataIn('input')}</value>
-            </property>
-            <property>
-              <name>wfOutput</name>
-              <value>${coord:dataOut('output')}</value>
-            </property>
-         </configuration>
-       </workflow>
-      </action>
- </coordinator-app>
-```
-
-#### 4.4.2. The coord:months(int n) and coord:endOfMonths(int n) EL functions
-
-The `${coord:months(int n)}` and `${coord:endOfMonths(int n)}` EL functions 
should be used to handle month based frequencies.
-
-Constant values cannot be used to indicate a month based frequency because the 
number of days in a month changes month to month and on leap years; plus the 
number of hours in every day of the month are not always the same for timezones 
that observe daylight-saving time.
-
-##### 4.4.2.1. The coord:months(int n) EL function
-
-The `${coord:months(int n)}` EL function returns the number of minutes for 'n' 
complete months starting with the month of the current nominal time for which 
the computation is being done.
-
-The `${coord:months(int n)}` EL function includes **all** the minutes of the 
current month, regardless of the day of the month of the current nominal time.
-
-**<font color="#008000"> Examples: </font>**
-
-| **Starting Nominal UTC time** | **Timezone** | **Usage**  | **Value** | 
**First Occurrence** | **Comments** |
-| --- | --- | --- | --- | --- | --- |
-| `2009-01-01T08:00Z` | `UTC` | `${coord:months(1)}` | 44640 | 
`2009-01-01T08:00Z` |total minutes for 2009JAN UTC time |
-| `2009-01-01T08:00Z` | `America/Los_Angeles` | `${coord:months(1)}` | 44640 | 
`2009-01-01T08:00Z` | total minutes in 2009JAN PST8PDT time |
-| `2009-01-01T08:00Z` | `America/Los_Angeles` | `${coord:months(2)}` | 84960 | 
`2009-01-01T08:00Z` | total minutes in 2009JAN and 2009FEB PST8PDT time |
-| |||||
-| `2009-03-08T08:00Z` | `UTC` | `${coord:months(1)}` | 44640 | 
`2009-03-08T08:00Z` | total minutes on 2009MAR UTC time |
-| `2009-03-08T08:00Z` | `Europe/London` | `${coord:months(1)}` | 44580 | 
`2009-03-08T08:00Z` | total minutes in 2009MAR BST1BDT time <br/> (2009MAR29 is 
DST switch in Europe) |
-| `2009-03-08T08:00Z` | `America/Los_Angeles` | `${coord:months(1)}` | 44580 | 
`2009-03-08T08:00Z` | total minutes in 2009MAR PST8PDT time <br/> (2009MAR08 is 
DST switch in the US) |
-| `2009-03-08T08:00Z` | `UTC` | `${coord:months(2)}` | 87840 | 
`2009-03-08T08:00Z` | total minutes in 2009MAR and 2009APR UTC time |
-| `2009-03-08T08:00Z` | `America/Los_Angeles` | `${coord:months(2)}` | 87780 | 
`2009-03-08T08:00Z` | total minutes in 2009MAR and 2009APR PST8PDT time <br/> 
(2009MAR08 is DST switch in US) |
-
-##### 4.4.2.2. The coord:endOfMonths(int n) EL function
-
-The `${coord:endOfMonths(int n)}` EL function is identical to the 
`${coord:months(int n)}` except that it shifts the first occurrence to the end 
of the month for the specified timezone before computing the interval in 
minutes.
-
-**<font color="#008000"> Examples: </font>**
-
-| **Starting Nominal UTC time** | **Timezone** | **Usage**  | **Value** | 
**First Occurrence** | **Comments** |
-| --- | --- | --- | --- | --- | --- |
-| `2009-01-01T00:00Z` | `UTC` | `${coord:endOfMonths(1)}` | 40320 | 
`2009-02-01T00:00Z` | first occurrence in 2009FEB 00:00 UTC time |
-| `2009-01-01T08:00Z` | `UTC` | `${coord:endOfMonths(1)}` | 40320 | 
`2009-02-01T00:00Z` | first occurrence in 2009FEB 00:00 UTC time |
-| `2009-01-31T08:00Z` | `UTC` | `${coord:endOfMonths(1)}` | 40320 | 
`2009-02-01T00:00Z` | first occurrence in 2009FEB 00:00 UTC time |
-| `2009-01-01T08:00Z` | `America/Los_Angeles` | `${coord:endOfMonths(1)}` | 
40320 | `2009-02-01T08:00Z` | first occurrence in 2009FEB 08:00 UTC time |
-| `2009-02-02T08:00Z` | `America/Los_Angeles` | `${coord:endOfMonths(1)}` | 
44580  | `2009-03-01T08:00Z` | first occurrence in 2009MAR 08:00 UTC time |
-| `2009-02-01T08:00Z` | `America/Los_Angeles` | `${coord:endOfMonths(1)}` | 
44580  | `2009-03-01T08:00Z` | first occurrence in 2009MAR 08:00 UTC time |
-
-
-```
-<coordinator-app name="hello-coord" frequency="${coord:months(1)}"
-                  start="2009-01-02T08:00Z" end="2009-04-02T08:00Z" 
timezone="America/Los_Angeles"
-                 xmlns="uri:oozie:coordinator:0.5">
-      <controls>
-        <timeout>10</timeout>
-        <concurrency>${concurrency_level}</concurrency>
-        <execution>${execution_order}</execution>
-        <throttle>${materialization_throttle}</throttle>
-      </controls>
-
-      <datasets>
-       <dataset name="din" frequency="${coord:endOfMonths(1)}"
-                initial-instance="2009-01-02T08:00Z" 
timezone="America/Los_Angeles">
-         
<uri-template>${baseFsURI}/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}</uri-template>
-        </dataset>
-       <dataset name="dout" frequency="${coord:minutes(30)}"
-                initial-instance="2009-01-02T08:00Z" timezone="UTC">
-         
<uri-template>${baseFsURI}/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}</uri-template>
-        </dataset>
-      </datasets>
-
-      <input-events>
-         <data-in name="input" dataset="din">
-                               <instance>${coord:current(0)}</instance>
-         </data-in>
-      </input-events>
-
-      <output-events>
-         <data-out name="output" dataset="dout">
-                               <instance>${coord:current(1)}</instance>
-         </data-out>
-      </output-events>
-
-      <action>
-        <workflow>
-          <app-path>${wf_app_path}</app-path>
-          <configuration>
-              <property>
-              <name>wfInput</name>
-              <value>${coord:dataIn('input')}</value>
-            </property>
-            <property>
-              <name>wfOutput</name>
-              <value>${coord:dataOut('output')}</value>
-            </property>
-         </configuration>
-       </workflow>
-      </action>
- </coordinator-app>
-```
-
-#### 4.4.3. The coord:endOfWeeks(int n) EL function
-
-The `${coord:endOfWeeks(int n)}`  EL function shifts the first occurrence to 
the start of the week for the specified
-timezone before computing the interval in minutes. The start of the week 
depends on the Java's implementation of
-[Calendar.getFirstDayOfWeek()](https://docs.oracle.com/javase/8/docs/api/java/util/Calendar.html#getFirstDayOfWeek--)
- i.e. first day of the week is SUNDAY in the U.S., MONDAY in France.
-
-**<font color="#008000"> Examples: </font>**
-
-| **Starting Nominal UTC time** | **Timezone** | **Usage**  | **Value** | 
**First Occurrence** | **Comments** |
-| --- | --- | --- | --- | --- | --- |
-| `2017-01-04T00:00Z` | `UTC` | `${coord:endOfWeeks(1)}` | 10080 | 
`2017-01-08T00:00Z` | first occurrence on 2017JAN08 08:00 UTC time |
-| `2017-01-04T08:00Z` | `UTC` | `${coord:endOfWeeks(1)}` | 10080 | 
`2017-01-08T08:00Z` | first occurrence on 2017JAN08 08:00 UTC time |
-| `2017-01-06T08:00Z` | `UTC` | `${coord:endOfWeeks(1)}` | 10080 | 
`2017-01-08T08:00Z` | first occurrence on 2017JAN08 08:00 UTC time |
-| `2017-01-04T08:00Z` | `America/Los_Angeles` | `${coord:endOfWeeks(1)}` | 
10080 | `2017-01-08T08:00Z` | first occurrence in 2017JAN08 08:00 UTC time |
-| `2017-01-06T08:00Z` | `America/Los_Angeles` | `${coord:endOfWeeks(1)}` | 
10080 | `2017-01-08T08:00Z` | first occurrence in 2017JAN08 08:00 UTC time |
-
-
-```
-<coordinator-app name="hello-coord" frequency="${coord:endOfWeeks(1)}"
-                  start="2017-01-04T08:00Z" end="2017-12-31T08:00Z" 
timezone="America/Los_Angeles"
-                 xmlns="uri:oozie:coordinator:0.5">
-      <controls>
-        <timeout>10</timeout>
-        <concurrency>${concurrency_level}</concurrency>
-        <execution>${execution_order}</execution>
-        <throttle>${materialization_throttle}</throttle>
-      </controls>
-
-      <datasets>
-       <dataset name="din" frequency="${coord:endOfWeeks(1)}"
-                initial-instance="2017-01-01T08:00Z" 
timezone="America/Los_Angeles">
-         <uri-template>${baseFsURI}/${YEAR}/${MONTH}/${DAY}</uri-template>
-        </dataset>
-       <dataset name="dout" frequency="${coord:endOfWeeks(1)}"
-                initial-instance="2017-01-01T08:00Z" timezone="UTC">
-         <uri-template>${baseFsURI}/${YEAR}/${MONTH}/${DAY}</uri-template>
-        </dataset>
-      </datasets>
-
-      <input-events>
-         <data-in name="input" dataset="din">
-            <instance>${coord:current(0)}</instance>
-         </data-in>
-      </input-events>
-
-      <output-events>
-         <data-out name="output" dataset="dout">
-            <instance>${coord:current(1)}</instance>
-         </data-out>
-      </output-events>
-
-      <action>
-        <workflow>
-          <app-path>${wf_app_path}</app-path>
-          <configuration>
-              <property>
-              <name>wfInput</name>
-              <value>${coord:dataIn('input')}</value>
-            </property>
-            <property>
-              <name>wfOutput</name>
-              <value>${coord:dataOut('output')}</value>
-            </property>
-         </configuration>
-       </workflow>
-      </action>
- </coordinator-app>
-```
-
-#### 4.4.4. Cron syntax in coordinator frequency
-
-Oozie has historically allowed only very basic forms of scheduling: You could 
choose
-to run jobs separated by a certain number of minutes, hours, days or weeks. 
That's
-all. This works fine for processes that need to run continuously all year like 
building
-a search index to power an online website.
-
-However, there are a lot of cases that don't fit this model. For example, 
maybe you
-want to export data to a reporting system used during the day by business 
analysts.
-It would be wasteful to run the jobs when no analyst is going to take 
advantage of
-the new information, such as overnight. You might want a policy that says 
"only run
-these jobs on weekdays between 6AM and 8PM". Previous versions of Oozie didn't 
support
-this kind of complex scheduling policy without requiring multiple identical 
coordinators.
-Cron-scheduling improves the user experience in this area, allowing for a lot 
more flexibility.
-
-Cron is a standard time-based job scheduling mechanism in unix-like operating 
system. It is used extensively by system
-administrators to setup jobs and maintain software environment. Cron syntax 
generally consists of five fields, minutes,
-hours, date of month, month, and day of week respectively although multiple 
variations do exist.
-
-
-```
-<coordinator-app name="cron-coord" frequency="0/10 1/2 ** ** *" 
start="${start}" end="${end}" timezone="UTC"
-                 xmlns="uri:oozie:coordinator:0.2">
-        <action>
-        <workflow>
-            <app-path>${workflowAppUri}</app-path>
-            <configuration>
-                <property>
-                    <name>jobTracker</name>
-                    <value>${jobTracker}</value>
-                </property>
-                <property>
-                    <name>nameNode</name>
-                    <value>${nameNode}</value>
-                </property>
-                <property>
-                    <name>queueName</name>
-                    <value>${queueName}</value>
-                </property>
-            </configuration>
-        </workflow>
-    </action>
-</coordinator-app>
-```
-
-Cron expressions are comprised of 5 required fields. The fields respectively 
are described as follows:
-
-| **Field name** | **Allowed Values** | **Allowed Special Characters**  |
-| --- | --- | --- |
-| `Minutes` | `0-59` | , - * / |
-| `Hours` | `0-23` | , - * / |
-| `Day-of-month` | `1-31` | , - * ? / L W |
-| `Month` | `1-12 or JAN-DEC` | , - * / |
-| `Day-of-Week` | `1-7 or SUN-SAT` | , - * ? / L #|
-
-The '**' character is used to specify all values. For example, "**" in the 
minute field means "every minute".
-
-The '?' character is allowed for the day-of-month and day-of-week fields. It 
is used to specify 'no specific value'.
-This is useful when you need to specify something in one of the two fields, 
but not the other.
-
-The '-' character is used to specify ranges For example "10-12" in the hour 
field means "the hours 10, 11 and 12".
-
-The ',' character is used to specify additional values. For example 
"MON,WED,FRI" in the day-of-week field means
-"the days Monday, Wednesday, and Friday".
-
-The '/' character is used to specify increments. For example "0/15" in the 
minutes field means "the minutes 0, 15, 30, and 45".
-And "5/15" in the minutes field means "the minutes 5, 20, 35, and 50". 
Specifying '*' before the '/' is equivalent to
-specifying 0 is the value to start with.
-Essentially, for each field in the expression, there is a set of numbers that 
can be turned on or off.
-For minutes, the numbers range from 0 to 59. For hours 0 to 23, for days of 
the month 0 to 31, and for months 1 to 12.
-The "/" character simply helps you turn on every "nth" value in the given set. 
Thus "7/6" in the month field only turns on
-month "7", it does NOT mean every 6th month, please note that subtlety.
-
-The 'L' character is allowed for the day-of-month and day-of-week fields. This 
character is short-hand for "last",
-but it has different meaning in each of the two fields.
-For example, the value "L" in the day-of-month field means "the last day of 
the month" - day 31 for January, day 28 for
-February on non-leap years.
-If used in the day-of-week field by itself, it simply means "7" or "SAT".
-But if used in the day-of-week field after another value, it means "the last 
xxx day of the month" - for example
-"6L" means "the last Friday of the month".
-You can also specify an offset from the last day of the month, such as "L-3" 
which would mean the third-to-last day of the
-calendar month.
-When using the 'L' option, it is important not to specify lists, or ranges of 
values, as you'll get confusing/unexpected results.
-
-The 'W' character is allowed for the day-of-month field. This character is 
used to specify the weekday (Monday-Friday)
-nearest the given day.
-As an example, if you were to specify "15W" as the value for the day-of-month 
field, the meaning is:
-"the nearest weekday to the 15th of the month". So if the 15th is a Saturday, 
the trigger will fire on Friday the 14th.
-If the 15th is a Sunday, the trigger will fire on Monday the 16th. If the 15th 
is a Tuesday, then it will fire on Tuesday the 15th.
-However if you specify "1W" as the value for day-of-month, and the 1st is a 
Saturday, the trigger will fire on Monday the 3rd,
-as it will not 'jump' over the boundary of a month's days.
-The 'W' character can only be specified when the day-of-month is a single day, 
not a range or list of days.
-
-The 'L' and 'W' characters can also be combined for the day-of-month 
expression to yield 'LW', which translates to
-"last weekday of the month".
-
-The '#' character is allowed for the day-of-week field. This character is used 
to specify "the nth" XXX day of the month.
-For example, the value of "6#3" in the day-of-week field means the third 
Friday of the month (day 6 = Friday and "#3" =
-the 3rd one in the month).
-Other examples: "2#1" = the first Monday of the month and "4#5" = the fifth 
Wednesday of the month.
-Note that if you specify "#5" and there is not 5 of the given day-of-week in 
the month, then no firing will occur that month.
-If the '#' character is used, there can only be one expression in the 
day-of-week field ("3#1,6#3" is not valid,
-since there are two expressions).
-
-The legal characters and the names of months and days of the week are not case 
sensitive.
-
-If a user specifies an invalid cron syntax to run something on Feb, 30th for 
example: "0 10 30 2 *", the coordinator job
-will not be created and an invalid coordinator frequency parse exception will 
be thrown.
-
-If a user has a coordinator job that materializes no action during run time, 
for example: frequency of "0 10 ** ** *" with
-start time of 2013-10-18T21:00Z and end time of 2013-10-18T22:00Z, the 
coordinator job submission will be rejected and
-an invalid coordinator attribute exception will be thrown.
-
-**<font color="#008000"> Examples: </font>**
-
-| **Cron Expression** | **Meaning** |
-| --- | --- |
-| 10 9 ** ** * | Runs everyday at 9:10am |
-| 10,30,45 9 ** ** * | Runs everyday at 9:10am, 9:30am, and 9:45am |
-| `0 * 30 JAN 2-6` | Runs at 0 minute of every hour on weekdays and 30th of 
January |
-| `0/20 9-17 ** ** 2-5` | Runs every Mon, Tue, Wed, and Thurs at minutes 0, 
20, 40 from 9am to 5pm |
-| 1 2 L-3 ** ** | Runs every third-to-last day of month at 2:01am |
-| `1 2 6W 3 ?` | Runs on the nearest weekday to March, 6th every year at 
2:01am |
-| `1 2 * 3 3#2` | Runs every second Tuesday of March at 2:01am every year |
-| `0 10,13 ** ** MON-FRI` | Runs every weekday at 10am and 1pm |
-
-
-NOTES:
-
-    Cron expression and syntax in Oozie are inspired by 
Quartz:http://quartz-scheduler.org/api/2.0.0/org/quartz/CronExpression.html.
-    However, there is a major difference between Quartz cron and Oozie cron in 
which Oozie cron doesn't have "Seconds" field
-    since everything in Oozie functions at the minute granularity at most. 
Everything related to Oozie cron syntax should be based
-    on the documentation in the Oozie documentation.
-
-    Cron expression uses oozie server processing timezone. Since default oozie 
processing timezone is UTC, if you want to
-    run a job on every weekday at 10am in Tokyo, Japan(UTC + 9), your cron 
expression should be "0 1 * * 2-6" instead of
-    the "0 10 * * 2-6" which you might expect.
-
-    Overflowing ranges is supported but strongly discouraged - that is, having 
a larger number on the left hand side than the right.
-    You might do 22-2 to catch 10 o'clock at night until 2 o'clock in the 
morning, or you might have NOV-FEB.
-    It is very important to note that overuse of overflowing ranges creates 
ranges that don't make sense and
-    no effort has been made to determine which interpretation CronExpression 
chooses.
-    An example would be "0 14-6 ? * FRI-MON".
-
-## 5. Dataset
-
-A dataset is a collection of data referred to by a logical name.
-
-A dataset instance is a particular occurrence of a dataset and it is 
represented by a unique set of URIs. A dataset instance can be individually 
referred. Dataset instances for datasets containing ranges are identified by a 
set of unique URIs, otherwise a dataset instance is identified by a single 
unique URI.
-
-Datasets are typically defined in some central place for a business domain and 
can be accessed by the coordinator. Because of this, they can be defined once 
and used many times.
-
-A dataset is a synchronous (produced at regular time intervals, it has an 
expected frequency) input.
-
-A dataset instance is considered to be immutable while it is being consumed by 
coordinator jobs.
-
-### 5.1. Synchronous Datasets
-
-Instances of synchronous datasets are produced at regular time intervals, at 
an expected frequency. They are also referred to as "clocked datasets".
-
-Synchronous dataset instances are identified by their nominal creation time. 
The nominal creation time is normally specified in the dataset instance URI.
-
-A synchronous dataset definition contains the following information:
-
-   * **<font color="#0000ff"> name: </font>** The dataset name. It must be a 
valid Java identifier.
-   * **<font color="#0000ff"> frequency: </font>*** It represents the rate, in 
minutes at which data is _periodically_ created. The granularity is in minutes 
and can be expressed using EL expressions, for example: ${5 ** HOUR}.
-   * **<font color="#0000ff"> initial-instance: </font>** The UTC datetime of 
the initial instance of the dataset. The initial-instance also provides the 
baseline datetime to compute instances of the dataset using multiples of the 
frequency.
-   * **<font color="#0000ff"> timezone:</font>** The timezone of the dataset.
-   * **<font color="#0000ff"> uri-template:</font>** The URI template that 
identifies the dataset and can be resolved into concrete URIs to identify a 
particular dataset instance. The URI template is constructed using:
-      * **<font color="#0000ff"> constants </font>** See the allowable EL Time 
Constants below. Ex: ${YEAR}/${MONTH}.
-      * **<font color="#0000ff"> variables </font>** Variables must be 
resolved at the time a coordinator job is submitted to the coordinator engine. 
They are normally provided a job parameters (configuration properties). Ex: 
${market}/${language}
-   * **<font color="#0000ff"> done-flag:</font>** This flag denotes when a 
dataset instance is ready to be consumed.
-      * If the done-flag is omitted the coordinator will wait for the presence 
of a _SUCCESS file in the directory (Note: MapReduce jobs create this on 
successful completion automatically).
-      * If the done-flag is present but empty, then the existence of the 
directory itself indicates that the dataset is ready.
-      * If the done-flag is present but non-empty, Oozie will check for the 
presence of the named file within the directory, and will be considered ready 
(done) when the file exists.
-
-The following EL constants can be used within synchronous dataset URI 
templates:
-
-| **EL Constant** | **Resulting Format** | **Comments**  |
-| --- | --- | --- |
-| `YEAR` | _YYYY_ | 4 digits representing the year |
-| `MONTH` | _MM_ | 2 digits representing the month of the year, January = 1 |
-| `DAY` | _DD_ | 2 digits representing the day of the month |
-| `HOUR` | _HH_ | 2 digits representing the hour of the day, in 24 hour 
format, 0 - 23 |
-| `MINUTE` | _mm_ | 2 digits representing the minute of the hour, 0 - 59 |
-
-**<font color="#800080">Syntax: </font>**
-
-
-```
-  <dataset name="[NAME]" frequency="[FREQUENCY]"
-           initial-instance="[DATETIME]" timezone="[TIMEZONE]">
-    <uri-template>[URI TEMPLATE]</uri-template>
-    <done-flag>[FILE NAME]</done-flag>
-  </dataset>
-```
-
-IMPORTANT: The values of the EL constants in the dataset URIs (in HDFS) are 
expected in UTC. Oozie Coordinator takes care of the timezone conversion when 
performing calculations.
-
-**<font color="#008000"> Examples: </font>**
-
-1. **A dataset produced once every day at 00:15 PST8PDT and done-flag is set 
to empty:**
-
-
-    ```
-      <dataset name="logs" frequency="${coord:days(1)}"
-               initial-instance="2009-02-15T08:15Z" 
timezone="America/Los_Angeles">
-        <uri-template>
-          hdfs://foo:8020/app/logs/${market}/${YEAR}${MONTH}/${DAY}/data
-        </uri-template>
-        <done-flag></done-flag>
-      </dataset>
-    ```
-
-
-    The dataset would resolve to the following URIs and Coordinator looks for 
the existence of the directory itself:
-
-
-    ```
-      [market] will be replaced with user given property.
-
-      hdfs://foo:8020/usr/app/[market]/2009/02/15/data
-      hdfs://foo:8020/usr/app/[market]/2009/02/16/data
-      hdfs://foo:8020/usr/app/[market]/2009/02/17/data
-      ...
-    ```
-
-
-2. **A dataset available on the 10th of each month and done-flag is default 
'_SUCCESS':**
-
-
-    ```
-      <dataset name="stats" frequency="${coord:months(1)}"
-               initial-instance="2009-01-10T10:00Z" 
timezone="America/Los_Angeles">
-        
<uri-template>hdfs://foo:8020/usr/app/stats/${YEAR}/${MONTH}/data</uri-template>
-      </dataset>
-    ```
-
-    The dataset would resolve to the following URIs:
-
-
-    ```
-      hdfs://foo:8020/usr/app/stats/2009/01/data
-      hdfs://foo:8020/usr/app/stats/2009/02/data
-      hdfs://foo:8020/usr/app/stats/2009/03/data
-      ...
-    ```
-
-    The dataset instances are not ready until '_SUCCESS' exists in each path:
-
-
-    ```
-      hdfs://foo:8020/usr/app/stats/2009/01/data/_SUCCESS
-      hdfs://foo:8020/usr/app/stats/2009/02/data/_SUCCESS
-      hdfs://foo:8020/usr/app/stats/2009/03/data/_SUCCESS
-      ...
-    ```
-
-
-3. **A dataset available at the end of every quarter and done-flag is 
'trigger.dat':**
-
-
-    ```
-      <dataset name="stats" frequency="${coord:months(3)}"
-               initial-instance="2009-01-31T20:00Z" 
timezone="America/Los_Angeles">
-        <uri-template>
-          hdfs://foo:8020/usr/app/stats/${YEAR}/${MONTH}/data
-        </uri-template>
-        <done-flag>trigger.dat</done-flag>
-      </dataset>
-    ```
-
-    The dataset would resolve to the following URIs:
-
-
-    ```
-      hdfs://foo:8020/usr/app/stats/2009/01/data
-      hdfs://foo:8020/usr/app/stats/2009/04/data
-      hdfs://foo:8020/usr/app/stats/2009/07/data
-      ...
-    ```
-
-    The dataset instances are not ready until 'trigger.dat' exists in each 
path:
-
-
-    ```
-      hdfs://foo:8020/usr/app/stats/2009/01/data/trigger.dat
-      hdfs://foo:8020/usr/app/stats/2009/04/data/trigger.dat
-      hdfs://foo:8020/usr/app/stats/2009/07/data/trigger.dat
-      ...
-    ```
-
-
-4. **Normally the URI template of a dataset has a precision similar to the 
frequency:**
-
-
-    ```
-      <dataset name="logs" frequency="${coord:days(1)}"
-               initial-instance="2009-01-01T10:30Z" 
timezone="America/Los_Angeles">
-        <uri-template>
-          hdfs://foo:8020/usr/app/logs/${YEAR}/${MONTH}/${DAY}/data
-        </uri-template>
-      </dataset>
-    ```
-
-    The dataset would resolve to the following URIs:
-
-
-    ```
-      hdfs://foo:8020/usr/app/logs/2009/01/01/data
-      hdfs://foo:8020/usr/app/logs/2009/01/02/data
-      hdfs://foo:8020/usr/app/logs/2009/01/03/data
-      ...
-    ```
-
-5. **However, if the URI template has a finer precision than the dataset 
frequency:**
-
-
-    ```
-      <dataset name="logs" frequency="${coord:days(1)}"
-               initial-instance="2009-01-01T10:30Z" 
timezone="America/Los_Angeles">
-        <uri-template>
-          
hdfs://foo:8020/usr/app/logs/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}/data
-        </uri-template>
-      </dataset>
-    ```
-
-    The dataset resolves to the following URIs with fixed values for the finer 
precision template variables:
-
-
-    ```
-      hdfs://foo:8020/usr/app/logs/2009/01/01/10/30/data
-      hdfs://foo:8020/usr/app/logs/2009/01/02/10/30/data
-      hdfs://foo:8020/usr/app/logs/2009/01/03/10/30/data
-      ...
-    ```
-
-### 5.2. Dataset URI-Template types
-
-Each dataset URI could be a HDFS path URI denoting a HDFS directory: 
`hdfs://foo:8020/usr/logs/20090415` or a
-HCatalog partition URI identifying a set of table partitions: 
`hcat://bar:8020/logsDB/logsTable/dt=20090415;region=US`.
-
-HCatalog enables table and storage management for Pig, Hive and MapReduce. The 
format to specify a HCatalog table partition URI is
-`hcat://[metastore server]:[port]/[database name]/[table 
name]/[partkey1]=[value];[partkey2]=[value];...`
-
-For example,
-
-```
-  <dataset name="logs" frequency="${coord:days(1)}"
-           initial-instance="2009-02-15T08:15Z" timezone="America/Los_Angeles">
-    <uri-template>
-      
hcat://myhcatmetastore:9080/database1/table1/myfirstpartitionkey=myfirstvalue;mysecondpartitionkey=mysecondvalue
-    </uri-template>
-    <done-flag></done-flag>
-  </dataset>
-```
-
-### 5.3. Asynchronous Datasets
-   * TBD
-
-### 5.4. Dataset Definitions
-
-Dataset definitions are grouped in XML files.
-**IMPORTANT:** Please note that if an XML namespace version is specified for 
the coordinator-app element in the coordinator.xml file, no namespace needs to 
be defined separately for the datasets element (even if the dataset is defined 
in a separate file). Specifying it at multiple places might result in xml 
errors while submitting the coordinator job.
-
-**<font color="#800080">Syntax: </font>**
-
-
-```
- <!-- Synchronous datasets -->
-<datasets>
-  <include>[SHARED_DATASETS]</include>
-  ...
-  <dataset name="[NAME]" frequency="[FREQUENCY]"
-           initial-instance="[DATETIME]" timezone="[TIMEZONE]">
-    <uri-template>[URI TEMPLATE]</uri-template>
-  </dataset>
-  ...
-</datasets>
-```
-
-**<font color="#008000"> Example: </font>**
-
-
-```
-<datasets>
-.
-  <include>hdfs://foo:8020/app/dataset-definitions/globallogs.xml</include>
-.
-  <dataset name="logs" frequency="${coord:hours(12)}"
-           initial-instance="2009-02-15T08:15Z" 
timezone="Americas/Los_Angeles">
-    <uri-template>
-    
hdfs://foo:8020/app/logs/${market}/${YEAR}${MONTH}/${DAY}/${HOUR}/${MINUTE}/data
-    </uri-template>
-  </dataset>
-.
-  <dataset name="stats" frequency="${coord:months(1)}"
-           initial-instance="2009-01-10T10:00Z" 
timezone="Americas/Los_Angeles">
-    
<uri-template>hdfs://foo:8020/usr/app/stats/${YEAR}/${MONTH}/data</uri-template>
-  </dataset>
-.
-</datasets>
-```
-
-## 6. Coordinator Application
-
-### 6.1. Concepts
-
-#### 6.1.1. Coordinator Application
-
-A coordinator application is a program that triggers actions (commonly 
workflow jobs) when a set of conditions are met. Conditions can be a time 
frequency, the availability of new dataset instances or other external events.
-
-Types of coordinator applications:
-
-   * **Synchronous:** Its coordinator actions are created at specified time 
intervals.
-
-Coordinator applications are normally parameterized.
-
-#### 6.1.2. Coordinator Job
-
-To create a coordinator job, a job configuration that resolves all coordinator 
application parameters must be provided to the coordinator engine.
-
-A coordinator job is a running instance of a coordinator application running 
from a start time to an end time. The start
-time must be earlier than the end time.
-
-At any time, a coordinator job is in one of the following status: **PREP, 
RUNNING, RUNNINGWITHERROR, PREPSUSPENDED, SUSPENDED, SUSPENDEDWITHERROR, 
PREPPAUSED, PAUSED, PAUSEDWITHERROR, SUCCEEDED, DONEWITHERROR, KILLED, FAILED**.
-
-Valid coordinator job status transitions are:
-
-   * **PREP --> PREPSUSPENDED | PREPPAUSED | RUNNING | KILLED**
-   * **RUNNING --> RUNNINGWITHERROR | SUSPENDED | PAUSED | SUCCEEDED | KILLED**
-   * **RUNNINGWITHERROR --> RUNNING | SUSPENDEDWITHERROR | PAUSEDWITHERROR | 
DONEWITHERROR | KILLED | FAILED**
-   * **PREPSUSPENDED --> PREP | KILLED**
-   * **SUSPENDED --> RUNNING | KILLED**
-   * **SUSPENDEDWITHERROR --> RUNNINGWITHERROR | KILLED**
-   * **PREPPAUSED --> PREP | KILLED**
-   * **PAUSED --> SUSPENDED | RUNNING | KILLED**
-   * **PAUSEDWITHERROR --> SUSPENDEDWITHERROR | RUNNINGWITHERROR | KILLED**
-   * **FAILED | KILLED --> IGNORED**
-   * **IGNORED --> RUNNING**
-
-When a coordinator job is submitted, oozie parses the coordinator job XML. 
Oozie then creates a record for the coordinator with status **PREP** and 
returns a unique ID. The coordinator is also started immediately if pause time 
is not set.
-
-When a user requests to suspend a coordinator job that is in **PREP** state, 
oozie puts the job in status **PREPSUSPENDED**. Similarly, when pause time 
reaches for a coordinator job with **PREP** status, oozie puts the job in 
status **PREPPAUSED**.
-
-Conversely, when a user requests to resume a **PREPSUSPENDED** coordinator 
job, oozie puts the job in status **PREP**. And when pause time is reset for a 
coordinator job and job status is **PREPPAUSED**, oozie puts the job in status 
**PREP**.
-
-When a coordinator job starts, oozie puts the job in status **RUNNING** and 
start materializing workflow jobs based on job frequency. If any workflow job 
goes to **FAILED/KILLED/TIMEDOUT** state, the coordinator job is put in 
**RUNNINGWITHERROR**
-
-When a user requests to kill a coordinator job, oozie puts the job in status 
**KILLED** and it sends kill to all submitted workflow jobs.
-
-When a user requests to suspend a coordinator job that is in **RUNNING** 
status, oozie puts the job in status **SUSPENDED** and it suspends all 
submitted workflow jobs. Similarly, when a user requests to suspend a 
coordinator job that is in **RUNNINGWITHERROR** status, oozie puts the job in 
status **SUSPENDEDWITHERROR** and it suspends all submitted workflow jobs.
-
-When pause time reaches for a coordinator job that is in **RUNNING** status, 
oozie puts the job in status **PAUSED**. Similarly, when pause time reaches for 
a coordinator job that is in **RUNNINGWITHERROR** status, oozie puts the job in 
status **PAUSEDWITHERROR**.
-
-Conversely, when a user requests to resume a **SUSPENDED** coordinator job, 
oozie puts the job in status **RUNNING**. Also,  when a user requests to resume 
a **SUSPENDEDWITHERROR** coordinator job, oozie puts the job in status 
**RUNNINGWITHERROR**. And when pause time is reset for a coordinator job and 
job status is **PAUSED**, oozie puts the job in status **RUNNING**. Also, when 
the pause time is reset for a coordinator job and job status is 
**PAUSEDWITHERROR**, oozie puts the job in status **RUNNINGWITHERROR**
-
-A coordinator job creates workflow jobs (commonly coordinator actions) only 
for the duration of the coordinator job and only if the coordinator job is in 
**RUNNING** status. If the coordinator job has been suspended, when resumed it 
will create all the coordinator actions that should have been created during 
the time it was suspended, actions will not be lost, they will delayed.
-
-When the coordinator job materialization finishes and all workflow jobs 
finish, oozie updates the coordinator status accordingly.
-For example, if all workflows are **SUCCEEDED**, oozie puts the coordinator 
job into **SUCCEEDED** status.
-If all workflows are **FAILED**, oozie puts the coordinator job into 
**FAILED** status. If all workflows are **KILLED**, the coordinator
-job status changes to KILLED. However, if any workflow job finishes with not 
**SUCCEEDED** and combination of **KILLED**, **FAILED** or
-**TIMEOUT**, oozie puts the coordinator job into **DONEWITHERROR**. If all 
coordinator actions are **TIMEDOUT**, oozie puts the
-coordinator job into **DONEWITHERROR**.
-
-A coordinator job in **FAILED** or **KILLED** status can be changed to 
**IGNORED** status. A coordinator job in **IGNORED** status can be changed to
- **RUNNING** status.
-
-#### 6.1.3. Coordinator Action
-
-A coordinator job creates and executes coordinator actions.
-
-A coordinator action is normally a workflow job that consumes and produces 
dataset instances.
-
-Once an coordinator action is created (this is also referred as the action 
being materialized), the coordinator action will be in waiting until all 
required inputs for execution are satisfied or until the waiting times out.
-
-##### 6.1.3.1. Coordinator Action Creation (Materialization)
-
-A coordinator job has one driver event that determines the creation 
(materialization) of its coordinator actions (typically a workflow job).
-
-   * For synchronous coordinator jobs the driver event is the frequency of the 
coordinator job.
-
-##### 6.1.3.2. Coordinator Action Status
-
-Once a coordinator action has been created (materialized) the coordinator 
action qualifies for execution. At this point, the action status is **WAITING**.
-
-A coordinator action in **WAITING** status must wait until all its input 
events are available before is ready for execution. When a coordinator action 
is ready for execution its status is **READY**.
-
-A coordinator action in **WAITING** status may timeout before it becomes ready 
for execution. Then the action status is **TIMEDOUT**.
-
-A coordinator action may remain in **READY** status for a while, without 
starting execution, due to the concurrency execution policies of the 
coordinator job.
-
-A coordinator action in **READY** or **WAITING** status changes to **SKIPPED** 
status if the execution strategy is LAST_ONLY and the
-current time is past the next action's nominal time.  See section 6.3 for more 
details.
-
-A coordinator action in **READY** or **WAITING** status changes to **SKIPPED** 
status if the execution strategy is NONE and the
-current time is past the action's nominal time + 1 minute.  See section 6.3 
for more details.
-
-A coordinator action in **READY** status changes to **SUBMITTED** status if 
total current **RUNNING** and **SUBMITTED** actions are less than concurrency 
execution limit.
-
-A coordinator action in **SUBMITTED** status changes to **RUNNING** status 
when the workflow engine start execution of the coordinator action.
-
-A coordinator action is in **RUNNING** status until the associated workflow 
job completes its execution. Depending on the workflow job completion status, 
the coordinator action will be in **SUCCEEDED**, **KILLED** or **FAILED** 
status.
-
-A coordinator action in **WAITING**, **READY**, **SUBMITTED** or **RUNNING** 
status can be killed, changing to **KILLED** status.
-
-A coordinator action in **SUBMITTED** or **RUNNING** status can also fail, 
changing to **FAILED** status.
-
-A coordinator action in **FAILED**, **KILLED**, or **TIMEDOUT** status can be 
changed to **IGNORED** status. A coordinator action in **IGNORED** status can be
- rerun, changing to **WAITING** status.
-
-Valid coordinator action status transitions are:
-
-   * **WAITING --> READY | TIMEDOUT | SKIPPED | KILLED**
-   * **READY --> SUBMITTED | SKIPPED | KILLED**
-   * **SUBMITTED --> RUNNING | KILLED | FAILED**
-   * **RUNNING --> SUCCEEDED | KILLED | FAILED**
-   * **FAILED | KILLED | TIMEDOUT --> IGNORED**
-   * **IGNORED --> WAITING**
-
-#### 6.1.4. Input Events
-
-The Input events of a coordinator application specify the input conditions 
that are required in order to execute a coordinator action.
-
-In the current specification input events are restricted to dataset instances 
availability.
-
-All the datasets instances defined as input events must be available for the 
coordinator action to be ready for execution ( **READY** status).
-
-Input events are normally parameterized. For example, the last 24 hourly 
instances of the 'searchlogs' dataset.
-
-Input events can be refer to multiple instances of multiple datasets. For 
example, the last 24 hourly instances of the 'searchlogs' dataset and the last 
weekly instance of the 'celebrityRumours' dataset.
-
-#### 6.1.5. Output Events
-
-A coordinator action can produce one or more dataset(s) instances as output.
-
-Dataset instances produced as output by one coordinator actions may be 
consumed as input by another coordinator action(s) of other coordinator job(s).
-
-The chaining of coordinator jobs via the datasets they produce and consume is 
referred as a **data pipeline.**
-
-In the current specification coordinator job output events are restricted to 
dataset instances.
-
-#### 6.1.6. Coordinator Action Execution Policies
-
-The execution policies for the actions of a coordinator job can be defined in 
the coordinator application.
-
-   * Timeout: A coordinator job can specify the timeout for its coordinator 
actions, this is, how long the coordinator action will be in *WAITING* or 
*READY* status before giving up on its execution.
-   * Concurrency: A coordinator job can specify the concurrency for its 
coordinator actions, this is, how many coordinator actions are allowed to run 
concurrently ( **RUNNING** status) before the coordinator engine starts 
throttling them.
-   * Execution strategy: A coordinator job can specify the execution strategy 
of its coordinator actions when there is backlog of coordinator actions in the 
coordinator engine. The different execution strategies are 'oldest first', 
'newest first', 'none' and 'last one only'. A backlog normally happens because 
of delayed input data, concurrency control or because manual re-runs of 
coordinator jobs.
-   * Throttle: A coordinator job can specify the materialization or creation 
throttle value for its coordinator actions, this is, how many maximum 
coordinator actions are allowed to be in WAITING state concurrently.
-
-#### 6.1.7. Data Pipeline Application
-
-Commonly, multiple workflow applications are chained together to form a more 
complex application.
-
-Workflow applications are run on regular basis, each of one of them at their 
own frequency. The data consumed and produced by these workflow applications is 
relative to the nominal time of workflow job that is processing the data. This 
is a **coordinator application**.
-
-The output of multiple workflow jobs of a single workflow application is then 
consumed by a single workflow job of another workflow application, this is done 
on regular basis as well. These workflow jobs are triggered by recurrent 
actions of coordinator jobs. This is a set of **coordinator jobs** that 
inter-depend on each other via the data they produce and consume.
-
-This set of interdependent **coordinator applications** is referred as a 
**data pipeline application**.
-
-### 6.2. Synchronous Coordinator Application Example
-
-   * The `checkouts` synchronous dataset is created every 15 minutes by an 
online checkout store.
-   * The `hourlyRevenue` synchronous dataset is created every hour and 
contains the hourly revenue.
-   * The `dailyRevenue` synchronous dataset is created every day and contains 
the daily revenue.
-   * The `monthlyRevenue` synchronous dataset is created every month and 
contains the monthly revenue.
-
-   * The `revenueCalculator-wf` workflow consumes checkout data and produces 
as output the corresponding revenue.
-   * The `rollUpRevenue-wf` workflow consumes revenue data and produces a 
consolidated output.
-
-   * The `hourlyRevenue-coord` coordinator job triggers, every hour, a 
`revenueCalculator-wf` workflow. It specifies as input the last 4 `checkouts` 
dataset instances and it specifies as output a new instance of the 
`hourlyRevenue` dataset.
-   * The `dailyRollUpRevenue-coord` coordinator job triggers, every day, a 
`rollUpRevenue-wf` workflow. It specifies as input the last 24 `hourlyRevenue` 
dataset instances and it specifies as output a new instance of the 
`dailyRevenue` dataset.
-   * The `monthlyRollUpRevenue-coord` coordinator job triggers, once a month, 
a `rollUpRevenue-wf` workflow. It specifies as input all the `dailyRevenue` 
dataset instance of the month and it specifies as output a new instance of the 
`monthlyRevenue` dataset.
-
-This example contains describes all the components that conform a data 
pipeline: datasets, coordinator jobs and coordinator actions (workflows).
-
-The coordinator actions (the workflows) are completely agnostic of datasets 
and their frequencies, they just use them as input and output data (i.e. HDFS 
files or directories). Furthermore, as the example shows, the same workflow can 
be used to process similar datasets of different frequencies.
-
-The frequency of the `hourlyRevenue-coord` coordinator job is 1 hour, this 
means that every hour a coordinator action is created. A coordinator action 
will be executed only when the 4 `checkouts` dataset instances for the 
corresponding last hour are available, until then the coordinator action will 
remain as created (materialized), in **WAITING** status. Once the 4 dataset 
instances for the corresponding last hour are available, the coordinator action 
will be executed and it will start a `revenueCalculator-wf` workflow job.
-
-### 6.3. Synchronous Coordinator Application Definition
-
-A synchronous coordinator definition is a is defined by a name, start time and 
end time, the frequency of creation of its coordinator actions, the input 
events, the output events and action control information:
-
-   * **<font color="#0000ff"> start: </font>** The start datetime for the job. 
Starting at this time actions will be materialized. Refer to section #3 
'Datetime Representation' for syntax details.
-   * **<font color="#0000ff"> end: </font>** The end datetime for the job. 
When actions will stop being materialized. Refer to section #3 'Datetime 
Representation' for syntax details.
-   * **<font color="#0000ff"> timezone:</font>** The timezone of the 
coordinator application.
-   * **<font color="#0000ff"> frequency: </font>** The frequency, in minutes, 
to materialize actions. Refer to section #4 'Time Interval Representation' for 
syntax details.
-   * Control information:
-      * **<font color="#0000ff"> timeout: </font>** The maximum time, in 
minutes, that a materialized action will be waiting for the additional 
conditions to be satisfied before being discarded. A timeout of `0` indicates 
that at the time of materialization all the other conditions must be satisfied, 
else the action will be discarded. A timeout of `0` indicates that if all the 
input events are not satisfied at the time of action materialization, the 
action should timeout immediately. A timeout of `-1` indicates no timeout, the 
materialized action will wait forever for the other conditions to be satisfied. 
The default value is `-1`. The timeout can only cause a `WAITING` action to 
transition to `TIMEDOUT`; once the data dependency is satisified, a `WAITING` 
action transitions to `READY`, and the timeout no longer has any affect, even 
if the action hasn't transitioned to `SUBMITTED` or `RUNNING` when it expires.
-      * **<font color="#0000ff"> concurrency: </font>** The maximum number of 
actions for this job that can be running at the same time. This value allows to 
materialize and submit multiple instances of the coordinator app, and allows 
operations to catchup on delayed processing. The default value is `1`.
-      * **<font color="#0000ff"> execution: </font>** Specifies the execution 
order if multiple instances of the coordinator job have satisfied their 
execution criteria. Valid values are:
-         ** `FIFO` (oldest first) **default*.
-         * `LIFO` (newest first).
-         * `LAST_ONLY` (see explanation below).
-         * `NONE` (see explanation below).
-      * **<font color="#0000ff"> throttle: </font>** The maximum coordinator 
actions are allowed to be in WAITING state concurrently. The default value is 
`12`.
-   * **<font color="#0000ff"> datasets: </font>** The datasets coordinator 
application uses.
-   * **<font color="#0000ff"> input-events: </font>** The coordinator job 
input events.
-      * **<font color="#0000ff"> data-in: </font>** It defines one job input 
condition that resolves to one or more instances of a dataset.
-         * **<font color="#0000ff"> name: </font>** input condition name.
-         * **<font color="#0000ff"> dataset: </font>** dataset name.
-         * **<font color="#0000ff"> instance: </font>** refers to a single 
dataset instance (the time for a synchronous dataset).
-         * **<font color="#0000ff"> start-instance: </font>** refers to the 
beginning of an instance range (the time for a synchronous dataset).
-         * **<font color="#0000ff"> end-instance: </font>** refers to the end 
of an instance range (the time for a synchronous dataset).
-   * **<font color="#0000ff"> output-events: </font>** The coordinator job 
output events.
-      * **<font color="#0000ff"> data-out: </font>** It defines one job output 
that resolves to a dataset instance.
-         * **<font color="#0000ff"> name: </font>** output name.
-         * **<font color="#0000ff"> dataset: </font>** dataset name.
-         * **<font color="#0000ff"> instance: </font>** dataset instance that 
will be generated by coordinator action.
-         * **<font color="#0000ff"> nocleanup: </font>** disable cleanup of 
the output dataset in rerun if true, even when nocleanup option is not used in 
CLI command.
-   * **<font color="#0000ff"> action: </font>** The coordinator action to 
execute.
-      * **<font color="#0000ff"> workflow: </font>** The workflow job 
invocation. Workflow job properties can refer to the defined data-in and 
data-out elements.
-
-**LAST_ONLY:** While `FIFO` and `LIFO` simply specify the order in which READY 
actions should be executed, `LAST_ONLY` can actually
-cause some actions to be SKIPPED and is a little harder to understand.  When 
`LAST_ONLY` is set, an action that is `WAITING`
-or `READY` will be `SKIPPED` when the current time is past the next action's 
nominal time.  For example, suppose action 1 and 2
-are both `READY`, the current time is 5:00pm, and action 2's nominal time is 
5:10pm.  In 10 minutes from now, at 5:10pm, action 1
-will become SKIPPED, assuming it doesn't transition to `SUBMITTED` (or a 
terminal state) before then.  This sounds similar to the
-timeout control, but there are some important differences:
-
-   * The timeout time is configurable while the `LAST_ONLY` time is 
effectively the frequency.
-   * Reaching the timeout causes an action to transition to `TIMEDOUT`, which 
will cause the Coordinator Job to become `RUNNINGWITHERROR` and eventually 
`DONEWITHERROR`.  With `LAST_ONLY`, an action becomes `SKIPPED` and the 
Coordinator Job remains `RUNNING` and eventually `DONE`.
-   * The timeout is looking satisfying the data dependency, while `LAST_ONLY` 
is looking at the action itself.  This means that the timeout can only cause a 
transition from `WAITING`, while `LAST_ONLY` can cause a transition from 
`WAITING` or `READY`.
-
-`LAST_ONLY` is useful if you want a recurring job, but do not actually care 
about the individual instances and just
-always want the latest action.  For example, if you have a coordinator running 
every 10 minutes and take Oozie down for 1 hour, when
-Oozie comes back, there would normally be 6 actions `WAITING` or `READY` to 
run.  However, with `LAST_ONLY`, only the current one
-will go to `SUBMITTED` and then `RUNNING`; the others will go to `SKIPPED`.
-
-**NONE:** Similar to `LAST_ONLY` except instead of looking at the next 
action's nominal time, it looks
-at `oozie.coord.execution.none.tolerance` in oozie-site.xml (default is 1 
minute). When `NONE` is set, an action that is `WAITING`
-or `READY` will be `SKIPPED` when the current time is more than the configured 
number of minutes (tolerance) past that action's
-nominal time. For example, suppose action 1 and 2 are both `READY`, the 
current time is 5:20pm, and both actions' nominal times are
-before 5:19pm. Both actions will become `SKIPPED`, assuming they don't 
transition to `SUBMITTED` (or a terminal state) before then.
-
-**<font color="#800080">Syntax: </font>**
-
-
-```
-   <coordinator-app name="[NAME]" frequency="[FREQUENCY]"
-                    start="[DATETIME]" end="[DATETIME]" timezone="[TIMEZONE]"
-                    xmlns="uri:oozie:coordinator:0.1">
-      <controls>
-        <timeout>[TIME_PERIOD]</timeout>
-        <concurrency>[CONCURRENCY]</concurrency>
-        <execution>[EXECUTION_STRATEGY]</execution>
-      </controls>
-.
-      <datasets>
-        <include>[SHARED_DATASETS]</include>
-        ...
-.
-        <!-- Synchronous datasets -->
-           <dataset name="[NAME]" frequency="[FREQUENCY]"
-                    initial-instance="[DATETIME]" timezone="[TIMEZONE]">
-             <uri-template>[URI_TEMPLATE]</uri-template>
-        </dataset>
-        ...
-.
-      </datasets>
-.
-      <input-events>
-        <data-in name="[NAME]" dataset="[DATASET]">
-          <instance>[INSTANCE]</instance>
-          ...
-        </data-in>
-        ...
-        <data-in name="[NAME]" dataset="[DATASET]">
-          <start-instance>[INSTANCE]</start-instance>
-          <end-instance>[INSTANCE]</end-instance>
-        </data-in>
-        ...
-      </input-events>
-      <output-events>
-         <data-out name="[NAME]" dataset="[DATASET]">
-           <instance>[INSTANCE]</instance>
-         </data-out>
-         ...
-      </output-events>
-      <action>
-        <workflow>
-          <app-path>[WF-APPLICATION-PATH]</app-path>
-          <configuration>
-            <property>
-              <name>[PROPERTY-NAME]</name>
-              <value>[PROPERTY-VALUE]</value>
-            </property>
-            ...
-         </configuration>
-       </workflow>
-      </action>
-   </coordinator-app>
-```
-
-**<font color="#008000"> Examples: </font>**
-
-**1. A Coordinator Job that creates an executes a single coordinator action:**
-
-The following example describes a synchronous coordinator application that 
runs once a day for 1 day at the end of the day. It consumes an instance of a 
daily 'logs' dataset and produces an instance of a daily 'siteAccessStats' 
dataset.
-
-**Coordinator application definition:**
-
-
-```
-   <coordinator-app name="hello-coord" frequency="${coord:days(1)}"
-                    start="2009-01-02T08:00Z" end="2009-01-02T08:00Z"
-                    timezone="America/Los_Angeles"
-                    xmlns="uri:oozie:coordinator:0.1">
-      <datasets>
-        <dataset name="logs" frequency="${coord:days(1)}"
-                 initial-instance="2009-01-02T08:00Z" 
timezone="America/Los_Angeles">
-          
<uri-template>hdfs://bar:8020/app/logs/${YEAR}${MONTH}/${DAY}/data</uri-template>
-        </dataset>
-        <dataset name="siteAccessStats" frequency="${coord:days(1)}"
-                 initial-instance="2009-01-02T08:00Z" 
timezone="America/Los_Angeles">
-          
<uri-template>hdfs://bar:8020/app/stats/${YEAR}/${MONTH}/${DAY}/data</uri-template>
-        </dataset>
-      </datasets>
-      <input-events>
-        <data-in name="input" dataset="logs">
-          <instance>2009-01-02T08:00Z</instance>
-        </data-in>
-      </input-events>
-      <output-events>
-         <data-out name="output" dataset="siteAccessStats">
-           <instance>2009-01-02T08:00Z</instance>
-         </data-out>
-      </output-events>
-      <action>
-        <workflow>
-          <app-path>hdfs://bar:8020/usr/joe/logsprocessor-wf</app-path>
-          <configuration>
-            <property>
-              <name>wfInput</name>
-              <value>${coord:dataIn('input')}</value>
-            </property>
-            <property>
-              <name>wfOutput</name>
-              <value>${coord:dataOut('output')}</value>
-            </property>
-         </configuration>
-       </workflow>
-      </action>
-   </coordinator-app>
-```
-
-There are 2 synchronous datasets with a daily frequency and they are expected 
at the end of each PST8PDT day.
-
-This coordinator job runs for 1 day on January 1st 2009 at 24:00 PST8PDT.
-
-The workflow job invocation for the single coordinator action would resolve to:
-
-
-```
-  <workflow>
-    <app-path>hdfs://bar:8020/usr/joe/logsprocessor-wf</app-path>
-    <configuration>
-      <property>
-        <name>wfInput</name>
-        <value>hdfs://bar:8020/app/logs/200901/02/data</value>
-      </property>
-      <property>
-        <name>wfOutput</name>
-        <value>hdfs://bar:8020/app/stats/2009/01/02/data</value>
-      </property>
-    </configuration>
-  </workflow>
-```
-
-IMPORTANT: Note Oozie works in UTC datetimes, all URI templates resolve to UTC 
datetime values. Because of the timezone difference between UTC and PST8PDT, 
the URIs resolves to `2009-01-02T08:00Z` (UTC) which is equivalent to 
2009-01-01T24:00PST8PDT= (PST).
-
-There is single input event, which resolves to January 1st PST8PDT instance of 
the 'logs' dataset. There is single output event, which resolves to January 1st 
PST8PDT instance of the 'siteAccessStats' dataset.
-
-The `${coord:dataIn(String name)}` and `${coord:dataOut(String name)}` EL 
functions resolve to the dataset instance URIs of the corresponding dataset 
instances. These EL functions are properly defined in a subsequent section.
-
-Because the `${coord:dataIn(String name)}` and `${coord:dataOut(String name)}` 
EL functions resolve to URIs, which are HDFS URIs, the workflow job itself does 
not deal with dataset instances, just HDFS URIs.
-
-**2. A Coordinator Job that executes its coordinator action multiple times:**
-
-A more realistic version of the previous example would be a coordinator job 
that runs for a year creating a daily action an consuming the daily 'logs' 
dataset instance and producing the daily 'siteAccessStats' dataset instance.
-
-The coordinator application is identical, except for the frequency, 'end' date 
and parameterization in the input and output events sections:
-
-
-```
-   <coordinator-app name="hello-coord" frequency="${coord:days(1)}"
-                    start="2009-01-02T08:00Z" end="2010-01-02T08:00Z"
-                    timezone="America/Los_Angeles"
-                    xmlns="uri:oozie:coordinator:0.1">
-      <datasets>
-        <dataset name="logs" frequency="${coord:days(1)}"
-                 initial-instance="2009-01-02T08:00Z" 
timezone="America/Los_Angeles">
-          
<uri-template>hdfs://bar:8020/app/logs/${YEAR}${MONTH}/${DAY}/data</uri-template>
-        </dataset>
-        <dataset name="siteAccessStats" frequency="${coord:days(1)}"
-                 initial-instance="2009-01-02T08:00Z" 
timezone="America/Los_Angeles">
-          
<uri-template>hdfs://bar:8020/app/stats/${YEAR}/${MONTH}/${DAY}/data</uri-template>
-        </dataset>
-      </datasets>
-      <input-events>
-        <data-in name="input" dataset="logs">
-          <instance>${coord:current(0)}</instance>
-        </data-in>
-      </input-events>
-      <output-events>
-         <data-out name="output" dataset="siteAccessStats">
-           <instance>${coord:current(0)}</instance>
-         </data-out>
-      </output-events>
-      <action>
-        <workflow>
-          <app-path>hdfs://bar:8020/usr/joe/logsprocessor-wf</app-path>
-          <configuration>
-            <property>
-              <name>wfInput</name>
-              <value>${coord:dataIn('input')}</value>
-            </property>
-            <property>
-              <name>wfOutput</name>
-              <value>${coord:dataOut('output')}</value>
-            </property>
-         </configuration>
-       </workflow>
-      </action>
-   </coordinator-app>
-```
-
-The `${coord:current(int offset)}` EL function resolves to coordinator action 
creation time, that would be the current day at the time the coordinator action 
is created: `2009-01-02T08:00 ... 2010-01-01T08:00`. This EL function is 
properly defined in a subsequent section.
-
-There is single input event, which resolves to the current day instance of the 
'logs' dataset.
-
-There is single output event, which resolves to the current day instance of 
the 'siteAccessStats' dataset.
-
-The workflow job invocation for the first coordinator action would resolve to:
-
-
-```
-  <workflow>
-    <app-path>hdfs://bar:8020/usr/joe/logsprocessor-wf</app-path>
-    <configuration>
-      <property>
-        <name>wfInput</name>
-        <value>hdfs://bar:8020/app/logs/200901/02/data</value>
-      </property>
-      <property>
-        <name>wfOutput</name>
-        <value>hdfs://bar:8020/app/stats/2009/01/02/data</value>
-      </property>
-    </configuration>
-  </workflow>
-```
-
-For the second coordinator action it would resolve to:
-
-
-```
-  <workflow>
-    <app-path>hdfs://bar:8020/usr/joe/logsprocessor-wf</app-path>
-    <configuration>
-      <property>
-        <name>wfInput</name>
-        <value>hdfs://bar:8020/app/logs/200901/03/data</value>
-      </property>
-      <property>
-        <name>wfOutput</name>
-        <value>hdfs://bar:8020/app/stats/2009/01/03/data</value>
-      </property>
-    </configuration>
-  </workflow>
-```
-
-And so on.
-
-**3. A Coordinator Job that executes its coordinator action multiple times and 
as input takes multiple dataset instances:**
-
-The following example is a variation of the example #2 where the synchronous 
coordinator application runs weekly. It consumes the of the last 7 instances of 
a daily 'logs' dataset and produces an instance of a weekly 
'weeklySiteAccessStats' dataset.
-
-'logs' is a synchronous dataset with a daily frequency and it is expected at 
the end of each day (24:00).
-
-'weeklystats' is a synchronous dataset with a weekly frequency and it is 
expected at the end (24:00) of every 7th day.
-
-The coordinator application frequency is weekly and it starts on the 7th day 
of the year:
-
-
-```
-   <coordinator-app name="hello2-coord" frequency="${coord:days(7)}"
-                    start="2009-01-07T24:00Z" end="2009-12-12T24:00Z"
-                    timezone="UTC"
-                    xmlns="uri:oozie:coordinator:0.1">
-      <datasets>
-        <dataset name="logs" frequency="${coord:days(1)}"
-                 initial-instance="2009-01-01T24:00Z" timezone="UTC">
-          
<uri-template>hdfs://bar:8020/app/logs/${YEAR}${MONTH}/${DAY}</uri-template>
-        </dataset>
-        <dataset name="weeklySiteAccessStats" frequency="${coord:days(7)}"
-                 initial-instance="2009-01-07T24:00Z" timezone="UTC">
-          
<uri-template>hdfs://bar:8020/app/weeklystats/${YEAR}/${MONTH}/${DAY}</uri-template>
-        </dataset>
-      </datasets>
-      <input-events>
-        <data-in name="input" dataset="logs">
-          <start-instance>${coord:current(-6)}</start-instance>
-          <end-instance>${coord:current(0)}</end-instance>
-        </data-in>
-      </input-events>
-      <output-events>
-         <data-out name="output" dataset="siteAccessStats">
-           <instance>${coord:current(0)}</instance>
-         </data-out>
-      </output-events>
-      <action>
-        <workflow>
-          <app-path>hdfs://bar:8020/usr/joe/logsprocessor-wf</app-path>
-          <configuration>
-            <property>
-              <name>wfInput</name>
-              <value>${coord:dataIn('input')}</value>
-            </property>
-            <property>
-              <name>wfOutput</name>
-              <value>${coord:dataOut('output')}</value>
-            </property>
-         </configuration>
-       </workflow>
-      </action>
-   </coordinator-app>
-```
-
-The `${coord:current(int offset)}` EL function resolves to coordinator action 
creation time minus the specified offset multiplied by the dataset frequency. 
This EL function is properly defined in a subsequent section.
-
-The input event, instead resolving to a single 'logs' dataset instance, it 
refers to a range of 7 dataset instances - the instance for 6 days ago, 5 days 
ago, ... and today's instance.
-
-The output event resolves to the current day instance of the 
'weeklySiteAccessStats' dataset. As the coordinator job will create a 
coordinator action every 7 days, dataset instances for the 
'weeklySiteAccessStats' dataset will be created every 7 days.
-
-The workflow job invocation for the first coordinator action would resolve to:
-
-
-```
-  <workflow>
-    <app-path>hdfs://bar:8020/usr/joe/logsprocessor-wf</app-path>
-    <configuration>
-      <property>
-        <name>wfInput</name>
-        <value>
-               
hdfs://bar:8020/app/logs/200901/01,hdfs://bar:8020/app/logs/200901/02,
-               
hdfs://bar:8020/app/logs/200901/03,hdfs://bar:8020/app/logs/200901/05,
-               
hdfs://bar:8020/app/logs/200901/05,hdfs://bar:8020/app/logs/200901/06,
-               hdfs://bar:8020/app/logs/200901/07
-        </value>
-      </property>
-      <property>
-        <name>wfOutput</name>
-        <value>hdfs://bar:8020/app/stats/2009/01/07</value>
-      </property>
-    </configuration>
-  </workflow>
-```
-
-For the second coordinator action it would resolve to:
-
-
-```
-  <workflow>
-    <app-path>hdfs://bar:8020/usr/joe/logsprocessor-wf</app-path>
-    <configuration>
-      <property>
-        <name>wfInput</name>
-        <value>
-               
hdfs://bar:8020/app/logs/200901/08,hdfs://bar:8020/app/logs/200901/09,
-               
hdfs://bar:8020/app/logs/200901/10,hdfs://bar:8020/app/logs/200901/11,
-               
hdfs://bar:8020/app/logs/200901/12,hdfs://bar:8020/app/logs/200901/13,
-               hdfs://bar:8020/app/logs/200901/16
-        </value>
-      </property>
-      <property>
-        <name>wfOutput</name>
-        <value>hdfs://bar:8020/app/stats/2009/01/16</value>
-      </property>
-    </configuration>
-  </workflow>
-```
-
-And so on.
-
-### 6.4. Asynchronous Coordinator Application Definition
-   * TBD
-
-### 6.5. Parameterization of Coordinator Applications
-
-When a coordinator job is submitted to Oozie, the submitter may specify as 
many coordinator job configuration properties as required (similar to Hadoop 
JobConf properties).
-
-Configuration properties that are a valid Java identifier, 
[A-Za-z_][0-9A-Za-z_]*, are available as `${NAME}` variables within the 
coordinator application definition.
-
-Configuration Properties that are not a valid Java identifier, for example 
`job.tracker`, are available via the `${coord:conf(String name)}` function. 
Valid Java identifier properties are available via this function as well.
-
-Using properties that are valid Java identifiers result in a more readable and 
compact definition.
-
-Dataset definitions can be also parameterized, the parameters are resolved 
using the configuration properties of Job configuration used to submit the 
coordinator job.
-
-If a configuration property used in the definitions is not provided with the 
job configuration used to submit a coordinator job, the value of the parameter 
will be undefined and the job submission will fail.
-
-**<font color="#008000"> Example: </font>**
-
-Coordinator application definition:
-
-
-```
-   <coordinator-app name="app-coord" frequency="${coord:days(1)}"
-                    start="${jobStart}" end="${jobEnd}" timezone="${timezone}"
-                    xmlns="uri:oozie:coordinator:0.1">
-      <datasets>
-        <dataset name="logs" frequency="${coord:hours(1)}"
-                 initial-instance="${logsInitialInstance}" 
timezone="${timezone}">
-          <uri-template>
-            
hdfs://bar:8020/app/logs/${market}/${language}/${YEAR}${MONTH}/${DAY}/${HOUR}
-          </uri-template>
-        </dataset>
-      </datasets>
-      <input-events>
-        <data-in name="input" dataset="logs">
-          <start-instance>${coord:current(-23)}</start-instance>
-          <end-instance>${coord:current(0)}</end-instance>
-        </data-in>
-      </input-events>
-      <action>
-        <workflow>
-        ...
-       </workflow>
-      </action>
-   </coordinator-app>
-```
-
-In the above example there are 6 configuration parameters (variables) that 
have to be provided when submitting a job:
-
-   * `jobStart` : start datetime for the job, in UTC
-   * `jobEnd` : end datetime for the job, in UTC
-   * `logsInitialInstance` : expected time of the first logs instance, in UTC
-   * `timezone` : timezone for the job and the dataset
-   * `market` : market to compute by this job, used in the uri-template
-   * `language` : language to compute by this job, used in the uri-template
-
-IMPORTANT: Note that this example is not completely correct as it always 
consumes the last 24 instances of the 'logs' dataset. It is assumed that all 
days have 24 hours. For timezones that observe daylight saving this application 
will not work as expected as it will consume the wrong number of dataset 
instances in DST switch days. To be able to handle these scenarios, the 
`${coord:hoursInDays(int n)}` and `${coord:daysInMonths(int n)}` EL functions 
must be used (refer to section #6.6.2 and #6.6.3).
-
-If the above 6 properties are not specified, the job will fail.
-
-As of schema 0.4, a list of formal parameters can be provided which will allow 
Oozie to verify, at submission time, that said
-properties are actually specified (i.e. before the job is executed and fails). 
Default values can also be provided.
-
-**Example:**
-
-The previous parameterized coordinator application definition with formal 
parameters:
-
-
-```
-   <coordinator-app name="app-coord" frequency="${coord:days(1)}"
-                    start="${jobStart}" end="${jobEnd}" timezone="${timezone}"
-                    xmlns="uri:oozie:coordinator:0.1">
-      <parameters>
-          <property>
-              <name>jobStart</name>
-          </property>
-          <property>
-              <name>jobEnd</name>
-              <value>2012-12-01T22:00Z</value>
-          </property>
-      </parameters>
-      <datasets>
-        <dataset name="logs" frequency="${coord:hours(1)}"
-                 initial-instance="${logsInitialInstance}" 
timezone="${timezone}">
-          <uri-template>
-            
hdfs://bar:8020/app/logs/${market}/${language}/${YEAR}${MONTH}/${DAY}/${HOUR}
-          </uri-template>
-        </dataset>
-      </datasets>
-      <input-events>
-        <data-in name="input" dataset="logs">
-          <start-instance>${coord:current(-23)}</start-instance>
-          <end-instance>${coord:current(0)}</end-instance>
-        </data-in>
-      </input-events>
-      <action>
-        <workflow>
-        ...
-       </workflow>
-      </action>
-   </coordinator-app>
-```
-
-In the above example, if `jobStart` is not specified, Oozie will print an 
error message instead of submitting the job. If
-`jobEnd` is not specified, Oozie will use the default value, 
`2012-12-01T22:00Z`.
-
-### 6.6. Parameterization of Dataset Instances in Input and Output Events
-
-A coordinator application job typically launches several coordinator actions 
during its lifetime. A coordinator action typically uses its creation 
(materialization) time to resolve the specific datasets instances required for 
its input and output events.
-
-The following EL functions are the means for binding the coordinator action 
creation time to the datasets instances of its input and output events.
-
-#### 6.6.1. coord:current(int n) EL Function for Synchronous Datasets
-
-`${coord:current(int n)}` represents the n<sup>th</sup> dataset instance for a 
**synchronous** dataset, relative to the coordinator action creation 
(materialization) time. The coordinator action creation (materialization) time 
is computed based on the coordinator job start time and its frequency. The 
n<sup>th</sup> dataset instance is computed based on the dataset's 
initial-instance datetime, its frequency and the (current) coordinator action 
creation (materialization) time.
-
-`n` can be a negative integer, zero or a positive integer.
-
-`${coord:current(int n)}` returns the nominal datetime for n<sup>th</sup> 
dataset instance relative to the coordinator action creation (materialization) 
time.
-
-`${coord:current(int n)}` performs the following calculation:
-
-
-```
-DS_II : dataset initial-instance (datetime)
-DS_FREQ: dataset frequency (minutes)
-CA_NT: coordinator action creation (materialization) nominal time
-
-coord:current(int n) = DS_II + DS_FREQ * ( (CA_NT - DS_II) div DS_FREQ + n)
-```
-
-NOTE: The formula above is not 100% correct, because DST changes the 
calculation has to account for hour shifts. Oozie Coordinator must make the 
correct calculation accounting for DST hour shifts.
-
-When a positive integer is used with the `${coord:current(int n)}`, it refers 
to


<TRUNCATED>

[09/21] oozie git commit: OOZIE-2734 amend [docs] Switch from TWiki to Markdown (asalamon74 via andras.piros, pbacsko, gezapeti)

Reply via email to