I am having problems understanding the dates in oozie. The nominal time of my
coordinator does not always match up with the output directory of my
coordinator.
Here is some data taken from the runtime properties of my workflow. The runDate
is the nominalTime of the workflow. The output dir is taken from the output
event that uses ${coord:current(0)}.
<property>
<name>runDate</name>
<value>2012-03-04</value>
</property>
<property>
<name>outputDir</name>
<value>hdfs://prodhpmaster01n:56310/user/hive/stamps/stamp_in_question/ds=2012-03-03</value>
</property>
Here is the dataset definition.
<dataset name="example" frequency="${coord:days(1)}"
initial-instance="2011-05-01T05:00Z" timezone="America/New_York">
<uri-template>${nameNode}/user/hive/stamps/stamp_in_question/ds=${YEAR}-${MONTH}-${DAY}
</uri-template>
<done-flag></done-flag>
</dataset>
The start time of the coordinator is 07:00Z and the frequency is this:
frequency="${coord:days(1)}"
I want the date on the outputDir to match the runDate. What is the best was to
achieve that? In particular, I want to know how oozie chooses the date to use
with an output event. 07:00Z (the start time) is well past the 05:00Z start
time of the data set so it seems like they should match up. I suspect that am
thinking about this all wrong though.
Max