Hello ,

Event trigger Oozie datasets

 1) Does oozie supports event trigger?

   Trigger Workflow based on a file arrival on AWS s3

   As per my understanding based on start date mentioned on coordinator it
can poll for a file on s3 and once dependency is met it can execute an
action/SparkAction but my requirement is trigger workflow based on a file
arrival and compare currentdate with starttime(if startime is configured
else execute action based on event) and execute action/SparkAction if its
time to execute the same.



2)Also i see on datasets we need to specify initial-instance and dataset
location is derived from initial-instance value

for ex:

                                <datasets>

        <dataset name="logs" frequency="${coord:hours(1)}"

                 initial-instance="2018-01-01T01:00Z" timezone="UTC">

          <uri-template>

            s3a://app/logs/${YEAR}_${MONTH}_${DAY}_${HOUR}

          </uri-template>

        </dataset>

      </datasets>

      <input-events>

        <data-in name="input" dataset="logs">

          <instance>${coord:latest(0)}</instance>

        </data-in>

      </input-events>



    Then, the dataset instances for the input events for the coordinator
action will be:

     s3a://app/logs/2009_01_10



But my requirement is im not sure of the dataset generation timestamp and
also im not sure of frequency of the dataset generation

My requirement is

 dataset location could be s3a://app/logs/2018_02_10 (ie it may be
generated everyday) and when i run my job on 2018/02/11 i should be able to
specify to consider either latest or 24hrs or n  number of days old (from
the day I run workflow )  datset as dependency for the action/SparkAction
which im trying to execute.

Please suggest !

Reply via email to