[ 
https://issues.apache.org/jira/browse/OOZIE-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585173#comment-14585173
 ] 

Ryan Brush commented on OOZIE-1431:
-----------------------------------

+1 to Arvind's comment, for what it's worth. We'd love to be able to use the 
same scheduling mechanism with datasets as we do with coordinator frequencies. 
Keeping these frequencies equivalent makes it easier for the user. I think 
there's also a case to be made that Oozie should be able to consume datasets at 
any schedule it can produce them with. As Arvind points out, that's not the 
case today.

We're working around this in some of our code by converting Cron-based schedule 
to the current dataset model, but this doesn't cleanly map and would be nice to 
eliminate.



> Dataset frequencies should accept cron syntax
> ---------------------------------------------
>
>                 Key: OOZIE-1431
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1431
>             Project: Oozie
>          Issue Type: Sub-task
>            Reporter: Robert Kanter
>            Assignee: Bowen Zhang
>
> For example, instead of
> {code:xml}
> <datasets>
>         <dataset name="raw-logs" frequency="${coord:minutes(20)}" 
> initial-instance="2010-01-01T00:00Z" timezone="UTC">
>             
> <uri-template>${nameNode}/user/${coord:user()}/${examplesRoot}/input-data/rawLogs/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}</uri-template>
>         </dataset>
>         <dataset name="aggregated-logs" frequency="${coord:hours(1)}" 
> initial-instance="2010-01-01T01:00Z" timezone="UTC">
>             
> <uri-template>${nameNode}/user/${coord:user()}/${examplesRoot}/output-data/aggregator/aggregatedLogs/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template>
>         </dataset>
>     </datasets>
> {code}
> we should be able to specify something like
> {code:xml}
> <datasets>
>         <dataset name="raw-logs" frequency="00 09-18 * * 1-5" 
> initial-instance="2010-01-01T00:00Z" timezone="UTC">
>             
> <uri-template>${nameNode}/user/${coord:user()}/${examplesRoot}/input-data/rawLogs/${YEAR}/${MONTH}/${DAY}/${HOUR}/${MINUTE}</uri-template>
>         </dataset>
>         <dataset name="aggregated-logs" frequency="${coord:hours(1)}" 
> initial-instance="2010-01-01T01:00Z" timezone="UTC">
>             
> <uri-template>${nameNode}/user/${coord:user()}/${examplesRoot}/output-data/aggregator/aggregatedLogs/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template>
>         </dataset>
>     </datasets>
> {code}
> In the second version, the frequency is {{00 09-18 * * 1-5}} instead of 
> {{$\{coord:minutes(20)}}}, which indicates that it will be 9am to 6pm mon-fri 
> instead of every 20min.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to