After making a few tests I got to the final conclusions:
The cron like jobs are supported by Airflow with one downside: On the the
very first job deployment (completely new DAG) an extra DAG run will be
created for the latest passed period.
When DAG is redeployed (dag name stays the same) then DB already contains
the latest run and scheduler will work as a genuine cron scheduler.
To better describe what I mean I prepared an example:
with DAG(
dag_id="dag",
start_date=datetime(2019, 4, 1),
schedule_interval="0 2 * * *",
default_view="graph",
orientation="TB",
concurrency=1,
max_active_runs=1,
catchup=False) as dag:
I deploy 'dag' for the first time and system time is *2019-04-03 3 PM*.
Airflow will create a DAG run with execution date of 2019-04-02 2 AM
straight after the deployment
However when a new version of 'dag' is redeployed the next run will be
triggered according to cron expression ie with the deployment done at
2019-04-03 6 PM the next dag run will be at 2019-04-04 2 AM.
Regards,
Pawel
On Thu, Apr 18, 2019 at 3:27 PM Chen Tong <[email protected]> wrote:
> Do not set to datetime.now(). You could set to 2019-04-18 and it will start
> scheduling at 2019-04-18 2 AM.
>
> Chen
>
> On Thu, Apr 18, 2019, 08:55 Pawel Bartoszek <[email protected]
> >
> wrote:
>
> > Ash, If I omit start_date it I get the error
> > Task is missing the start_date parameter
> >
> > What should I set it to then?
> >
> > On Thu, Apr 18, 2019 at 1:03 PM Ash Berlin-Taylor <[email protected]>
> wrote:
> >
> > > Do not set start_date to now. That will _always_ be wrong.
> > > https://airflow.apache.org/faq.html#what-s-the-deal-with-start-date
> > >
> > > > On 18 Apr 2019, at 12:13, Pawel Bartoszek <
> > [email protected]>
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > When I set start_date to datetime.now() ie
> > > >
> > > > DAG(
> > > > dag_id="dag",
> > > > start_date=datetime.now(),
> > > > schedule_interval="0 2 * * *",
> > > > default_view="graph",
> > > > orientation="TB",
> > > > concurrency=1,
> > > > max_active_runs=1,
> > > > catchup=False
> > > > )
> > > >
> > > > I get following info in task instance details
> > > >
> > > > DependencyReason
> > > > Execution Date The execution date is 2019-04-18T11:09:16.193396+00:00
> > but
> > > > this is before the task's start date
> 2019-04-18T11:10:42.607861+00:00.
> > > > Execution Date The execution date is 2019-04-18T11:09:16.193396+00:00
> > but
> > > > this is before the task's DAG's start date
> > > 2019-04-18T11:10:42.607861+00:00.
> > > > Dagrun Running Task instance's dagrun did not exist: Unknown reason.
> > > >
> > > > I though execution date should be set to 2019-04-19 02:00 ?
> > > >
> > > >
> > > > On Wed, Apr 17, 2019 at 8:37 PM Chao-Han Tsai <[email protected]>
> > > wrote:
> > > >
> > > >> Hi Pawel,
> > > >>
> > > >> I think you can change the start_date to later dates to avoid the
> > > DagRun of
> > > >> 2019-04-16 02:00 being scheduled.
> > > >>
> > > >> Chao-Han
> > > >>
> > > >> On Wed, Apr 17, 2019 at 10:13 AM Pawel Bartoszek <
> > > >> [email protected]> wrote:
> > > >>
> > > >>> Hi,
> > > >>>
> > > >>> Let's say I deploy the following DAG at 2019-04-17 5 PM
> > > >>>
> > > >>> DAG(
> > > >>> dag_id="dag",
> > > >>> start_date=datetime(year=2018, month=1, day=1, hour=2,
> > > minute=0),
> > > >>> schedule_interval="0 2 * * *,
> > > >>> default_view="graph",
> > > >>> orientation="TB",
> > > >>> concurrency=1,
> > > >>> max_active_runs=1,
> > > >>> catchup=False)
> > > >>>
> > > >>>
> > > >>> I noticed that DAG will be first scheduled for yesterday ie
> > 2019-04-16
> > > 2
> > > >>> AM. How can I avoid this? I want the DAG to be scheduled in the
> > future
> > > >>> according to the cron expression ie 2019-04-18 2 AM.
> > > >>>
> > > >>> Setting schedule_interval as
> > > >>>
> > > >>> schedule_interval=timedelta(hours=24),
> > > >>>
> > > >>> correct me if I am wrong but Airflow seems to schedule DAG 24 hours
> > in
> > > >> the
> > > >>> past from the time DAG was deployed.
> > > >>>
> > > >>> Thanks,
> > > >>> Pawel
> > > >>>
> > > >>
> > > >>
> > > >> --
> > > >>
> > > >> Chao-Han Tsai
> > > >>
> > >
> > >
> >
>