+1

On Thu, May 14, 2020 at 1:18 PM Dan Davydov <[email protected]>
wrote:

> +1 but in the future I think better would be
> /dags/{dag_id}/dagRuns/{execution_date}/{run_number}. That would give an
> automatic ordering between two runs, is a lot simpler than
> "backfill_2020-03-16T00:00:00+00:00" and helps enable the multiple
> dagruns per execution date that you mention.
>
> On Thu, May 14, 2020 at 4:09 PM Jarek Potiuk <[email protected]>
> wrote:
>
> > +1
> >
> > On Thu, May 14, 2020 at 4:45 PM Kamil Breguła <[email protected]
> >
> > wrote:
> >
> > > +1
> > >
> > >
> > > On Wed, May 13, 2020 at 1:10 PM Kaxil Naik <[email protected]>
> wrote:
> > > >
> > > > +1
> > > >
> > > > On Wed, May 13, 2020 at 11:54 AM Ash Berlin-Taylor <[email protected]>
> > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > The discussion about API spec made me think of something.
> > > > >
> > > > > Right now the primary key on TI is (dag_id, task_id,
> exeuction_date).
> > > > > This is because when Airflow was original written, DagRun didn't
> > exist
> > > > > as a concept, so this was the natural PK.
> > > > >
> > > > > Then DagRun was added with a couple of unique constraints (
> `(dag_id,
> > > > > run_id)`, `(dag_id, execution_date)`) .But the PK on TI was never
> > > changed.
> > > > >
> > > > >
> > > > > Why does this matter?
> > > > >
> > > > > Well there's been often times people have asked to be able to have
> > > > > multiple runs for the same execution_date. For example we might
> want
> > to
> > > > > have a machine learning pipeline run over the same set of data (so
> > > > > having the same execution date) but use different hyperparameters
> > > > > (passed in via dag_run.conf).
> > > > >
> > > > > This isn't supported _yet_ but with some adjusting of PKs,
> > constraints
> > > > > and relations in the TI this would be possible.
> > > > >
> > > > >
> > > > > Why am I bringing this up now?
> > > > >
> > > > > Because as we are designing the API, it would be nice to change how
> > we
> > > > > refere to TIs changed before hand. (It would be possible to do it
> in
> > a
> > > > > compatible way, i.e. supporting both execution_date and run_id, but
> > > it'd
> > > > > be cleaner to only have to support one)
> > > > >
> > > > > So the proposal I am making right now is to change the API from:
> > > > >
> > > > > /dags/{dag_id}/dagRuns/{execution_date}/...
> > > > >
> > > > > to
> > > > >
> > > > > /dags/{dag_id}/dagRuns/{run_id}/...
> > > > >
> > > > > This alone doesn't give us the ability to have multiple dag runs
> for
> > > the
> > > > > same date, but it makes it easier to do so in the future without
> > having
> > > > > to change/redesign the API.
> > > > >
> > > > > What do people think?
> > > > >
> > > > > -ash
> > > > >
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>

Reply via email to