+1
On Wed, May 13, 2020 at 11:54 AM Ash Berlin-Taylor <[email protected]> wrote:
> Hi all,
>
> The discussion about API spec made me think of something.
>
> Right now the primary key on TI is (dag_id, task_id, exeuction_date).
> This is because when Airflow was original written, DagRun didn't exist
> as a concept, so this was the natural PK.
>
> Then DagRun was added with a couple of unique constraints ( `(dag_id,
> run_id)`, `(dag_id, execution_date)`) .But the PK on TI was never changed.
>
>
> Why does this matter?
>
> Well there's been often times people have asked to be able to have
> multiple runs for the same execution_date. For example we might want to
> have a machine learning pipeline run over the same set of data (so
> having the same execution date) but use different hyperparameters
> (passed in via dag_run.conf).
>
> This isn't supported _yet_ but with some adjusting of PKs, constraints
> and relations in the TI this would be possible.
>
>
> Why am I bringing this up now?
>
> Because as we are designing the API, it would be nice to change how we
> refere to TIs changed before hand. (It would be possible to do it in a
> compatible way, i.e. supporting both execution_date and run_id, but it'd
> be cleaner to only have to support one)
>
> So the proposal I am making right now is to change the API from:
>
> /dags/{dag_id}/dagRuns/{execution_date}/...
>
> to
>
> /dags/{dag_id}/dagRuns/{run_id}/...
>
> This alone doesn't give us the ability to have multiple dag runs for the
> same date, but it makes it easier to do so in the future without having
> to change/redesign the API.
>
> What do people think?
>
> -ash
>