Hi,

that's what primary keys are usually for (which in this case is an ID that's 
not really suitable for a CLI).

What I thought was the intended uniqueness constraint (it's still there) is 
the one on (dag_id, run_id).

Our use case: we get a really bad data export (we have no influence on) where 
we had to "transpose execution dates", meaning that the export was defined on 
one date column, but we needed the data organized along a different date 
column. We implemented it by using TriggerDagRunOperator. The second Dag (the 
one that was triggered) was triggered multiple times for the same execution 
date. And we would have liked to keep the state of the different 'runs'.


On Saturday, January 19, 2019 12:37:40 AM CET Deng Xiaodong wrote:
> Without this unique constraint, how is the scheduler supposed to
> find/update the state of each specific TaskInstance or DagRun? How should
> the log be named & stored?
> 
> It should be there.
> 
> 
> XD
> 
> On Sat, Jan 19, 2019 at 03:20 Ash Berlin-Taylor <[email protected]> wrote:
> > So that it's here on a searchable record:
> > 
> > Simply this is how airflow identifies tasks - `airflow run my_dag_id
> > 2018-01-17 my task`.
> > 
> > If you were to have two dag runs for the exact same milliseconds which one
> > would it run? Much of the system treats this combo as unique, so lots of
> > code would need to change to make it not.
> > 
> > But: what is your use case here? The same dag running twice at for the
> > same period doesn't make sense to me conceptual (but then I admit I now
> > think in terms of what airflow does internally do my thinking is coloured)
> > 
> > -ash
> > 
> > On 18 January 2019 19:14:01 GMT, "Andreas Költringer" <
> > 
> > [email protected]> wrote:
> > >Hi,
> > >
> > >almost a year ago I reported an issue to Airflow's Jira:
> > >https://issues.apache.org/jira/browse/AIRFLOW-2319
> > >
> > >Recently, someone pointed out that I should ask here on the mailing
> > >list (I
> > >thought I did, but apparently, I did not).
> > >
> > >So here's the thing: the DagRun table has a unique constraint on
> > >(dag_id,
> > >exec_date), that was not reflected by models.py (now
> > >models/__init__.py). This
> > >got "fixed" in June 2018 - apparently this was not the only place where
> > >model
> > >declaration and schema migration scripts were out of sync.
> > >
> > >Ash now pointed out on AIRFLOW-2319, that the unique constraint is
> > >there "by
> > >design", meaning there's a reason why it's there. Can somebody please
> > >explain
> > >to me what's the reasoning behind this? For us it would be a use case
> > >to have
> > >multiple dag runs per (dag_id, exec_date).
> > >
> > >thx in advance!
> > >
> > >--
> > >Andreas Koeltringer
> > >Mail:   [email protected]


-- 
Andreas Koeltringer
Mail:   [email protected]
 



Reply via email to