[ 
https://issues.apache.org/jira/browse/AIRFLOW-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603302#comment-16603302
 ] 

Trevor Edwards edited comment on AIRFLOW-2319 at 9/4/18 9:25 PM:
-----------------------------------------------------------------

+1 to this issue. There is an id column, but aside from this, it seems like 
only the pair (dag_id, 
[run_id|[https://github.com/apache/incubator-airflow/blob/1.9.0/airflow/models.py#L4384]]
 ) should be enforced as unique. The current behavior feels like a bug.

 

 

This issue becomes problematic if you have event-driven DAGs (e.g. 
[https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf]) 
which may have different parameters execute simultaneously, causing an 
execution_date collision.

 

Andreas, are you working on a fix for this?


was (Author: trevoredwards):
+1 to this issue. There is an id column, but aside from this, it seems like 
only the pair (dag_id, 
[run_id|[https://github.com/apache/incubator-airflow/blob/1.9.0/airflow/models.py#L4384]])
 should be enforced as unique. The current behavior feels like a bug.

 

This issue becomes problematic if you have event-driven DAGs (e.g. 
[https://cloud.google.com/composer/docs/how-to/using/triggering-with-gcf]) 
which may have different parameters execute simultaneously, causing an 
execution_date collision.

 

Andreas, are you working on a fix for this?

> Table "dag_run" has (bad) second index on (dag_id, execution_date)
> ------------------------------------------------------------------
>
>                 Key: AIRFLOW-2319
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2319
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DagRun
>    Affects Versions: 1.9.0
>            Reporter: Andreas Költringer
>            Priority: Major
>
> Inserting DagRun's via {{airflow.api.common.experimental.trigger_dag}} 
> (multiple rows with the same {{(dag_id, execution_date)}}) raised the 
> following error:
> {code:java}
> {models.py:1644} ERROR - No row was found for one(){code}
> This is weird as the {{session.add()}} and {{session.commit()}} is right 
> before {{run.refresh_from_db()}} in {{models.DAG.create_dagrun()}}.
> Manually inspecting the database revealed that there is an extra index with 
> {{unique}} constraint on the columns {{(dag_id, execution_date)}}:
> {code:java}
> sqlite> .schema dag_run
> CREATE TABLE dag_run (
>         id INTEGER NOT NULL, 
>         dag_id VARCHAR(250), 
>         execution_date DATETIME, 
>         state VARCHAR(50), 
>         run_id VARCHAR(250), 
>         external_trigger BOOLEAN, conf BLOB, end_date DATETIME, start_date 
> DATETIME, 
>         PRIMARY KEY (id), 
>         UNIQUE (dag_id, execution_date), 
>         UNIQUE (dag_id, run_id), 
>         CHECK (external_trigger IN (0, 1))
> );
> CREATE INDEX dag_id_state ON dag_run (dag_id, state);{code}
> (On SQLite its a unique constraint, on MariaDB its also an index)
> The {{DagRun}} class in {{models.py}} does not reflect this, however it is in 
> [migrations/versions/1b38cef5b76e_add_dagrun.py|https://github.com/apache/incubator-airflow/blob/master/airflow/migrations/versions/1b38cef5b76e_add_dagrun.py#L42]
> I looked for other migrations correting this, but could not find any. As this 
> is not reflected in the model, I guess this is a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to