Daniel Huang created AIRFLOW-1296:
-------------------------------------

             Summary: DAGs using operators involving cascading skipped tasks 
fail prematurely
                 Key: AIRFLOW-1296
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1296
             Project: Apache Airflow
          Issue Type: Bug
          Components: scheduler
            Reporter: Daniel Huang


So this is basically the same issue as AIRFLOW-872 and AIRFLOW-719. A 
workaround had fixed this 
(https://github.com/apache/incubator-airflow/pull/2125), but was later reverted 
(https://github.com/apache/incubator-airflow/pull/2195). I totally agree with 
the reason for reverting, but I still think this is an issue. 

The issue is related to any operators that involves cascading skipped tasks, 
like ShortCircuitOperator or LatestOnlyOperator. These operators mark only 
their *direct* downstream task as SKIPPED, but additional downstream tasks from 
that skipped task is left up to the scheduler to cascade the SKIPPED state (see 
latest only op docs about this expected behavior 
https://airflow.incubator.apache.org/concepts.html#latest-run-only). However, 
instead the scheduler marks the DAG run as FAILED prematurely before the DAG 
has a chance to skip all downstream tasks.

This example DAG should reproduce the issue: 
https://gist.github.com/dhuang/61d38fb001c3a917edf4817bb0c915f9. 

Expected result: DAG succeeds with tasks - latest_only (success) -> dummy1 
(skipped) -> dummy2 (skipped) -> dummy3 (skipped)
Actual result: DAG fails with tasks - latest_only (success) -> dummy1 (skipped) 
-> dummy2 (none) -> dummy3 (none)

I believe the results I'm seeing are because of this deadlock prevention logic, 
https://github.com/apache/incubator-airflow/blob/1.8.1/airflow/models.py#L4182. 
While that actual result shown above _could_ mean a deadlock, in this case it 
shouldn't be. Since this {{update_state}} logic is reached first in each 
scheduler run, dummy2/dummy3 don't get a chance to cascade the SKIPPED state. 
Commenting out that block gives me the results I expect.

[~bolke] I know you spent awhile trying to reproduce my issue and weren't able 
to, but I'm still hitting this on a fresh environment, default configs, 
sqlite/mysql dbs, local/sequential/celery executors, and 1.8.1/master.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to