[ 
https://issues.apache.org/jira/browse/AIRFLOW-5391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Yu updated AIRFLOW-5391:
-----------------------------
    Description: 
I tried this on 1.10.3 and 1.10.4, both have this issue: 

E.g. in this example from the doc, branch_a executed, branch_false was skipped 
because of branching condition. However if someone Clear branch_false, it'll 
cause branch_false to execute. 
!https://airflow.apache.org/_images/branch_good.png!

This behaviour is understandable given how BranchPythonOperator is implemented. 
BranchPythonOperator does not store its decision anywhere. It skips its own 
downstream tasks in the branch at runtime. So there's currently no way for 
branch_false to know it should be skipped without rerunning the branching task.

This is obviously counter-intuitive from the user's perspective. In this 
example, users would not expect branch_false to execute when they clear it 
because the branching task should have skipped it.

There are a few ways to improve this:

Option 1): Make downstream tasks skipped by BranchPythonOperator not clearable 
without also clearing the upstream BranchPythonOperator. In this example, if 
someone clears branch_false without clearing branching, the Clear action should 
just fail with an error telling the user he needs to clear the branching task 
as well.

Option 2): Make BranchPythonOperator store the result of its skip condition 
somewhere. Make downstream tasks check for this stored decision and skip 
themselves if they should have been skipped by the condition. This probably 
means the decision of BranchPythonOperator needs to be stored in the db.

 

[kevcampb|https://blog.diffractive.io/author/kevcampb/] attempted a workaround 
and on this blog. And he acknowledged his workaround is not perfect and a 
better permanent fix is needed:

[https://blog.diffractive.io/2018/08/07/replacement-shortcircuitoperator-for-airflow/]

 

  was:
I tried this on 1.10.3 and 1.10.4, both have this issue: 

E.g. in this example from the doc, branch_a executed, branch_false was skipped 
because of branching condition. However if someone Clear branch_false, it'll 
cause branch_false to 
execute.!https://airflow.apache.org/_images/branch_good.png!

This behaviour is understandable given how BranchPythonOperator is implemented. 
BranchPythonOperator does not store its decision anywhere. It skips its own 
downstream tasks in the branch at runtime. So there's currently no way for 
branch_false to know it should be skipped without rerunning the branching task.

This is obviously counter-intuitive from the user's perspective. In this 
example, users would not expect branch_false to execute when they clear it 
because the branching task should have skipped it.

There are a few ways to improve this:

Option 1): Make downstream tasks skipped by BranchPythonOperator not clearable 
without also clearing the upstream BranchPythonOperator. In this example, if 
someone clears branch_false without clearing branching, branch_false should not 
execute.

Option 2): Make BranchPythonOperator store the result of its skip condition 
somewhere. Make downstream tasks check for this stored decision and skip 
themselves if they should have been skipped by the condition. This probably 
means the decision of BranchPythonOperator needs to be stored in the db.

 

[kevcampb|https://blog.diffractive.io/author/kevcampb/] attempted a workaround 
and on this blog. And he acknowledged his workaround is not perfect and a 
better permanent fix is needed:

[https://blog.diffractive.io/2018/08/07/replacement-shortcircuitoperator-for-airflow/]

 


> Clearing a task skipped by BranchPythonOperator will cause the task to execute
> ------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5391
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5391
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: operators
>    Affects Versions: 1.10.4
>            Reporter: Qian Yu
>            Priority: Major
>
> I tried this on 1.10.3 and 1.10.4, both have this issue: 
> E.g. in this example from the doc, branch_a executed, branch_false was 
> skipped because of branching condition. However if someone Clear 
> branch_false, it'll cause branch_false to execute. 
> !https://airflow.apache.org/_images/branch_good.png!
> This behaviour is understandable given how BranchPythonOperator is 
> implemented. BranchPythonOperator does not store its decision anywhere. It 
> skips its own downstream tasks in the branch at runtime. So there's currently 
> no way for branch_false to know it should be skipped without rerunning the 
> branching task.
> This is obviously counter-intuitive from the user's perspective. In this 
> example, users would not expect branch_false to execute when they clear it 
> because the branching task should have skipped it.
> There are a few ways to improve this:
> Option 1): Make downstream tasks skipped by BranchPythonOperator not 
> clearable without also clearing the upstream BranchPythonOperator. In this 
> example, if someone clears branch_false without clearing branching, the Clear 
> action should just fail with an error telling the user he needs to clear the 
> branching task as well.
> Option 2): Make BranchPythonOperator store the result of its skip condition 
> somewhere. Make downstream tasks check for this stored decision and skip 
> themselves if they should have been skipped by the condition. This probably 
> means the decision of BranchPythonOperator needs to be stored in the db.
>  
> [kevcampb|https://blog.diffractive.io/author/kevcampb/] attempted a 
> workaround and on this blog. And he acknowledged his workaround is not 
> perfect and a better permanent fix is needed:
> [https://blog.diffractive.io/2018/08/07/replacement-shortcircuitoperator-for-airflow/]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to