[ 
https://issues.apache.org/jira/browse/AIRFLOW-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rolf Schroeder updated AIRFLOW-2683:
------------------------------------
    Attachment: airflow_priorities_purge_in_the_middle.png

> priority_weights honored by LocalExecutor but not by CeleryExecutor
> -------------------------------------------------------------------
>
>                 Key: AIRFLOW-2683
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2683
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: celery, scheduler
>    Affects Versions: Airflow 1.7.1.3
>         Environment: Linux
>            Reporter: Rolf Schroeder
>            Priority: Major
>         Attachments: airflow_priorities_celery.png, 
> airflow_priorities_dag.png, airflow_priorities_localexecutor.png, 
> airflow_priorities_purge_in_the_middle.png, test_priority.py
>
>
> Hi,
> I am running Airflow 1.7.1.3 with Celery 4.0.2 (and rabbitmq). I have come 
> across the following issue: It seems to me that task priorities are not 
> honored as expected when scheduling tasks via Celery. However, when using the 
> LocalExecutor, the priorities seem to work fine. This issue arises when low 
> prio tasks get scheduled/queued before high prio tasks. Celery will finish 
> all previously scheduled/queued low prio tasks before tackling the high prio 
> ones. In contrast, the LocalExecutor does not further process remaining low 
> prio tasks and starts the high prio tasks. I fee like once Airflow has sent 
> the tasks to Celery, it "looses" control of the execution order. Details 
> below. I searched in Jira and possibly the following issues are linked 
> (although I am not convinced)
> (AIRFLOW-1510)https://issues.apache.org/jira/browse/AIRFLOW-1510
> (AIRFLOW-584)https://issues.apache.org/jira/browse/AIRFLOW-584
> I have the following test setup: Two independent initial tasks, each with 3 
> downstream tasks. One of the initial tasks takes longer but has high prio 
> downstream tasks. The other initial task has a short duration and low prio 
> downstream tasks. Both initial tasks start at the same time. My expectation 
> is that some of the low prio tasks get executed first (since their initial 
> upstream task is faster) but the moment the slower initial taks is done, its 
> high prio downstream tasks should get executed before the low prio downstream 
> tasks.
> Here is a picture of the setup (forget about init0, this is just to make the 
> DAG look correct):
>  !airflow_priorities_dag.png!
> I've been trying this with a LocalExecutor (AIRFLOW__CORE__PARALLELISM=2) and 
> two celery workers. When running the LocalExecutor, the high prio tasks get 
> indeed executed once their init task is done (as expected, the 'prio100' 
> tasks finish before the remaining 'prio10' tasks):
> !airflow_priorities_localexecutor.png!
> However, when using Celery, it seems that that the priorities are not honored 
> anymore. Once can see clearly that the low prio tasks (prio10) are all 
> executed before the high prio ones (prio100)
> !airflow_priorities_celery.png!
> Here is the corresponding code
> [^test_priority.py]
> I do not know whether this expected behavior or a bug? Is there any way to 
> "fix" this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to