[ 
https://issues.apache.org/jira/browse/AIRFLOW-2683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rolf Schroeder updated AIRFLOW-2683:
------------------------------------
    Description: 
Hi,

I am running Airflow 1.7.1.3 with Celery 4.0.2 (and rabbitmq). I have come 
across the following issue: It seems to me that task priorities are not honored 
as expected when scheduling tasks via Celery. However, when using the 
LocalExecutor, the priorities seem to work fine. This issue arises when low 
prio tasks get scheduled/queued before high prio tasks. Celery will finish all 
previously scheduled/queued low prio tasks before tackling the high prio ones. 
In contrast, the LocalExecutor "correctly" does not further process remaining 
low prio tasks and starts the high prio tasks. I fee like once Airflow has sent 
the tasks to Celery, it "looses" control of the execution order. Details below. 
I search in Jira and possibly the following issues are linked (although I am 
not convinced)

(AIRFLOW-1510)https://issues.apache.org/jira/browse/AIRFLOW-1510

(AIRFLOW-584)https://issues.apache.org/jira/browse/AIRFLOW-584

I have the following test setup: Two independent initial tasks, each with 3 
downstream tasks. One of the initial tasks takes longer but has high prio 
downstream tasks. The other initial task has a short duration and low prio 
downstream tasks. Both initial tasks start at the same time. My expectation is 
that some of the low prio tasks get executed first (since their initial 
upstream task is faster) but the moment the slower initial taks is done, its 
high prio downstream tasks should get executed before the low prio downstream 
tasks.

Here is a picture of the setup (forget about init0, this is just to make the 
DAG look correct):
 !airflow_priorities_dag.png!

I've been trying this with a LocalExecutor (AIRFLOW__CORE__PARALLELISM=2) and 
two celery workers. When running the LocalExecutor, the high prio tasks get 
indeed executed once their init task is done (as expected, the 'prio100' tasks 
finish before the remaining 'prio10' tasks):

!airflow_priorities_localexecutor.png!

However, when using Celery, it seems that that the priorities are not honored 
anymore. Once can see clearly that the low prio tasks (prio10) are all executed 
before the high prio ones (prio100)

!airflow_priorities_celery.png!

Here is the corresponding code

[^test_priority.py]

I do not know whether this expected behavior or a bug? Is there any way to 
"fix" this?

  was:
Hi,

I am running Airflow 1.7.1.3 with Celery 4.0.2 (and rabbitmq). I have come 
across the following issue: It seems to me that task priorities are not honored 
as expected when scheduling tasks via Celery. However, when using the 
LocalExecutor, the priorities seem to work fine. This issue arises when low 
prio tasks get scheduled/queued before high prio tasks. Celery will finish all 
previously scheduled/queued low prio tasks before tackling the high prio ones. 
In contrast, the LocalExecutor "correctly" interrupts processing of the low 
prio tasks in favor of the high prio task. I fee like once Airflow has sent the 
tasks to Celery, it "looses" control of the execution order. Details below. I 
search in Jira and possibly the following issues are linked (although I am not 
convinced)

(AIRFLOW-1510)[https://issues.apache.org/jira/browse/AIRFLOW-1510]

(AIRFLOW-584)[https://issues.apache.org/jira/browse/AIRFLOW-584]


I have the following test setup: Two independent initial tasks, each with 3 
downstream tasks. One of the initial tasks takes longer but has high prio 
downstream tasks. The other initial task has a short duration and low prio 
downstream tasks. Both initial tasks start at the same time. My expectation is 
that some of the low prio tasks get executed first (since their initial 
upstream task is faster) but the moment the slower initial taks is done, its 
high prio downstream tasks should get executed before the low prio downstream 
tasks.

Here is a picture of the setup (forget about init0, this is just to make the 
DAG look correct):
!airflow_priorities_dag.png!


I've been trying this with a LocalExecutor (AIRFLOW__CORE__PARALLELISM=2) and 
two celery workers. When running the LocalExecutor, the high prio tasks get 
indeed executed once their init task is done (as expected, the 'prio100' tasks 
finish before the remaining 'prio10' tasks):

!airflow_priorities_localexecutor.png!

However, when using Celery, it seems that that the priorities are not honored 
anymore. Once can see clearly that the low prio tasks (prio10) are all executed 
before the high prio ones (prio100)

!airflow_priorities_celery.png!


Here is the corresponding code

[^test_priority.py]

I do not know whether this expected behavior or a bug? Is there any way to 
"fix" this?


> priority_weights honored by LocalExecutor but not by CeleryExecutor
> -------------------------------------------------------------------
>
>                 Key: AIRFLOW-2683
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2683
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: celery, scheduler
>    Affects Versions: Airflow 1.7.1.3
>         Environment: Linux
>            Reporter: Rolf Schroeder
>            Priority: Major
>         Attachments: airflow_priorities_celery.png, 
> airflow_priorities_dag.png, airflow_priorities_localexecutor.png, 
> test_priority.py
>
>
> Hi,
> I am running Airflow 1.7.1.3 with Celery 4.0.2 (and rabbitmq). I have come 
> across the following issue: It seems to me that task priorities are not 
> honored as expected when scheduling tasks via Celery. However, when using the 
> LocalExecutor, the priorities seem to work fine. This issue arises when low 
> prio tasks get scheduled/queued before high prio tasks. Celery will finish 
> all previously scheduled/queued low prio tasks before tackling the high prio 
> ones. In contrast, the LocalExecutor "correctly" does not further process 
> remaining low prio tasks and starts the high prio tasks. I fee like once 
> Airflow has sent the tasks to Celery, it "looses" control of the execution 
> order. Details below. I search in Jira and possibly the following issues are 
> linked (although I am not convinced)
> (AIRFLOW-1510)https://issues.apache.org/jira/browse/AIRFLOW-1510
> (AIRFLOW-584)https://issues.apache.org/jira/browse/AIRFLOW-584
> I have the following test setup: Two independent initial tasks, each with 3 
> downstream tasks. One of the initial tasks takes longer but has high prio 
> downstream tasks. The other initial task has a short duration and low prio 
> downstream tasks. Both initial tasks start at the same time. My expectation 
> is that some of the low prio tasks get executed first (since their initial 
> upstream task is faster) but the moment the slower initial taks is done, its 
> high prio downstream tasks should get executed before the low prio downstream 
> tasks.
> Here is a picture of the setup (forget about init0, this is just to make the 
> DAG look correct):
>  !airflow_priorities_dag.png!
> I've been trying this with a LocalExecutor (AIRFLOW__CORE__PARALLELISM=2) and 
> two celery workers. When running the LocalExecutor, the high prio tasks get 
> indeed executed once their init task is done (as expected, the 'prio100' 
> tasks finish before the remaining 'prio10' tasks):
> !airflow_priorities_localexecutor.png!
> However, when using Celery, it seems that that the priorities are not honored 
> anymore. Once can see clearly that the low prio tasks (prio10) are all 
> executed before the high prio ones (prio100)
> !airflow_priorities_celery.png!
> Here is the corresponding code
> [^test_priority.py]
> I do not know whether this expected behavior or a bug? Is there any way to 
> "fix" this?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to