I think in a perfect world we'd only have the completely vendor neutral 
executors pre-installed (Local, Sequential, Debug) and anything else would need 
to be specifically installed by admins/users. I think if we were starting from 
scratch this would make the most sense, but clearly Kubernetes and Celery 
executors are so ubiquitous that it'd cause too much wreckage to not install 
them, but I'd like to push for Dask to _not_ be installed by default. If this 
causes too much wreckage then perhaps we should deprecate that (though I'm not 
sure exactly what that would look like in this context), but it's difficult to 
measure how many folks are using the Dask executor. Perhaps we have data from 
the yearly questionnaire/survey we send?

________________________________
From: Jarek Potiuk <ja...@potiuk.com>
Sent: Wednesday, July 12, 2023 8:05:54 AM
To: dev@airflow.apache.org
Subject: [EXTERNAL] [DISCUSS] Moving Dask Executor to a separate (optional?) 
dask provider

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.



Hello Everyone,

A small follow up after K8S/Celery executors being moved:
https://lists.apache.org/thread/7gyw7ty9vm0pokjxq7y3b1zw6mrlxfm8

We are in the process of moving Celery / Kubernetes executor (Celery almost
complete and I am working on K8S next + some common discovery and config
moving)

But there is one more "questionable" executor - i.e. Dask executor, still
living in Airflow Core.

When it comes to Celery/Kubernetes, we decided to make the two providers
preinstalled, because it makes most sense  - we are also going to get the
basic documentation in the "core" airflow documentation so that it is
easier discoverable and prominently visible - also because of the
vendor-neutrality.

However when it comes to Dask I am not sure about its status and whether we
should make it preinstalled ?

I guess there is no doubt to move it to a provider - this has only the
benefits same as Celery/K8S move. But whether it should be preinstalled
with Airflow - I am not sure. I do not know how frequently Dask executor
(and Dask) is used by people using Airflow, but I personally do not think
it should be as "closely" connected with Airflow as Celery/Kubernetes ones.

If we do not make it preinstalled, it is somewhat (but not too much,
really) breaking change. We still might choose to install dask provider in
the PROD reference image, so it will continue to work if you use the image,
and when you are installing airflow in venv you will only have to specify
`pip install apache-airflow[dask]` or manually install
`apache-airflow-providers-daskexecutor` (for now at least this is the name
I could reserve in PyPI). So this is not really breaking, it just requires
another dependency to be installed. But some pipelines of installing
Airflow might get broken because it won't be pre-installed - so this is a
borderline breaking.

WDYT? Should we make the dask executor pre-installed or not?

J.

Reply via email to