I think in a perfect world we'd only have the completely vendor neutral executors pre-installed (Local, Sequential, Debug) and anything else would need to be specifically installed by admins/users. I think if we were starting from scratch this would make the most sense, but clearly Kubernetes and Celery executors are so ubiquitous that it'd cause too much wreckage to not install them, but I'd like to push for Dask to _not_ be installed by default. If this causes too much wreckage then perhaps we should deprecate that (though I'm not sure exactly what that would look like in this context), but it's difficult to measure how many folks are using the Dask executor. Perhaps we have data from the yearly questionnaire/survey we send?
________________________________ From: Jarek Potiuk <ja...@potiuk.com> Sent: Wednesday, July 12, 2023 8:05:54 AM To: dev@airflow.apache.org Subject: [EXTERNAL] [DISCUSS] Moving Dask Executor to a separate (optional?) dask provider CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hello Everyone, A small follow up after K8S/Celery executors being moved: https://lists.apache.org/thread/7gyw7ty9vm0pokjxq7y3b1zw6mrlxfm8 We are in the process of moving Celery / Kubernetes executor (Celery almost complete and I am working on K8S next + some common discovery and config moving) But there is one more "questionable" executor - i.e. Dask executor, still living in Airflow Core. When it comes to Celery/Kubernetes, we decided to make the two providers preinstalled, because it makes most sense - we are also going to get the basic documentation in the "core" airflow documentation so that it is easier discoverable and prominently visible - also because of the vendor-neutrality. However when it comes to Dask I am not sure about its status and whether we should make it preinstalled ? I guess there is no doubt to move it to a provider - this has only the benefits same as Celery/K8S move. But whether it should be preinstalled with Airflow - I am not sure. I do not know how frequently Dask executor (and Dask) is used by people using Airflow, but I personally do not think it should be as "closely" connected with Airflow as Celery/Kubernetes ones. If we do not make it preinstalled, it is somewhat (but not too much, really) breaking change. We still might choose to install dask provider in the PROD reference image, so it will continue to work if you use the image, and when you are installing airflow in venv you will only have to specify `pip install apache-airflow[dask]` or manually install `apache-airflow-providers-daskexecutor` (for now at least this is the name I could reserve in PyPI). So this is not really breaking, it just requires another dependency to be installed. But some pipelines of installing Airflow might get broken because it won't be pre-installed - so this is a borderline breaking. WDYT? Should we make the dask executor pre-installed or not? J.