Very Good comments Ash ! Food for thought indeed - indeed LocalExecutor for multi-tenant is no-go (thought about it too :). I agree there are different cases and I agree totally that Celery will stay there for a looong time (maybe forever).
Maybe the "phasing out" is too strong of a statement (I deliberately did not use "deprecated" because that was really not my intention to "remove it" . I thought more of changing the "thinking" we have in Airflow. Currently the thinking is (at least in my head): "if you want auto-scaling solution with support for long and short tasks - by default go to CeleryKubernetesExecutor" However I think that we **might** have another target in the future: "if you want auto-scaling solution with support for long and short tasks - by default go to Local<N>Executor or even if you care for multi-tenancy, <N>Executor **might** be enough" (where N is "Kubernetes" today but might be "Fargate/CloudRun/ContainerInstances" etc. J. On Thu, Nov 25, 2021 at 12:51 PM Ash Berlin-Taylor <[email protected]> wrote: > > Hi Jarek, > > Trigger does support multiple instances already. > Deferrable tasks still need a normal task slot on a worker to start off and > then defer to a trigger right now as well. > > While I have no love for Celery (or how we mis-use it in Airflow more > accurately), and I agree that we aren't using many of it's capabilities, > deprecating/removing the Celery executor doesn't feel right to me. Yet. And > not for a long while either. > > First there is the multi-tenancy issue (discussion happening tomorrow of > course) - and if the scheduler is multi-tenant then I wouldn't feel safe > running _any_ user/DAG code on the scheduler node at all, so for that to be > possible we wouldn't be able to use Local Executor at all!. For instance all > SLA misses, and DAG level callbacks would need to go via an executor to run > on a worker. > > Then there is my goal for Airflow: I want us to be better at running many > smaller tasks (which largely rules out Kubernetes due to pod start up time), > and while LocalExecutor would work with that model, I think a multi-node > deployment that doesn't involve running multiple schedulers should be > possible -- being able to scale worker slots (for actual data processing in > Airflow, not just kicking of external jobs) interdependently of scheduling > throughput is desirable to me. Afterall, running a scheduler is not free in > terms of load on the database. > > Essentially by running multiple schedulers with LocalExecutor I worry that we > have build a poor imitation of a distributed job queue (i.e. Celery) without > all the years of experience that Celery has of making it robust. Also lets > not forget that building any kind of distributed queue is a Difficult Problem > and there always have to be tradeoffs. > > -ash > > > On Thu, Nov 25 2021 at 11:40:10 +0100, Jarek Potiuk <[email protected]> wrote: > > Hello Everyone, I recently had some discussions and thought about some new > features implemented already and planned and in-progress work, and I had a > thought - that maybe worth discussing here. It's very likely many of the > people involved had similar discussion and thoughts, but maybe it's worth > spelling it out now and have a common "direction" we are heading for the > future of airflow when it comes to executors. TL;DR; I think the recent > changes and possibly some future improvements and optimisation can lead us to > the situation that we will not need Celery Executor (nor CeleryKubernetes) > and can phase it out eventually - leaving only Local, Kubernetes and soon > coming LocalKubernetes one. We might still "support" CeleryExecutor for > backwards compatibility and people who do not want to run Kubernetes, but in > a way the main reasons why Celery would be preferred over Kubernetes should > be gone soon IMHO. Why do I think so ? I think so because I believe the main > problems of having CeleryExecutor in the first place are largely gone. The > main reason why Celery executor was better than the Kubernetes one was that > you could run more short tasks with far less overhead and latency. However we > have now either already implemented or easy to optimise ways of significantly > decreasing the need of running small tasks via "remote" executors. The > following things already happened: 1) We have Deferrable Operators support. > Most of the code there - for mostly small tasks or parts of the operators > that wait for something already executed in triggerer for those. 2) We have a > HA scheduler where you could run multiple schedulers with Local Executor - > thus you can get scalability in LocalExecutor for small tasks. 3) We had some > optimisations in DummyOperator where triggering is done in Scheduler. What > still can (or is being already done): * While triggerer does not (I believe) > support multiple instances for now, it has been designed from ground up to > support HA/scalability. * We can rewrite a lot of the operators we have to be > Deferrable - especially those that reach out to external services. * We can > make more "built-in" operators that have some declarative behaviour rather > than imperative "execute" and have them evaluated directly in Scheduler. We > had a discussion about it in https://github.com/apache/airflow/pull/19361 - > but looks like it should be possible to implement - for example - "DayOfWeek" > operator that would be evaluated in Scheduler and triggering decisions could > be made there. We could probably add quite a number of such "optimized" > operators that could be declarative and evaluated in a scheduler with > virtually 0 overhead. * with LocalKubernetes executor coming > https://github.com/apache/airflow/pull/19729 combined with HA/scalability of > scheduler (thus scalability of Local Executors) - It seems that any > reasonable installation will have enough scalability and capacity to locally > execute all the remaining "small tasks" in Local Executors. We could even try > to figure out some good pattern of figuring out which tasks are "small" and > automatically using LocalExecutor for them - eventually. It seems to me that > with those upcoming changes, LocalKubernetes should be default executor in > the future rather than Celery (which is now kind-of de facto "default"). We > could even likly think about adding more options of similar kind for > GCP/AWS/Azure - using native capabilities of those platforms rather than > using generic "Kubernetes" as remote execution. I can imagine using Fargate > (AWS team could contribute it ), Cloud Run (Google team), Azure Container > Instances (maybe Microsoft will finally also embrace Airflow :) ) . That > would make the Airflow architecture more "Multiple Cloud Native". Why do I > think Celery Executor should be "gone" (possibly not immediately but possibly > with less priority) ? Problem with Celery is that even with KEDA autoscaling > Celery Executor has big problems with scaling-in (also had discussions about > it recently - with the AWS team among others). Celery is complex and we are > using maybe 5% of it's capabilities (however I had a recent discussion (at > PyWaw where I gave talk about Airflow dependencies) with people who are > heavily using Celery with their product and utilise a lot more of those > capabilities and they are rather unhappy with the problems they have to deal > with and stability of more complex features of Celery. I'd love to hear what > others think on the subject? It would be great to have some common > "direction" we are heading in agreed and "vision" of Airflow in the future > when it comes to Executors, and I have a feeling that we are just about a > pivotal point where we can all consciously change our paradigm of thinking > about Airflow executors and prioritising things differently. J.
