Re: [DISCUSS] Shaping the future of executors for Airflow (slowly phasing out Celery ?)

Jarek Potiuk Thu, 25 Nov 2021 04:23:07 -0800

Very Good comments Ash !
Food for thought indeed - indeed LocalExecutor for multi-tenant is
no-go (thought about it too :). I agree there are different cases and
I agree totally that Celery will stay there for a looong time (maybe
forever).


Maybe the "phasing out" is too strong of a statement (I deliberately
did not use "deprecated" because that was really not my intention to
"remove it" .

I thought more of changing the "thinking" we have in Airflow.

Currently the thinking is (at least in my head):

"if you want auto-scaling solution with support for long and short
tasks - by default go to CeleryKubernetesExecutor"

However I think that we **might** have another target in the future:

"if you want auto-scaling solution with support for long and short
tasks - by default go to Local<N>Executor or even if you care for
multi-tenancy, <N>Executor **might** be enough" (where N is
"Kubernetes" today but might be "Fargate/CloudRun/ContainerInstances"
etc.

J.



On Thu, Nov 25, 2021 at 12:51 PM Ash Berlin-Taylor <[email protected]> wrote:
>
> Hi Jarek,
>
> Trigger does support multiple instances already.
> Deferrable tasks still need a normal task slot on a worker to start off and 
> then defer to a trigger right now as well.
>
> While I have no love for Celery (or how we mis-use it in Airflow more 
> accurately), and I agree that we aren't using many of it's capabilities, 
> deprecating/removing the Celery executor doesn't feel right to me. Yet. And 
> not for a long while either.
>
> First there is the multi-tenancy issue (discussion happening tomorrow of 
> course) - and if the scheduler is multi-tenant then I wouldn't feel safe 
> running _any_ user/DAG code on the scheduler node at all, so for that to be 
> possible we wouldn't be able to use Local Executor at all!. For instance all 
> SLA misses, and DAG level callbacks would need to go via an executor to run 
> on a worker.
>
> Then there is my goal for Airflow: I want us to be better at running many 
> smaller tasks (which largely rules out Kubernetes due to pod start up time), 
> and while LocalExecutor would work with that model, I think a multi-node 
> deployment that doesn't involve running multiple schedulers should be 
> possible -- being able to scale worker slots (for actual data processing in 
> Airflow, not just kicking of external jobs) interdependently of scheduling 
> throughput is desirable to me. Afterall, running a scheduler is not free in 
> terms of load on the database.
>
> Essentially by running multiple schedulers with LocalExecutor I worry that we 
> have build a poor imitation of a distributed job queue (i.e. Celery) without 
> all the years of experience that Celery has of making it robust. Also lets 
> not forget that building any kind of distributed queue is a Difficult Problem 
> and there always have to be tradeoffs.
>
> -ash
>
>
> On Thu, Nov 25 2021 at 11:40:10 +0100, Jarek Potiuk <[email protected]> wrote:
>
> Hello Everyone, I recently had some discussions and thought about some new 
> features implemented already and planned and in-progress work, and I had a 
> thought - that maybe worth discussing here. It's very likely many of the 
> people involved had similar discussion and thoughts, but maybe it's worth 
> spelling it out now and have a common "direction" we are heading for the 
> future of airflow when it comes to executors. TL;DR; I think the recent 
> changes and possibly some future improvements and optimisation can lead us to 
> the situation that we will not need Celery Executor (nor CeleryKubernetes) 
> and can phase it out eventually - leaving only Local, Kubernetes and soon 
> coming LocalKubernetes one. We might still "support" CeleryExecutor for 
> backwards compatibility and people who do not want to run Kubernetes, but in 
> a way the main reasons why Celery would be preferred over Kubernetes should 
> be gone soon IMHO. Why do I think so ? I think so because I believe the main 
> problems of having CeleryExecutor in the first place are largely gone. The 
> main reason why Celery executor was better than the Kubernetes one was that 
> you could run more short tasks with far less overhead and latency. However we 
> have now either already implemented or easy to optimise ways of significantly 
> decreasing the need of running small tasks via "remote" executors. The 
> following things already happened: 1) We have Deferrable Operators support. 
> Most of the code there - for mostly small tasks or parts of the operators 
> that wait for something already executed in triggerer for those. 2) We have a 
> HA scheduler where you could run multiple schedulers with Local Executor - 
> thus you can get scalability in LocalExecutor for small tasks. 3) We had some 
> optimisations in DummyOperator where triggering is done in Scheduler. What 
> still can (or is being already done): * While triggerer does not (I believe) 
> support multiple instances for now, it has been designed from ground up to 
> support HA/scalability. * We can rewrite a lot of the operators we have to be 
> Deferrable - especially those that reach out to external services. * We can 
> make more "built-in" operators that have some declarative behaviour rather 
> than imperative "execute" and have them evaluated directly in Scheduler. We 
> had a discussion about it in https://github.com/apache/airflow/pull/19361 - 
> but looks like it should be possible to implement - for example - "DayOfWeek" 
> operator that would be evaluated in Scheduler and triggering decisions could 
> be made there. We could probably add quite a number of such "optimized" 
> operators that could be declarative and evaluated in a scheduler with 
> virtually 0 overhead. * with LocalKubernetes executor coming 
> https://github.com/apache/airflow/pull/19729 combined with HA/scalability of 
> scheduler (thus scalability of Local Executors) - It seems that any 
> reasonable installation will have enough scalability and capacity to locally 
> execute all the remaining "small tasks" in Local Executors. We could even try 
> to figure out some good pattern of figuring out which tasks are "small" and 
> automatically using LocalExecutor for them - eventually. It seems to me that 
> with those upcoming changes, LocalKubernetes should be default executor in 
> the future rather than Celery (which is now kind-of de facto "default"). We 
> could even likly think about adding more options of similar kind for 
> GCP/AWS/Azure - using native capabilities of those platforms rather than 
> using generic "Kubernetes" as remote execution. I can imagine using Fargate 
> (AWS team could contribute it ), Cloud Run (Google team), Azure Container 
> Instances (maybe Microsoft will finally also embrace Airflow :) ) . That 
> would make the Airflow architecture more "Multiple Cloud Native". Why do I 
> think Celery Executor should be "gone" (possibly not immediately but possibly 
> with less priority) ? Problem with Celery is that even with KEDA autoscaling 
> Celery Executor has big problems with scaling-in (also had discussions about 
> it recently - with the AWS team among others). Celery is complex and we are 
> using maybe 5% of it's capabilities (however I had a recent discussion (at 
> PyWaw where I gave talk about Airflow dependencies) with people who are 
> heavily using Celery with their product and utilise a lot more of those 
> capabilities and they are rather unhappy with the problems they have to deal 
> with and stability of more complex features of Celery. I'd love to hear what 
> others think on the subject? It would be great to have some common 
> "direction" we are heading in agreed and "vision" of Airflow in the future 
> when it comes to Executors, and I have a feeling that we are just about a 
> pivotal point where we can all consciously change our paradigm of thinking 
> about Airflow executors and prioritising things differently. J.

Re: [DISCUSS] Shaping the future of executors for Airflow (slowly phasing out Celery ?)

Reply via email to