Re: [DISCUSSION] Move K8S and Celery Executors (and related) to respective providers?

Elad Kalif Thu, 02 Feb 2023 00:12:35 -0800

I agree with Jarek.

provider release cycle is faster than Airflow core. We can also release
ad-hoc specific provider. There is no cherry picking and on top of that
with providers we can back port a fix to older versions of Airflow (should
we want to) which is something we don't do for core release.


wearing Airflow user hat I can say that it's not always so simple and easy
to update Airflow version (it has nothing to do with Airflow itself but
more about companies policies). Airflow core releases might contains
migrations scripts thus one might need to coordinate with dba, devops and
many other departments. I know of some companies that update Airflow core
version only once in a quarter. This means that if a company hit a bug
which we already fixed in a followup minor release they will continue to
suffer from it till next update cycle. However updating a provider is very
easy, rollback from provider update is also super simple in compare to
Airflow core.

To my it looks like moving executors to providers contains only benefits.

On Wed, Feb 1, 2023 at 11:11 PM Jarek Potiuk <[email protected]> wrote:

> - "evolve K8S executor faster": so far I don't see the provider being
> released at a significantly faster pace than the core. And putting
> such a key component away from the core seems issue-prone IMHO
>
> They are. if you look here
>
> https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/index.html
> - there were those "major" or "minor" releases of cncf.kubernetes:
> 5.1, 5.0, 4.4, 4.3, 4.2. 4.1, 3.1, 3.0, 2.2, 2.1, 2.0, 1.2, 1.1, 1.0
> - in total 14 releases that brought new features to K8S provider (I am
> not counting bug-fix only releases) since 2.0, where we had 2.5, 2.4,
> 2.3, 2.2, 2.1, 2.0  =  6 Airflow releases brining new features (not
> counting bug-fix releases) - which means we are 2x faster in releasing
> new features in providers than releasing Airflow.
> I am only counting "new features".- because that's what I consider
> evolution.
>
> But you are right, faster is not the only thing - it's also the
> flexibility that comes with it and consistency. The thing is - if we
> release executors in providers, it means that users will be able to
> take advantage of it even without upgrading Airflow. I've heard many
> concerns from many of our users "yes we want to upgrade Airflow, but
> this is a big effort for us  - it involves database migrations,
> upgrading dependencies and more. The "single" provider upgrade on the
> other hand - even if it is 'cncf.kubernetes' - is far easier to add -
> and far easier to rollback as well if things go wrong. The whole idea
> of providers is that you can - if you like - upgrade a new version of
> the provider and downgrade it much easier than upgrading the whole
> airflow.
>
> Another reason is consistency. If all the code that is needed to run
> k8s executor is in one package - you can be far more certain it will
> work. Not everyone knows that the current cncf.kubernetes provider
> heavily depends on "airflow.kubernetes" package. Those are the imports
> that I copy from
> "airflow.providers.cncf.kubernetes.providers.operators.kubernetes_pod.py":
>
> from airflow.kubernetes import pod_generator
> from airflow.kubernetes.pod_generator import PodGenerator
> from airflow.kubernetes.secret import Secret
>
> This means that if there is any change in PodGeneratod, or Secret - we
> have to somehow make sure that the existing, released providers will
> still work with it. Becasue if someone decides that - due to backwards
> compatibility - they want to use an older provider version, but they
> want to migrate to a newer airflow version - this dependency should
> still work. And we do not test it. We **hope** it will work, but there
> is no way to test a number of past versions of providers with new
> versions of airflow. In this case the cncf.kubernetes provider
> implicitly depends on  "airflow.kubernetes" keeping backwards
> compatibility. And the only reason the "airflow.kubernetes" code is
> directly under the "airflow" package is the fact that
> KubernetesExecutor is also in "airflow.executors" - so it is a part of
> Airflow core.
>
> So I think there are more reasons than "faster" (though faster
> remains) to move the executors to providers:
>
> * faster to release new features in smaller increments (important for
> maintainers)
> * more consistent in behaviours without risking
> backwards-compatibility problems (important for both - users and
> maintainers)
> * more flexible in handling upgrades and either applying new versions
> or staying with old implementation for longer (important for users)
>
> All those are I think important - and everyone (IMHO) benefits.
>
> I'd also like to ask you XD - what are you afraid of? What problem,
> you think it might cause?
>
> I believe the "feel natural" part is already addressed by keeping the
> documentation where it is and making cncf.kubernetes and celery
> providers as "preinstalled" (as discussed before) - I think the docs
> are fine to stay as they are and keeping all the common executors are
> part of "regular core" executors are fine. For those who install
> Airflow Image, or upgrade airflow following constraints - that would
> be absolutely no change. Things will work as previously. The only
> difference for those users will be that the implementation of those
> classes will be in a different package. The users won't even notice
> the difference IMHO.
>
> This change will only matter for people who will decide to downgrade
> the provider or upgrade it (for some good reasons where they will
> deliberately decide it should be done).
>
> J.
>
>
>
> On Wed, Feb 1, 2023 at 4:56 AM Xiaodong Deng <[email protected]> wrote:
> >
> > I would suggest double thinking about this idea, and I'm not fully
> convinced by it, at least for now.
> >
> > The main question for me would be: what's the advantage of doing it?
> > I understand you have already mentioned it in your very first email. But
> they don't seem to break even:
> > - "evolve K8S executor faster": so far I don't see the provider being
> released at a significantly faster pace than the core. And putting such a
> key component away from the core seems issue-prone IMHO
> > - "give celery provider some 'real' meaning": we don't have to make a
> big refactoring just in order to give a provider package some "real"
> meaning.
> >
> > These two executors are among the most adopted executors, keeping them
> in the "core" seems sensible & natural to me. Also meaning we should be
> more careful with them. I would like to be educated with more
> advantages/benefits of making such a change before I can share a "+1"
> confidently here.
> >
> > I definitely have missed something. Please let me know if I missed some
> important pieces here. Thanks
> >
> >
> > Regards,
> > XD
> >
> > On Tue, Jan 31, 2023 at 12:43 AM Jarek Potiuk <[email protected]> wrote:
> >>
> >>  > 1. How does this impact work of release managers? I see them
> >> already working pretty hard to release providers and Airflow core. I
> >> wonder if this makes their task even harder, especially as these
> >> require more rigorous testing as compared to providers.
> >>
> >> Good point. I think it is less of a problem for release managers and
> >> more for those who contributed related changes. Contrary to what you
> >> might know from elsewhere, release managers DO NOT test Airflow when
> >> releasing. Release managers are merely executing the mechanics of the
> >> release preparation and ca make decisions in case someone finds issues
> >> that are potentially blockling:
> >> https://infra.apache.org/release-publishing.html#releasemanager
> >>
> >> > Most projects designate a committer to be the release manager who
> takes responsibility for the mechanics of a release. It is a good idea to
> let several committers take this role on different releases so that more
> than one person is comfortable doing a release. Release managers shepherd a
> release from an initial community consensus to getting the compiled code
> package to final distribution, and may be involved in publicizing the
> release to the project's community and the ASF in general.
> >>
> >> Our current process for providers already involves asking the
> >> community to test the release candidate, including the Github Issue
> >> automated pinging and asking those who contributed the change to test
> >> it https://github.com/apache/airflow/issues/29103. This absolutely
> >> does not change in the process. If anything, this has actually
> >> potential to be MUCH better tested and safer to use. The usual saying
> >> that I keep on repeating - if something is painful when it comes to
> >> releases - do it more often. If we move K8S executor to the providers
> >> it has two side effects - the changes to k8s executor will come in
> >> smaller increments, and they will never be distributed between the
> >> cncf.kubernetes provider and airflow (which on it's own is recipe for
> >> much more stable behaviour) and - more importantly - users will be
> >> able to go BACK to a working k8s executor when they upgrade
> >> cncf.kubernetes. Let this sink for a while - if you (as deployment
> >> manager) find a problem in k8s executor after upgrading the provider,
> >> you will be able to fix the problem by selective downgrading the
> >> provider to previous version, without having to downgrade airflow. you
> >> will even be able to use executor that you used and loved in Airflow
> >> 2.6 (say) even if you upgrade to Airflow 2.7 (providing of course that
> >> our BaseExecutor API is stable). But we can also add some explicit
> >> version limitations in Airflow if we would like to make
> >> back-incompatible changes - and we can do that by adding
> >> min-requirement for cncf.kubernetes provider. There is a lot of
> >> flexibility and control both sides have - maintainers can impose
> >> certain requirements, while users should be free to move within the
> >> limitations set by maintainers.
> >>
> >> I'd say all those effects are only positive.
> >>
> >> > 2. As part of AIP-44, should we create an internal API for executors?
> I recall a discussion in Slack (related to ECS) where we aspire to remove
> all database access from providers.
> >>
> >> This is an excellent point that has some interesting consequences.
> >> Glad that you raised it.
> >>
> >> We do not need Internal API for executors. In the case of AIP-44 it
> >> does not matter where the code comes from, it matters in which context
> >> the code is executed in (i.e. scheduler/task/triggerer etc.).
> >> Executors are always executed in the scheduler context and that means
> >> that they should use direct DB access - same as all other
> >> scheduler-run code.
> >>
> >> But that indeed raises another question for the future "multi-tenant"
> >> setup. Will tenants be able to upgrade their cncf.kubernetes provider
> >> and impact the code run by scheduler? That would be a security problem
> >> if the second part of the question is true. But this is not how it
> >> will work for multi-tenant setup.
> >>
> >> By definition, those who run multi-tenant future Airlfow should not
> >> allow tenants to install anything in the scheduler. It's already given
> >> that scheduler has a different set of dependencies (so possibly
> >> different set of providers) installed than the "workers" of the tenant
> >> (and managed by different people - Airflow admins, not Tenant admins).
> >> Each tenant should have (whatever way deployment is managed) their own
> >> set of dependencies for "workers" + "dag file processors", but
> >> scheduler shoudl have it's own set, managed independently. Such
> >> scheduler should even have no "regular" providers installed - just
> >> plugins that extend it's capabilities with timetables and the like. So
> >> we are already in the situation that scheduler has different set of
> >> dependencies than processors and workers.
> >>
> >> And this is perfectly fine to have scheduler installed with different
> >> kubernetes provider - acting as plugin extending scheduler
> >> capabilities (managed by the central deployment team) and each tenant
> >> might have the same or a different provider as long as it is
> >> compatible with the version of Airflow used. Scheduler will only use
> >> the "executor" part of the provider, while workers and dag file
> >> processors will only use the KPO part of their provider. No overlap or
> >> dependency there.
> >>
> >> I think however (and this is something that we might consider) -
> >> executors could only be used if exposed in providers via plugin
> >> mechanism/entrypoint similarly as Timetables. And for security,
> >> scheduler might only accept executors that are configured via plugin
> >> mechanism. This would be slightly backwards incompatible and is not
> >> strictly necessary (because someone would have to have access to
> >> configuration to use a scheduler which is not coming via plugin) - but
> >> we might consider that to enhance a bit security and add another layer
> >> of it. We already have a precedent where hive provider exposes both -
> >> provider entrypoint and plugin entrypoint, and we could do the same
> >> for cncf.kubernetes, and celery. Worth considering for sure, not
> >> strictly necessary.
> >>
> >> >
> >> > Best,
> >> > Shubham
> >> >
> >> >
> >> >
> >> > On 2023-01-29, 1:21 AM, "Jarek Potiuk" <[email protected]> wrote:
> >> >
> >> >     CAUTION: This email originated from outside of the organization.
> Do not click links or open attachments unless you can confirm the sender
> and know the content is safe.
> >> >
> >> >
> >> >
> >> >     Hello Everyone,
> >> >
> >> >     As a follow-up to AIP-51 - when it is completed (with few more
> quirks
> >> >     like the one described by Andrey in the "Rendered Task Instance
> >> >     Fields" discussion) - it should now be entirely possible to move
> >> >     Kubernetes Executor, Celery Executor and related "joined"
> executors to
> >> >     respective providers.
> >> >
> >> >     There is of course question where to move
> CeleryKubernetesExecutor,
> >> >     but we have prior art with Transfer Operators and we might choose
> to
> >> >     add it to "cncf.kubernetes" and make the celery provider an
> optional
> >> >     dependency of K8S provider.
> >> >
> >> >     This has multiple advantages, and one of the biggest one I can
> think
> >> >     of is that we could evolve K8S executor faster than Airflow
> itself and
> >> >     we would avoid quite a lot of complexities involved in parallel
> >> >     modifications of K8S code in provider and executor (it happened
> in the
> >> >     past that we had do to add very high min-airflow version for
> >> >     "cncf.kubernetes" provider in order to make this happens, and we
> could
> >> >     likely avoid similar problems by making sure K8S executor is used
> as
> >> >     "yet another executor" - compliant with the AIP-51 interface and
> we
> >> >     could evolve it much faster.
> >> >
> >> >     That would also give celery provider some "real" meaning - so far
> >> >     celery provider was merely exposing celery queue sensor, but when
> we
> >> >     move CeleryExecutor to the provider, the provider would finally
> be a
> >> >     useful one.
> >> >
> >> >     Doing this requires a few small things:
> >> >     * we likely need to add dynamic import in the old "executors"
> package
> >> >     following PEP 562 (and the way we've done operators) and
> deprecation
> >> >     notice in case someone uses them from there.
> >> >     * we likely need to add "cncf.kubernetes" and "celery" packages as
> >> >     pre-installed providers - alongside http, fttp, common.sql,
> sqlite,
> >> >     imap.
> >> >
> >> >     I **think**, after AIP-51 gets implemented, this would be a fully
> >> >     backwards-compatible change - it's just a matter of proper
> management
> >> >     of the dependencies. We could add min-cncf.kubernetes/celery
> versions
> >> >     in Airflow 2.6, so that we are sure 2.6+ airflow uses only newer
> >> >     providers and kubernetes/celery code and this would be fully
> >> >     transparent for the users, I believe.
> >> >
> >> >     J.
> >> >
>

Re: [DISCUSSION] Move K8S and Celery Executors (and related) to respective providers?

Reply via email to