Re: [DISCUSSION] Move K8S and Celery Executors (and related) to respective providers?

Jarek Potiuk Sun, 12 Feb 2023 16:49:27 -0800

Anyone else has any more comments? I am inclined to start voting on
that one as this is a bit more than just a PR.


On Thu, Feb 2, 2023 at 9:12 AM Elad Kalif <[email protected]> wrote:
>
> I agree with Jarek.
>
> provider release cycle is faster than Airflow core. We can also release 
> ad-hoc specific provider. There is no cherry picking and on top of that with 
> providers we can back port a fix to older versions of Airflow (should we want 
> to) which is something we don't do for core release.
>
> wearing Airflow user hat I can say that it's not always so simple and easy to 
> update Airflow version (it has nothing to do with Airflow itself but more 
> about companies policies). Airflow core releases might contains migrations 
> scripts thus one might need to coordinate with dba, devops and many other 
> departments. I know of some companies that update Airflow core version only 
> once in a quarter. This means that if a company hit a bug which we already 
> fixed in a followup minor release they will continue to suffer from it till 
> next update cycle. However updating a provider is very easy, rollback from 
> provider update is also super simple in compare to Airflow core.
>
> To my it looks like moving executors to providers contains only benefits.
>
> On Wed, Feb 1, 2023 at 11:11 PM Jarek Potiuk <[email protected]> wrote:
>>
>> - "evolve K8S executor faster": so far I don't see the provider being
>> released at a significantly faster pace than the core. And putting
>> such a key component away from the core seems issue-prone IMHO
>>
>> They are. if you look here
>> https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/index.html
>> - there were those "major" or "minor" releases of cncf.kubernetes:
>> 5.1, 5.0, 4.4, 4.3, 4.2. 4.1, 3.1, 3.0, 2.2, 2.1, 2.0, 1.2, 1.1, 1.0
>> - in total 14 releases that brought new features to K8S provider (I am
>> not counting bug-fix only releases) since 2.0, where we had 2.5, 2.4,
>> 2.3, 2.2, 2.1, 2.0  =  6 Airflow releases brining new features (not
>> counting bug-fix releases) - which means we are 2x faster in releasing
>> new features in providers than releasing Airflow.
>> I am only counting "new features".- because that's what I consider evolution.
>>
>> But you are right, faster is not the only thing - it's also the
>> flexibility that comes with it and consistency. The thing is - if we
>> release executors in providers, it means that users will be able to
>> take advantage of it even without upgrading Airflow. I've heard many
>> concerns from many of our users "yes we want to upgrade Airflow, but
>> this is a big effort for us  - it involves database migrations,
>> upgrading dependencies and more. The "single" provider upgrade on the
>> other hand - even if it is 'cncf.kubernetes' - is far easier to add -
>> and far easier to rollback as well if things go wrong. The whole idea
>> of providers is that you can - if you like - upgrade a new version of
>> the provider and downgrade it much easier than upgrading the whole
>> airflow.
>>
>> Another reason is consistency. If all the code that is needed to run
>> k8s executor is in one package - you can be far more certain it will
>> work. Not everyone knows that the current cncf.kubernetes provider
>> heavily depends on "airflow.kubernetes" package. Those are the imports
>> that I copy from
>> "airflow.providers.cncf.kubernetes.providers.operators.kubernetes_pod.py":
>>
>> from airflow.kubernetes import pod_generator
>> from airflow.kubernetes.pod_generator import PodGenerator
>> from airflow.kubernetes.secret import Secret
>>
>> This means that if there is any change in PodGeneratod, or Secret - we
>> have to somehow make sure that the existing, released providers will
>> still work with it. Becasue if someone decides that - due to backwards
>> compatibility - they want to use an older provider version, but they
>> want to migrate to a newer airflow version - this dependency should
>> still work. And we do not test it. We **hope** it will work, but there
>> is no way to test a number of past versions of providers with new
>> versions of airflow. In this case the cncf.kubernetes provider
>> implicitly depends on  "airflow.kubernetes" keeping backwards
>> compatibility. And the only reason the "airflow.kubernetes" code is
>> directly under the "airflow" package is the fact that
>> KubernetesExecutor is also in "airflow.executors" - so it is a part of
>> Airflow core.
>>
>> So I think there are more reasons than "faster" (though faster
>> remains) to move the executors to providers:
>>
>> * faster to release new features in smaller increments (important for
>> maintainers)
>> * more consistent in behaviours without risking
>> backwards-compatibility problems (important for both - users and
>> maintainers)
>> * more flexible in handling upgrades and either applying new versions
>> or staying with old implementation for longer (important for users)
>>
>> All those are I think important - and everyone (IMHO) benefits.
>>
>> I'd also like to ask you XD - what are you afraid of? What problem,
>> you think it might cause?
>>
>> I believe the "feel natural" part is already addressed by keeping the
>> documentation where it is and making cncf.kubernetes and celery
>> providers as "preinstalled" (as discussed before) - I think the docs
>> are fine to stay as they are and keeping all the common executors are
>> part of "regular core" executors are fine. For those who install
>> Airflow Image, or upgrade airflow following constraints - that would
>> be absolutely no change. Things will work as previously. The only
>> difference for those users will be that the implementation of those
>> classes will be in a different package. The users won't even notice
>> the difference IMHO.
>>
>> This change will only matter for people who will decide to downgrade
>> the provider or upgrade it (for some good reasons where they will
>> deliberately decide it should be done).
>>
>> J.
>>
>>
>>
>> On Wed, Feb 1, 2023 at 4:56 AM Xiaodong Deng <[email protected]> wrote:
>> >
>> > I would suggest double thinking about this idea, and I'm not fully 
>> > convinced by it, at least for now.
>> >
>> > The main question for me would be: what's the advantage of doing it?
>> > I understand you have already mentioned it in your very first email. But 
>> > they don't seem to break even:
>> > - "evolve K8S executor faster": so far I don't see the provider being 
>> > released at a significantly faster pace than the core. And putting such a 
>> > key component away from the core seems issue-prone IMHO
>> > - "give celery provider some 'real' meaning": we don't have to make a big 
>> > refactoring just in order to give a provider package some "real" meaning.
>> >
>> > These two executors are among the most adopted executors, keeping them in 
>> > the "core" seems sensible & natural to me. Also meaning we should be more 
>> > careful with them. I would like to be educated with more 
>> > advantages/benefits of making such a change before I can share a "+1" 
>> > confidently here.
>> >
>> > I definitely have missed something. Please let me know if I missed some 
>> > important pieces here. Thanks
>> >
>> >
>> > Regards,
>> > XD
>> >
>> > On Tue, Jan 31, 2023 at 12:43 AM Jarek Potiuk <[email protected]> wrote:
>> >>
>> >>  > 1. How does this impact work of release managers? I see them
>> >> already working pretty hard to release providers and Airflow core. I
>> >> wonder if this makes their task even harder, especially as these
>> >> require more rigorous testing as compared to providers.
>> >>
>> >> Good point. I think it is less of a problem for release managers and
>> >> more for those who contributed related changes. Contrary to what you
>> >> might know from elsewhere, release managers DO NOT test Airflow when
>> >> releasing. Release managers are merely executing the mechanics of the
>> >> release preparation and ca make decisions in case someone finds issues
>> >> that are potentially blockling:
>> >> https://infra.apache.org/release-publishing.html#releasemanager
>> >>
>> >> > Most projects designate a committer to be the release manager who takes 
>> >> > responsibility for the mechanics of a release. It is a good idea to let 
>> >> > several committers take this role on different releases so that more 
>> >> > than one person is comfortable doing a release. Release managers 
>> >> > shepherd a release from an initial community consensus to getting the 
>> >> > compiled code package to final distribution, and may be involved in 
>> >> > publicizing the release to the project's community and the ASF in 
>> >> > general.
>> >>
>> >> Our current process for providers already involves asking the
>> >> community to test the release candidate, including the Github Issue
>> >> automated pinging and asking those who contributed the change to test
>> >> it https://github.com/apache/airflow/issues/29103. This absolutely
>> >> does not change in the process. If anything, this has actually
>> >> potential to be MUCH better tested and safer to use. The usual saying
>> >> that I keep on repeating - if something is painful when it comes to
>> >> releases - do it more often. If we move K8S executor to the providers
>> >> it has two side effects - the changes to k8s executor will come in
>> >> smaller increments, and they will never be distributed between the
>> >> cncf.kubernetes provider and airflow (which on it's own is recipe for
>> >> much more stable behaviour) and - more importantly - users will be
>> >> able to go BACK to a working k8s executor when they upgrade
>> >> cncf.kubernetes. Let this sink for a while - if you (as deployment
>> >> manager) find a problem in k8s executor after upgrading the provider,
>> >> you will be able to fix the problem by selective downgrading the
>> >> provider to previous version, without having to downgrade airflow. you
>> >> will even be able to use executor that you used and loved in Airflow
>> >> 2.6 (say) even if you upgrade to Airflow 2.7 (providing of course that
>> >> our BaseExecutor API is stable). But we can also add some explicit
>> >> version limitations in Airflow if we would like to make
>> >> back-incompatible changes - and we can do that by adding
>> >> min-requirement for cncf.kubernetes provider. There is a lot of
>> >> flexibility and control both sides have - maintainers can impose
>> >> certain requirements, while users should be free to move within the
>> >> limitations set by maintainers.
>> >>
>> >> I'd say all those effects are only positive.
>> >>
>> >> > 2. As part of AIP-44, should we create an internal API for executors? I 
>> >> > recall a discussion in Slack (related to ECS) where we aspire to remove 
>> >> > all database access from providers.
>> >>
>> >> This is an excellent point that has some interesting consequences.
>> >> Glad that you raised it.
>> >>
>> >> We do not need Internal API for executors. In the case of AIP-44 it
>> >> does not matter where the code comes from, it matters in which context
>> >> the code is executed in (i.e. scheduler/task/triggerer etc.).
>> >> Executors are always executed in the scheduler context and that means
>> >> that they should use direct DB access - same as all other
>> >> scheduler-run code.
>> >>
>> >> But that indeed raises another question for the future "multi-tenant"
>> >> setup. Will tenants be able to upgrade their cncf.kubernetes provider
>> >> and impact the code run by scheduler? That would be a security problem
>> >> if the second part of the question is true. But this is not how it
>> >> will work for multi-tenant setup.
>> >>
>> >> By definition, those who run multi-tenant future Airlfow should not
>> >> allow tenants to install anything in the scheduler. It's already given
>> >> that scheduler has a different set of dependencies (so possibly
>> >> different set of providers) installed than the "workers" of the tenant
>> >> (and managed by different people - Airflow admins, not Tenant admins).
>> >> Each tenant should have (whatever way deployment is managed) their own
>> >> set of dependencies for "workers" + "dag file processors", but
>> >> scheduler shoudl have it's own set, managed independently. Such
>> >> scheduler should even have no "regular" providers installed - just
>> >> plugins that extend it's capabilities with timetables and the like. So
>> >> we are already in the situation that scheduler has different set of
>> >> dependencies than processors and workers.
>> >>
>> >> And this is perfectly fine to have scheduler installed with different
>> >> kubernetes provider - acting as plugin extending scheduler
>> >> capabilities (managed by the central deployment team) and each tenant
>> >> might have the same or a different provider as long as it is
>> >> compatible with the version of Airflow used. Scheduler will only use
>> >> the "executor" part of the provider, while workers and dag file
>> >> processors will only use the KPO part of their provider. No overlap or
>> >> dependency there.
>> >>
>> >> I think however (and this is something that we might consider) -
>> >> executors could only be used if exposed in providers via plugin
>> >> mechanism/entrypoint similarly as Timetables. And for security,
>> >> scheduler might only accept executors that are configured via plugin
>> >> mechanism. This would be slightly backwards incompatible and is not
>> >> strictly necessary (because someone would have to have access to
>> >> configuration to use a scheduler which is not coming via plugin) - but
>> >> we might consider that to enhance a bit security and add another layer
>> >> of it. We already have a precedent where hive provider exposes both -
>> >> provider entrypoint and plugin entrypoint, and we could do the same
>> >> for cncf.kubernetes, and celery. Worth considering for sure, not
>> >> strictly necessary.
>> >>
>> >> >
>> >> > Best,
>> >> > Shubham
>> >> >
>> >> >
>> >> >
>> >> > On 2023-01-29, 1:21 AM, "Jarek Potiuk" <[email protected]> wrote:
>> >> >
>> >> >     CAUTION: This email originated from outside of the organization. Do 
>> >> > not click links or open attachments unless you can confirm the sender 
>> >> > and know the content is safe.
>> >> >
>> >> >
>> >> >
>> >> >     Hello Everyone,
>> >> >
>> >> >     As a follow-up to AIP-51 - when it is completed (with few more 
>> >> > quirks
>> >> >     like the one described by Andrey in the "Rendered Task Instance
>> >> >     Fields" discussion) - it should now be entirely possible to move
>> >> >     Kubernetes Executor, Celery Executor and related "joined" executors 
>> >> > to
>> >> >     respective providers.
>> >> >
>> >> >     There is of course question where to move CeleryKubernetesExecutor,
>> >> >     but we have prior art with Transfer Operators and we might choose to
>> >> >     add it to "cncf.kubernetes" and make the celery provider an optional
>> >> >     dependency of K8S provider.
>> >> >
>> >> >     This has multiple advantages, and one of the biggest one I can think
>> >> >     of is that we could evolve K8S executor faster than Airflow itself 
>> >> > and
>> >> >     we would avoid quite a lot of complexities involved in parallel
>> >> >     modifications of K8S code in provider and executor (it happened in 
>> >> > the
>> >> >     past that we had do to add very high min-airflow version for
>> >> >     "cncf.kubernetes" provider in order to make this happens, and we 
>> >> > could
>> >> >     likely avoid similar problems by making sure K8S executor is used as
>> >> >     "yet another executor" - compliant with the AIP-51 interface and we
>> >> >     could evolve it much faster.
>> >> >
>> >> >     That would also give celery provider some "real" meaning - so far
>> >> >     celery provider was merely exposing celery queue sensor, but when we
>> >> >     move CeleryExecutor to the provider, the provider would finally be a
>> >> >     useful one.
>> >> >
>> >> >     Doing this requires a few small things:
>> >> >     * we likely need to add dynamic import in the old "executors" 
>> >> > package
>> >> >     following PEP 562 (and the way we've done operators) and deprecation
>> >> >     notice in case someone uses them from there.
>> >> >     * we likely need to add "cncf.kubernetes" and "celery" packages as
>> >> >     pre-installed providers - alongside http, fttp, common.sql, sqlite,
>> >> >     imap.
>> >> >
>> >> >     I **think**, after AIP-51 gets implemented, this would be a fully
>> >> >     backwards-compatible change - it's just a matter of proper 
>> >> > management
>> >> >     of the dependencies. We could add min-cncf.kubernetes/celery 
>> >> > versions
>> >> >     in Airflow 2.6, so that we are sure 2.6+ airflow uses only newer
>> >> >     providers and kubernetes/celery code and this would be fully
>> >> >     transparent for the users, I believe.
>> >> >
>> >> >     J.
>> >> >

Re: [DISCUSSION] Move K8S and Celery Executors (and related) to respective providers?

Reply via email to