Re: [DISCUSSION] Move K8S and Celery Executors (and related) to respective providers?

Mehta, Shubham Tue, 31 Jan 2023 18:06:56 -0800

Thank you for the detailed reply! It takes me at least 2 cups of coffee to 
absorb all the information you provide in your emails :)


> Release managers are merely executing the mechanics of the release 
> preparation and ca make decisions in case someone finds issues that are 
> potentially blockling.

Fair point. I was concerned about the work required to cut releases and ensure 
that contributors have tested all the changes, but looks like RMs have got that 
under control. It makes sense that releasing more frequently could reduce 
overall friction and pain.

> if you (as deployment manager) find a problem in k8s executor after upgrading 
> the provider, you will be able to fix the problem by selective downgrading 
> the provider to previous version, without having to downgrade airflow. You 
> will even be able to use executor that you used and loved in Airflow 2.6 
> (say) even if you upgrade to Airflow 2.7 (providing of course that our 
> BaseExecutor API is stable).

This is a brilliant point which I didn't consider before. The added flexibility 
will certainly solve a lot of problems. I am more convinced that separating 
executors is the right way to go. 

> Executors are always executed in the scheduler context and that means that 
> they should use direct DB access - same as all other scheduler-run code.

This makes sense and clarifies my question. Internal API work is driven by the 
separation of untrusted and trusted component. Irrespective of how executor 
release and contributions are managed, airflow will always consider executors 
as trusted.

> Each tenant should have (whatever way deployment is managed) their own set of 
> dependencies for "workers" + "dag file processors", but scheduler should have 
> it's own set, managed independently.

This is very important point that I realized while working on the Multi-tenancy 
doc, thanks to Mateusz and you. Once I get some time, I will emphasize this 
more in the current document. 

> scheduler might only accept executors that are configured via plugin 
> mechanism.

This seems like a reasonable trade-off to improve security, but, as you said, 
it doesn't seem necessary for now. If any concerns arise, we can always revisit 
it.

Shubham


On 2023-01-31, 12:43 AM, "Jarek Potiuk" <[email protected]> wrote:

    CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



     > 1. How does this impact work of release managers? I see them
    already working pretty hard to release providers and Airflow core. I
    wonder if this makes their task even harder, especially as these
    require more rigorous testing as compared to providers.

    Good point. I think it is less of a problem for release managers and
    more for those who contributed related changes. Contrary to what you
    might know from elsewhere, release managers DO NOT test Airflow when
    releasing. Release managers are merely executing the mechanics of the
    release preparation and ca make decisions in case someone finds issues
    that are potentially blockling:
    https://infra.apache.org/release-publishing.html#releasemanager

    > Most projects designate a committer to be the release manager who takes 
responsibility for the mechanics of a release. It is a good idea to let several 
committers take this role on different releases so that more than one person is 
comfortable doing a release. Release managers shepherd a release from an 
initial community consensus to getting the compiled code package to final 
distribution, and may be involved in publicizing the release to the project's 
community and the ASF in general.

    Our current process for providers already involves asking the
    community to test the release candidate, including the Github Issue
    automated pinging and asking those who contributed the change to test
    it https://github.com/apache/airflow/issues/29103. This absolutely
    does not change in the process. If anything, this has actually
    potential to be MUCH better tested and safer to use. The usual saying
    that I keep on repeating - if something is painful when it comes to
    releases - do it more often. If we move K8S executor to the providers
    it has two side effects - the changes to k8s executor will come in
    smaller increments, and they will never be distributed between the
    cncf.kubernetes provider and airflow (which on it's own is recipe for
    much more stable behaviour) and - more importantly - users will be
    able to go BACK to a working k8s executor when they upgrade
    cncf.kubernetes. Let this sink for a while - if you (as deployment
    manager) find a problem in k8s executor after upgrading the provider,
    you will be able to fix the problem by selective downgrading the
    provider to previous version, without having to downgrade airflow. you
    will even be able to use executor that you used and loved in Airflow
    2.6 (say) even if you upgrade to Airflow 2.7 (providing of course that
    our BaseExecutor API is stable). But we can also add some explicit
    version limitations in Airflow if we would like to make
    back-incompatible changes - and we can do that by adding
    min-requirement for cncf.kubernetes provider. There is a lot of
    flexibility and control both sides have - maintainers can impose
    certain requirements, while users should be free to move within the
    limitations set by maintainers.

    I'd say all those effects are only positive.

    > 2. As part of AIP-44, should we create an internal API for executors? I 
recall a discussion in Slack (related to ECS) where we aspire to remove all 
database access from providers.

    This is an excellent point that has some interesting consequences.
    Glad that you raised it.

    We do not need Internal API for executors. In the case of AIP-44 it
    does not matter where the code comes from, it matters in which context
    the code is executed in (i.e. scheduler/task/triggerer etc.).
    Executors are always executed in the scheduler context and that means
    that they should use direct DB access - same as all other
    scheduler-run code.

    But that indeed raises another question for the future "multi-tenant"
    setup. Will tenants be able to upgrade their cncf.kubernetes provider
    and impact the code run by scheduler? That would be a security problem
    if the second part of the question is true. But this is not how it
    will work for multi-tenant setup.

    By definition, those who run multi-tenant future Airlfow should not
    allow tenants to install anything in the scheduler. It's already given
    that scheduler has a different set of dependencies (so possibly
    different set of providers) installed than the "workers" of the tenant
    (and managed by different people - Airflow admins, not Tenant admins).
    Each tenant should have (whatever way deployment is managed) their own
    set of dependencies for "workers" + "dag file processors", but
    scheduler shoudl have it's own set, managed independently. Such
    scheduler should even have no "regular" providers installed - just
    plugins that extend it's capabilities with timetables and the like. So
    we are already in the situation that scheduler has different set of
    dependencies than processors and workers.

    And this is perfectly fine to have scheduler installed with different
    kubernetes provider - acting as plugin extending scheduler
    capabilities (managed by the central deployment team) and each tenant
    might have the same or a different provider as long as it is
    compatible with the version of Airflow used. Scheduler will only use
    the "executor" part of the provider, while workers and dag file
    processors will only use the KPO part of their provider. No overlap or
    dependency there.

    I think however (and this is something that we might consider) -
    executors could only be used if exposed in providers via plugin
    mechanism/entrypoint similarly as Timetables. And for security,
    scheduler might only accept executors that are configured via plugin
    mechanism. This would be slightly backwards incompatible and is not
    strictly necessary (because someone would have to have access to
    configuration to use a scheduler which is not coming via plugin) - but
    we might consider that to enhance a bit security and add another layer
    of it. We already have a precedent where hive provider exposes both -
    provider entrypoint and plugin entrypoint, and we could do the same
    for cncf.kubernetes, and celery. Worth considering for sure, not
    strictly necessary.

    >
    > Best,
    > Shubham
    >
    >
    >
    > On 2023-01-29, 1:21 AM, "Jarek Potiuk" <[email protected]> wrote:
    >
    >     CAUTION: This email originated from outside of the organization. Do 
not click links or open attachments unless you can confirm the sender and know 
the content is safe.
    >
    >
    >
    >     Hello Everyone,
    >
    >     As a follow-up to AIP-51 - when it is completed (with few more quirks
    >     like the one described by Andrey in the "Rendered Task Instance
    >     Fields" discussion) - it should now be entirely possible to move
    >     Kubernetes Executor, Celery Executor and related "joined" executors to
    >     respective providers.
    >
    >     There is of course question where to move CeleryKubernetesExecutor,
    >     but we have prior art with Transfer Operators and we might choose to
    >     add it to "cncf.kubernetes" and make the celery provider an optional
    >     dependency of K8S provider.
    >
    >     This has multiple advantages, and one of the biggest one I can think
    >     of is that we could evolve K8S executor faster than Airflow itself and
    >     we would avoid quite a lot of complexities involved in parallel
    >     modifications of K8S code in provider and executor (it happened in the
    >     past that we had do to add very high min-airflow version for
    >     "cncf.kubernetes" provider in order to make this happens, and we could
    >     likely avoid similar problems by making sure K8S executor is used as
    >     "yet another executor" - compliant with the AIP-51 interface and we
    >     could evolve it much faster.
    >
    >     That would also give celery provider some "real" meaning - so far
    >     celery provider was merely exposing celery queue sensor, but when we
    >     move CeleryExecutor to the provider, the provider would finally be a
    >     useful one.
    >
    >     Doing this requires a few small things:
    >     * we likely need to add dynamic import in the old "executors" package
    >     following PEP 562 (and the way we've done operators) and deprecation
    >     notice in case someone uses them from there.
    >     * we likely need to add "cncf.kubernetes" and "celery" packages as
    >     pre-installed providers - alongside http, fttp, common.sql, sqlite,
    >     imap.
    >
    >     I **think**, after AIP-51 gets implemented, this would be a fully
    >     backwards-compatible change - it's just a matter of proper management
    >     of the dependencies. We could add min-cncf.kubernetes/celery versions
    >     in Airflow 2.6, so that we are sure 2.6+ airflow uses only newer
    >     providers and kubernetes/celery code and this would be fully
    >     transparent for the users, I believe.
    >
    >     J.
    >

Re: [DISCUSSION] Move K8S and Celery Executors (and related) to respective providers?

Reply via email to