Thank you for the detailed reply! It takes me at least 2 cups of coffee to absorb all the information you provide in your emails :)
> Release managers are merely executing the mechanics of the release > preparation and ca make decisions in case someone finds issues that are > potentially blockling. Fair point. I was concerned about the work required to cut releases and ensure that contributors have tested all the changes, but looks like RMs have got that under control. It makes sense that releasing more frequently could reduce overall friction and pain. > if you (as deployment manager) find a problem in k8s executor after upgrading > the provider, you will be able to fix the problem by selective downgrading > the provider to previous version, without having to downgrade airflow. You > will even be able to use executor that you used and loved in Airflow 2.6 > (say) even if you upgrade to Airflow 2.7 (providing of course that our > BaseExecutor API is stable). This is a brilliant point which I didn't consider before. The added flexibility will certainly solve a lot of problems. I am more convinced that separating executors is the right way to go. > Executors are always executed in the scheduler context and that means that > they should use direct DB access - same as all other scheduler-run code. This makes sense and clarifies my question. Internal API work is driven by the separation of untrusted and trusted component. Irrespective of how executor release and contributions are managed, airflow will always consider executors as trusted. > Each tenant should have (whatever way deployment is managed) their own set of > dependencies for "workers" + "dag file processors", but scheduler should have > it's own set, managed independently. This is very important point that I realized while working on the Multi-tenancy doc, thanks to Mateusz and you. Once I get some time, I will emphasize this more in the current document. > scheduler might only accept executors that are configured via plugin > mechanism. This seems like a reasonable trade-off to improve security, but, as you said, it doesn't seem necessary for now. If any concerns arise, we can always revisit it. Shubham On 2023-01-31, 12:43 AM, "Jarek Potiuk" <[email protected]> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > 1. How does this impact work of release managers? I see them already working pretty hard to release providers and Airflow core. I wonder if this makes their task even harder, especially as these require more rigorous testing as compared to providers. Good point. I think it is less of a problem for release managers and more for those who contributed related changes. Contrary to what you might know from elsewhere, release managers DO NOT test Airflow when releasing. Release managers are merely executing the mechanics of the release preparation and ca make decisions in case someone finds issues that are potentially blockling: https://infra.apache.org/release-publishing.html#releasemanager > Most projects designate a committer to be the release manager who takes responsibility for the mechanics of a release. It is a good idea to let several committers take this role on different releases so that more than one person is comfortable doing a release. Release managers shepherd a release from an initial community consensus to getting the compiled code package to final distribution, and may be involved in publicizing the release to the project's community and the ASF in general. Our current process for providers already involves asking the community to test the release candidate, including the Github Issue automated pinging and asking those who contributed the change to test it https://github.com/apache/airflow/issues/29103. This absolutely does not change in the process. If anything, this has actually potential to be MUCH better tested and safer to use. The usual saying that I keep on repeating - if something is painful when it comes to releases - do it more often. If we move K8S executor to the providers it has two side effects - the changes to k8s executor will come in smaller increments, and they will never be distributed between the cncf.kubernetes provider and airflow (which on it's own is recipe for much more stable behaviour) and - more importantly - users will be able to go BACK to a working k8s executor when they upgrade cncf.kubernetes. Let this sink for a while - if you (as deployment manager) find a problem in k8s executor after upgrading the provider, you will be able to fix the problem by selective downgrading the provider to previous version, without having to downgrade airflow. you will even be able to use executor that you used and loved in Airflow 2.6 (say) even if you upgrade to Airflow 2.7 (providing of course that our BaseExecutor API is stable). But we can also add some explicit version limitations in Airflow if we would like to make back-incompatible changes - and we can do that by adding min-requirement for cncf.kubernetes provider. There is a lot of flexibility and control both sides have - maintainers can impose certain requirements, while users should be free to move within the limitations set by maintainers. I'd say all those effects are only positive. > 2. As part of AIP-44, should we create an internal API for executors? I recall a discussion in Slack (related to ECS) where we aspire to remove all database access from providers. This is an excellent point that has some interesting consequences. Glad that you raised it. We do not need Internal API for executors. In the case of AIP-44 it does not matter where the code comes from, it matters in which context the code is executed in (i.e. scheduler/task/triggerer etc.). Executors are always executed in the scheduler context and that means that they should use direct DB access - same as all other scheduler-run code. But that indeed raises another question for the future "multi-tenant" setup. Will tenants be able to upgrade their cncf.kubernetes provider and impact the code run by scheduler? That would be a security problem if the second part of the question is true. But this is not how it will work for multi-tenant setup. By definition, those who run multi-tenant future Airlfow should not allow tenants to install anything in the scheduler. It's already given that scheduler has a different set of dependencies (so possibly different set of providers) installed than the "workers" of the tenant (and managed by different people - Airflow admins, not Tenant admins). Each tenant should have (whatever way deployment is managed) their own set of dependencies for "workers" + "dag file processors", but scheduler shoudl have it's own set, managed independently. Such scheduler should even have no "regular" providers installed - just plugins that extend it's capabilities with timetables and the like. So we are already in the situation that scheduler has different set of dependencies than processors and workers. And this is perfectly fine to have scheduler installed with different kubernetes provider - acting as plugin extending scheduler capabilities (managed by the central deployment team) and each tenant might have the same or a different provider as long as it is compatible with the version of Airflow used. Scheduler will only use the "executor" part of the provider, while workers and dag file processors will only use the KPO part of their provider. No overlap or dependency there. I think however (and this is something that we might consider) - executors could only be used if exposed in providers via plugin mechanism/entrypoint similarly as Timetables. And for security, scheduler might only accept executors that are configured via plugin mechanism. This would be slightly backwards incompatible and is not strictly necessary (because someone would have to have access to configuration to use a scheduler which is not coming via plugin) - but we might consider that to enhance a bit security and add another layer of it. We already have a precedent where hive provider exposes both - provider entrypoint and plugin entrypoint, and we could do the same for cncf.kubernetes, and celery. Worth considering for sure, not strictly necessary. > > Best, > Shubham > > > > On 2023-01-29, 1:21 AM, "Jarek Potiuk" <[email protected]> wrote: > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > Hello Everyone, > > As a follow-up to AIP-51 - when it is completed (with few more quirks > like the one described by Andrey in the "Rendered Task Instance > Fields" discussion) - it should now be entirely possible to move > Kubernetes Executor, Celery Executor and related "joined" executors to > respective providers. > > There is of course question where to move CeleryKubernetesExecutor, > but we have prior art with Transfer Operators and we might choose to > add it to "cncf.kubernetes" and make the celery provider an optional > dependency of K8S provider. > > This has multiple advantages, and one of the biggest one I can think > of is that we could evolve K8S executor faster than Airflow itself and > we would avoid quite a lot of complexities involved in parallel > modifications of K8S code in provider and executor (it happened in the > past that we had do to add very high min-airflow version for > "cncf.kubernetes" provider in order to make this happens, and we could > likely avoid similar problems by making sure K8S executor is used as > "yet another executor" - compliant with the AIP-51 interface and we > could evolve it much faster. > > That would also give celery provider some "real" meaning - so far > celery provider was merely exposing celery queue sensor, but when we > move CeleryExecutor to the provider, the provider would finally be a > useful one. > > Doing this requires a few small things: > * we likely need to add dynamic import in the old "executors" package > following PEP 562 (and the way we've done operators) and deprecation > notice in case someone uses them from there. > * we likely need to add "cncf.kubernetes" and "celery" packages as > pre-installed providers - alongside http, fttp, common.sql, sqlite, > imap. > > I **think**, after AIP-51 gets implemented, this would be a fully > backwards-compatible change - it's just a matter of proper management > of the dependencies. We could add min-cncf.kubernetes/celery versions > in Airflow 2.6, so that we are sure 2.6+ airflow uses only newer > providers and kubernetes/celery code and this would be fully > transparent for the users, I believe. > > J. >
