Anticipating result of the discussion - I updated the PR https://github.com/apache/airflow/pull/32604 to forbid contributing sections that are already there. This means that:
1) core sections (or their values) cannot be overwritten by any provider 2) only one provider can contribute a section. If two of them try to add options in the same section - only one (random) will succeed to add their options. Loading provider information will still succeed but we will have warning describing where the conflict originates from That should avoid accidental cases and enforce the "gentleman's agreement". We can always change it in the future. So if you think it is still good idea to allow overriding in existing section - feel free to speak up :) J. On Fri, Jul 21, 2023 at 12:02 PM Jarek Potiuk <[email protected]> wrote: > > Comment: > > I started to think that indeed option 2) is best. And as a way out > from the celery logging case, we could also deprecate the > celery/logging and move the logging option for celery to "[celery]" - > that would solve the need to "merge" sections entirely for the current > move of celery executor to provider. > > J. > > On Fri, Jul 21, 2023 at 11:51 AM Michał Modras > <[email protected]> wrote: > > > > In general I support this suggestion for more clarity and separation, but > > on the assumption we at least protect the core sections as in point 2). > > With a scale large enough, someone, consciously or not, will break the > > gentleman's agreement. > > > > On Fri, Jul 21, 2023 at 7:13 AM Jarek Potiuk <[email protected]> wrote: > > > > > Hello everyone, > > > > > > TL;DR; I wanted to discuss one aspect of the "configuration" separation > > > between core and providers that I was working on while moving executors to > > > providers - namely : can providers contribute configuration options in > > > "core" sections of configuration. > > > > > > It is almost complete and we are gearing up to release it in 2.7.0 > > > together with executor separation - we have one final PR that moves > > > configuration for celery to the celery provider and it is at the last > > > stages of review [1]. > > > > > > The effect it is going to have: > > > > > > * it will keep 100% compatibility for the user when it comes to > > > configuring Airflow > > > * documentation of configuration for providers is moved to the provider's > > > docs > > > > > > In the PR (also attached) you can see screenshots on how the docs will > > > look like after separation (which is the only "user" visible change. The > > > documentation of configuration for providers will land in provider's docs > > > and we will only have links from the main configuration page of Airflow to > > > community provider's configuration (for those providers that will have its > > > own configuration). > > > > > > This is all nice and cool. > > > > > > You will see that "celery", "celery_broker_transport_options" and > > > "celery_kubernetes_executor" sections land in the Celery docs, disappear > > > from airflow config but there is a separate page that lists "celery" > > > provider as having configuration and linking between the core and provider > > > docs. > > > > > > But - we have a possibility that a provider could contribute a new option > > > or change default for an existing option into some of the "core" sections. > > > > > > And Jed asked a great and important question - do we want to do it or not? > > > > > > We don't need it. But I feel it can be useful and can remove the last > > > point where we have some configuration coupling between core and > > > providers. > > > > > > We even have such a case even now and it's good to explain it: > > > > > > > > > https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#celery-logging-level > > > > > > * AIRFLOW__LOGGING__CELERY_LOGGING_LEVEL > > > > > > It is currently in the "logging" section. And I think it is quite right > > > that it is in logging. But technically - this one belongs to both logging > > > and celery. > > > > > > I did not not want to move it just yet to celery provider (I wanted to > > > make sure we want to keep it simple (and I think we do not have to decide > > > in order to merge this PR) and I did not want to complicate the whole > > > setup, but it feels like it **could be** contributed by celery: > > > > > > * it is only used by code in celery provider > > > * it is only useful when you use Celery executor > > > > > > So - purely theoretically, we could move it entirely to celery - and get > > > it described there, rather than in Airflow core. The only (small) thing - > > > it is also part of a "deprecated" array so we would have to add a > > > possibility of a provider contributing also deprecated entries to the core > > > configuration (and that would be a rather easy, small change). > > > > > > We do not "really" need it - Celery is anyhow going to be referred from > > > the core in the documentation and will come pre-installed, so this would > > > also be ok to treat this as part of "core" of airflow. > > > > > > Benefits: > > > > > > * core would have no knowledge of celery (except documentation for Celery > > > Executor > > > * we could easier separate the code if we decide providers go out to > > > separate repo > > > * documentation for the logging/celery_logging_level would automatically > > > go to celery provider > > > > > > I can think of three drawbacks there: > > > > > > * potential confusion for docs - where the configuration comes from > > > (though it is already largely addressed - in the PR you can run `airflow > > > config list --include-sources` and you will be able to see for each option > > > whether it was contributed from "defaults" or "defaults celery" > > > > > > * providers could also semi-randomly override each other's defaults - we > > > do not have a mechanism to make providers have separate "namespaces" for > > > sections - it's mostly a convention of section name. Maybe we can change > > > it > > > in the future, but for now it's more of a "gentleman's agreement" to > > > decide > > > what is the section name that each provider contributes. > > > > > > * potential security issue where 3rd-party provider could change "any" > > > default for core > > > > > > I am not sure if the last one is really a threat. We already assume that > > > providers are behaving "nicely". We already use entry points to discover > > > providers and plugins and this means that any package installed in the > > > system that provides the entrypoints, can execute arbitrary code while > > > airflow is starting (and it could do all kinds of nasty things). And this > > > is basically no different than installing and running any package in the > > > system. > > > > > > However, for this case, we could potentially easily add a protection and > > > refuse any provider (including celery) to contribute the options in > > > existing sections and "fail". > > > > > > We might want to add a section about it to our security model [2], but I > > > think providers should be treated as "trusted". > > > > > > I think - to summarise it, we have three options: > > > > > > 1) to allow providers to contribute "core" sections - total freedom, > > > trust, gentleman's agreement that providers do not do nasty things > > > (current > > > situation) > > > > > > 2) to have a list of "core" sections that can be only contributed by the > > > core - (easy to add for 2.7 if we decide to as a small follow up after > > > 32064 ) > > > > > > 3) we could also have a way to "reserve" certain sections for certain > > > providers (but this is rather long term and we do not really need that > > > yet). > > > > > > WDYT? > > > > > > [1] https://github.com/apache/airflow/pull/32604 > > > [2] > > > https://airflow.apache.org/docs/apache-airflow/stable/security/index.html > > > > > > J. > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [email protected] > > > For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
