Hello everyone,

TL;DR; I wanted to discuss one aspect of the "configuration" separation
between core and providers that I was working on while moving executors to
providers - namely : can providers contribute configuration options in
"core" sections of configuration.

It is almost complete and we are gearing up to release it in 2.7.0 together
with executor separation - we have one final PR that moves configuration
for celery to the celery provider and it is at the last stages of review
[1].

The effect it is going to have:

* it will keep 100% compatibility for the user when it comes to configuring
Airflow
* documentation of configuration for providers is moved to the provider's
docs

In the PR (also attached) you can see screenshots on how the docs will look
like after separation (which is the only "user" visible change. The
documentation of configuration for providers will land in provider's docs
and we will only have links from the main configuration page of Airflow to
community provider's configuration (for those providers that will have its
own configuration).

This is all nice and cool.

You will see that "celery", "celery_broker_transport_options" and
"celery_kubernetes_executor" sections land in the Celery docs, disappear
from airflow config but there is a separate page that lists "celery"
provider as having configuration and linking between the core and provider
docs.

But - we have a possibility that a provider could contribute a new  option
or change default for an existing option into some of the "core" sections.

And Jed asked a great and important question - do we want to do it or not?

We don't need it. But I feel it can be useful and can remove the last point
where we have some configuration coupling between core and providers.

We even have such a case even now and it's good to explain it:

https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#celery-logging-level

* AIRFLOW__LOGGING__CELERY_LOGGING_LEVEL

It is currently in the "logging" section. And I think it is quite right
that it is in logging. But technically - this one belongs to both logging
and celery.

I did not not want to move it just yet to celery provider (I wanted to make
sure we want to keep it simple (and I think we do not have to decide in
order to merge this PR) and I did not want to complicate the whole setup,
but it feels like it **could be** contributed by celery:

* it is only used by code in celery provider
* it is only useful when you use Celery executor

So - purely theoretically, we could move it entirely to celery - and get it
described there, rather than in Airflow core.  The only (small) thing - it
is also part of a "deprecated" array so we would have to add a possibility
of a provider contributing also deprecated entries to the core
configuration (and that would be a rather easy, small change).

We do not "really" need it - Celery is anyhow going to be referred from the
core in the documentation and will come pre-installed, so this would also
be ok to treat this as part of "core" of airflow.

Benefits:

* core would have no knowledge of celery (except documentation for Celery
Executor
* we could easier separate the code if we decide providers go out to
separate repo
* documentation for the logging/celery_logging_level would automatically go
to celery provider

I can think of three drawbacks there:

* potential confusion for docs - where the configuration comes from (though
it is already largely addressed - in the PR you can run `airflow config
list --include-sources` and you will be able to see for each option whether
it was contributed from "defaults" or "defaults celery"

* providers could also semi-randomly override each other's defaults - we do
not have a mechanism to make providers have separate "namespaces" for
sections - it's mostly a convention of section name. Maybe we can change it
in the future, but for now it's more of a "gentleman's agreement" to decide
what is the section name that each provider contributes.

* potential security issue where 3rd-party provider could change "any"
default for core

I am not sure if the last one is really a threat. We already assume that
providers are behaving "nicely". We already use entry points to discover
providers and plugins and this means that any package installed in the
system that provides the entrypoints, can execute arbitrary code while
airflow is starting (and it could do all kinds of nasty things). And this
is basically no different than installing and running any package in the
system.

However, for this case, we could potentially easily add a protection and
refuse any provider (including celery) to contribute the options in
existing sections and "fail".

We might want to add a section about it to our security model [2], but I
think providers should be treated as "trusted".

I think - to summarise it, we have three options:

1) to allow providers to contribute "core" sections - total freedom, trust,
gentleman's agreement that providers do not do nasty things (current
situation)

2) to have a list of "core" sections that can be only contributed by the
core - (easy to add for 2.7 if we decide to as a small follow up after
32064 )

3) we could also have a way to "reserve" certain sections for certain
providers (but this is rather long term and we do not really need that yet).

WDYT?

[1] https://github.com/apache/airflow/pull/32604
[2]
https://airflow.apache.org/docs/apache-airflow/stable/security/index.html

J.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to