Jarek -

Re backcompat, yeah, I already have the fallback in place in my POC. The
discovery code
will first try to load the metadata from yaml, and if it fails to do so, it
will use the *python method*
flow to discover the metadata.

Re the bigger vision about API servers without providers, I love where you
are going with this, but
I think we need to split up the tasks because we aren't there yet. Let me
explain -

Your idea to discover providers via triggerer, store in DB and API server
reads from DB might work
for connection forms, but there are a few more reasons why API server will
continue to need the
ProvidersManager:

1. Auth Managers
2. Secrets Backends
3. Providers List Endpoint: maybe we should get rid of this? IDK who the
consumer of this endpoint is

So the API server without Providers thing is harder than just connection
forms and we aren't there yet
until we figure out the 3 points from above.

I suggest we do this instead:

Phase 1: Connection forms from YAML to establish foundation for the future
Phase 2: The DB storage phase - decide if Triggerer / who populates in DB
(Maybe not triggerer because we do not want it to have DB access eventually)

Does that sound reasonable? What do you think?


Thanks & Regards,
Amogh Desai


On Fri, Jan 16, 2026 at 4:46 AM Jarek Potiuk <[email protected]> wrote:

> > One main thing was assuming that all providers need to be available on
> > Scheduler (? I think that changed?) that there the connection form
> >  definitons are persisted to DB such that the API server directly can
> > read from there - no need to install providers on API Server!
>
> I think Triggerer is better than Scheduler to persist connection definition
> to the DB. Essentially Triggerer is the only component that needs DB access
> and also needs to have providers installed. Any of the providers might
> implement Triggers and they are very tightly coupled with "Hooks" and
> "Operators".  Scheduler only really needs **scheduler plugins** (Timetables
> and such) and **executors** (which we eventually want to split-off from
> current "worker" providers). It does not need "worker providers".
>
> IMHO in many discussions of ours this long term plan / vision is most
> appealing:
>
> * api-server: only needs distributions that are "ui plugins" (no providers)
> * scheduler only needs distributions that are "scheduler plugins" (e.g.
> timetables) and "executors"
> * worker only needs "worker/triggerer providers" (i.e. hooks and operators
> essentially) and "worker plugins" (e.g. macros)
> * triggerer only needs "worker/triggerer providers" (as in workers) -
> possibly "triggerer plugins" if we ever have a need to have them
>
> Eventually, optionally, each of those should ("api-server", "scheduler",
> "worker", "triggerer") should be a separate distribution. Each with its own
> dependencies. But this one only makes sense if we find that those
> dependencies could be very different between those - it's likely this will
> not happen, because dependency-set for each of those "components" will be
> very close. when we finalize the current task-sdk isolation work.
>
> Of course we cannot do it all at once and it will take quite some time to
> get there.
>
> But I think we should have it as a "North Star" that we should look at when
> we make any "architecture" decisions.  And every decision we make should
> bring us closer to this "North Star".
>
> Also - just to note - with the "shared" libraries concept we already have,
> and with "uv workspace" in our monorepo - we have ALL the mechanisms needed
> to make it happen. And to do it in a very maintainable way with very little
> overhead and virtually no change in regular development workflow. For
> example the shared libraries concept might be used to share common code for
> both: apache-airflow-providers-cncf-kubernetes (worker provider - KPO
> essentially - installable for worker and triggerer) and (future)
> apache-airflow-executors-cncf-kubernetes (executor installable for
> scheduler). Same for amazon worker provider/executor split and edge worker
> provider/executor split. All that is doable.
>
> J.
>
>
>
> On Thu, Jan 15, 2026 at 10:23 PM Jens Scheffler <[email protected]>
> wrote:
>
> > Also +100 from my side.
> >
> > We discussed exactly this in a Airflow 3 dev call, I was looking for the
> > notes... that was when we discussed about the component split in the
> > future. Found a reference in
> >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-22August2024
> >
> > ```
> >
> > **Plan for Decoupling Providers's Connections metadata from FAB (Jens
> > Scheffler <https://cwiki.apache.org/confluence/display/~jscheffl>)**
> >
> >   * Jens created this draft PR
> >     <https://github.com/apache/airflow/pull/41656> with the POC for it
> >     and presented it on the call.
> >   * Jarek <https://cwiki.apache.org/confluence/display/~potiuk> proposed
> >     the idea of dumping the JSON/YAML with connection fields in the
> >     Database or loading it via package metadata so we don't load all the
> >     dependencies on the webserver.
> >   * We will need some plan for external providers on how they can define
> >     connections or register them.
> >   * The POC successfully proved that we can separate the connection
> >     metadata from FAB
> >   * /*Action Item*/: Jens
> >     <https://cwiki.apache.org/confluence/display/~jscheffl> to create a
> >     GitHub issue for decoupling the Connection metadata from FAB
> >
> > ```
> >
> > Also on Sep 19th 2024 we had an overview which pieces of the providers
> > are needed where:
> >
> >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-19September2024
> >
> > Follow-up was notes in Github ticket:
> > https://github.com/apache/airflow/issues/42016
> >
> >
> > One main thing was assuming that all providers need to be available on
> > Scheduler (? I think that changed?) that there the connection form
> > definitons are persisted to DB such that the API server directly can
> > read from there - no need to install providers on API Server!
> >
> > Looking forward for the contribution... I assume no VOTE needed :-D
> >
> > Jens
> >
> > On 1/15/26 15:52, Ash Berlin-Taylor wrote:
> > > As an idea/structure I think its certainly the right way to go — not
> > needing the code, not the instantiated widget classes, to (I suspect)
> throw
> > them away in the new React UI certainly seems like a silly idea now.
> > >
> > > In your POC I don’t think you have got the ability to have the extra
> > fields that, for instance, Google Cloud connection has yet though.
> > >
> > > As for the schema we need to express: I’d say we should look at what
> the
> > react UI currently supports?
> > >
> > > -ash
> > >
> > >> On 15 Jan 2026, at 14:07, Amogh Desai<[email protected]> wrote:
> > >>
> > >> Hi All,
> > >>
> > >> I wanted to get feedback on something I have been twiddling with. For
> > >> context, the API server has to import
> > >> every single hook class from all providers just to render connection
> > forms
> > >> in the UI. This is because the UI
> > >> metadata (what fields to show, labels, validators, etc.) are living in
> > >> python functions like `get_connection_form_widgets()`
> > >> and `get_ui_field_behaviour()` which are defined on the hook classes.
> > >>
> > >> This means:
> > >> - API server startup imports 100+ hook classes it might not actually
> > need
> > >> - Slower startup due to heavier memory footprint
> > >> - Poor client-server separation (why does the API server need to know
> > about
> > >> pyodbc just to show a UI form?)
> > >>
> > >> My proposal
> > >>
> > >> Moving the UI metadata from python code to something static /
> > declarative
> > >> like yaml. I want to add this information
> > >> in the provider.yaml file that every provider already has. For
> example -
> > >>
> > >> class PostgresHook(BaseHook):
> > >>     @classmethod
> > >>     def get_ui_field_behaviour(cls) -> dict[str, Any]:
> > >>         return {
> > >>             "hidden_fields": [],
> > >>             "relabeling": {
> > >>                 "schema": "Database",
> > >>             },
> > >>         }
> > >>
> > >> Will become:
> > >>
> > >> connection-types:
> > >>   - connection-type: postgres
> > >>     hook-class-name:
> > airflow.providers.postgres.hooks.postgres.PostgresHook
> > >>
> > >>     ui-field-behaviour:
> > >>       hidden-fields: []
> > >>       relabeling:
> > >>         schema: "Database"
> > >>
> > >>     conn-fields:
> > >>       sslmode:
> > >>         type: string
> > >>         label: SSL Mode
> > >>         enum: ["disable", "prefer", "require"]
> > >>         default: "prefer"
> > >>
> > >>       timeout:
> > >>         type: integer
> > >>         label: Timeout
> > >>         range: [1, 300]
> > >>         default: 30
> > >>
> > >> The schema will now consist of two new sections:
> > >>
> > >> 1. ui-field-behaviour
> > >> - Used to customize the standard connection fields (host, port, login,
> > etc.)
> > >> - hidden-fields: Hide some fields
> > >> - relabeling: Change labels for some fields (like schema -> Database
> > above)
> > >> - placeholders: Show hints in the form (port 5432 for example)
> > >>
> > >> 2. conn-fields
> > >> - Can be used to define custom fields stored in Connection.extra
> > >> - You can define inline validators like enum, range, pattern,
> > min-length,
> > >> max-length
> > >> - Will support the standard wtforms string, integer, boolean, number
> > types
> > >>
> > >> As for why this schema was chosen, check the comparison with
> > alternative in
> > >> the PR
> > >> desc:https://github.com/apache/airflow/pull/60410
> > >>
> > >>
> > >> Current Status
> > >>
> > >> I have a POC in:https://github.com/apache/airflow/pull/60410 where I
> > chose
> > >> two pilot providers of
> > >> varying difficulty: HTTP and SMTP (HTTP is easy, just a vanilla form
> but
> > >> SMTP has some hidden fields).
> > >>
> > >>
> > >> Benefits this will offer
> > >>
> > >> - Once complete, the API server won't import any hook classes for UI
> > >> rendering leading to faster startup
> > >> - Provider dependencies don't affect API server
> > >> - YAML is easier to read/write than python functions for form metadata
> > >>
> > >> Would love feedback on:
> > >> 1. Schema design - does it cover your use cases?
> > >> 2. Any missing field types or validators?
> > >>
> > >> The goal is to get the pilot providers in so we can start migrating
> > >> providers incrementally. Old way still
> > >> works, so no rush for everyone to migrate at once.
> > >>
> > >> Thoughts?
> > >>
> > >> Thanks & Regards,
> > >> Amogh Desai
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:[email protected]
> > > For additional commands, e-mail:[email protected]
> > >
>

Reply via email to