Hello Folks, Was not able to get to the comments on this one sooner, but I finally got to it and have addressed reviewers comments as well as added a tool here that can ease the migration effort for *all providers* and their *hooks*.
Hoping to get some more eyes on this. Thanks & Regards, Amogh Desai On Mon, Jan 19, 2026 at 11:58 AM Amogh Desai <[email protected]> wrote: > Thanks for your inputs Jarek and Jens. > > Yes, all of what you said makes sense in terms of auth managers, secrets > backends. > The providers list API just provides basic information like this: > > { > "package_name": "apache-airflow-providers-airbyte", > "description": "Airbyte https://airbyte.com/", > "version": "5.3.1", > "documentation_url": " > https://airflow.apache.org/docs/apache-airflow-providers-airbyte/5.3.1" > }, > > For each provider. This information can probably be retrieved pretty > easily using either the yaml itself > or add it to the DB if needed, I think we can be done without having to > write to the DB. > > I have the north star in my mind as I think about the refractors :) > > Jens - yes, jsonschema turned out to be a better way to do this and I > updated my PR for that very reason. > > I would love to get a few more reviews on that PR :) > > Thanks & Regards, > Amogh Desai > > > On Sat, Jan 17, 2026 at 1:22 AM Jens Scheffler <[email protected]> > wrote: > >> +100 still - especially on the JSON schema thing. >> >> JSON schema was once decided to be the base of Params and the very first >> AIP-50 trigger form built on it and such evolved the todays trigger UI >> as well. All is internally transferred as JSON Schema. So great that you >> catched-up on this, a custom schema would have been bad. Also this >> allows for future extension and added validation which we might not >> support today in Trigger Form - can be plugged with more features in the >> future. >> >> On 16.01.26 15:23, Jarek Potiuk wrote: >> >> There are a few more reasons why API server will continue to need the >> > ProvidersManager: >> > >> > Yeah, I was aware we likely have a few more things I forgot, but this >> idea >> > extends to those nicely: >> > >> > 1. Auth Managers -> I consider this as an api-server plugin :), or >> possibly >> > separate (apache-airlfow-auth-manager) type of distribution (again this >> > will work nicely with "shared" library") >> > 2. Secrets Backends -> not sure if that is needed for api-server (maybe >> > just for configuration retrieval? ) this again can be a plugin - or >> > separate (apache-airflow-secrets-backend) >> > 3. Providers List Endpoint: maybe we should get rid of this? > >> Eventually >> > this should be part of the same Triggerer DB storage - > triggerer >> > should store in the DB list of providers installed - already what we >> > currently have in api-server is kinda wrong - because even now >> potentially >> > we can have different providers installed on api-server and different in >> > workers/triggers - and only those installed in api-server will show up, >> > swtiching it to reading from DB that will be updated by Triggerrer (also >> > including team_id as there might be different sets of providers for >> > different teams) - will make it "correct" (eventually). >> > >> > But Yeah. We definitely can defer any of that to be done later, if we do >> > not find it "easier" to do it together - absolutely no pressure there, >> just >> > wanted to make sure the "North star" is quite commonly agreed, so that >> we >> > know where we are going :). We can definitely proceed with the current >> POC >> > "as is" >> > >> > J. >> > >> > >> > On Fri, Jan 16, 2026 at 11:11 AM Amogh Desai <[email protected]> >> wrote: >> > >> >> Thanks for the suggestion for using jsonschema! >> >> >> >> I updated the implementation to use jsonschema instead of the custom >> >> format. Now the structure looks like this for example: >> >> >> >> conn-fields: >> >> timeout: >> >> label: "Connection Timeout" >> >> description: "Timeout in seconds" >> >> schema: >> >> type: integer >> >> minimum: 1 >> >> maximum: 300 >> >> default: 30 >> >> >> >> As for the concerns regarding GCP (14 fields including string, int, >> >> boolean, and password), I tested it and it >> >> works well (updated on PR). The code now uses schema object for all >> >> jsonschema validation properties like min, max, pattern, >> >> enum, etc while keeping UI metadata like label, description, sensitive >> or >> >> not at the top level. This aligns >> >> better with the react UI which already expects this format. >> >> >> >> Thanks & Regards, >> >> Amogh Desai >> >> >> >> >> >> On Fri, Jan 16, 2026 at 12:50 PM Amogh Desai <[email protected]> >> >> wrote: >> >> >> >>> Ash - >> >>> >> >>> Good catch on the GCP concern. I checked it and this is what it uses: >> >>> >> >>> @classmethod >> >>> def get_connection_form_widgets(cls) -> dict[str, Any]: >> >>> """Return connection widgets to add to connection form.""" >> >>> from flask_appbuilder.fieldwidgets import >> BS3PasswordFieldWidget, >> >>> BS3TextFieldWidget >> >>> from flask_babel import lazy_gettext >> >>> from wtforms import BooleanField, IntegerField, >> PasswordField, >> >>> StringField >> >>> from wtforms.validators import NumberRange >> >>> >> >>> return { >> >>> "project": StringField(lazy_gettext("Project Id"), >> >>> widget=BS3TextFieldWidget()), >> >>> "key_path": StringField(lazy_gettext("Keyfile Path"), >> >>> widget=BS3TextFieldWidget()), >> >>> "keyfile_dict": PasswordField(lazy_gettext("Keyfile >> JSON"), >> >>> widget=BS3PasswordFieldWidget()), >> >>> "credential_config_file": StringField( >> >>> lazy_gettext("Credential Configuration File"), >> >>> widget=BS3TextFieldWidget() >> >>> ), >> >>> "scope": StringField(lazy_gettext("Scopes (comma >> >> separated)"), >> >>> widget=BS3TextFieldWidget()), >> >>> "key_secret_name": StringField( >> >>> lazy_gettext("Keyfile Secret Name (in GCP Secret >> >>> Manager)"), widget=BS3TextFieldWidget() >> >>> ), >> >>> "key_secret_project_id": StringField( >> >>> lazy_gettext("Keyfile Secret Project Id (in GCP >> Secret >> >>> Manager)"), widget=BS3TextFieldWidget() >> >>> ), >> >>> "num_retries": IntegerField( >> >>> lazy_gettext("Number of Retries"), >> >>> validators=[NumberRange(min=0)], >> >>> widget=BS3TextFieldWidget(), >> >>> default=5, >> >>> ), >> >>> "impersonation_chain": StringField( >> >>> lazy_gettext("Impersonation Chain"), >> >>> widget=BS3TextFieldWidget() >> >>> ), >> >>> "idp_issuer_url": StringField( >> >>> lazy_gettext("IdP Token Issue URL (Client Credentials >> >>> Grant Flow)"), >> >>> widget=BS3TextFieldWidget(), >> >>> ), >> >>> "client_id": StringField( >> >>> lazy_gettext("Client ID (Client Credentials Grant >> >> Flow)"), >> >>> widget=BS3TextFieldWidget() >> >>> ), >> >>> "client_secret": StringField( >> >>> lazy_gettext("Client Secret (Client Credentials Grant >> >>> Flow)"), >> >>> widget=BS3PasswordFieldWidget(), >> >>> ), >> >>> "idp_extra_parameters": StringField( >> >>> lazy_gettext("IdP Extra Request Parameters"), >> >>> widget=BS3TextFieldWidget() >> >>> ), >> >>> "is_anonymous": BooleanField( >> >>> lazy_gettext("Anonymous credentials (ignores all >> other >> >>> settings)"), default=False >> >>> ), >> >>> } >> >>> >> >>> @classmethod >> >>> def get_ui_field_behaviour(cls) -> dict[str, Any]: >> >>> """Return custom field behaviour.""" >> >>> return { >> >>> "hidden_fields": ["host", "schema", "login", "password", >> >>> "port", "extra"], >> >>> "relabeling": {}, >> >>> } >> >>> >> >>> All of these are covered by my schema. >> >>> >> >>> Also checked what the react UI supports and: >> >>> >> >>> I checked what the react UI supports as of now and this is what I >> found: >> >>> >> >>> string - Text input >> >>> integer - Number input >> >>> number - Number input >> >>> boolean - Checkbox >> >>> object - JSON object editor >> >>> array - Array input >> >>> >> >>> String Formats: >> >>> format: "password" - Masked password field >> >>> format: "multiline" - Textarea >> >>> format: "date" - Date picker >> >>> format: "date-time" - DateTime picker >> >>> format: "time" - Time picker >> >>> >> >>> Array Types >> >>> >> >>> This all comes from a field selector logic: >> >>> >> >> >> https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/src/components/FlexibleForm/FieldSelector.tsx#L58-L92 >> >>> . >> >>> >> >>> Fields are selected based on >> >>> - `schema.type` (string, integer, boolean, array, object) >> >>> - `schema.format` (password, multiline, date, date-time, time, email, >> >> url) >> >>> - `schema.enum` (if present, dropdown select) >> >>> >> >>> So essentially anything with a type, format, and enum defined can be >> >>> handled by react UI. That said, maybe I should >> >>> try and adopt using jsonschema format here. >> >>> >> >>> Thanks & Regards, >> >>> Amogh Desai >> >>> >> >>> >> >>> On Fri, Jan 16, 2026 at 12:36 PM Amogh Desai <[email protected]> >> >>> wrote: >> >>> >> >>>> Jarek - >> >>>> >> >>>> Re backcompat, yeah, I already have the fallback in place in my POC. >> The >> >>>> discovery code >> >>>> will first try to load the metadata from yaml, and if it fails to do >> so, >> >>>> it will use the *python method* >> >>>> flow to discover the metadata. >> >>>> >> >>>> Re the bigger vision about API servers without providers, I love >> where >> >>>> you are going with this, but >> >>>> I think we need to split up the tasks because we aren't there yet. >> Let >> >> me >> >>>> explain - >> >>>> >> >>>> Your idea to discover providers via triggerer, store in DB and API >> >> server >> >>>> reads from DB might work >> >>>> for connection forms, but there are a few more reasons why API server >> >>>> will continue to need the >> >>>> ProvidersManager: >> >>>> >> >>>> 1. Auth Managers >> >>>> 2. Secrets Backends >> >>>> 3. Providers List Endpoint: maybe we should get rid of this? IDK who >> the >> >>>> consumer of this endpoint is >> >>>> >> >>>> So the API server without Providers thing is harder than just >> connection >> >>>> forms and we aren't there yet >> >>>> until we figure out the 3 points from above. >> >>>> >> >>>> I suggest we do this instead: >> >>>> >> >>>> Phase 1: Connection forms from YAML to establish foundation for the >> >> future >> >>>> Phase 2: The DB storage phase - decide if Triggerer / who populates >> in >> >> DB >> >>>> (Maybe not triggerer because we do not want it to have DB access >> >>>> eventually) >> >>>> >> >>>> Does that sound reasonable? What do you think? >> >>>> >> >>>> >> >>>> Thanks & Regards, >> >>>> Amogh Desai >> >>>> >> >>>> >> >>>> On Fri, Jan 16, 2026 at 4:46 AM Jarek Potiuk <[email protected]> >> wrote: >> >>>> >> >>>>>> One main thing was assuming that all providers need to be available >> >> on >> >>>>>> Scheduler (? I think that changed?) that there the connection form >> >>>>>> definitons are persisted to DB such that the API server directly >> can >> >>>>>> read from there - no need to install providers on API Server! >> >>>>> I think Triggerer is better than Scheduler to persist connection >> >>>>> definition >> >>>>> to the DB. Essentially Triggerer is the only component that needs DB >> >>>>> access >> >>>>> and also needs to have providers installed. Any of the providers >> might >> >>>>> implement Triggers and they are very tightly coupled with "Hooks" >> and >> >>>>> "Operators". Scheduler only really needs **scheduler plugins** >> >>>>> (Timetables >> >>>>> and such) and **executors** (which we eventually want to split-off >> from >> >>>>> current "worker" providers). It does not need "worker providers". >> >>>>> >> >>>>> IMHO in many discussions of ours this long term plan / vision is >> most >> >>>>> appealing: >> >>>>> >> >>>>> * api-server: only needs distributions that are "ui plugins" (no >> >>>>> providers) >> >>>>> * scheduler only needs distributions that are "scheduler plugins" >> (e.g. >> >>>>> timetables) and "executors" >> >>>>> * worker only needs "worker/triggerer providers" (i.e. hooks and >> >>>>> operators >> >>>>> essentially) and "worker plugins" (e.g. macros) >> >>>>> * triggerer only needs "worker/triggerer providers" (as in workers) >> - >> >>>>> possibly "triggerer plugins" if we ever have a need to have them >> >>>>> >> >>>>> Eventually, optionally, each of those should ("api-server", >> >> "scheduler", >> >>>>> "worker", "triggerer") should be a separate distribution. Each with >> its >> >>>>> own >> >>>>> dependencies. But this one only makes sense if we find that those >> >>>>> dependencies could be very different between those - it's likely >> this >> >>>>> will >> >>>>> not happen, because dependency-set for each of those "components" >> will >> >> be >> >>>>> very close. when we finalize the current task-sdk isolation work. >> >>>>> >> >>>>> Of course we cannot do it all at once and it will take quite some >> time >> >> to >> >>>>> get there. >> >>>>> >> >>>>> But I think we should have it as a "North Star" that we should look >> at >> >>>>> when >> >>>>> we make any "architecture" decisions. And every decision we make >> >> should >> >>>>> bring us closer to this "North Star". >> >>>>> >> >>>>> Also - just to note - with the "shared" libraries concept we already >> >>>>> have, >> >>>>> and with "uv workspace" in our monorepo - we have ALL the mechanisms >> >>>>> needed >> >>>>> to make it happen. And to do it in a very maintainable way with very >> >>>>> little >> >>>>> overhead and virtually no change in regular development workflow. >> For >> >>>>> example the shared libraries concept might be used to share common >> code >> >>>>> for >> >>>>> both: apache-airflow-providers-cncf-kubernetes (worker provider - >> KPO >> >>>>> essentially - installable for worker and triggerer) and (future) >> >>>>> apache-airflow-executors-cncf-kubernetes (executor installable for >> >>>>> scheduler). Same for amazon worker provider/executor split and edge >> >>>>> worker >> >>>>> provider/executor split. All that is doable. >> >>>>> >> >>>>> J. >> >>>>> >> >>>>> >> >>>>> >> >>>>> On Thu, Jan 15, 2026 at 10:23 PM Jens Scheffler < >> [email protected]> >> >>>>> wrote: >> >>>>> >> >>>>>> Also +100 from my side. >> >>>>>> >> >>>>>> We discussed exactly this in a Airflow 3 dev call, I was looking >> for >> >>>>> the >> >>>>>> notes... that was when we discussed about the component split in >> the >> >>>>>> future. Found a reference in >> >>>>>> >> >>>>>> >> >> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-22August2024 >> >>>>>> ``` >> >>>>>> >> >>>>>> **Plan for Decoupling Providers's Connections metadata from FAB >> (Jens >> >>>>>> Scheffler <https://cwiki.apache.org/confluence/display/~jscheffl >> >)** >> >>>>>> >> >>>>>> * Jens created this draft PR >> >>>>>> <https://github.com/apache/airflow/pull/41656> with the POC >> for >> >> it >> >>>>>> and presented it on the call. >> >>>>>> * Jarek <https://cwiki.apache.org/confluence/display/~potiuk> >> >>>>> proposed >> >>>>>> the idea of dumping the JSON/YAML with connection fields in >> the >> >>>>>> Database or loading it via package metadata so we don't load >> all >> >>>>> the >> >>>>>> dependencies on the webserver. >> >>>>>> * We will need some plan for external providers on how they can >> >>>>> define >> >>>>>> connections or register them. >> >>>>>> * The POC successfully proved that we can separate the >> connection >> >>>>>> metadata from FAB >> >>>>>> * /*Action Item*/: Jens >> >>>>>> <https://cwiki.apache.org/confluence/display/~jscheffl> to >> >> create >> >>>>> a >> >>>>>> GitHub issue for decoupling the Connection metadata from FAB >> >>>>>> >> >>>>>> ``` >> >>>>>> >> >>>>>> Also on Sep 19th 2024 we had an overview which pieces of the >> >> providers >> >>>>>> are needed where: >> >>>>>> >> >>>>>> >> >>>>>> >> >> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-19September2024 >> >>>>>> Follow-up was notes in Github ticket: >> >>>>>> https://github.com/apache/airflow/issues/42016 >> >>>>>> >> >>>>>> >> >>>>>> One main thing was assuming that all providers need to be available >> >> on >> >>>>>> Scheduler (? I think that changed?) that there the connection form >> >>>>>> definitons are persisted to DB such that the API server directly >> can >> >>>>>> read from there - no need to install providers on API Server! >> >>>>>> >> >>>>>> Looking forward for the contribution... I assume no VOTE needed :-D >> >>>>>> >> >>>>>> Jens >> >>>>>> >> >>>>>> On 1/15/26 15:52, Ash Berlin-Taylor wrote: >> >>>>>>> As an idea/structure I think its certainly the right way to go — >> >> not >> >>>>>> needing the code, not the instantiated widget classes, to (I >> suspect) >> >>>>> throw >> >>>>>> them away in the new React UI certainly seems like a silly idea >> now. >> >>>>>>> In your POC I don’t think you have got the ability to have the >> >> extra >> >>>>>> fields that, for instance, Google Cloud connection has yet though. >> >>>>>>> As for the schema we need to express: I’d say we should look at >> >> what >> >>>>> the >> >>>>>> react UI currently supports? >> >>>>>>> -ash >> >>>>>>> >> >>>>>>>> On 15 Jan 2026, at 14:07, Amogh Desai<[email protected]> >> >> wrote: >> >>>>>>>> Hi All, >> >>>>>>>> >> >>>>>>>> I wanted to get feedback on something I have been twiddling with. >> >>>>> For >> >>>>>>>> context, the API server has to import >> >>>>>>>> every single hook class from all providers just to render >> >> connection >> >>>>>> forms >> >>>>>>>> in the UI. This is because the UI >> >>>>>>>> metadata (what fields to show, labels, validators, etc.) are >> >> living >> >>>>> in >> >>>>>>>> python functions like `get_connection_form_widgets()` >> >>>>>>>> and `get_ui_field_behaviour()` which are defined on the hook >> >>>>> classes. >> >>>>>>>> This means: >> >>>>>>>> - API server startup imports 100+ hook classes it might not >> >> actually >> >>>>>> need >> >>>>>>>> - Slower startup due to heavier memory footprint >> >>>>>>>> - Poor client-server separation (why does the API server need to >> >>>>> know >> >>>>>> about >> >>>>>>>> pyodbc just to show a UI form?) >> >>>>>>>> >> >>>>>>>> My proposal >> >>>>>>>> >> >>>>>>>> Moving the UI metadata from python code to something static / >> >>>>>> declarative >> >>>>>>>> like yaml. I want to add this information >> >>>>>>>> in the provider.yaml file that every provider already has. For >> >>>>> example - >> >>>>>>>> class PostgresHook(BaseHook): >> >>>>>>>> @classmethod >> >>>>>>>> def get_ui_field_behaviour(cls) -> dict[str, Any]: >> >>>>>>>> return { >> >>>>>>>> "hidden_fields": [], >> >>>>>>>> "relabeling": { >> >>>>>>>> "schema": "Database", >> >>>>>>>> }, >> >>>>>>>> } >> >>>>>>>> >> >>>>>>>> Will become: >> >>>>>>>> >> >>>>>>>> connection-types: >> >>>>>>>> - connection-type: postgres >> >>>>>>>> hook-class-name: >> >>>>>> airflow.providers.postgres.hooks.postgres.PostgresHook >> >>>>>>>> ui-field-behaviour: >> >>>>>>>> hidden-fields: [] >> >>>>>>>> relabeling: >> >>>>>>>> schema: "Database" >> >>>>>>>> >> >>>>>>>> conn-fields: >> >>>>>>>> sslmode: >> >>>>>>>> type: string >> >>>>>>>> label: SSL Mode >> >>>>>>>> enum: ["disable", "prefer", "require"] >> >>>>>>>> default: "prefer" >> >>>>>>>> >> >>>>>>>> timeout: >> >>>>>>>> type: integer >> >>>>>>>> label: Timeout >> >>>>>>>> range: [1, 300] >> >>>>>>>> default: 30 >> >>>>>>>> >> >>>>>>>> The schema will now consist of two new sections: >> >>>>>>>> >> >>>>>>>> 1. ui-field-behaviour >> >>>>>>>> - Used to customize the standard connection fields (host, port, >> >>>>> login, >> >>>>>> etc.) >> >>>>>>>> - hidden-fields: Hide some fields >> >>>>>>>> - relabeling: Change labels for some fields (like schema -> >> >> Database >> >>>>>> above) >> >>>>>>>> - placeholders: Show hints in the form (port 5432 for example) >> >>>>>>>> >> >>>>>>>> 2. conn-fields >> >>>>>>>> - Can be used to define custom fields stored in Connection.extra >> >>>>>>>> - You can define inline validators like enum, range, pattern, >> >>>>>> min-length, >> >>>>>>>> max-length >> >>>>>>>> - Will support the standard wtforms string, integer, boolean, >> >> number >> >>>>>> types >> >>>>>>>> As for why this schema was chosen, check the comparison with >> >>>>>> alternative in >> >>>>>>>> the PR >> >>>>>>>> desc:https://github.com/apache/airflow/pull/60410 >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Current Status >> >>>>>>>> >> >>>>>>>> I have a POC in:https://github.com/apache/airflow/pull/60410 >> >> where >> >>>>> I >> >>>>>> chose >> >>>>>>>> two pilot providers of >> >>>>>>>> varying difficulty: HTTP and SMTP (HTTP is easy, just a vanilla >> >>>>> form but >> >>>>>>>> SMTP has some hidden fields). >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> Benefits this will offer >> >>>>>>>> >> >>>>>>>> - Once complete, the API server won't import any hook classes for >> >> UI >> >>>>>>>> rendering leading to faster startup >> >>>>>>>> - Provider dependencies don't affect API server >> >>>>>>>> - YAML is easier to read/write than python functions for form >> >>>>> metadata >> >>>>>>>> Would love feedback on: >> >>>>>>>> 1. Schema design - does it cover your use cases? >> >>>>>>>> 2. Any missing field types or validators? >> >>>>>>>> >> >>>>>>>> The goal is to get the pilot providers in so we can start >> >> migrating >> >>>>>>>> providers incrementally. Old way still >> >>>>>>>> works, so no rush for everyone to migrate at once. >> >>>>>>>> >> >>>>>>>> Thoughts? >> >>>>>>>> >> >>>>>>>> Thanks & Regards, >> >>>>>>>> Amogh Desai >> >>>>>>> >> >> --------------------------------------------------------------------- >> >>>>>>> To unsubscribe, e-mail:[email protected] >> >>>>>>> For additional commands, e-mail:[email protected] >> >>>>>>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >>
