Hi All,
I wanted to get feedback on something I have been twiddling with. For
context, the API server has to import
every single hook class from all providers just to render connection forms
in the UI. This is because the UI
metadata (what fields to show, labels, validators, etc.) are living in
python functions like `get_connection_form_widgets()`
and `get_ui_field_behaviour()` which are defined on the hook classes.
This means:
- API server startup imports 100+ hook classes it might not actually need
- Slower startup due to heavier memory footprint
- Poor client-server separation (why does the API server need to know about
pyodbc just to show a UI form?)
My proposal
Moving the UI metadata from python code to something static / declarative
like yaml. I want to add this information
in the provider.yaml file that every provider already has. For example -
class PostgresHook(BaseHook):
@classmethod
def get_ui_field_behaviour(cls) -> dict[str, Any]:
return {
"hidden_fields": [],
"relabeling": {
"schema": "Database",
},
}
Will become:
connection-types:
- connection-type: postgres
hook-class-name: airflow.providers.postgres.hooks.postgres.PostgresHook
ui-field-behaviour:
hidden-fields: []
relabeling:
schema: "Database"
conn-fields:
sslmode:
type: string
label: SSL Mode
enum: ["disable", "prefer", "require"]
default: "prefer"
timeout:
type: integer
label: Timeout
range: [1, 300]
default: 30
The schema will now consist of two new sections:
1. ui-field-behaviour
- Used to customize the standard connection fields (host, port, login, etc.)
- hidden-fields: Hide some fields
- relabeling: Change labels for some fields (like schema -> Database above)
- placeholders: Show hints in the form (port 5432 for example)
2. conn-fields
- Can be used to define custom fields stored in Connection.extra
- You can define inline validators like enum, range, pattern, min-length,
max-length
- Will support the standard wtforms string, integer, boolean, number types
As for why this schema was chosen, check the comparison with alternative in
the PR
desc: https://github.com/apache/airflow/pull/60410
Current Status
I have a POC in: https://github.com/apache/airflow/pull/60410 where I chose
two pilot providers of
varying difficulty: HTTP and SMTP (HTTP is easy, just a vanilla form but
SMTP has some hidden fields).
Benefits this will offer
- Once complete, the API server won't import any hook classes for UI
rendering leading to faster startup
- Provider dependencies don't affect API server
- YAML is easier to read/write than python functions for form metadata
Would love feedback on:
1. Schema design - does it cover your use cases?
2. Any missing field types or validators?
The goal is to get the pilot providers in so we can start migrating
providers incrementally. Old way still
works, so no rush for everyone to migrate at once.
Thoughts?
Thanks & Regards,
Amogh Desai