+1 in my view. That would be a proper resolution - with the trade-off that processing is async and response to user might take a few seconds for "some" worker to pick-up.

We would need to assume that testing is using "default" queue, if different workers are configured then there might be differences - but adding further complexity would not be reasonable in my view.

On 22.02.26 15:38, Jarek Potiuk wrote:
I am all for it :)

1. The connection test needs to store an encrypted URI, conn_type, and
some timestamps. Is the Callback.data JSON column the right place
for that, or does it warrant its own small table?

They don't have to be stored. It's enough to send connection_id (after
saving it to the DB).  The worker can retrieve all the credentials the
usual way workers do.
I think it's reasonable to only run test connection when it has been
saved (not during editing) - and even if during editing, we could save
it automatically for tests.


2. Stale requests: if a worker crashes mid-test, the record stays
in a non-terminal state. Should there be a scheduler-side reaper
similar to zombie task detection, or is client-side timeout (60s
in the UI) enough?

A good idea would be to generate a random/unique ID for the test
request. This ID should be random enough to prevent easy guessing,
ensuring only the client who initiated the request can poll for its
status—which also serves as a security feature. We can simply store
such test connection requests (and eventually responses) in a
database, including a timestamp, and use our standard `db clean` to
clear them.

J.


On Sun, Feb 22, 2026 at 4:52 AM Anish Giri <[email protected]> wrote:
Hi all,

I'd like to discuss moving connection testing off the API server and
onto workers. Jarek suggested this direction in a comment on #59643
[1], and I think the Callback infrastructure being built for running
callbacks on executors is the right foundation for it.

Since 2.7.0, test_connection has been disabled by default (#32052).
Running it on the API server has two problems: the API server
shouldn't be executing user-supplied driver code (Jarek described the
ODBC/JDBC risks in detail on #59643), and workers typically have
network access to external systems that API servers don't, so test
results from the API server can be misleading.

Ramit's generic Callback model (#54796 [2]) and Ferruzzi's
in-progress executor dispatch (#61153 [3]) together give us most of
what's needed. The flow would be:

1. UI calls POST /connections/test
2. API server Fernet-encrypts the connection URI, creates an
ExecutorCallback pointing to the test function, returns an ID
3. Scheduler dispatch loop (from #61153) picks it up, sends it
to the executor
4. Worker decrypts the URI, builds a transient Connection, calls
test_connection(), reports result through the callback path
5. UI polls GET /connections/test/{id} until it gets a terminal
state

The connection-testing-specific code would be small: a POST endpoint
to queue the test, a GET endpoint to poll for results, and the worker
function that decrypts and runs test_connection().

One thing I noticed: #61153's _enqueue_executor_callbacks currently
requires dag_run_id in the callback data dict, and ExecuteCallback.make
needs a DagRun for bundle info. Connection tests don't have a DagRun.
It would be a small change to make that optional. The dispatch query
itself is already generic (selects all PENDING ExecutorCallbacks). I
can take a look at decoupling that if it would be useful.

A couple of other open questions:

1. The connection test needs to store an encrypted URI, conn_type, and
some timestamps. Is the Callback.data JSON column the right place
for that, or does it warrant its own small table?

2. Stale requests: if a worker crashes mid-test, the record stays
in a non-terminal state. Should there be a scheduler-side reaper
similar to zombie task detection, or is client-side timeout (60s
in the UI) enough?

I explored this earlier in #60618 [4] with a self-contained
implementation. Now that the ExecutorCallback dispatch is taking shape
in #61153, building on top of will be in right direction.

Thoughts?

Anish

[1] https://github.com/apache/airflow/pull/59643
[2] https://github.com/apache/airflow/pull/54796
[3] https://github.com/apache/airflow/pull/61153
[4] https://github.com/apache/airflow/pull/60618

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to