Re: [DISCUSS] Move connection testing to workers

Amogh Desai Wed, 25 Feb 2026 23:59:24 -0800

Good discussion, Anish. +1 on moving connection testing to workers --
running user supplied
driver code on API server is the right thing to move away from.


Strongly agree with Ephraim and Jarek here. If we require the connection to
be saved before testing
(so the worker can fetch it via connection_id through the standard secrets
path), the UI button should
make that explicit. "Save & Test" is exactly that and it will avoid any
surprise of overwriting an existing
connection while *just testing :)*

Re queue routing: I think an optional queue parameter on the test
connection API should be part of the
initial implementation itself rather than follow up. The default can be
*default*, which covers the common case.
But there are some other scenarios too:

* Workers on different queues deployed in different network zones meaning
that the test would pass or fail
depending on which queue makes it run.

* Setups where different queues have different secrets backend, the worker
might not even be able to resolve the
connection at all.

The cost of having an optional queue parameter is kinda low and I think it
avoids shipping something that
gives wrong answers in these setups, so if the cost of it is relatively
low, can we consider doing it?



Thanks & Regards,
Amogh Desai


On Tue, Feb 24, 2026 at 10:29 PM Jarek Potiuk <[email protected]> wrote:

> > Maybe changing the button to 'save & test' would suffix.
>
> +10
>
> On Tue, Feb 24, 2026 at 1:17 PM Ephraim Anierobi <
> [email protected]>
> wrote:
>
> > Cool idea.
> >
> > On saving the connection automatically, I think we should make it
> explicit
> > in the UI that testing the connection will save the connection. This will
> > help the users to know that they are not just testing but also creating
> the
> > connection with the entered credentials. Without this being explicit, I
> > think users may unknowingly replace a connection while just trying if a
> new
> > connection would work.
> >
> > Maybe changing the button to 'save & test' would suffix.
> >
> > On 2026/02/22 03:50:11 Anish Giri wrote:
> > > Hi all,
> > >
> > > I'd like to discuss moving connection testing off the API server and
> > > onto workers. Jarek suggested this direction in a comment on #59643
> > > [1], and I think the Callback infrastructure being built for running
> > > callbacks on executors is the right foundation for it.
> > >
> > > Since 2.7.0, test_connection has been disabled by default (#32052).
> > > Running it on the API server has two problems: the API server
> > > shouldn't be executing user-supplied driver code (Jarek described the
> > > ODBC/JDBC risks in detail on #59643), and workers typically have
> > > network access to external systems that API servers don't, so test
> > > results from the API server can be misleading.
> > >
> > > Ramit's generic Callback model (#54796 [2]) and Ferruzzi's
> > > in-progress executor dispatch (#61153 [3]) together give us most of
> > > what's needed. The flow would be:
> > >
> > > 1. UI calls POST /connections/test
> > > 2. API server Fernet-encrypts the connection URI, creates an
> > > ExecutorCallback pointing to the test function, returns an ID
> > > 3. Scheduler dispatch loop (from #61153) picks it up, sends it
> > > to the executor
> > > 4. Worker decrypts the URI, builds a transient Connection, calls
> > > test_connection(), reports result through the callback path
> > > 5. UI polls GET /connections/test/{id} until it gets a terminal
> > > state
> > >
> > > The connection-testing-specific code would be small: a POST endpoint
> > > to queue the test, a GET endpoint to poll for results, and the worker
> > > function that decrypts and runs test_connection().
> > >
> > > One thing I noticed: #61153's _enqueue_executor_callbacks currently
> > > requires dag_run_id in the callback data dict, and ExecuteCallback.make
> > > needs a DagRun for bundle info. Connection tests don't have a DagRun.
> > > It would be a small change to make that optional. The dispatch query
> > > itself is already generic (selects all PENDING ExecutorCallbacks). I
> > > can take a look at decoupling that if it would be useful.
> > >
> > > A couple of other open questions:
> > >
> > > 1. The connection test needs to store an encrypted URI, conn_type, and
> > > some timestamps. Is the Callback.data JSON column the right place
> > > for that, or does it warrant its own small table?
> > >
> > > 2. Stale requests: if a worker crashes mid-test, the record stays
> > > in a non-terminal state. Should there be a scheduler-side reaper
> > > similar to zombie task detection, or is client-side timeout (60s
> > > in the UI) enough?
> > >
> > > I explored this earlier in #60618 [4] with a self-contained
> > > implementation. Now that the ExecutorCallback dispatch is taking shape
> > > in #61153, building on top of will be in right direction.
> > >
> > > Thoughts?
> > >
> > > Anish
> > >
> > > [1] https://github.com/apache/airflow/pull/59643
> > > [2] https://github.com/apache/airflow/pull/54796
> > > [3] https://github.com/apache/airflow/pull/61153
> > > [4] https://github.com/apache/airflow/pull/60618
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >
>

Re: [DISCUSS] Move connection testing to workers

Reply via email to