Sounds good, thanks. Feel free to handle them in separate PRs if needed.
On Thu, 26 Feb 2026 at 22:27, Anish Giri <[email protected]> wrote: > Thanks Kaxil, Amogh. This is really helpful feedback. > My earlier POC in #60618 was along these lines; a dedicated > TestConnection workload type with its own scheduler dispatch, worker > side timeout enforcement, and a DB backed state machine (PENDING → > RUNNING → SUCCESS/FAILED). I moved to the ExecutorCallback-based > approach in #62343 to reduce the scope, but it sounds like the > dedicated workload was heading towards the right direction. That said, > #60618 didn't address several of the concerns raised here and it had > no queue routing (hardcoded first executor), no concurrency budget, > and no reaper for stuck requests. So this isn't just a revert; it's > building on that foundation with the operational controls you're > describing. And based on the discussion, here is the scope I plan to > implement in the updated PR: > > 1. Dedicated ConnectionTest workload: not transported as ExecutorCallback > 2. Save & Test flow: connection must be saved; worker fetches via > connection_id through secrets > 3. Explicit routing: optional queue parameter on the API, > configuration driven default. Team based routing for multi team setups > as a natural follow up once that infrastructure lands (per Jarek's > point that team is a connection attribute and routing should happen > automatically) > 4. Dispatch budget: configurable concurrency cap so connection tests > can't starve task execution > 5. Strict timeout + stale-work reaper: worker side timeout > enforcement, scheduler side reaper for stuck RUNNING requests > 6. Fail fast on unsupported executors: upfront validation rather than > runtime failure > > If there are no objections or anything I've missed, I'll update #62343 > accordingly. > > Anish > > On Thu, Feb 26, 2026 at 1:52 PM Kaxil Naik <[email protected]> wrote: > > > > +1 on moving connection testing off the API server and onto workers for > the > > security/isolation reasons already discussed. > > > > I also think the queue concern is critical. > > > > Queues are a core routing abstraction in Airflow for directing work to > > specific worker populations (runtime/network/location). For some of our > > enterprise customer's deployments, some workers can reach certain > > third-party systems while others cannot, so queue placement directly > > affects the correctness of a connection test result. > > > > My main concern is architectural: connection testing is currently being > > transported as an ExecutorCallback. Conceptually, connection testing is > not > > a callback; it is an ad-hoc workload. That mismatch creates two concrete > > risks in the current model: > > > > 1) Routing correctness: > > Default/fallback queue behavior can run tests in the wrong worker > > environment > > > > 2) Fairness and isolation: > > Callbacks are correctness-critical for state progression. Connection > > tests are user-initiated and can hang; they should not compete in the > same > > priority lane. > > > > I agree callbacks are critical for correctness (they drive state/failure > > progression), so we should preserve their ability to run promptly. At the > > same time, connection testing is different in nature (user-initiated > ad-hoc > > workload, potentially long-running/hanging) and should not compete in the > > same unbounded priority lane. A hanging connection test can occupy a > worker > > slot for a long time, so this needs strict timeout/reaper behavior in > > addition to dispatch controls. > > > > So my suggestion is to require a dedicated ConnectionTest workload before > > merge, with: > > - explicit routing semantics (queue/executor/team) -- and the defaults > can > > be config driven, > > - dedicated dispatch budget/concurrency control, > > - strict timeout + stale-work reaper, > > - fail-fast behavior on unsupported executors. > > > > This keeps the security goal while avoiding routing regressions and > > scheduler starvation risk, and keeps callback lanes reserved for callback > > semantics. > > > > Regards, > > Kaxil > > > > On Thu, 26 Feb 2026 at 07:59, Amogh Desai <[email protected]> wrote: > > > > > Good discussion, Anish. +1 on moving connection testing to workers -- > > > running user supplied > > > driver code on API server is the right thing to move away from. > > > > > > Strongly agree with Ephraim and Jarek here. If we require the > connection to > > > be saved before testing > > > (so the worker can fetch it via connection_id through the standard > secrets > > > path), the UI button should > > > make that explicit. "Save & Test" is exactly that and it will avoid any > > > surprise of overwriting an existing > > > connection while *just testing :)* > > > > > > Re queue routing: I think an optional queue parameter on the test > > > connection API should be part of the > > > initial implementation itself rather than follow up. The default can be > > > *default*, which covers the common case. > > > But there are some other scenarios too: > > > > > > * Workers on different queues deployed in different network zones > meaning > > > that the test would pass or fail > > > depending on which queue makes it run. > > > > > > * Setups where different queues have different secrets backend, the > worker > > > might not even be able to resolve the > > > connection at all. > > > > > > The cost of having an optional queue parameter is kinda low and I > think it > > > avoids shipping something that > > > gives wrong answers in these setups, so if the cost of it is relatively > > > low, can we consider doing it? > > > > > > > > > > > > Thanks & Regards, > > > Amogh Desai > > > > > > > > > On Tue, Feb 24, 2026 at 10:29 PM Jarek Potiuk <[email protected]> > wrote: > > > > > > > > Maybe changing the button to 'save & test' would suffix. > > > > > > > > +10 > > > > > > > > On Tue, Feb 24, 2026 at 1:17 PM Ephraim Anierobi < > > > > [email protected]> > > > > wrote: > > > > > > > > > Cool idea. > > > > > > > > > > On saving the connection automatically, I think we should make it > > > > explicit > > > > > in the UI that testing the connection will save the connection. > This > > > will > > > > > help the users to know that they are not just testing but also > creating > > > > the > > > > > connection with the entered credentials. Without this being > explicit, I > > > > > think users may unknowingly replace a connection while just trying > if a > > > > new > > > > > connection would work. > > > > > > > > > > Maybe changing the button to 'save & test' would suffix. > > > > > > > > > > On 2026/02/22 03:50:11 Anish Giri wrote: > > > > > > Hi all, > > > > > > > > > > > > I'd like to discuss moving connection testing off the API server > and > > > > > > onto workers. Jarek suggested this direction in a comment on > #59643 > > > > > > [1], and I think the Callback infrastructure being built for > running > > > > > > callbacks on executors is the right foundation for it. > > > > > > > > > > > > Since 2.7.0, test_connection has been disabled by default > (#32052). > > > > > > Running it on the API server has two problems: the API server > > > > > > shouldn't be executing user-supplied driver code (Jarek > described the > > > > > > ODBC/JDBC risks in detail on #59643), and workers typically have > > > > > > network access to external systems that API servers don't, so > test > > > > > > results from the API server can be misleading. > > > > > > > > > > > > Ramit's generic Callback model (#54796 [2]) and Ferruzzi's > > > > > > in-progress executor dispatch (#61153 [3]) together give us most > of > > > > > > what's needed. The flow would be: > > > > > > > > > > > > 1. UI calls POST /connections/test > > > > > > 2. API server Fernet-encrypts the connection URI, creates an > > > > > > ExecutorCallback pointing to the test function, returns an ID > > > > > > 3. Scheduler dispatch loop (from #61153) picks it up, sends it > > > > > > to the executor > > > > > > 4. Worker decrypts the URI, builds a transient Connection, calls > > > > > > test_connection(), reports result through the callback path > > > > > > 5. UI polls GET /connections/test/{id} until it gets a terminal > > > > > > state > > > > > > > > > > > > The connection-testing-specific code would be small: a POST > endpoint > > > > > > to queue the test, a GET endpoint to poll for results, and the > worker > > > > > > function that decrypts and runs test_connection(). > > > > > > > > > > > > One thing I noticed: #61153's _enqueue_executor_callbacks > currently > > > > > > requires dag_run_id in the callback data dict, and > > > ExecuteCallback.make > > > > > > needs a DagRun for bundle info. Connection tests don't have a > DagRun. > > > > > > It would be a small change to make that optional. The dispatch > query > > > > > > itself is already generic (selects all PENDING > ExecutorCallbacks). I > > > > > > can take a look at decoupling that if it would be useful. > > > > > > > > > > > > A couple of other open questions: > > > > > > > > > > > > 1. The connection test needs to store an encrypted URI, > conn_type, > > > and > > > > > > some timestamps. Is the Callback.data JSON column the right place > > > > > > for that, or does it warrant its own small table? > > > > > > > > > > > > 2. Stale requests: if a worker crashes mid-test, the record stays > > > > > > in a non-terminal state. Should there be a scheduler-side reaper > > > > > > similar to zombie task detection, or is client-side timeout (60s > > > > > > in the UI) enough? > > > > > > > > > > > > I explored this earlier in #60618 [4] with a self-contained > > > > > > implementation. Now that the ExecutorCallback dispatch is taking > > > shape > > > > > > in #61153, building on top of will be in right direction. > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > Anish > > > > > > > > > > > > [1] https://github.com/apache/airflow/pull/59643 > > > > > > [2] https://github.com/apache/airflow/pull/54796 > > > > > > [3] https://github.com/apache/airflow/pull/61153 > > > > > > [4] https://github.com/apache/airflow/pull/60618 > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > To unsubscribe, e-mail: [email protected] > > > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: [email protected] > > > > > For additional commands, e-mail: [email protected] > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
