ramitkataria commented on PR #66608:
URL: https://github.com/apache/airflow/pull/66608#issuecomment-4755580502
Overall I'm good with the direction. Clear improvement over today's
workaround. A couple of things before merge: a scope preference and one
regression. Everything else I'm fine merging as-is, provided we open quick
follow-ups.
### Scope: I prefer smaller PRs
These aren't "fetch callback context at runtime", and splitting them out
would make each easier to review:
1. **Dynamic `VariableInterval` support**: a self-contained feature
spanning `dag.py` resolution, the `0117` migration, the UI datamodel + route
(`interval` -> `float | None` + validator, dropping `interval` as a sort key),
the regenerated `openapi-gen` TS + 4 `.tsx` + `i18n/dag.json`, and its
`test_deadlines.py` coverage. The UI code is fine but it's all downstream of
that one change, which is why it reads as a separate feature: context-fetching
has nothing to do with how an interval renders. I'd make it its own PR (and fix
the regression below there).
2. **`base_executor.py` `team_name` metric tagging**: multi-team
observability, seems unrelated?
3. **Scheduler/executor resilience hardening** (`begin_nested()`
isolation, the `FOR UPDATE SKIP LOCKED` rewrite, the `prune_deadlines`
`~missed` guard): more coupled to the callback machinery, so I won't push to
split it, just flagging it as a distinct concern.
### Regression of existing behavior
- **`VariableInterval` resolution now reads only the metadata `variable`
table** (`dag.py` -> `session.scalar(select(VariableModel))` instead of
`Variable.get()`), bypassing secrets backends and `AIRFLOW_VAR_*` env vars.
Those now return `None` -> `ValueError` -> swallowed by the new `try/except`,
so the deadline is **silently dropped**. The session-scoped read is the right
fix for `prohibit_commit`, but it must still go through secrets/env resolution
(or fail loudly). Untested too: the success test mocks `Variable.get`, which
the new code no longer calls, so it passes while the deadline is skipped.
Please add a test that inserts a real `Variable` row and asserts a `Deadline`
is created. (Belongs in the `VariableInterval` PR if you split it; else fix
here.)
### Fine to merge now (fast follow-ups please)
- **Celery still terminally FAILs a transient context-fetch blip**: the
retry is wired only in `LocalExecutor`, so the alert is silently lost on the
most common production executor. Close the gap or scope+document it.
- **PENDING-requeue has no cap/backoff**: a DagRun that exists but keeps
failing to fetch re-enqueues every scheduler loop forever. Needs a bounded
retry / give-up.
- Add the index backing the new `FOR UPDATE SKIP LOCKED` callback query
(unindexed `type`/`state`/`priority_weight`/`created_at`).
- Add a newsfragment.
### Direction I'm reviewing toward (ties into #67919)
End state: a callback gets the **same `Context` type** as a task, i.e.
every field not tied to a task instance (`dag_run`, `run_id`, …, plus lazy
`var`/`conn`/`macros`), with task-only fields as documented omissions, fetched
through the Execution API on both paths. As follow-ups:
- Type `build_context_from_dag_run`'s return as `Context`, add `deadline`
as a `NotRequired` member of `Context` (+ `KNOWN_CONTEXT_KEYS`), and either
wire `var`/`conn`/`macros` in (the read-only comms channels already exist from
#65269) or document that callback templates are limited to Dag-run-level fields.
- **The triggerer path fetches the DagRun with a direct
`session.scalar(select(DagRun))`, which collides with #67919** (cc @kaxil).
That PR moves workload-building server-side into `triggerer_workloads.py`,
removes the triggerer's DB connection, and its new builder has no callback
branch. Suggestion unless @kaxil prefers otherwise: **land this PR first** and
re-home the triggerer callback path onto the server-side (API-based) builder
during the #67919 rebase. It's being rewritten there anyway, so that's the
right home rather than building on `_create_workload` here only to delete it.
Keep the triggerer-side changes here minimal so that rebase stays cheap.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]