diogosilva30 commented on issue #68077:
URL: https://github.com/apache/airflow/issues/68077#issuecomment-4635221726
## UPDATE
**This issue is not reproducible on the `main` branch. Here's why:**
I reproduced this end-to-end on a real stack (`breeze start-airflow
--executor EdgeExecutor --integration statsd` — real API server + real `airflow
edge worker` + statsd-exporter) and ran an A/B test with the proposed
`lifespan` init toggled on and off. With the init **disabled**, `edge_worker_*`
series were still present on `main` and `edge_worker.heartbeat_count` kept
incrementing. So the bug as written reproduces on **3.2.x**, but **not** on
current `main`.
### Root cause of the difference
`serde/__init__.py` calls `Stats.initialize(...)` as an **import-time side
effect** (via `_register()`). On both versions the relevant import chain is the
same:
```
airflow/__init__.py settings.initialize()
→ configuration.py mask_secrets() → mask_secret_sdk()
→ sdk/log.py mask_secret() → from airflow.sdk.execution_time
import task_runner
```
The difference is entirely in **how `task_runner.py` imports `serde`**:
- **`main`** — `task_runner` has a **module-level** import of `serde`:
[`task_runner.py#L140`](https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/execution_time/task_runner.py#L140)
```python
from airflow.sdk.serde import allow_class, iter_pydantic_models
```
So importing `task_runner` imports `serde`, whose `_register()` runs
`Stats.initialize(...)`. That fires on **any `import airflow`**, in **every**
process — including the API server. Verified: a bare `import
airflow.api_fastapi.app` leaves the SDK `Stats` `_factory` set to a real
backend, not `NoStatsLogger`.
- **3.2.x** — `task_runner` imports `serde` **only lazily, inside
functions** (`serialize`/`deserialize`), and `Stats.initialize` is likewise
only invoked inside a function (which runs in the task subprocess, not the API
server). So `import airflow` in the API server **never** triggers
`serde._register()`, the SDK `Stats` singleton stays `NoStatsLogger`, and Edge
metrics are silently dropped — exactly the reported symptom.
So on `main` the API server's SDK `Stats` ends up initialized **only
incidentally**, because it happens to inherit a `Stats.initialize` from
`serde`'s import side effect via an unrelated mask-secrets → `task_runner` →
`serde` chain.
### Proposal: frame the fix as hardening, not a bug fix
Rather than relying on that implicit coupling, I'd suggest framing the
`lifespan` init as **explicit hardening**:
- The API server does not actually *want* `serde` at startup — the `Stats`
initialization it currently gets is an **accidental side effect** of an
unrelated import chain. That's fragile: it would silently regress again if
`task_runner`'s module-level `serde` import were ever made lazy, or if the
mask-secrets import path changed.
- Adding an explicit `Stats.initialize(...)` in the API server's FastAPI
`lifespan` makes the dependency **intentional and self-contained**. It's a
**no-op on `main`** (already initialized) and a **real fix on 3.2.x**, while
removing reliance on an import side effect that no one is guaranteeing.
So I'd propose keeping the PR, but reframing it from "API server never
initializes Stats (bug)" to "**explicitly initialize Task SDK Stats in the API
server lifespan instead of depending on an import side effect (hardening)**",
plus optionally a 3.2.x backport where the bug actually bites.
Happy to adjust the PR wording/scope to match whichever framing maintainers
prefer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]