Re: [PR] [POC][WIP] Async SQLAlchemy sessions in Airflow [airflow]

via GitHub Thu, 14 May 2026 03:52:29 -0700


Dev-iL commented on PR #36504:
URL: https://github.com/apache/airflow/pull/36504#issuecomment-4450027580


   I asked Opus to consider this proposal in light of the changes introduced in 
Airflow 3 (internals as well as dependencies). Below is the result of 
implementing the principles raised herein to Airflow 3.2/3.3 and suggested next 
steps.
   
   All code that was generated as part of this effort can be found in: 
https://github.com/Dev-iL/airflow/commit/90070e1920bd7d5e38a999387ee63c2d1f7e5dd3
   
   <details><summary>Testing spec</summary>
   <p>
   
   ~~~md
   # Definition: Async-SQLA Execution-API POC + Benchmark
   
   ## 1. Intent & Context
   
   - **Goal:** Demonstrate, with reproducible numbers on Airflow 3.2 + 
Postgres, that converting an Execution-API read endpoint from sync `SessionDep` 
to async `AsyncSessionDep` removes the Starlette threadpool ceiling under 
concurrent DB-bound load. This is the modern reframing of PR #36504; the 
original Triggerer-direct-DB motivation is architecturally obsolete in 3.x, but 
FastAPI route conversion is the underlying win that remains.
   - **Mental Model:**
     - Async SQLA infrastructure already exists in `airflow-core` 
(`create_async_engine`, `async_sessionmaker`, `AsyncSessionDep`, 
`create_session_async`, `paginated_select_async`, `sql_alchemy_conn_async` 
config with automatic `AIO_LIBS_MAPPING` derivation in `settings.py:240`). 
Adoption inside FastAPI routes is near-zero (0 routes use `AsyncSessionDep`, 49 
use `SessionDep` in core_api alone). The work is *adoption*, not infrastructure.
     - Conversion target must use `SessionDep` directly with plain SQLA 
queries. `get_variable` and `get_connection` are NOT eligible — they call 
`Variable.get(...)` / `Connection.get_connection_from_secrets(...)`, which 
traverse the SecretsBackend chain. Converting those would require building 
async secrets backends (PR #36504-scale work), out of POC scope. Eligible 
target: `get_variable_keys` in 
`airflow-core/src/airflow/api_fastapi/execution_api/routes/variables.py` — 
already uses `SessionDep` + `session.scalar(select(...))` patterns, 
mechanically convertible to `AsyncSessionDep` + `paginated_select_async`.
     - The benchmark must isolate the variable of interest. Use a **debug-only 
synthetic route pair** as the controlled A/B (identical SQL, only session type 
differs), plus the real `get_variable_keys` conversion as a 
realistic-conversion proof. The pedagogical exhibit is a `SELECT pg_sleep(...)` 
route: sync mode hits the FastAPI/Starlette threadpool ceiling 
(`AnyIO.to_thread`, default 40) and starves; async mode does not.
     - Default async PG driver is **asyncpg** (per `AIO_LIBS_MAPPING`), not 
psycopg v3 async. The benchmark therefore compares "async SQLA via asyncpg vs 
sync SQLA via psycopg2/3" — this framing must appear verbatim in the report so 
the win is not misread as a psycopg-v3-async claim.
     - This is a POC under `dev/`. Not destined to ship in this manifest's 
scope: no newsfragment, no PR, no provider changes, no production rollout. A 
follow-up PR migrating more routes is downstream of the benchmark numbers.
   - **Mode:** balanced
   - **Interview:** autonomous
   - **Medium:** local
   - **Cache:** manifest
   
   ## 2. Approach
   
   - **Architecture:**
     - **Synthetic bench router** (POC instrumentation, gated by env var): a 
new FastAPI router file at 
`airflow-core/src/airflow/api_fastapi/execution_api/routes/_benchmark.py`, 
mounted into the Execution API only when 
`AIRFLOW__BENCHMARK__ENABLE_ASYNC_SESSION_ROUTES=true`. Provides three matched 
pairs of routes under `/execution/__bench/`:
       - `/select1/sync` vs `/select1/async` — `SELECT 1`, isolates per-request 
session overhead.
       - `/sleep/sync?ms=N` vs `/sleep/async?ms=N` — `SELECT 
pg_sleep(N/1000.0)`, demonstrates threadpool starvation under slow queries.
       - `/var_count/sync` vs `/var_count/async` — `SELECT COUNT(*) FROM 
variable`, realistic single-row aggregate.
       - All six routes bypass `CurrentTIToken`/`has_*_access` deps 
(debug-only, gated) so request auth is not in the hot path.
     - **Real-route conversion**: convert `get_variable_keys` in `variables.py` 
from sync `def` + `SessionDep` + `session.scalar/scalars` to `async def` + 
`AsyncSessionDep` + `await session.scalar(...)` / `await session.scalars(...)` 
/ `paginated_select_async` for the count. No other endpoint converted in this 
manifest's scope.
     - **Benchmark harness** at `dev/async_session_poc/run_bench.py`: 
pure-Python (`httpx.AsyncClient`), no external tools 
(`hey`/`wrk`/`oha`/`locust` are not installed locally — verified). Spawns N 
concurrent worker coroutines for a fixed duration against a target URL, records 
per-request wall-clock latency, emits RPS + p50/p95/p99 + error rate. Sweeps 
concurrency `c={50, 100, 200, 400}`, duration 30s per sweep.
     - **Seed script** at `dev/async_session_poc/seed.py`: inserts 1000 
Variables (idempotent — `INSERT ... ON CONFLICT DO NOTHING`) so the 
realistic-query path returns non-trivial work.
     - **Driver script** at `dev/async_session_poc/run_all.sh`: orchestrates 
`breeze start-airflow --backend postgres --integration ...`, seeds the DB, runs 
the harness against the synthetic bench routes at all four concurrency levels, 
writes results CSV and a results markdown. **Calibration**: the script exports 
`AIRFLOW__DATABASE__SQL_ALCHEMY_POOL_SIZE=20`, 
`AIRFLOW__DATABASE__SQL_ALCHEMY_MAX_OVERFLOW=480` (so combined pool ≥ 500 > max 
concurrency 400) before launching the API server, leaves AnyIO threadpool at 
its Starlette default of 40 (read and recorded at startup), and targets 
`http://localhost:8080/execution/__bench/*` (breeze API-server default port; 
override via `BENCH_TARGET_BASE_URL` env var).
     - **Results report** at `dev/async_session_poc/RESULTS.md`: tables (RPS, 
p50/p95/p99, error rate) per route × concurrency, plus framing paragraph 
stating exactly what driver pair was compared and what the result does/doesn't 
prove (does prove: async-route DB-bound throughput under contention; does NOT 
prove: Triggerer/Worker wins, since they don't hit DB directly in 3.x).
   
   - **Execution Order:**
     - D1 (seed script) → D2 (synthetic bench router) → D3 (harness) → D4 
(real-route conversion) → D5 (run benchmarks, write report)
     - Rationale: seed + bench router + harness are independent and can be 
validated against `SELECT 1` before D4 lands; D4 is the lowest-blast-radius 
production-code touch and only needs to land before the final benchmark run; D5 
depends on all four.
   
   - **Risk Areas:**
     - [R-1] Async ORM lazy-loading trap on the realistic conversion | Detect: 
existing `test_variables.py` tests must pass after conversion; any 
`MissingGreenlet` exception or `await` on a non-async attribute will surface 
there.
     - [R-2] Benchmark client itself is the bottleneck (Python GIL, 
single-process `httpx.AsyncClient`) instead of the server | Detect: at low 
concurrency (c=50), both sync and async should be CPU-light on the client; 
client CPU >80% during a run means the harness is saturated. Mitigate by 
running the harness against a no-op static endpoint baseline and confirming RPS 
≫ what we see in the DB-bound tests.
     - [R-3] Threadpool ceiling not actually hit at c=400 in sync mode | 
Detect: if sync `/sleep` at c=400 doesn't show p99 cliff, the threadpool may be 
larger than expected (e.g., uvicorn overrode the default). Mitigation: 
explicitly read `anyio.to_thread.current_default_thread_limiter().total_tokens` 
at server startup and record it in the report header; INV-G16 fails the run if 
it isn't 40.
     - [R-4] Behavior change leaks into the `get_variable_keys` conversion | 
Detect: behavior-contract AC requires existing test suite passes byte-identical 
responses; add a characterization test if coverage is thin on 
prefix/limit/offset/team_name combos.
     - [R-5] Bench routes accidentally registered in non-debug mode | Detect: 
registration is gated on 
`os.environ.get("AIRFLOW__BENCHMARK__ENABLE_ASYNC_SESSION_ROUTES") == "true"`; 
integration test asserts the routes return 404 when the env var is unset.
     - [R-6] Connection-pool size masks the real effect | Detect: report header 
records both sync `SQL_ALCHEMY_POOL_SIZE` + `SQL_ALCHEMY_MAX_OVERFLOW` and the 
async equivalents (per `settings.py:418` both engines read the same config 
keys). Mitigation: calibration sets combined pool ≥ 500 (PG-10); INV-G17 fails 
the run if either pool is configured below this.
     - [R-7] `asyncpg` not installed in the breeze image | Detect: `python -c 
'import asyncpg'` inside breeze returns non-zero. Mitigation: precheck step in 
`run_all.sh` runs the import and aborts with an actionable install hint before 
launching the bench.
   
   - **Trade-offs:**
     - [T-1] External benchmark tool (`hey`/`wrk`) vs in-repo pure-Python 
harness → Prefer pure-Python because no external tooling is pre-installed in 
the breeze container and reproducibility is higher when the client is checked 
into the repo. Cost: GIL ceiling on the client side — mitigated by R-2's 
detection.
     - [T-2] Convert a real production endpoint vs synthetic-only → Prefer 
both. Synthetic isolates the mechanism; real conversion proves the migration is 
mechanical, not theoretical. Skipping real means the POC is a microbenchmark 
with no transferable claim.
     - [T-3] `get_variable_keys` (clean conversion) vs `ti_heartbeat` (hot 
path) → Prefer `get_variable_keys`. Heartbeat is the production hot path but 
does state-machine mutation + RTIF logic + tracing emission. Conversion risk is 
high and would dilute the benchmark signal with unrelated complexity. Heartbeat 
conversion is the obvious follow-up if these numbers warrant it.
     - [T-4] Seed scale 1000 vs 10000 vs 100000 → Prefer 1000. The route 
returns up to `limit=1000` keys; larger seed exercises pagination but doesn't 
change the threadpool-ceiling shape. Bigger seeds inflate setup time without 
sharpening the signal.
     - [T-5] Postgres-only first pass vs include SQLite/MySQL → Prefer 
Postgres-only. SQLite over aiosqlite serializes through a single-writer lock 
and would muddy the result; MySQL adds async-driver maintenance load (aiomysql 
vs asyncmy) without changing the qualitative answer.
   
   ## 3. Global Invariants
   
   - [INV-G1] All Python files touched in this manifest pass `uv run ruff 
format` (no diff) and `uv run ruff check --fix` (no remaining errors).
     ```yaml
     verify:
       method: bash
       command: "uv run ruff format --check 
airflow-core/src/airflow/api_fastapi/execution_api/routes/variables.py 
airflow-core/src/airflow/api_fastapi/execution_api/routes/_benchmark.py 
dev/async_session_poc/ && uv run ruff check 
airflow-core/src/airflow/api_fastapi/execution_api/routes/variables.py 
airflow-core/src/airflow/api_fastapi/execution_api/routes/_benchmark.py 
dev/async_session_poc/"
     ```
   
   - [INV-G2] `airflow-core` mypy hook passes after the conversion (no new type 
errors introduced in the touched modules).
     ```yaml
     verify:
       method: bash
       command: "prek run mypy-airflow-core --files 
airflow-core/src/airflow/api_fastapi/execution_api/routes/variables.py 
airflow-core/src/airflow/api_fastapi/execution_api/routes/_benchmark.py"
     ```
   
   - [INV-G3] Existing execution_api variable tests pass against the converted 
route — behavior preserved.
     ```yaml
     verify:
       method: bash
       command: "uv run --project airflow-core pytest 
airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_variables.py
 
airflow-core/tests/unit/api_fastapi/execution_api/versions/v2026_06_30/test_variables.py
 -xvs"
     ```
   
   - [INV-G4] Benchmark routes are gated and inert by default. With 
`AIRFLOW__BENCHMARK__ENABLE_ASYNC_SESSION_ROUTES` unset/false, all six 
`/execution/__bench/*` paths return 404.
     ```yaml
     verify:
       method: bash
       command: "uv run --project airflow-core pytest 
airflow-core/tests/unit/api_fastapi/execution_api/test_benchmark_routes.py::test_routes_gated_off
 -xvs"
     ```
   
   - [INV-G5] No newsfragment, no `provider.yaml` edit, no `chart/` edit. Scope 
ring-fence: only 
`airflow-core/src/airflow/api_fastapi/execution_api/routes/{variables,_benchmark}.py`,
 `airflow-core/src/airflow/api_fastapi/execution_api/app.py` (if mount needs 
gating), 
`airflow-core/tests/unit/api_fastapi/execution_api/test_benchmark_routes.py`, 
and `dev/async_session_poc/**` may be added or modified.
     ```yaml
     verify:
       method: bash
       # Self-amended: the original `main...HEAD` form picked up the branch's
       # pre-existing committed work (40+ files modified before this manifest's
       # changeset began). Working-tree scope (unstaged + untracked) cleanly
       # isolates manifest-introduced changes — which is what the ring-fence is
       # actually trying to gate.
       command: "{ git diff --name-only HEAD; git ls-files --others 
--exclude-standard; } | grep -vE 
'^(airflow-core/src/airflow/api_fastapi/execution_api/(routes/(variables|_benchmark)\\.py|app\\.py)|airflow-core/tests/unit/api_fastapi/execution_api/test_benchmark_routes\\.py|dev/async_session_poc/)'
 && echo 'OUT-OF-SCOPE FILES' && exit 1 || exit 0"
     ```
   
   - [INV-G6] The results report contains the literal framing string `async 
SQLA via asyncpg vs sync SQLA via psycopg` and the literal disclaimer string 
`Triggerer and Workers do not hit the metadata DB directly in 3.x`, so the win 
is not misread.
     ```yaml
     verify:
       method: bash
       command: "grep -q 'async SQLA via asyncpg vs sync SQLA via psycopg' 
dev/async_session_poc/RESULTS.md && grep -q 'Triggerer and Workers do not hit 
the metadata DB directly in 3.x' dev/async_session_poc/RESULTS.md"
     ```
   
   - [INV-G7] Defect-finding review (code-bugs) finds no LOW+ findings on 
touched production code.
     ```yaml
     verify:
       method: subagent
       agent: manifest-dev:code-bugs-reviewer
       prompt: "Audit code changes in 
airflow-core/src/airflow/api_fastapi/execution_api/routes/variables.py, 
airflow-core/src/airflow/api_fastapi/execution_api/routes/_benchmark.py, and 
any modifications to airflow-core/src/airflow/api_fastapi/execution_api/app.py 
for mechanical defects: race conditions, resource leaks (especially around 
async session/connection lifecycle), edge cases, missing-await, dangerous 
defaults, env-var gate logic bypasses. Threshold: no LOW+ findings."
     ```
   
   - [INV-G8] Change-intent review finds no LOW+ findings — the conversion 
actually delivers async behavior, not accidental sync-blocking inside an async 
wrapper.
     ```yaml
     verify:
       method: subagent
       agent: manifest-dev:change-intent-reviewer
       prompt: "Audit the get_variable_keys conversion in 
airflow-core/src/airflow/api_fastapi/execution_api/routes/variables.py, the 
synthetic bench routes in _benchmark.py, and the mount-gate logic in 
airflow-core/src/airflow/api_fastapi/execution_api/app.py. Stated intent: an 
async route that does not block the event loop on DB I/O, and a gate that only 
mounts bench routes when the env var is set. Look for behavioral divergences: 
sync calls inside async functions, await of a sync method, threadpool offload 
still happening invisibly, blocking attribute access on lazy-loaded ORM 
relationships, gate logic that admits unintended truthy values (e.g., '1', 
'yes') if scope says only 'true'. Threshold: no LOW+ findings."
     ```
   
   - [INV-G9] Contract review finds no LOW+ findings — the public response 
shape and status codes of `GET /execution/variables/keys` are unchanged.
     ```yaml
     verify:
       method: subagent
       agent: manifest-dev:contracts-reviewer
       prompt: "Compare the converted GET /execution/variables/keys endpoint in 
airflow-core/src/airflow/api_fastapi/execution_api/routes/variables.py against 
the pre-conversion contract: response model VariableKeysResponse with fields 
{keys: list[str], total_entries: int}, query params prefix, limit (1-10000), 
offset (>=0), responses {200, 401}. Confirm no breaking change for clients. 
Threshold: no LOW+ findings."
     ```
   
   - [INV-G10] Type-safety review finds no LOW+ findings on touched modules.
     ```yaml
     verify:
       method: subagent
       agent: manifest-dev:type-safety-reviewer
       prompt: "Audit type safety on 
airflow-core/src/airflow/api_fastapi/execution_api/routes/variables.py, 
_benchmark.py, app.py (gate-mount edits), and dev/async_session_poc/. Verify 
return types, awaitable vs coroutine vs result confusion, AsyncSession vs 
Session annotations, missing async generics. Threshold: no LOW+ findings."
     ```
   
   - [INV-G11] CLAUDE.md adherence review finds no MEDIUM+ findings (no 
newsfragment for non-applicable distributions, ruff invoked after each Python 
edit, no direct pytest/airflow CLI calls outside of breeze in instructions, no 
commit-author-as-agent footers added to commits, etc.).
     ```yaml
     verify:
       method: subagent
       agent: manifest-dev:context-file-adherence-reviewer
       prompt: "Audit this changeset against 
/home/iliya/repositories/airflow/CLAUDE.md. Specifically: was ruff format+check 
run on every Python file edit? Any newsfragment added for providers/airflow-ctl 
(forbidden)? Any newsfragment added for airflow-core (not wanted — this is a 
POC, not a user-visible change)? Any direct pytest/airflow CLI calls in scripts 
that should go through breeze? Any Co-Authored-By: <agent> in commits? 
Threshold: no MEDIUM+ findings."
     ```
   
   - [INV-G12] Maintainability review on the harness finds no MEDIUM+ findings 
— the harness is reproducible by another engineer.
     ```yaml
     verify:
       method: subagent
       agent: manifest-dev:code-maintainability-reviewer
       prompt: "Audit dev/async_session_poc/ for maintainability: are sweep 
parameters configurable from CLI rather than hardcoded? Are seeds and target 
URLs not buried? Is the README clear about how to re-run? Threshold: no MEDIUM+ 
findings."
     ```
   
   - [INV-G13] Simplicity review on the harness finds no MEDIUM+ findings.
     ```yaml
     verify:
       method: subagent
       agent: manifest-dev:code-simplicity-reviewer
       prompt: "Audit dev/async_session_poc/ for unnecessary complexity. A 
benchmark harness should be small (~100-200 LOC). Reject if there's 
abstraction-for-its-own-sake (frameworks, plugins, registries, classes wrapping 
single functions). Threshold: no MEDIUM+ findings."
     ```
   
   - [INV-G14] Test-quality review on any new tests added for D2 (gate test) 
and D4 (characterization tests if added).
     ```yaml
     verify:
       method: subagent
       agent: manifest-dev:test-quality-reviewer
       prompt: "Audit new tests at 
airflow-core/tests/unit/api_fastapi/execution_api/test_benchmark_routes.py and 
any characterization tests added near test_variables.py. Verify tests are not 
tautological, don't mock the SUT, assert meaningful properties (response 
status, body, gate behavior). Threshold: no MEDIUM+ findings."
     ```
   
   - [INV-G15] Requirements traceability — every Deliverable AC below maps to a 
concrete artifact in the diff.
     ```yaml
     verify:
       method: subagent
       agent: general-purpose
       prompt: "For each Deliverable D1..D5 in 
/tmp/manifest-20260514-073808.md, locate the artifact in the working tree that 
satisfies it. Report any AC whose artifact is missing or whose claim cannot be 
verified by inspecting the repo. Threshold: no MEDIUM+ findings — every 
specified AC must have an identifiable on-disk implementation."
     ```
   
   - [INV-G16] AnyIO/Starlette threadpool size at bench-server startup equals 
exactly 40 (Starlette default). The bench's recorded `threadpool_total_tokens` 
must match. The report header must include a fenced metadata block (```` ```env 
```` … ```` ``` ````) containing `threadpool_total_tokens=40` on its own line, 
so the check is format-stable across prose vs table layouts.
     ```yaml
     verify:
       method: bash
       phase: 2
       command: "python3 -c \"import re,sys; 
m=open('dev/async_session_poc/RESULTS.md').read(); 
v=re.search(r'(?m)^threadpool_total_tokens\\s*=\\s*(\\d+)\\s*$', m); sys.exit(0 
if v and v.group(1)=='40' else 1)\""
     ```
   
   - [INV-G17] DB pool calibration. The bench environment must export 
`AIRFLOW__DATABASE__SQL_ALCHEMY_POOL_SIZE` and 
`AIRFLOW__DATABASE__SQL_ALCHEMY_MAX_OVERFLOW` whose sum is ≥ 500, and the 
RESULTS.md header must record both **sync** and **async** pool sizes (per R-6 
either pool can saturate). The fenced metadata block must contain 
`sync_pool_size`, `sync_max_overflow`, `async_pool_size`, `async_max_overflow` 
on their own lines.
     ```yaml
     verify:
       method: bash
       phase: 2
       command: "python3 -c \"import re,sys; 
m=open('dev/async_session_poc/RESULTS.md').read(); g=lambda k: 
int(re.search(r'(?m)^'+k+r'\\s*=\\s*(\\d+)\\s*$', m).group(1)); 
sps,smo,aps,amo=g('sync_pool_size'),g('sync_max_overflow'),g('async_pool_size'),g('async_max_overflow');
 sys.exit(0 if (sps+smo>=500 and aps+amo>=500) else 1)\""
     ```
   
   - [INV-G18] `asyncpg` is importable inside breeze (precheck for the bench 
run).
     ```yaml
     verify:
       method: bash
       phase: 2
       command: "breeze run python -c 'import asyncpg; 
print(asyncpg.__version__)'"
     ```
   
   ## 4. Process Guidance
   
   - [PG-1] Run existing variable-endpoint tests on `main` BEFORE starting D4. 
If they fail on `main` already (pre-existing breakage in the working 
environment), record the failures and treat the conversion's pass criterion as 
"no new failures vs baseline", not "all green".
   - [PG-2] Read each touched file immediately before editing, even if just 
read minutes ago — per global CLAUDE.md "Edit Integrity" rule.
   - [PG-3] After every Python file edit, run `uv run ruff format <file>` and 
`uv run ruff check --fix <file>` before moving on (CLAUDE.md project rule).
   - [PG-4] Benchmark runs go through `breeze start-airflow --backend postgres 
...`, never against host Python directly. The pure-Python harness client may 
run from host (it only needs `httpx`) — but the API server under test must run 
inside breeze so the env is reproducible.
   - [PG-5] The synthetic bench router file is named with a leading underscore 
(`_benchmark.py`) and the gate env var lives under an `AIRFLOW__BENCHMARK__` 
namespace (not `AIRFLOW__CORE__`) — both signals that this is debug/POC 
instrumentation, not production-facing config.
   - [PG-6] The results report must record: Airflow version (`airflow 
version`), Postgres version (`SELECT version()`), sync DB URL scheme, async DB 
URL scheme, `SQL_ALCHEMY_POOL_SIZE`, `SQL_ALCHEMY_MAX_OVERFLOW`, 
Starlette/AnyIO threadpool size, host CPU/memory. Otherwise the numbers are not 
reproducible.
   - [PG-7] If the harness shows a confusing or weak result (e.g., async is not 
faster), do NOT massage parameters until it looks favorable. Record the actual 
numbers and propose investigation — the POC's value is in honest measurement, 
not advocacy.
   - [PG-8] Commits go through `git commit` normally; no `Co-Authored-By: 
Claude` (CLAUDE.md forbids agent self-as-co-author).
   - [PG-9] Treat `dev/async_session_poc/` as scratch-quality code by 
convention — but the touched `variables.py` and any test files are production 
code and must meet production standards.
   - [PG-10] Benchmark target base URL: `http://localhost:8080` by default 
(breeze API-server default). Override via `BENCH_TARGET_BASE_URL` env var. 
Encode as a constant in `run_all.sh` (`: 
"${BENCH_TARGET_BASE_URL:=http://localhost:8080}"`); the Python harness accepts 
it via `--url`.
   - [PG-11] Calibration is a precondition for D5 (not a fix-up after the 
fact). `run_all.sh` MUST: (a) verify `asyncpg` import, (b) export 
`AIRFLOW__DATABASE__SQL_ALCHEMY_POOL_SIZE=40` and 
`AIRFLOW__DATABASE__SQL_ALCHEMY_MAX_OVERFLOW=560` BEFORE launching the API 
server (sum = 600, providing ~50% headroom over max concurrency 400 — 
minimum-viable would be 500; the extra margin absorbs housekeeping/background 
slots so transient saturation doesn't masquerade as the threadpool effect), (c) 
on first request, log the live 
`anyio.to_thread.current_default_thread_limiter().total_tokens` so it appears 
in API server logs and can be copied into `RESULTS.md`. If any precondition 
fails, abort the bench with a non-zero exit and a clear message.
   - [PG-12] Warmup default = 5s (not 2s). Async pool + asyncpg + DNS+TLS ramp 
may not stabilize in 2s at high concurrency. The harness CLI keeps `--warmup` 
adjustable; the canonical run uses 5s.
   
   ## 5. Known Assumptions
   
   - [ASM-1] (auto) Conversion target = `get_variable_keys`. Default: chosen 
over `get_variable`/`get_connection`. Impact if wrong: if the user actually 
wanted the user-Connection lookup endpoint converted, they'd be unhappy that we 
converted a list endpoint instead — but `get_variable`/`get_connection` require 
building async secrets backends (PR #36504-scope work, beyond a single-endpoint 
POC). Recovery: amend to add the async-secrets-backend work as additional 
deliverables.
   - [ASM-2] (auto) Benchmark client = pure-Python `httpx.AsyncClient`. 
Default: chosen because `hey`/`wrk`/`oha`/`locust` are NOT installed on host 
(verified). Impact if wrong: if the GIL ceiling on the client distorts results, 
we'd need to switch to `hey`/`wrk`. R-2 detects this.
   - [ASM-3] (auto) Concurrency sweep `c={50, 100, 200, 400}`, duration 30s per 
cell. Default: covers below and above the default Starlette threadpool size 
(40) and gives stable percentiles in 30s for these endpoints. Impact if wrong: 
if the threadpool-ceiling effect doesn't show by c=400, we'd extend to c=800; 
cheap to redo.
   - [ASM-4] (auto) Seed = 1000 Variables, idempotent insert. Default: matches 
the `limit<=10000` ceiling on the keys endpoint without trivializing the count 
query. Impact if wrong: trivial — re-seed with different count.
   - [ASM-5] (auto) Default async PG driver = asyncpg via 
`_get_async_conn_uri_from_sync` derivation. Default: matches Airflow's built-in 
behavior — does NOT explicitly set `sql_alchemy_conn_async` to 
`postgresql+psycopg://...`. Impact if wrong: report would need to claim 
"psycopg-v3 async vs sync", which is not what we're measuring. INV-G6 enforces 
honest framing.
   - [ASM-6] (auto) Synthetic bench router gated by env var 
`AIRFLOW__BENCHMARK__ENABLE_ASYNC_SESSION_ROUTES=true`. Default: env-var gate 
is the simplest and lowest-blast-radius mechanism; doesn't require a 
config-file change. Impact if wrong: if reviewer prefers a config-key in 
`airflow.cfg` under a `[benchmark]` section, that's a trivial swap.
   - [ASM-7] (auto) Postgres-only scope, no SQLite/MySQL coverage in this 
manifest. Default: keeps the comparison clean — SQLite's single-writer lock and 
MySQL's async-driver picks (aiomysql vs asyncmy) are separate concerns. Impact 
if wrong: adds D6/D7 in a follow-up manifest.
   - [ASM-8] (auto) No PR opened, no upstream push, no fork-branch push at the 
end of this manifest. Default: framing is "POC measurements", and the user's 
next decision (after seeing numbers) determines whether to migrate further 
routes in a real PR. Impact if wrong: trivial — open the PR after manifest 
completes if results warrant.
   - [ASM-9] (auto) Sample size per cell = whatever 30s yields after a 5s 
warmup discard (per PG-12). Default: time-bounded is more honest than 
count-bounded under saturation. Impact if wrong: if percentiles are noisy, 
extend duration to 60s.
   - [ASM-10] (auto) `asyncpg` is available inside breeze's standard 
postgres-backend image. Default: it's a transitive dep via the async-SQLA path 
already wired in `settings.py`, and a precheck (PG-11/INV-G18) confirms before 
the bench runs. Impact if wrong: bench aborts cleanly; install step (`uv add 
asyncpg --project airflow-core` or breeze image rebuild) is then required.
   - [ASM-11] (auto) The threadpool-starvation pedagogical effect requires 
**server-side** sleep — `pg_sleep(N)` in the database, not `asyncio.sleep(N)` 
or `time.sleep(N)` in Python. Default: implemented as `await 
session.execute(text('SELECT pg_sleep(:s)'), {'s': ms/1000.0})` (async) / 
`session.execute(text('SELECT pg_sleep(:s)'), {'s': ms/1000.0})` (sync). Impact 
if wrong: a Python-side sleep would not exercise DB-pool/threadpool contention 
and would invalidate the demonstration.
   - [ASM-12] (auto) The benchmark sweep covers only the six synthetic bench 
routes (24 cells: 6 routes × 4 concurrencies). The real-route conversion D4 is 
verified by tests (AC-4.2), not by harness throughput, because driving 
`get_variable_keys` requires minting per-request `CurrentTIToken` JWTs — auth 
complexity that would dilute the A/B signal. Default: drop `real_var_keys` from 
CSV/report; D4 stands on test pass alone. Impact if wrong: if a real-route 
throughput number is required, follow-up adds a JWT-minting helper and re-runs.
   
   ## 6. Deliverables
   
   ### Deliverable 1: Seed script
   
   A script that idempotently seeds the metadata DB with 1000 Variables 
(key=`bench_var_NNNN`, value=`v_NNNN`) for the realistic-query benchmark path. 
Runs against the Postgres backend in breeze.
   
   **Acceptance Criteria:**
   
   - [AC-1.1] `dev/async_session_poc/seed.py` exists and is executable as 
`breeze run python dev/async_session_poc/seed.py`.
     ```yaml
     verify:
       method: bash
       command: "test -f dev/async_session_poc/seed.py"
     ```
   - [AC-1.2] After running the seed script twice against a fresh DB, exactly 
1000 `bench_var_*` keys exist (idempotent).
     ```yaml
     verify:
       method: bash
       phase: 2
       command: "breeze run python dev/async_session_poc/seed.py && breeze run 
python dev/async_session_poc/seed.py && breeze run python -c \"from 
airflow.utils.session import create_session; from airflow.models.variable 
import Variable; from sqlalchemy import select, func; s = create_session(); 
print(s.scalar(select(func.count()).select_from(select(Variable.key).where(Variable.key.like('bench_var_%')).subquery())))\"
 | tail -1 | grep -x 1000"
     ```
   
   ### Deliverable 2: Synthetic bench router
   
   A gated debug router exposing six routes under `/execution/__bench/`: three 
pairs of (sync, async) routes that issue identical SQL — `SELECT 1`, `SELECT 
pg_sleep(:s)`, `SELECT COUNT(*) FROM variable` — using only the session-type 
difference between each pair.
   
   **Acceptance Criteria:**
   
   - [AC-2.1] File 
`airflow-core/src/airflow/api_fastapi/execution_api/routes/_benchmark.py` 
exists; defines a router with exactly six routes: `/select1/sync`, 
`/select1/async`, `/sleep/sync`, `/sleep/async`, `/var_count/sync`, 
`/var_count/async`.
     ```yaml
     verify:
       method: codebase
       prompt: "Confirm 
airflow-core/src/airflow/api_fastapi/execution_api/routes/_benchmark.py defines 
exactly the six routes listed; confirm sync routes use SessionDep and async 
routes use AsyncSessionDep; confirm /sleep routes accept a `ms` query param 
(int, >=0, <=10000) and the SQL is parameterized (no string interpolation)."
     ```
   - [AC-2.2] Router is mounted into the Execution API only when env var 
`AIRFLOW__BENCHMARK__ENABLE_ASYNC_SESSION_ROUTES` is set to the **exact 
case-sensitive string `"true"`**. Any other value (unset, `"True"`, `"1"`, 
`"yes"`, empty string, anything else) leaves the routes unmounted. Strictness 
is intentional: this is debug instrumentation and ambiguity around what counts 
as "enabled" is a security/scope risk.
     ```yaml
     verify:
       method: codebase
       prompt: "Confirm the mount logic in 
airflow-core/src/airflow/api_fastapi/execution_api/app.py (or equivalent 
router-registration site) is gated by 
`os.environ.get('AIRFLOW__BENCHMARK__ENABLE_ASYNC_SESSION_ROUTES') == 'true'` — 
exact case-sensitive string equality, no .lower(), no boolean-conf parser, no 
membership check against a truthy set."
     ```
   - [AC-2.3] Gate test passes: 
`airflow-core/tests/unit/api_fastapi/execution_api/test_benchmark_routes.py` 
contains at least three tests — `test_routes_gated_off_unset` (asserts 404 for 
all 6 routes when env var is unset), `test_routes_gated_off_truthy_variants` 
(asserts 404 for all 6 routes when env var is set to each of `"True"`, `"1"`, 
`"yes"`, `""`, `"false"`), and `test_routes_gated_on` (asserts 
200/expected-body for all 6 routes when env var is exactly `"true"`).
     ```yaml
     verify:
       method: bash
       command: "uv run --project airflow-core pytest 
airflow-core/tests/unit/api_fastapi/execution_api/test_benchmark_routes.py -xvs"
     ```
   - [AC-2.4] The sync routes use plain `Session` with `session.execute(...)`; 
the async routes use `AsyncSession` with `await session.execute(...)`. No code 
path uses `asyncio.to_thread`, no sync route is `async def`, no async route is 
plain `def`.
     ```yaml
     verify:
       method: subagent
       agent: manifest-dev:code-bugs-reviewer
       prompt: "Audit 
airflow-core/src/airflow/api_fastapi/execution_api/routes/_benchmark.py: 
confirm sync routes are `def` + SessionDep + sync session calls, and async 
routes are `async def` + AsyncSessionDep + awaited calls. Flag any threadpool 
offload or mixed mode. Threshold: no LOW+."
     ```
   
   ### Deliverable 3: Benchmark harness
   
   A pure-Python concurrency-sweep harness driven by `httpx.AsyncClient`. Takes 
URL, concurrency, duration; emits per-request latencies and aggregated 
RPS/p50/p95/p99/error-rate.
   
   **Acceptance Criteria:**
   
   - [AC-3.1] `dev/async_session_poc/run_bench.py` exists. CLI accepts `--url 
URL --concurrency N --duration SECONDS --warmup SECONDS --output PATH.csv`. 
Defaults: `--warmup 5 --output -` (stdout) — aligned with PG-12 canonical 
warmup.
     ```yaml
     verify:
       method: bash
       command: "test -f dev/async_session_poc/run_bench.py && uv run --project 
airflow-core python dev/async_session_poc/run_bench.py --help | grep -qE 
'(--url|--concurrency|--duration|--warmup|--output)'"
     ```
   - [AC-3.2] Harness records: per-request wall-clock latency (monotonic 
clock), HTTP status, completion time. Aggregates RPS (excluding warmup window), 
p50/p95/p99 (HDR or numpy.percentile is acceptable), error rate (non-200 
responses / total).
     ```yaml
     verify:
       method: codebase
       prompt: "Confirm dev/async_session_poc/run_bench.py uses 
time.monotonic() for latencies (CLAUDE.md project rule), excludes the warmup 
window from aggregates, and emits the listed metrics."
     ```
   - [AC-3.3] Harness self-test: running against `httpbin.org/anything` or a 
local placeholder endpoint at c=10 d=5s produces a non-empty CSV with 
reasonable values (RPS > 0, p50 > 0).
     ```yaml
     verify:
       method: bash
       phase: 2
       command: "cd dev/async_session_poc && uv run --project airflow-core 
python run_bench.py --url https://httpbin.org/anything --concurrency 10 
--duration 5 --warmup 1 --output /tmp/bench_selftest.csv && test -s 
/tmp/bench_selftest.csv && head -2 /tmp/bench_selftest.csv | grep -qE 
'^(metric|p50|p95|p99|rps),'"
     ```
   - [AC-3.4] No `assert` in production code (the harness counts as POC 
tooling, but ruff/style rules still apply); no `time.time()` for durations.
     ```yaml
     verify:
       method: bash
       command: "! grep -nE '\\btime\\.time\\(\\)' 
dev/async_session_poc/run_bench.py && ! grep -nE '^\\s*assert ' 
dev/async_session_poc/run_bench.py"
     ```
   
   ### Deliverable 4: Convert `get_variable_keys` to async
   
   Mechanical conversion of the existing sync endpoint to `async def` using 
`AsyncSessionDep` and `paginated_select_async`. No behavior change: same 
response model, same query params, same status codes, same response shape.
   
   **Acceptance Criteria:**
   
   - [AC-4.1] `get_variable_keys` in 
`airflow-core/src/airflow/api_fastapi/execution_api/routes/variables.py` is 
`async def` and uses `AsyncSessionDep` (not `SessionDep`). All session calls 
are awaited.
     ```yaml
     verify:
       method: codebase
       prompt: "Read 
airflow-core/src/airflow/api_fastapi/execution_api/routes/variables.py and 
confirm get_variable_keys is declared `async def`, its session parameter is 
annotated AsyncSessionDep, and every session.* call is awaited. The count query 
should use paginated_select_async or an explicit `await session.scalar(...)`. 
The response model VariableKeysResponse and parameter signatures (prefix, limit 
ge=1 le=10000, offset ge=0, team_name dep) are unchanged."
     ```
   - [AC-4.2] Existing test suite for the Variables execution-api route passes.
     ```yaml
     verify:
       method: bash
       command: "uv run --project airflow-core pytest 
airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_variables.py
 
airflow-core/tests/unit/api_fastapi/execution_api/versions/v2026_06_30/test_variables.py
 -xvs"
     ```
   - [AC-4.3] No other endpoint in `variables.py` is converted. `get_variable`, 
`put_variable`, `delete_variable` retain their pre-existing sync signatures 
(out-of-scope for this POC).
     ```yaml
     verify:
       method: codebase
       prompt: "Confirm get_variable, put_variable, delete_variable in 
airflow-core/src/airflow/api_fastapi/execution_api/routes/variables.py are 
still plain `def` (not async def) and have not been touched beyond imports."
     ```
   
   ### Deliverable 5: Run benchmarks + write results report
   
   Execute the harness across the six synthetic bench routes at the concurrency 
sweep (per ASM-12, the real `get_variable_keys` is verified by tests not 
throughput). Write CSV + markdown report with the framing required by INV-G6.
   
   **Acceptance Criteria:**
   
   - [AC-5.1] `dev/async_session_poc/results.csv` exists, contains rows for at 
least: `{select1, sleep_500ms, var_count} × {sync, async} × {50, 100, 200, 
400}` = 24 cells (the synthetic-route sweep; per ASM-12, the real-route 
`get_variable_keys` is verified by tests not harness). Columns at minimum: 
`route, mode, concurrency, rps, p50_ms, p95_ms, p99_ms, error_rate, 
sample_count`.
     ```yaml
     verify:
       method: bash
       phase: 2
       command: "test -s dev/async_session_poc/results.csv && head -1 
dev/async_session_poc/results.csv | grep -qE 
'route.*mode.*concurrency.*rps.*p50_ms.*p95_ms.*p99_ms.*error_rate.*sample_count'
 && wc -l dev/async_session_poc/results.csv | awk '{ if ($1 < 25) { print \"too 
few rows: \"$1; exit 1 } }'"
     ```
   - [AC-5.2] `dev/async_session_poc/RESULTS.md` exists; contains a markdown 
table per route, environment header (Airflow version, Postgres version, sync DB 
URL scheme, async DB URL scheme, sync pool size, async pool size, threadpool 
size, host CPU/RAM per PG-6), and analysis paragraphs.
     ```yaml
     verify:
       method: codebase
       prompt: "Read dev/async_session_poc/RESULTS.md. Confirm presence of: (1) 
an environment-header section listing Airflow version, Postgres version, both 
sync+async DB URL schemes, both sync+async pool sizes, the Starlette/AnyIO 
threadpool size, and host CPU/RAM; (2) a results table per route (or one 
combined table) covering the 24 cells from AC-5.1; (3) analysis paragraphs 
distinguishing what the data does and doesn't prove."
     ```
   - [AC-5.3] Report's framing strings present (cross-checked by INV-G6 but 
listed here as the deliverable's own AC).
     ```yaml
     verify:
       method: bash
       command: "grep -q 'async SQLA via asyncpg vs sync SQLA via psycopg' 
dev/async_session_poc/RESULTS.md && grep -q 'Triggerer and Workers do not hit 
the metadata DB directly in 3.x' dev/async_session_poc/RESULTS.md"
     ```
   - [AC-5.4] For the `/sleep` pair at concurrency 200, async RPS is strictly 
higher than sync RPS (the pedagogical exhibit must demonstrate the effect). If 
it does not, the run is treated as **diagnostic-required** and PASSes only when 
`RESULTS.md` contains an `## Anomalies` section that names the cause AND 
includes concrete diagnostic evidence — not just a written explanation. 
Required diagnostic artifacts when claiming pool saturation: Postgres 
`pg_stat_activity` count snapshots showing active-connection peak; when 
claiming threadpool exhaustion: live 
`anyio.to_thread.current_default_thread_limiter()` token state during the run; 
when claiming client-side GIL ceiling: harness CPU% during the sync vs async 
runs side-by-side.
     ```yaml
     verify:
       method: subagent
       phase: 2
       agent: manifest-dev:criteria-checker
       prompt: "Read dev/async_session_poc/results.csv. Locate the rows for 
route=sleep_500ms (or equivalent slow-query route name), concurrency=200, modes 
sync and async. PASS-PATH-1: async rps > sync rps. PASS-PATH-2: an '## 
Anomalies' section in dev/async_session_poc/RESULTS.md (a) names the specific 
cause from the manifest's R-2/R-3/R-6 risks and (b) cites concrete diagnostic 
evidence for that cause — pg_stat_activity counts (pool), recorded threadpool 
token state (threadpool), or harness CPU% (client). A bare textual explanation 
without diagnostic artifacts FAILS."
     ```
   ~~~
   
   </p>
   </details> 
   
   ---
   
   ## TL;DR
   
   The **motivation** behind PR #36504 — *the Triggerer talks to the metadata 
DB directly, so we should make those DB calls async to free the event loop* — 
is **architecturally obsolete on Airflow 3.x**. Triggerer (and Workers) no 
longer hit the metadata DB directly; they go through the Execution API over 
HTTP.
   
   What **remains valid and is now the actual win** is the lower-level idea 
underneath that PR: **converting Airflow's FastAPI routes from sync 
`SessionDep` to async `AsyncSessionDep` removes a real bottleneck — but the 
bottleneck is the Starlette/AnyIO threadpool inside the API server, not the 
Triggerer's loop.**
   
   This POC builds a minimal, reproducible A/B benchmark of that conversion 
against a single Airflow 3.3.0 + PostgreSQL 14 deployment and produces honest 
measured numbers — both where async helps and where it surprisingly doesn't.
   
   ## What was kept from PR #36504
   
   1. **The core conversion pattern.** Sync `SessionDep` → async 
`AsyncSessionDep`, `def` → `async def`, `session.scalar(...)` → `await 
session.scalar(...)`, `session.scalars(...).all()` → `(await 
session.scalars(...)).all()`. PR #36504 already demonstrated this mechanically 
for several call sites; the pattern is unchanged.
   2. **The async-SQLAlchemy plumbing it presupposed.** Airflow 3.x already 
ships `create_async_engine`, `async_sessionmaker`, `AsyncSessionDep`, 
`create_session_async`, `paginated_select_async`, the `sql_alchemy_conn_async` 
config key, and automatic driver derivation via `AIO_LIBS_MAPPING` 
(`airflow-core/src/airflow/settings.py:240, 387, 415-421`). *None of this had 
to be re-built* — adoption is what's missing, not infrastructure. The current 
adoption rate inside `core_api` is ~0/49 routes using `AsyncSessionDep`.
   3. **The route-conversion shape.** The POC converts exactly one real route — 
`GET /execution/variables/keys` in 
`airflow-core/src/airflow/api_fastapi/execution_api/routes/variables.py` — as a 
low-blast-radius proof that the migration is mechanical: response model 
unchanged, query-param signatures unchanged, status codes unchanged, all 30 
existing variable-execution-API tests pass byte-identical.
   
   ## What was deliberately set aside
   
   1. **The Triggerer-direct-DB framing.** In 3.x, the Triggerer evaluates 
deferred tasks but communicates via the Execution API, not direct DB access. PR 
#36504's headline benefit ("Triggerer event loop unblocked") no longer applies 
— there is no Triggerer code path in the hot zone the original change targeted. 
The README and `RESULTS.md` include an explicit disclaimer paragraph: 
*"Triggerer and Workers do not hit the metadata DB directly in 3.x."*
   2. **`get_variable` / `get_connection` conversions.** Both call into the 
SecretsBackend chain (`Variable.get(...)`, 
`Connection.get_connection_from_secrets(...)`). Converting them requires 
building **async secrets backends**, which is PR-#36504-scale work and well 
beyond a single-endpoint POC. They are explicitly out of scope here.
   3. **`ti_heartbeat` and other production hot paths.** Heartbeat is what 
matters most for throughput, but it does state-machine mutation + RTIF logic + 
tracing emission, all of which would dilute the A/B signal with unrelated 
complexity. The POC picks the cleanest-conversion candidate 
(`get_variable_keys`) instead and reserves heartbeat for a follow-up if the 
numbers warrant it.
   4. **SQLite and MySQL.** Single-writer locks on SQLite (via aiosqlite) and 
the aiomysql-vs-asyncmy driver picking on MySQL would muddy the measurement. 
Postgres-only (asyncpg) for this pass.
   5. **Newsfragment, PR, provider changes, prod rollout.** This is a POC under 
`dev/`. No newsfragment, no `provider.yaml` edit, no `chart/` edit. A follow-up 
migration PR is downstream of these numbers.
   
   ## What this POC actually does
   
   Three pieces of instrumentation, all under 
`airflow-core/src/airflow/api_fastapi/execution_api/` and 
`dev/async_session_poc/`:
   
   - **A real-route conversion** — `get_variable_keys` → `async def` + 
`AsyncSessionDep` + awaited `session.scalar/scalars`.
   - **A synthetic A/B router** at `/__bench/...`, mounted only when 
`AIRFLOW__BENCHMARK__ENABLE_ASYNC_SESSION_ROUTES=true` (exact-string gate; no 
`.lower()`, no truthy-set membership). Six routes — three pairs of identical 
SQL where the only difference is sync `SessionDep` vs async `AsyncSessionDep`:
     - `/select1/{sync,async}` — `SELECT 1` (per-request session overhead)
     - `/sleep/{sync,async}?ms=N` — `SELECT pg_sleep(N/1000.0)` (the 
threadpool-ceiling pedagogy)
     - `/var_count/{sync,async}` — `SELECT COUNT(*) FROM variable` (realistic 
aggregate; seeded with 1000 `bench_var_*` rows)
   - **A pure-Python `httpx.AsyncClient`-based harness** with a CSV/markdown 
reporting pipeline (`dev/async_session_poc/`). 24-cell sweep: 6 routes × {50, 
100, 200, 400} concurrency, 30s measured + 5s warmup per cell.
   
   ## Key outcomes
   
   Full numbers and per-route tables live in 
[`dev/async_session_poc/RESULTS.md`](RESULTS.md). The headline:
   
   ### Where async wins cleanly — c=50
   
   At concurrency 50 (below the 40-token AnyIO threadpool ceiling), async is 
consistently faster than sync on every route, with **zero error rate** on both 
sides:
   
   | route | sync RPS | async RPS | sync p99 | async p99 |
   |---|---:|---:|---:|---:|
   | `/select1` | 121.3 | 131.7 | 1749 ms | 1747 ms |
   | `/sleep` 500ms | 78.5 | **96.4** | 893 ms | **558 ms** |
   | `/var_count` | 156.6 | 175.2 | 1403 ms | 1378 ms |
   
   The `/sleep` row is the cleanest signal: identical 500 ms server-side work, 
~23% higher async throughput, ~38% better tail latency. **This is the "PR 
#36504 effect"** — at modest concurrency the async path avoids the per-request 
threadpool-dispatch overhead that the sync path pays.
   
   ### Where it inverts surprisingly — c≥100
   
   Above the threadpool ceiling, async becomes *slower* than sync on this 
single-uvicorn-worker setup, and both modes accumulate large queueing latency 
and httpx client-side timeouts:
   
   | route | concurrency | sync RPS | async RPS |
   |---|---:|---:|---:|
   | `/sleep` 500ms | 100 | 46.5 | 39.5 |
   | `/sleep` 500ms | **200** | **43.9** | **37.3** ⚠ |
   | `/sleep` 500ms | 400 | 44.5 | 29.2 |
   
   This is the **AC-5.4 inversion** the POC's acceptance criterion specifically 
calls out as diagnostic-required. We ran the inversion to ground and reached 
the following evidence-backed conclusion (full diagnostics in `RESULTS.md` § 
Anomalies, raw log at `dev/async_session_poc/diagnostics/pg_stat_activity.log`):
   
   - **R-6 (DB-pool saturation) — ruled out.** 1 Hz `pg_stat_activity` sampling 
during the full 24-cell sweep recorded a **peak active connection count of 51** 
— out of a calibrated pool of **600** (`sync_pool_size=40 + 
sync_max_overflow=560`; async engine reads the same cfg keys). The active count 
plateaus at ~40-41 (the sync threadpool's 40 workers each holding one 
connection) plus a small ~10-connection async overlay. The DB pool was never 
the gate.
   - **R-3 (sync threadpool exhaustion) — firing for sync as designed.** The 
active-conn mode at 40-41 is exactly the Starlette threadpool's 40-worker 
ceiling holding 40 simultaneous sync queries. This is the *intended* 
pedagogical effect for the sync side.
   - **The async path is bottlenecked above the DB pool.** With c=200 async 
awaiting `pg_sleep(0.5)`, ~200 simultaneous active connections were expected; 
we saw ~10 incremental async connections above the sync baseline. The 
bottleneck is upstream of the DB.
   - **Operative causes — R-2 (client GIL) and R-3-async (single uvicorn worker 
event loop), in combination.** The harness is a single Python process driving 
200+ concurrent `httpx.AsyncClient` requests; the API server is a single 
uvicorn worker whose event loop handles all async-route work + response framing 
on one Python interpreter. Sync routes get *effective* parallelism via the 
threadpool's 40 workers; async routes get only the serialization of one event 
loop.
   
   In short: **the async-route conversion does what it advertises — the route 
itself no longer blocks the event loop on DB I/O — but a single-worker uvicorn 
ceiling and a single-process client harness combine to hide that gain at high 
concurrency.**
   
   ## Why a single uvicorn worker matters
   
   This is the subtle part and the most important thing for the original PR 
thread's audience to internalize:
   
   - A sync route inside an async framework is dispatched to the AnyIO 
threadpool (default size **40**). 40 sync queries can run concurrently.
   - An async route stays on the event loop. With one uvicorn worker, **N** 
async coroutines can each `await` simultaneously — but every CPU-bound step 
(request parsing, response serialization, JSON encoding, middleware) is 
serialized on one Python interpreter.
   
   The threadpool gives sync routes a fixed-but-real form of parallelism even 
without process workers. Async routes need *either* (a) more uvicorn workers, 
*or* (b) lower-CPU-overhead handling per request, to fully exploit the 
unblocked event loop. We didn't run with multiple workers in this POC 
(deliberately, to isolate the route-level mechanism); that's the single-biggest 
follow-up.
   
   ## Next steps
   
   These are out of scope for this manifest and tracked here for the PR thread:
   
   1. **Re-run with `--workers 4` / `--workers 8`.** Repeat the same 24-cell 
sweep with a multi-worker uvicorn. Predicted: sync's ceiling tracks `workers × 
40` threadpool; async's ceiling rises with `workers`, eventually crossing sync 
at high concurrency. If that prediction holds, the original PR's win is real 
but contingent on running with reasonable worker counts — which is already 
standard production guidance.
   2. **Multi-process harness.** Run one harness process per logical core, sum 
the CSVs. Removes R-2 (client GIL) as a confound. Required to make c=200+ 
measurements trustworthy regardless of server-worker count.
   3. **Profile the async path with `py-spy`** during a c=200 `/sleep` run. 
Confirm where the time actually goes: connection acquisition, query dispatch, 
or response serialization.
   4. **Migrate `ti_heartbeat`.** Heartbeat is the production hot path (one 
call per task per heartbeat interval, per worker). Conversion touches 
state-machine + RTIF + tracing logic; needs a careful PR.
   5. **Async secrets backends.** Prerequisite to converting `get_variable` / 
`get_connection`. PR-#36504-scale work; tracked separately.
   6. **Pick a small batch of low-risk Execution-API read endpoints to 
migrate** in a real PR — list endpoints with simple `select` patterns, no 
SecretsBackend dependency, comprehensive existing test coverage. Candidates: 
variable-keys (already migrated here), connection-list (without value lookup), 
task-instance read endpoints.
   
   ## Reproducing
   
   Everything needed to re-run is in [`README.md`](README.md). One liner:
   
   ```bash
   source dev/async_session_poc/env_for_server.sh   # emitted by run_all.sh
   breeze start-airflow --backend postgres            # in one terminal
   bash dev/async_session_poc/run_all.sh              # in another
   ```
   
   Override sweep parameters via `BENCH_*` env vars (`BENCH_TARGET_BASE_URL`, 
`BENCH_DURATION`, `BENCH_WARMUP`, `BENCH_SLEEP_MS`, `BENCH_CONCURRENCIES`). 
Pool calibration knobs (`AIRFLOW__DATABASE__SQL_ALCHEMY_POOL_SIZE`, 
`AIRFLOW__DATABASE__SQL_ALCHEMY_MAX_OVERFLOW`) must reach the api-server 
process; `run_all.sh` writes `env_for_server.sh` for that purpose.
   
   Raw 24-cell results: [`results.csv`](results.csv). Analysis and anomaly 
diagnostics: [`RESULTS.md`](RESULTS.md). Captured `pg_stat_activity` log: 
[`diagnostics/pg_stat_activity.log`](diagnostics/pg_stat_activity.log).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [POC][WIP] Async SQLAlchemy sessions in Airflow [airflow]

Reply via email to