kaxil opened a new pull request, #67878:
URL: https://github.com/apache/airflow/pull/67878

   Once this PR is merged, the standalone DAG processor (`airflow 
dag-processor`) no longer connects to the metadata
   database directly. It persists parse results and reads all metadata through 
the API server, the
   same way workers already operate. This removes one of the last few 
components that
   runs user-adjacent code while also holding a direct database connection.
   
   Persistence (serialized DAGs, import errors, warnings), stale-DAG and 
orphaned-import-error
   reconciliation, bundle sync and state, priority-parse-request and callback 
claiming, and the
   processor's own `Job` liveness record all go through a new `/dag-processing` 
API app. Parse-time
   and bundle-initialization `Connection`/`Variable` reads resolve through the 
Execution API.
   
   ## What changed
   
   - New `/dag-processing` FastAPI sub-app mounted on the API server
     (`airflow.api_fastapi.dag_processing`), split into `app.py` (routes), 
`datamodels.py`, and
     `security.py`.
   - New `DagProcessingApiClient` (httpx) used by the processor: pooled, with 
bounded retry/backoff
     and a startup readiness wait.
   - `DagFileProcessorManager` routes all persistence and metadata reads 
through the client.
     Bundle-initialization credentials resolve through the Execution API (the 
same path workers and
     triggerers use), so a git connection stored in the metadata database keeps 
working without
     direct DB access.
   - New config `[core] dag_processing_api_server_url` (defaults to the 
`/dag-processing` mount of
     the configured API server) and `[dag_processor] jwt_audience`.
   
   ## Breaking change
   
   The direct-database path is removed: the DAG processor now requires a 
reachable API server that
   mounts the `dag-processing` app (`airflow api-server --apps all`, or include 
`dag-processing`). A
   deployment that previously ran `airflow dag-processor` with only a database 
connection must now
   also run the API server. See the newsfragment.
   
   ## Design notes
   
   
   - **Auth.** The processor self-signs a token for `[dag_processor] 
jwt_audience` with the
     deployment signing key, and the endpoints validate it via `JWTBearer`. 
Validation goes through
     the same `get_sig_validation_args` path as the Execution API, so a 
deployment that configures
     `[api_auth] trusted_jwks_url` validates externally-issued tokens for 
`/dag-processing` exactly
     as it does for `/execution`. `/health` stays unauthenticated for readiness 
probes.
   - **Resilience.** Per-loop API calls are guarded so a transient API outage 
skips a cycle instead
     of crashing the processor, the heartbeat is throttled, and startup waits 
for API readiness.
   
   ## Config
   
   ```ini
   [core]
   # optional; defaults to the /dag-processing sibling mount of 
execution_api_server_url
   dag_processing_api_server_url = http://api-server:8080/dag-processing
   
   [dag_processor]
   # optional; mirrors [execution_api] jwt_audience
   jwt_audience = urn:airflow.apache.org:dag-processing
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to