1fanwang opened a new issue, #66779:
URL: https://github.com/apache/airflow/issues/66779
### Description
When a Dag is triggered with an oversized `conf` dict (via REST API,
`airflow dags trigger`, or `TriggerDagRunOperator`), the payload size is not
validated up-front. The DagRun row is created in memory, and the size error
surfaces only at flush / commit time, deep in the SQLAlchemy stack:
```
sqlalchemy.exc.DataError: (pymysql.err.DataError)
(1406, "Data too long for column 'conf' at row 1")
```
On MySQL with the `JSON` column type, the documented hard limit is
`max_allowed_packet` (default 64 MiB), but in practice the column-level limit
hit first depends on storage engine + row-format settings, and some deployments
use InnoDB row formats that cap individual values much lower.
The result for users:
- A `POST /api/v2/dags/{dag_id}/dagRuns` request returns 500 with an
internal DB error.
- The Dag run is sometimes half-created (depending on whether the failure
happens before or after the parent transaction commit).
- The user has no clear signal that the cause is conf size — they see a
generic 500 and have to escalate.
Issue #14159 (closed 2021, against the deprecated experimental API) covered
the same crash class. The bug never reached a validation-layer fix and the same
failure mode is reproducible today against the FastAPI public API.
### Use case / motivation
- API clients passing large dict payloads (model configs, feature flags,
embedded JSON) hit this without a clear error message.
- `TriggerDagRunOperator` instances composing data between Dag runs hit this
when the upstream task's XCom-ish output gets passed as conf.
### Proposal
Add a `[core] max_dagrun_conf_size_bytes` validation at the trigger boundary
(both `DAG.create_dagrun()` and the FastAPI route handler for `POST
/dags/{dag_id}/dagRuns`). Serialize the conf once via the standard JSON
encoder, measure the length, and raise a typed exception
(`DagRunConfTooLargeError`) if it exceeds the configured threshold (default
65,535 bytes — fits in the smallest MySQL JSON column variant; deployments can
raise it).
The error returns 413 Payload Too Large with a message guiding the user to
store large payloads externally (XCom, Variables, file storage) and pass
references in conf.
### Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]