romanzdk opened a new issue, #68317:
URL: https://github.com/apache/airflow/issues/68317
### Under which category would you file this issue?
Airflow Core
### Apache Airflow version
3.2.1
### What happened and how to reproduce it?
We upgraded production from **Airflow 3.1.8** to **3.2.1**, hit scheduling
issues, then attempted to roll back by:
1. Deploying Airflow **3.1.8** image
2. Running `airflow db downgrade -n 3.1.8`
Alembic migrations completed successfully. However, **scheduler and
triggerer immediately crashed** on startup with:
```
KeyError: <Encoding.VAR: '__var'>
```
in `BaseSerialization.deserialize`, while loading rows from the metadata
database (e.g. during `_schedule_all_dag_runs` or trigger deserialization).
### Environment
- **Upgrade path:** 3.1.8 → 3.2.1 → attempted rollback to 3.1.8
- **Database:** PostgreSQL
- **Executor:** KubernetesExecutor
Example traceback (scheduler):
```
File ".../scheduler_job_runner.py", line 1968, in _schedule_all_dag_runs
callback_tuples = [(run, self._schedule_dag_run(run, session=session))
for run in dag_runs]
...
File ".../airflow/utils/sqlalchemy.py", line 137, in process_result_value
return BaseSerialization.deserialize(value)
File ".../airflow/serialization/serialized_objects.py", line 897, in
deserialize
var = encoded_var[Encoding.VAR]
KeyError: <Encoding.VAR: '__var'>
```
Similar failures occur in the triggerer when deserializing
`trigger.encrypted_kwargs`.
### Root cause (our analysis)
There are **two separate layers** in the metadata DB:
| Layer | What downgrade migrations handle | What they do NOT handle |
|-------|-----------------------------------|-------------------------|
| Schema | Tables, columns, alembic revision | — |
| Row content | — | Serialized JSON blobs written while 3.2 was running |
While Airflow 3.2.x was running, it wrote metadata using **SDK serde** (see
[#59711](https://github.com/apache/airflow/pull/59711) and 3.2.1 release notes
— serde moved to `airflow.sdk.serde`). Examples:
- `trigger.encrypted_kwargs`
- `dag_run.conf` and related serialized columns
- Deferred task / trigger payloads
Airflow **3.1.8** reads these via legacy `BaseSerialization.deserialize()`,
which expects the `{__type, __var}` wrapper format. SDK-serde blobs do not have
`__var` at the top level → `KeyError`.
`airflow db downgrade` reverts the **schema** to 3.1.8-compatible structure
but does **not** rewrite existing row payloads back to 3.1 serialization format.
### What we tried
- `airflow db downgrade -n 3.1.8` — succeeds, but runtime still crashes
- Deploying the old 3.1.8 application image — correct for code, insufficient
for DB content
- Manual cleanup (risky): `DELETE FROM trigger;` + failing stuck `dag_run`
rows — unblocks partially but is not a safe general solution
**Only clean rollback path:** restore PostgreSQL from a backup taken
**before** the 3.2 upgrade.
### Expected behavior
The upgrade/downgrade documentation should clearly state:
1. **Downgrading Airflow major/minor versions is not fully supported**
without a metadata DB backup/restore.
2. **`airflow db downgrade` only reverts schema** (alembic migrations). It
does not migrate serialized row content.
3. After running 3.2.x against a database, rolling back to 3.1.x requires
either:
- Restoring a pre-3.2 DB backup, or
- Manual cleanup of incompatible rows (triggers, active dag runs with
3.2-format conf, etc.) — with data loss risk
4. The 3.2 serde migration
([#59711](https://github.com/apache/airflow/pull/59711)) affects trigger kwargs
and related fields; this is not reversed on downgrade.
Suggested doc locations:
- Upgrade guide / release notes for 3.2.0 / 3.2.1
- `docs/howto/upgrading.rst` or equivalent
- `airflow db downgrade` CLI help text
### Actual behavior
- Downgrade migrations report success
- Users reasonably assume DB is compatible with 3.1.8
- Scheduler/triggerer crashloop with opaque `KeyError: __var`
- No guidance on which tables/rows are affected or how to recover
### Suggested diagnostic queries
```sql
-- Triggers written under 3.2 SDK serde (may lack __var wrapper)
SELECT id, classpath, LEFT(encrypted_kwargs::text, 120)
FROM trigger
LIMIT 20;
-- Active dag runs that may carry 3.2-format conf
SELECT dag_id, run_id, state, LEFT(conf::text, 120)
FROM dag_run
WHERE state IN ('running', 'queued')
AND conf IS NOT NULL
AND conf::text NOT LIKE '%__var%';
```
### Related issues / PRs
- [#59711](https://github.com/apache/airflow/pull/59711) — SDK serde for
trigger/next kwargs
- [#64613](https://github.com/apache/airflow/issues/64613) — trigger
deserialization errors with external-event DAGs
- [#65973](https://github.com/apache/airflow/issues/65973) — asset trigger
kwargs format change 3.1.8 → 3.2.1
- [#65688](https://github.com/apache/airflow/pull/65688) — scheduler
UniqueViolation on downgrade 3.2.0 → 3.1.x (schema-level fix, not serde data)
- [#63434](https://github.com/apache/airflow/issues/63434),
[#63444](https://github.com/apache/airflow/issues/63444),
[#63535](https://github.com/apache/airflow/issues/63535) — other 3.2 → 3.1
downgrade migration failures
### Why this matters
Teams hitting issues on 3.2 may attempt downgrade as first recovery step.
Schema-successful downgrade with runtime failure is worse than a clear
"unsupported — restore from backup" message. We lost time debugging this as a
dependency/version mismatch before identifying the serde data layer.
### What you think should happen instead?
_No response_
### Operating System
_No response_
### Deployment
None
### Apache Airflow Provider(s)
_No response_
### Versions of Apache Airflow Providers
_No response_
### Official Helm Chart version
Not Applicable
### Kubernetes Version
_No response_
### Helm Chart configuration
_No response_
### Docker Image customizations
_No response_
### Anything else?
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]