irakl1s opened a new issue, #66524:
URL: https://github.com/apache/airflow/issues/66524
### Apache Airflow version
3.2.1
### What happened
Upgrading an existing deployment from Airflow 3.1.0 +
`apache-airflow-providers-edge3==1.3.0` to Airflow 3.2.1 +
`apache-airflow-providers-edge3==3.4.0`, the `airflow db migrate` command
**reports success** but leaves the edge3 schema stale: the
`alembic_version_edge3` table is stamped to head `a09c3ee8e1d3`, but the
columns that should have been added by migrations `0002`
(`edge_worker.concurrency`) and `0004` (`edge_job.team_name`,
`edge_worker.team_name`) are **missing from the database**.
Any Airflow process that subsequently queries the edge tables (e.g. the
scheduler) fails with:
```
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn)
column edge_job.team_name does not exist
```
`airflow db migrate` itself reports `Database migration done!` and exits 0.
There is no log line, warning, or error that signals the schema is broken —
operators have no way to detect this from the migrate output alone.
The smoking-gun lines in the `airflow db migrate` output are:
```
Upgrading the EdgeDBManager database
Creating EdgeDBManager tables from the ORM ← fallback path fires
Running stamp_revision -> a09c3ee8e1d3 ← stamps head, no migration
bodies execute
EdgeDBManager tables have been created from the ORM
```
There is **no** `Running upgrade 9d34dfc2de06 -> b3c4d5e6f7a8 ...` line (or
any subsequent edge3 chain entry) — i.e. no `op.add_column` ever runs.
### Expected behavior
`airflow db migrate` should detect that the edge3 tables already exist (from
the pre-`EdgeDBManager` era — edge3 ≤ 3.0.x had its models registered against
`Base.metadata` and tables were created via `metadata.create_all`), stamp
`alembic_version_edge3` to the **base** revision `9d34dfc2de06`, and then walk
the Alembic chain to head, applying every `op.add_column`/`op.alter_column`
along the way.
This logic is **already implemented** in `EdgeDBManager.initdb`. The bug is
that this code path is unreachable on the actual upgrade scenario — see "Root
cause" below.
### How to reproduce
Self-contained reproducer using only public images and public PyPI packages.
`docker-compose.yml`:
```yaml
services:
postgres:
image: postgres:15
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
timeout: 5s
retries: 10
airflow-source:
image: apache/airflow:3.1.0-python3.10
profiles: ["source"]
depends_on:
postgres: { condition: service_healthy }
environment:
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN:
postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
AIRFLOW__CORE__EXECUTOR: airflow.providers.edge3.executors.EdgeExecutor
AIRFLOW__CORE__LOAD_EXAMPLES: "false"
AIRFLOW__CORE__AUTH_MANAGER:
airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManager
entrypoint: ["bash", "-c"]
command:
- |
set -e
pip install --quiet apache-airflow-providers-edge3==1.3.0
airflow db migrate
airflow-target:
image: apache/airflow:3.2.1-python3.10
profiles: ["target"]
depends_on:
postgres: { condition: service_healthy }
environment:
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN:
postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
AIRFLOW__CORE__EXECUTOR: airflow.providers.edge3.executors.EdgeExecutor
AIRFLOW__CORE__LOAD_EXAMPLES: "false"
AIRFLOW__CORE__AUTH_MANAGER:
airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManager
entrypoint: ["bash", "-c"]
command:
- |
set -e
pip install --quiet apache-airflow-providers-edge3==3.4.0
airflow db migrate
```
Steps:
```bash
# 1. Start Postgres
docker compose up -d postgres
# 2. Bootstrap source state (Airflow 3.1.0 + edge3 1.3.0)
# This creates edge_job/edge_worker/edge_logs via Base.metadata.create_all
# and stamps alembic_version + alembic_version_fab. NO
alembic_version_edge3.
docker compose --profile source run --rm airflow-source
# 3. Verify source state
docker exec -i $(docker compose ps -q postgres) psql -U airflow -d airflow
-c "\dt"
# Expect: edge_job, edge_worker, edge_logs present
# Expect: alembic_version, alembic_version_fab present, NO
alembic_version_edge3
# 4. Run the upgrade (Airflow 3.2.1 + edge3 3.4.0)
docker compose --profile target run --rm airflow-target
# 5. Observe broken state
docker exec -i $(docker compose ps -q postgres) psql -U airflow -d airflow
<<'SQL'
SELECT version_num FROM alembic_version_edge3;
-- expected fix: 9d34dfc2de06 base, walked up to a09c3ee8e1d3 with
op.add_column applied
-- actual bug: a09c3ee8e1d3 stamped, no op.add_column ever ran
SELECT EXISTS(SELECT 1 FROM information_schema.columns WHERE
table_name='edge_job' AND column_name='team_name') AS edge_job_team_name,
EXISTS(SELECT 1 FROM information_schema.columns WHERE
table_name='edge_worker' AND column_name='team_name') AS
edge_worker_team_name,
EXISTS(SELECT 1 FROM information_schema.columns WHERE
table_name='edge_worker' AND column_name='concurrency') AS
edge_worker_concurrency;
-- actual: f | f | f
SELECT team_name FROM edge_job LIMIT 1;
-- ERROR: column "team_name" does not exist
SQL
```
`airflow db migrate` in step 4 exits 0 with `Database migration done!`. This
is the operator-visible failure mode.
### Root cause
Verified at tags `apache-airflow/3.2.1` (core) and `providers-edge3/3.4.0`
(provider).
When `airflow db migrate` runs against a database where core
`alembic_version` already exists (i.e. an upgrade, not a fresh install), the
call routes through `airflow.utils.db.upgradedb()` → `_run_upgradedb()`. After
the core upgrade completes, [`_run_upgradedb` calls
`external_db_manager.upgradedb(...)` — not
`.initdb(...)`](https://github.com/apache/airflow/blob/3.2.1/airflow-core/src/airflow/utils/db.py).
`RunDBManager.upgradedb` iterates registered managers and invokes each
manager's `upgradedb`. `EdgeDBManager` does **not** override `upgradedb`, so it
falls through to
[`BaseDBManager.upgradedb`](https://github.com/apache/airflow/blob/3.2.1/airflow-core/src/airflow/utils/db_manager.py):
```python
def upgradedb(self, to_revision=None, from_revision=None,
show_sql_only=False, use_migration_files=False):
self.log.info("Upgrading the %s database", self.__class__.__name__)
self._release_metadata_locks_if_needed()
current_revision = self.get_current_revision()
if not current_revision and not to_revision and not use_migration_files
and not show_sql_only:
self.create_db_from_orm() # ← FALLBACK fires here (no
alembic_version_edge3 row exists)
return
config = self.get_alembic_config()
command.upgrade(config, revision=to_revision or "heads",
sql=show_sql_only)
```
`create_db_from_orm` does:
```python
def create_db_from_orm(self):
self.metadata.create_all(engine) # SQLAlchemy: only creates *missing*
tables (no-op here)
command.stamp(config, "head") # ← stamps directly to ORM head, no
migrations run
```
End state on a database that came from edge3 1.x (tables exist, no
`alembic_version_edge3`):
- `metadata.create_all` is a no-op because the tables already exist.
- `command.stamp(config, "head")` writes the current ORM head
(`a09c3ee8e1d3` for edge3 3.4.0) into `alembic_version_edge3`.
- Migrations `0002` (`b3c4d5e6f7a8` — adds `edge_worker.concurrency`),
`0003` (`8c275b6fbaa8` — `DateTime` → `TIMESTAMP` type fixes), and `0004`
(`a09c3ee8e1d3` — adds `team_name` to both tables) **never execute**, even
though Alembic now believes they have.
Notably,
[`EdgeDBManager.initdb`](https://github.com/apache/airflow/blob/providers-edge3/3.4.0/providers/edge3/src/airflow/providers/edge3/models/db.py)
**does** handle this scenario correctly: if edge3 tables exist but
`alembic_version_edge3` doesn't, it stamps to the base revision `9d34dfc2de06`
and runs an incremental upgrade. But `initdb` is only called on the
fresh-install path (when core `alembic_version` is also empty), not on an
upgrade. The careful detection logic is dead code on this scenario.
This bug is **not specific to edge3** — it will fire for any provider DB
manager whose tables predated the manager itself, on any DB-already-initialized
upgrade path.
### Operating System
macOS (Docker Desktop) for the reproducer.
### Versions of Apache Airflow Providers
- `apache-airflow-providers-edge3==1.3.0` (source)
- `apache-airflow-providers-edge3==3.4.0` (target)
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
Originally observed on:
- Official Apache Airflow Helm chart **1.18 → 1.21 upgrade**
- Apache Airflow **3.1.0 → 3.2.1**
- `apache-airflow-providers-edge3` **1.3.0 → 3.4.0**
Reproduced locally on `apache/airflow:3.1.0-python3.10` →
`apache/airflow:3.2.1-python3.10` (compose stanzas above) with edge3 1.3.0 /
3.4.0 from PyPI — the bug fires without any chart involved.
### Anything else
#### Workaround for affected users
Until an upstream fix lands, affected users can run the migrations manually
by invoking `EdgeDBManager.initdb()` directly from any Python environment that
can already reach the Airflow metadata DB (i.e. anywhere `airflow db migrate`
would work). This bypasses the broken `upgradedb` routing and runs the
official, correct upgrade-from-existing-tables logic — stamps to base
`9d34dfc2de06`, then walks the chain applying every migration body:
```python
from airflow.utils.session import create_session
from airflow.providers.edge3.models.db import EdgeDBManager
with create_session() as session:
EdgeDBManager(session).initdb()
```
Empirically verified on `apache/airflow:3.2.1-python3.10` + `edge3==3.4.0`.
Log output:
```
Running stamp_revision -> 9d34dfc2de06
Upgrading the EdgeDBManager database
Running upgrade 9d34dfc2de06 -> b3c4d5e6f7a8, Add concurrency column to
edge_worker table.
Running upgrade b3c4d5e6f7a8 -> 8c275b6fbaa8, Fix migration file/ORM
inconsistencies.
Running upgrade 8c275b6fbaa8 -> a09c3ee8e1d3, Add team_name column to
edge_job and edge_worker tables.
Migrated the EdgeDBManager database
```
End state: schema correct, `alembic_version_edge3 = a09c3ee8e1d3`, all
migration bodies (including `0003`'s `DateTime → TIMESTAMP` fixes) applied.
#### Suggested fix paths
Any one of these would resolve the bug. Listed in order of decreasing
preference:
1. **Route `_run_upgradedb` to the manager's `initdb` when its version table
is absent.** `EdgeDBManager.initdb` already contains the correct
upgrade-from-existing-tables logic (verified by invoking it directly on the
broken state — it stamps to base and walks the chain cleanly, see the
workaround section above). The fix is essentially a routing change, no new
logic required.
2. **Fix `BaseDBManager.upgradedb` to mirror `initdb`'s pre-Alembic
detection.** Before falling into `create_db_from_orm`, check if any of the
manager's tables already exist; if so, stamp to base and run an incremental
upgrade. Fixes the bug for every provider DB manager, not just edge3.
3. **Override `upgradedb` on `EdgeDBManager`** to handle the
upgrade-from-existing-tables case the same way `initdb` does. Local fix;
doesn't help other providers in the same situation.
#### Repro frequency
100% deterministic on the reproducer above.
#### Related
- [#61155](https://github.com/apache/airflow/pull/61155) — Introduce
`EdgeDBManager`. Author handled the upgrade-from-1.x case in `initdb` but
`initdb` is unreachable on this path.
- [#62234](https://github.com/apache/airflow/pull/62234) — Re-introducing
`--use-migration-files` and ORM/migration consistency. Refined `initdb`.
- [#62308](https://github.com/apache/airflow/pull/62308) — Auto-discover DB
managers from `provider.yaml`. Auto-discovery works correctly; the bug is
downstream of that.
- [#61646](https://github.com/apache/airflow/pull/61646) — AIP-67 multi-team
Edge Executor. Added migration `0004` whose absent execution is the
operator-visible symptom.
### Are you willing to submit PR?
Open to it for option (1) or (2) above with maintainer guidance on direction.
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]