irakl1s opened a new issue, #66524:
URL: https://github.com/apache/airflow/issues/66524

   
   ### Apache Airflow version
   
   3.2.1
   
   ### What happened
   
   Upgrading an existing deployment from Airflow 3.1.0 + 
`apache-airflow-providers-edge3==1.3.0` to Airflow 3.2.1 + 
`apache-airflow-providers-edge3==3.4.0`, the `airflow db migrate` command 
**reports success** but leaves the edge3 schema stale: the 
`alembic_version_edge3` table is stamped to head `a09c3ee8e1d3`, but the 
columns that should have been added by migrations `0002` 
(`edge_worker.concurrency`) and `0004` (`edge_job.team_name`, 
`edge_worker.team_name`) are **missing from the database**.
   
   Any Airflow process that subsequently queries the edge tables (e.g. the 
scheduler) fails with:
   
   ```
   sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn)
   column edge_job.team_name does not exist
   ```
   
   `airflow db migrate` itself reports `Database migration done!` and exits 0. 
There is no log line, warning, or error that signals the schema is broken — 
operators have no way to detect this from the migrate output alone.
   
   The smoking-gun lines in the `airflow db migrate` output are:
   
   ```
   Upgrading the EdgeDBManager database
   Creating EdgeDBManager tables from the ORM     ← fallback path fires
   Running stamp_revision  -> a09c3ee8e1d3        ← stamps head, no migration 
bodies execute
   EdgeDBManager tables have been created from the ORM
   ```
   
   There is **no** `Running upgrade 9d34dfc2de06 -> b3c4d5e6f7a8 ...` line (or 
any subsequent edge3 chain entry) — i.e. no `op.add_column` ever runs.
   
   ### Expected behavior
   
   `airflow db migrate` should detect that the edge3 tables already exist (from 
the pre-`EdgeDBManager` era — edge3 ≤ 3.0.x had its models registered against 
`Base.metadata` and tables were created via `metadata.create_all`), stamp 
`alembic_version_edge3` to the **base** revision `9d34dfc2de06`, and then walk 
the Alembic chain to head, applying every `op.add_column`/`op.alter_column` 
along the way.
   
   This logic is **already implemented** in `EdgeDBManager.initdb`. The bug is 
that this code path is unreachable on the actual upgrade scenario — see "Root 
cause" below.
   
   ### How to reproduce
   
   Self-contained reproducer using only public images and public PyPI packages.
   
   `docker-compose.yml`:
   
   ```yaml
   services:
     postgres:
       image: postgres:15
       environment:
         POSTGRES_USER: airflow
         POSTGRES_PASSWORD: airflow
         POSTGRES_DB: airflow
       healthcheck:
         test: ["CMD", "pg_isready", "-U", "airflow"]
         interval: 5s
         timeout: 5s
         retries: 10
   
     airflow-source:
       image: apache/airflow:3.1.0-python3.10
       profiles: ["source"]
       depends_on:
         postgres: { condition: service_healthy }
       environment:
         AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: 
postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
         AIRFLOW__CORE__EXECUTOR: airflow.providers.edge3.executors.EdgeExecutor
         AIRFLOW__CORE__LOAD_EXAMPLES: "false"
         AIRFLOW__CORE__AUTH_MANAGER: 
airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManager
       entrypoint: ["bash", "-c"]
       command:
         - |
           set -e
           pip install --quiet apache-airflow-providers-edge3==1.3.0
           airflow db migrate
   
     airflow-target:
       image: apache/airflow:3.2.1-python3.10
       profiles: ["target"]
       depends_on:
         postgres: { condition: service_healthy }
       environment:
         AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: 
postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
         AIRFLOW__CORE__EXECUTOR: airflow.providers.edge3.executors.EdgeExecutor
         AIRFLOW__CORE__LOAD_EXAMPLES: "false"
         AIRFLOW__CORE__AUTH_MANAGER: 
airflow.providers.fab.auth_manager.fab_auth_manager.FabAuthManager
       entrypoint: ["bash", "-c"]
       command:
         - |
           set -e
           pip install --quiet apache-airflow-providers-edge3==3.4.0
           airflow db migrate
   ```
   
   Steps:
   
   ```bash
   # 1. Start Postgres
   docker compose up -d postgres
   
   # 2. Bootstrap source state (Airflow 3.1.0 + edge3 1.3.0)
   #    This creates edge_job/edge_worker/edge_logs via Base.metadata.create_all
   #    and stamps alembic_version + alembic_version_fab. NO 
alembic_version_edge3.
   docker compose --profile source run --rm airflow-source
   
   # 3. Verify source state
   docker exec -i $(docker compose ps -q postgres) psql -U airflow -d airflow 
-c "\dt"
   # Expect: edge_job, edge_worker, edge_logs present
   # Expect: alembic_version, alembic_version_fab present, NO 
alembic_version_edge3
   
   # 4. Run the upgrade (Airflow 3.2.1 + edge3 3.4.0)
   docker compose --profile target run --rm airflow-target
   
   # 5. Observe broken state
   docker exec -i $(docker compose ps -q postgres) psql -U airflow -d airflow 
<<'SQL'
   SELECT version_num FROM alembic_version_edge3;
   -- expected fix:    9d34dfc2de06 base, walked up to a09c3ee8e1d3 with 
op.add_column applied
   -- actual bug:      a09c3ee8e1d3 stamped, no op.add_column ever ran
   
   SELECT EXISTS(SELECT 1 FROM information_schema.columns WHERE 
table_name='edge_job'    AND column_name='team_name')   AS edge_job_team_name,
          EXISTS(SELECT 1 FROM information_schema.columns WHERE 
table_name='edge_worker' AND column_name='team_name')   AS 
edge_worker_team_name,
          EXISTS(SELECT 1 FROM information_schema.columns WHERE 
table_name='edge_worker' AND column_name='concurrency') AS 
edge_worker_concurrency;
   -- actual: f | f | f
   
   SELECT team_name FROM edge_job LIMIT 1;
   -- ERROR: column "team_name" does not exist
   SQL
   ```
   
   `airflow db migrate` in step 4 exits 0 with `Database migration done!`. This 
is the operator-visible failure mode.
   
   ### Root cause
   
   Verified at tags `apache-airflow/3.2.1` (core) and `providers-edge3/3.4.0` 
(provider).
   
   When `airflow db migrate` runs against a database where core 
`alembic_version` already exists (i.e. an upgrade, not a fresh install), the 
call routes through `airflow.utils.db.upgradedb()` → `_run_upgradedb()`. After 
the core upgrade completes, [`_run_upgradedb` calls 
`external_db_manager.upgradedb(...)` — not 
`.initdb(...)`](https://github.com/apache/airflow/blob/3.2.1/airflow-core/src/airflow/utils/db.py).
   
   `RunDBManager.upgradedb` iterates registered managers and invokes each 
manager's `upgradedb`. `EdgeDBManager` does **not** override `upgradedb`, so it 
falls through to 
[`BaseDBManager.upgradedb`](https://github.com/apache/airflow/blob/3.2.1/airflow-core/src/airflow/utils/db_manager.py):
   
   ```python
   def upgradedb(self, to_revision=None, from_revision=None, 
show_sql_only=False, use_migration_files=False):
       self.log.info("Upgrading the %s database", self.__class__.__name__)
       self._release_metadata_locks_if_needed()
       current_revision = self.get_current_revision()
   
       if not current_revision and not to_revision and not use_migration_files 
and not show_sql_only:
           self.create_db_from_orm()      # ← FALLBACK fires here (no 
alembic_version_edge3 row exists)
           return
   
       config = self.get_alembic_config()
       command.upgrade(config, revision=to_revision or "heads", 
sql=show_sql_only)
   ```
   
   `create_db_from_orm` does:
   
   ```python
   def create_db_from_orm(self):
       self.metadata.create_all(engine)   # SQLAlchemy: only creates *missing* 
tables (no-op here)
       command.stamp(config, "head")      # ← stamps directly to ORM head, no 
migrations run
   ```
   
   End state on a database that came from edge3 1.x (tables exist, no 
`alembic_version_edge3`):
   
   - `metadata.create_all` is a no-op because the tables already exist.
   - `command.stamp(config, "head")` writes the current ORM head 
(`a09c3ee8e1d3` for edge3 3.4.0) into `alembic_version_edge3`.
   - Migrations `0002` (`b3c4d5e6f7a8` — adds `edge_worker.concurrency`), 
`0003` (`8c275b6fbaa8` — `DateTime` → `TIMESTAMP` type fixes), and `0004` 
(`a09c3ee8e1d3` — adds `team_name` to both tables) **never execute**, even 
though Alembic now believes they have.
   
   Notably, 
[`EdgeDBManager.initdb`](https://github.com/apache/airflow/blob/providers-edge3/3.4.0/providers/edge3/src/airflow/providers/edge3/models/db.py)
 **does** handle this scenario correctly: if edge3 tables exist but 
`alembic_version_edge3` doesn't, it stamps to the base revision `9d34dfc2de06` 
and runs an incremental upgrade. But `initdb` is only called on the 
fresh-install path (when core `alembic_version` is also empty), not on an 
upgrade. The careful detection logic is dead code on this scenario.
   
   This bug is **not specific to edge3** — it will fire for any provider DB 
manager whose tables predated the manager itself, on any DB-already-initialized 
upgrade path.
   
   ### Operating System
   
   macOS (Docker Desktop) for the reproducer.
   
   ### Versions of Apache Airflow Providers
   
   - `apache-airflow-providers-edge3==1.3.0` (source)
   - `apache-airflow-providers-edge3==3.4.0` (target)
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Originally observed on:
   - Official Apache Airflow Helm chart **1.18 → 1.21 upgrade**
   - Apache Airflow **3.1.0 → 3.2.1**
   - `apache-airflow-providers-edge3` **1.3.0 → 3.4.0**
   
   Reproduced locally on `apache/airflow:3.1.0-python3.10` → 
`apache/airflow:3.2.1-python3.10` (compose stanzas above) with edge3 1.3.0 / 
3.4.0 from PyPI — the bug fires without any chart involved.
   
   ### Anything else
   
   #### Workaround for affected users
   
   Until an upstream fix lands, affected users can run the migrations manually 
by invoking `EdgeDBManager.initdb()` directly from any Python environment that 
can already reach the Airflow metadata DB (i.e. anywhere `airflow db migrate` 
would work). This bypasses the broken `upgradedb` routing and runs the 
official, correct upgrade-from-existing-tables logic — stamps to base 
`9d34dfc2de06`, then walks the chain applying every migration body:
   
   ```python
   from airflow.utils.session import create_session
   from airflow.providers.edge3.models.db import EdgeDBManager
   with create_session() as session:
       EdgeDBManager(session).initdb()
   ```
   
   Empirically verified on `apache/airflow:3.2.1-python3.10` + `edge3==3.4.0`. 
Log output:
   
   ```
   Running stamp_revision  -> 9d34dfc2de06
   Upgrading the EdgeDBManager database
   Running upgrade 9d34dfc2de06 -> b3c4d5e6f7a8, Add concurrency column to 
edge_worker table.
   Running upgrade b3c4d5e6f7a8 -> 8c275b6fbaa8, Fix migration file/ORM 
inconsistencies.
   Running upgrade 8c275b6fbaa8 -> a09c3ee8e1d3, Add team_name column to 
edge_job and edge_worker tables.
   Migrated the EdgeDBManager database
   ```
   
   End state: schema correct, `alembic_version_edge3 = a09c3ee8e1d3`, all 
migration bodies (including `0003`'s `DateTime → TIMESTAMP` fixes) applied.
   
   #### Suggested fix paths
   
   Any one of these would resolve the bug. Listed in order of decreasing 
preference:
   
   1. **Route `_run_upgradedb` to the manager's `initdb` when its version table 
is absent.** `EdgeDBManager.initdb` already contains the correct 
upgrade-from-existing-tables logic (verified by invoking it directly on the 
broken state — it stamps to base and walks the chain cleanly, see the 
workaround section above). The fix is essentially a routing change, no new 
logic required.
   
   2. **Fix `BaseDBManager.upgradedb` to mirror `initdb`'s pre-Alembic 
detection.** Before falling into `create_db_from_orm`, check if any of the 
manager's tables already exist; if so, stamp to base and run an incremental 
upgrade. Fixes the bug for every provider DB manager, not just edge3.
   
   3. **Override `upgradedb` on `EdgeDBManager`** to handle the 
upgrade-from-existing-tables case the same way `initdb` does. Local fix; 
doesn't help other providers in the same situation.
   
   #### Repro frequency
   
   100% deterministic on the reproducer above.
   
   #### Related
   
   - [#61155](https://github.com/apache/airflow/pull/61155) — Introduce 
`EdgeDBManager`. Author handled the upgrade-from-1.x case in `initdb` but 
`initdb` is unreachable on this path.
   - [#62234](https://github.com/apache/airflow/pull/62234) — Re-introducing 
`--use-migration-files` and ORM/migration consistency. Refined `initdb`.
   - [#62308](https://github.com/apache/airflow/pull/62308) — Auto-discover DB 
managers from `provider.yaml`. Auto-discovery works correctly; the bug is 
downstream of that.
   - [#61646](https://github.com/apache/airflow/pull/61646) — AIP-67 multi-team 
Edge Executor. Added migration `0004` whose absent execution is the 
operator-visible symptom.
   
   ### Are you willing to submit PR?
   
   Open to it for option (1) or (2) above with maintainer guidance on direction.
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to