JoseBueno-plytix opened a new issue, #68483:
URL: https://github.com/apache/airflow/issues/68483

   ### Under which category would you file this issue?
   
   Airflow Core
   
   ### Apache Airflow version
   
   3.1.8
   
   ### What happened and how to reproduce it?
   
   Hi everyone! I already mentioned this in the upgrades channel, but it is 
also kubernetes realted. i run into a problem when updating airflow from 3.1.8 
to 3.2.2
   
   Setup
   
   - Upgrading Apache Airflow 3.1.8 → 3.2.2, Python 3.13 → 3.14.
   - Helm chart 1.18.0, kept as-is (image-only bump)
   - Metadata DB is PostgreSQL, accessed through in-house pgbouncer (not 
Airflow's)
   - Previous 3.0 → 3.1.8 upgrade completed with no downtime, through the 
Airflow chart's own pgbouncer.
   
   Migrations involved
   - Upgrading to 3.2.2 applies the 3.2.0 set: ~22 core revisions,509b94a1042d→ 
head1d6611b6ab7c.
   - A separate FAB provider migration chain also runs as part of airflow db 
migrate: base6709f7a774b9(FAB 1.4.0) → head02ca36b0235b, "Fix fab db 
inconsistencies" (FAB 3.5.0). FAB migrations run automatically via airflow db 
migrate
   - The FAB floor for the 3.2 line was raised to 
apache-airflow-providers-fab>=3.6.0(PR #65524), for Python 3.13/3.14 
dependency-resolution reasons. The 3.1.8 image had an older FAB provider, so 
its FAB head was already at the 1.4.0 base — no pending FAB migration last time.
   - Known bug at these exact versions: issue #65402 (3.1.8 → 3.2.0, FAB 
3.6.0), where the Fix fab db inconsistencies migration drops 
theab_permission_view_role_role_id_fkeyforeign key; it fails on MySQL, was 
confirmed as a bug, and was closed by PR #65831.
   
   What happened:
   - The migration log shows all 22 core revisions applying in ~3 seconds, then 
the run moved on to the FAB chain (6709f7a774b9 → 02ca36b0235b).
   - The migration job pod then kept running for ~50 minutes. During that time 
the migration was not progressing (it seemed like it hanged, there weren't any 
errors in the migrations pod).
   - New 3.2.2 pods were in CrashLoopBackOff: their 
wait-for-airflow-migrationsinit container runs airflow db check-migrations, 
which times out after 60 seconds. The init log shows it counting up and then 
raising TimeoutError: There are still unapplied migrations after 60 seconds, 
withMigrationHead(s) in DB: {'1d6611b6ab7c'}equal toMigration Head(s) in Source 
Code: {'1d6611b6ab7c'}(i.e. the core head matched). These pods only reached the 
init container; their main containers never started.
   - Old 3.1.8 pods were crashing, reporting that the DB needed migration ("you 
need to run airflow db migrate in airflow 3.1.8").
   - After i manually deleted one of the crashlooping new pods, the migration 
pod appeared to finish (i think it somehow restarted) and the other pods began 
recovering.
   
   ### What you think should happen instead?
   
   Migrations are applied correctly and upgrade is completed without issues
   
   ### Operating System
   
   Ubuntu
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Apache Airflow Provider(s)
   
   _No response_
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Official Helm Chart version
   
   1.18.0
   
   ### Kubernetes Version
   
   Not Applicable
   
   ### Helm Chart configuration
   
   Not Applicable
   
   ### Docker Image customizations
   
   Not applicable
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to