vatsrahul1001 opened a new issue, #63535:
URL: https://github.com/apache/airflow/issues/63535

   ### Apache Airflow version
   
   main (development)
   
   ### If "Other Airflow 3 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   Migration 0101_3_2_0_ui_improvements_for_deadlines downgrade creates a 
savepoint for every serialized_dag row via conn.begin_nested() to isolate 
per-DAG errors during deadline alert data restoration
   
   ```
   for serialized_dag_id, dag_id, data, data_compressed in batch_results:
       savepoint = conn.begin_nested()
       try:
           # ... process DAG ...
           savepoint.commit()
       except Exception as e:
           savepoint.rollback()
   ```
   
   On PostgreSQL, locks acquired within a savepoint are not released when the 
savepoint is committed — they persist in the parent transaction until the outer 
COMMIT. Since the entire migration runs in a single Alembic transaction, lock 
entries accumulate across all DAGs.
   PostgreSQL's shared lock table size is max_locks_per_transaction × 
max_connections (default: 64 × 100 = 6,400 entries). Each savepoint acquires 
multiple lock entries (relation-level locks on serialized_dag, deadline_alert, 
plus the savepoint virtual transaction ID). With thousands of DAGs the lock 
table overflows.
   
   Observed behavior:
   With 100K serialized_dag entries the migration failed after processing ~20K 
DAGs (batch 2 of 11):
   
   `psycopg2.errors.OutOfMemory: out of shared memoryHINT: You might need to 
increase max_locks_per_transaction.CONTEXT: parallel worker`
   
   This was followed by a secondary RecursionError as SQLAlchemy tried to 
unwind 970+ nested savepoint objects via recursive _cancel() calls during 
rollback:
   `
   RecursionError: maximum recursion depth exceeded`
   
   Deployments with more than ~1K–2K DAGs will likely hit this on default 
PostgreSQL settings. The exact threshold depends on how many lock entries each 
savepoint consumes and the configured max_locks_per_transaction and 
max_connections.
   
   ### What you think should happen instead?
   
   _No response_
   
   ### How to reproduce
   
   Steps to reproduce:
   
   - Start Airflow on 3.1.8 with PostgreSQL backend (default 
max_locks_per_transaction = 64)
   - Create 2K+ DAGs into serialized_dag with deadline alert data
   - Upgrade to 3.2.0 (airflow db migrate)
   - Run airflow db downgrade -n 3.1.8
   - Migration 0101 fails with out of shared memory error
   
   ### Operating System
   
   Macos
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to