ephraimbuddy opened a new pull request, #25532:
URL: https://github.com/apache/airflow/pull/25532

   I'm not sure what's happening here but moving the session.rollback closer to 
the error stopped scheduler from crashing when there is an integrity error. I 
couldn't reproduce this in tests but can reproduce it consistently in airflow
   
   Here's how to reproduce the scheduler crash:
   Run this dag in main:
   ```python
   from datetime import datetime
   
   from airflow import DAG
   from airflow.decorators import task
   
   with DAG(dag_id='mvp_map_task_bug', start_date=datetime(2022, 1, 1), 
schedule_interval='@daily', catchup=False) as dag:
       @task
       def get_files():
           return [1,2,3,4,5,6]
   
       @task
       def download_files(file: str):
           print(f"{file}")
   
       files = download_files.expand(file=get_files())
   ```
   Stop the scheduler
   Reduce the list in the `get_files` to something like [1,2,3]
   Change this `difference` to `symmetric_difference`: 
https://github.com/apache/airflow/blob/171aaf017aee068d8e1b76121c8c75310c854d9e/airflow/models/dagrun.py#L1189
   
   Start the scheduler and clear the above task so it can run again. 
   Now, it'll try to create new TIs and the Scheduler will crash. Switch to 
this PR and try the same again. You will notice that the scheduler survived the 
crash and rollback was successful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to