ephraimbuddy opened a new pull request, #25532:
URL: https://github.com/apache/airflow/pull/25532
I'm not sure what's happening here but moving the session.rollback closer to
the error stopped scheduler from crashing when there is an integrity error. I
couldn't reproduce this in tests but can reproduce it consistently in airflow
Here's how to reproduce the scheduler crash:
Run this dag in main:
```python
from datetime import datetime
from airflow import DAG
from airflow.decorators import task
with DAG(dag_id='mvp_map_task_bug', start_date=datetime(2022, 1, 1),
schedule_interval='@daily', catchup=False) as dag:
@task
def get_files():
return [1,2,3,4,5,6]
@task
def download_files(file: str):
print(f"{file}")
files = download_files.expand(file=get_files())
```
Stop the scheduler
Reduce the list in the `get_files` to something like [1,2,3]
Change this `difference` to `symmetric_difference`:
https://github.com/apache/airflow/blob/171aaf017aee068d8e1b76121c8c75310c854d9e/airflow/models/dagrun.py#L1189
Start the scheduler and clear the above task so it can run again.
Now, it'll try to create new TIs and the Scheduler will crash. Switch to
this PR and try the same again. You will notice that the scheduler survived the
crash and rollback was successful.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]