ramziyassine opened a new issue, #27622: URL: https://github.com/apache/airflow/issues/27622
### Apache Airflow version Other Airflow 2 version (please specify below) ### What happened ### Deployment * Airflow Version: 2.4.1 * Infrastructure: AWS ECS * Number of DAG: 162 ``` Version: [v2.4.1](https://pypi.python.org/pypi/apache-airflow/2.4.1) Git Version: .release:2.4.1+7b979def75923ba28dd64e31e613043d29f34fce ``` ### The issue We have seen this issue when the Scheduler is trying to schedule **too many DAG (140+)** around the same time ``` [2022-11-11T00:15:00.311+0000] {{dagbag.py:196}} WARNING - Serialized DAG mongodb-assistedbreakdown-jobs-processes no longer exists [2022-11-11T00:15:00.312+0000] {{scheduler_job.py:763}} ERROR - Exception when executing SchedulerJob._run_scheduler_loop Traceback (most recent call last): File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 746, in _execute self._run_scheduler_loop() File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 866, in _run_scheduler_loop num_queued_tis = self._do_scheduling(session) File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 948, in _do_scheduling callback_to_run = self._schedule_dag_run(dag_run, session) File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1292, in _schedule_dag_run self._verify_integrity_if_dag_changed(dag_run=dag_run, session=session) File "/home/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 1321, in _verify_integrity_if_dag_changed dag_run.verify_integrity(session=session) File "/home/airflow/.local/lib/python3.7/site-packages/airflow/utils/session.py", line 72, in wrapper return func(*args, **kwargs) File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/dagrun.py", line 874, in verify_integrity dag = self.get_dag() File "/home/airflow/.local/lib/python3.7/site-packages/airflow/models/dagrun.py", line 484, in get_dag raise AirflowException(f"The DAG (.dag) for {self} needs to be set") airflow.exceptions.AirflowException: The DAG (.dag) for <DagRun mongodb-assistedbreakdown-jobs-processes @ 2022-11-10 00:10:00+00:00: scheduled__2022-11-10T00:10:00+00:00, state:running, queued_at: 2022-11-11 00:10:09.363852+00:00. externally triggered: False> needs to be set ``` Main Cause ``` raise AirflowException(f"The DAG (.dag) for {self} needs to be set") ``` [We believe this is happening here, airflow github](https://github.com/apache/airflow/blob/7b979def75923ba28dd64e31e613043d29f34fce/airflow/jobs/scheduler_job.py#L1318) We saw a large amount of Connection hitting our airflow Database, but CPU was around 60%. Is there any workaround or configuration that can help the scheduler not crash when this happen? ### What you think should happen instead Can the scheduler be safe, or when it come back to reschedule the dags that got stuck ### How to reproduce _No response_ ### Operating System Amazon Linux 2, Fargate deployment using the airflow Image ### Versions of Apache Airflow Providers _No response_ ### Deployment Other ### Deployment details AWS ECS Fargate ### Anything else _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
