shivanshs9 commented on issue #11899:
URL: https://github.com/apache/airflow/issues/11899#issuecomment-735769434
@ashb Ah sorry for the delay in response. The issue is still occurring,
unfortunately.
<details>
<summary>Scheduler logs</summary>
```
[2020-11-30 12:11:01,752] {{scheduler_job.py:1301}} ERROR - Exception when
executing SchedulerJob._run_scheduler_loop
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py",
line 1277, in _execute_context
self.dialect.do_execute(
File
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py",
line 593, in do_execute
cursor.execute(statement, parameters)
File
"/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line
255, in execute
self.errorhandler(self, exc, value)
File
"/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line
50, in defaulterrorhandler
raise errorvalue
File
"/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line
252, in execute
res = self._query(query)
File
"/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line
378, in _query
db.query(q)
File
"/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line
280, in query
_mysql.connection.query(self, query)
_mysql_exceptions.OperationalError: (1213, 'Deadlock found when trying to
get lock; try restarting transaction')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py",
line 1283, in _execute
self._run_scheduler_loop()
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py",
line 1385, in _run_scheduler_loop
num_queued_tis = self._do_scheduling(session)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py",
line 1543, in _do_scheduling
num_queued_tis =
self._critical_section_execute_task_instances(session=session)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py",
line 1140, in _critical_section_execute_task_instances
queued_tis = self._executable_task_instances_to_queued(max_tis,
session=session)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/utils/session.py",
line 59, in wrapper
return func(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/jobs/scheduler_job.py",
line 932, in _executable_task_instances_to_queued
task_instances_to_examine: List[TI] = with_row_locks(
File
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py",
line 3341, in all
return list(self)
File
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py",
line 3503, in __iter__
return self._execute_and_instances(context)
File
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/orm/query.py",
line 3528, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py",
line 1014, in execute
return meth(self, multiparams, params)
File
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py",
line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py",
line 1127, in _execute_clauseelement
ret = self._execute_context(
File
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py",
line 1317, in _execute_context
self._handle_dbapi_exception(
File
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py",
line 1511, in _handle_dbapi_exception
util.raise_(
File
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/util/compat.py",
line 178, in raise_
raise exception
File
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/base.py",
line 1277, in _execute_context
self.dialect.do_execute(
File
"/home/airflow/.local/lib/python3.8/site-packages/sqlalchemy/engine/default.py",
line 593, in do_execute
cursor.execute(statement, parameters)
File
"/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line
255, in execute
self.errorhandler(self, exc, value)
File
"/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line
50, in defaulterrorhandler
raise errorvalue
File
"/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line
252, in execute
res = self._query(query)
File
"/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line
378, in _query
db.query(q)
File
"/home/airflow/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line
280, in query
_mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1213,
'Deadlock found when trying to get lock; try restarting transaction')
[SQL: SELECT task_instance.try_number AS task_instance_try_number,
task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS
task_instance_dag_id, task_instance.execution_date AS
task_instance_execution_date, task_instance.start_date AS
task_instance_start_date, task_instance.end_date AS task_instance_end_date,
task_instance.duration AS task_instance_duration, task_instance.state AS
task_instance_state, task_instance.max_tries AS task_instance_max_tries,
task_instance.hostname AS task_instance_hostname, task_instance.unixname AS
task_instance_unixname, task_instance.job_id AS task_instance_job_id,
task_instance.pool AS task_instance_pool, task_instance.pool_slots AS
task_instance_pool_slots, task_instance.queue AS task_instance_queue,
task_instance.priority_weight AS task_instance_priority_weight,
task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS
task_instance_queued_dttm, task_instance.queued_by_job_id AS
task_instance_queued_by_job_id,
task_instance.pid AS task_instance_pid, task_instance.executor_config AS
task_instance_executor_config, task_instance.external_executor_id AS
task_instance_external_executor_id
FROM task_instance LEFT OUTER JOIN dag_run ON task_instance.dag_id =
dag_run.dag_id AND task_instance.execution_date = dag_run.execution_date INNER
JOIN dag ON task_instance.dag_id = dag.dag_id
WHERE (dag_run.run_id IS NULL OR dag_run.run_type != %s) AND dag.is_paused =
0 AND task_instance.state = %s
LIMIT %s FOR UPDATE]
[parameters: (<DagRunType.BACKFILL_JOB: 'backfill'>, 'scheduled', 29)]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2020-11-30 12:11:02,774] {{process_utils.py:95}} INFO - Sending
Signals.SIGTERM to GPID 50
[2020-11-30 12:11:12,964] {{process_utils.py:198}} INFO - Terminating child
PID: 335
[2020-11-30 12:11:12,964] {{process_utils.py:198}} INFO - Terminating child
PID: 336
[2020-11-30 12:11:12,964] {{process_utils.py:201}} INFO - Waiting up to 5
seconds for processes to exit...
[2020-11-30 12:11:17,974] {{process_utils.py:214}} INFO - SIGKILL processes
that did not terminate gracefully
[2020-11-30 12:11:17,975] {{process_utils.py:216}} INFO - Killing child PID:
335
[2020-11-30 12:11:17,979] {{process_utils.py:216}} INFO - Killing child PID:
336
[2020-11-30 12:11:18,015] {{process_utils.py:61}} INFO - Process
psutil.Process(pid=335, status='terminated', started='12:10:59') (335)
terminated with exit code None
[2020-11-30 12:11:18,440] {{process_utils.py:61}} INFO - Process
psutil.Process(pid=336, status='terminated', started='12:11:00') (336)
terminated with exit code None
[2020-11-30 12:12:02,785] {{process_utils.py:108}} WARNING - process
psutil.Process(pid=334, name='airflow schedul', status='sleeping',
started='12:10:59') did not respond to SIGTERM. Trying SIGKILL
[2020-11-30 12:12:02,786] {{process_utils.py:108}} WARNING - process
psutil.Process(pid=50, name='airflow scheduler -- DagFileProcessorManager',
status='sleeping', started='12:09:58') did not respond to SIGTERM. Trying
SIGKILL
[2020-11-30 12:12:02,787] {{process_utils.py:108}} WARNING - process
psutil.Process(pid=331, name='airflow schedul', status='sleeping',
started='12:10:58') did not respond to SIGTERM. Trying SIGKILL
[2020-11-30 12:12:02,801] {{process_utils.py:61}} INFO - Process
psutil.Process(pid=334, name='airflow schedul', status='terminated',
started='12:10:59') (334) terminated with exit code None
[2020-11-30 12:12:02,801] {{process_utils.py:61}} INFO - Process
psutil.Process(pid=50, name='airflow scheduler -- DagFileProcessorManager',
status='terminated', exitcode=<Negsignal.SIGKILL: -9>, started='12:09:58') (50)
terminated with exit code Negsignal.SIGKILL
[2020-11-30 12:12:02,802] {{process_utils.py:61}} INFO - Process
psutil.Process(pid=331, name='airflow schedul', status='terminated',
started='12:10:58') (331) terminated with exit code None
[2020-11-30 12:12:02,802] {{scheduler_job.py:1304}} INFO - Exited execute
loop
```
</details>
Airflow version:
```
airflow@ergo-chronos-scheduler-695d46c8d6-qgnvv:/opt/airflow$ airflow version
[2020-11-30 12:52:49,519] {{plugins_manager.py:283}} INFO - Loading 2
plugin(s) took 0.86 seconds
2.0.0b3
```
Weirdly, I think the process is being terminated (as in the logs) but it's
not exactly crashing the enclosing pod. So the container is not being restarted
either causing the scheduler to not work indefinitely.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]