Hi all, I use Airflow v1.7.1.3 with the local scheduler and I encounter a problem with the scheduler : For some reason, the airflow database is no more accessible so the scheduler display the OperationalError below. My problem is the scheduler does not kill itself after this error, it is running but it does not run any DAG any more. I cannot automatically restart it with Supervisor because its process is always displayed as runnning. Each time I have a network error, Airflow display this error and enters in this "zombie" mode, and my DAG are not processed.
Have you heard about this problem, any suggestions? 29/09/2016 21:09:53Traceback (most recent call last): 29/09/2016 21:09:53 File "/usr/bin/airflow", line 15, in <module> 29/09/2016 21:09:53 args.func(args) 29/09/2016 21:09:53 File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 455, in scheduler 29/09/2016 21:09:53 job.run() 29/09/2016 21:09:53 File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 173, in run 29/09/2016 21:09:53 self._execute() 29/09/2016 21:09:53 File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 712, in _execute 29/09/2016 21:09:53 paused_dag_ids = dagbag.paused_dags() 29/09/2016 21:09:53 File "/usr/lib/python2.7/site-packages/airflow/models.py", line 429, in paused_dags 29/09/2016 21:09:53 DagModel.is_paused == True)] 29/09/2016 21:09:53 File "/usr/lib/python2.7/site- packages/sqlalchemy/orm/query.py", line 2761, in __iter__ 29/09/2016 21:09:53 return self._execute_and_instances(context) 29/09/2016 21:09:53 File "/usr/lib/python2.7/site- packages/sqlalchemy/orm/query.py", line 2774, in _execute_and_instances 29/09/2016 21:09:53 close_with_result=True) 29/09/2016 21:09:53 File "/usr/lib/python2.7/site- packages/sqlalchemy/orm/query.py", line 2765, in _connection_from_session 29/09/2016 21:09:53 **kw) 29/09/2016 21:09:53 File "/usr/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 893, in connection 29/09/2016 21:09:53 execution_options=execution_options) 29/09/2016 21:09:53 File "/usr/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 898, in _connection_for_bind 29/09/2016 21:09:53 engine, execution_options) 29/09/2016 21:09:53 File "/usr/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 334, in _connection_for_bind 29/09/2016 21:09:53 conn = bind.contextual_connect() 29/09/2016 21:09:53 File "/usr/lib/python2.7/site- packages/sqlalchemy/engine/base.py", line 2039, in contextual_connect 29/09/2016 21:09:53 self._wrap_pool_connect(self.pool.connect, None), 29/09/2016 21:09:53 File "/usr/lib/python2.7/site- packages/sqlalchemy/engine/base.py", line 2078, in _wrap_pool_connect 29/09/2016 21:09:53 e, dialect, self) 29/09/2016 21:09:53 File "/usr/lib/python2.7/site- packages/sqlalchemy/engine/base.py", line 1405, in _handle_dbapi_exception_ noconnection 29/09/2016 21:09:53 exc_info 29/09/2016 21:09:53 File "/usr/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 202, in raise_from_cause 29/09/2016 21:09:53 reraise(type(exception), exception, tb=exc_tb, cause=cause) 29/09/2016 21:09:53 File "/usr/lib/python2.7/site- packages/sqlalchemy/engine/base.py", line 2074, in _wrap_pool_connect 29/09/2016 21:09:53 return fn() 29/09/2016 21:09:53 File "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 376, in connect 29/09/2016 21:09:53 return _ConnectionFairy._checkout(self) 29/09/2016 21:09:53 File "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 713, in _checkout 29/09/2016 21:09:53 fairy = _ConnectionRecord.checkout(pool) 29/09/2016 21:09:53 File "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 485, in checkout 29/09/2016 21:09:53 rec.checkin() 29/09/2016 21:09:53 File "/usr/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__ 29/09/2016 21:09:53 compat.reraise(exc_type, exc_value, exc_tb) 29/09/2016 21:09:53 File "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 482, in checkout 29/09/2016 21:09:53 dbapi_connection = rec.get_connection() 29/09/2016 21:09:53 File "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 563, in get_connection 29/09/2016 21:09:53 self.connection = self.__connect() 29/09/2016 21:09:53 File "/usr/lib/python2.7/site-packages/sqlalchemy/pool.py", line 607, in __connect 29/09/2016 21:09:53 connection = self.__pool._invoke_creator(self) 29/09/2016 21:09:53 File "/usr/lib/python2.7/site- packages/sqlalchemy/engine/strategies.py", line 97, in connect 29/09/2016 21:09:53 return dialect.connect(*cargs, **cparams) 29/09/2016 21:09:53 File "/usr/lib/python2.7/site- packages/sqlalchemy/engine/default.py", line 385, in connect 29/09/2016 21:09:53 return self.dbapi.connect(*cargs, **cparams) 29/09/2016 21:09:53 File "/usr/lib/python2.7/site- packages/psycopg2/__init__.py", line 164, in connect 29/09/2016 21:09:53 conn = _connect(dsn, connection_factory=connection_factory, async=async) 29/09/2016 21:09:53sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not translate host name "db-airflow" to address: Name does not resolve