Friend don't let friends use LocalExecutor in production. LocalExecutor is essentially a subprocess pool running in-process. When I wrote it originally I never thought it would ever be used in production. Celery / CeleryExecutor is more reasonable as Celery is a proper process/thread pool that's configurable and does what you want it to do. It's clearly lacking orchestration features like containerization and resource containment, but it's a decent solution if you don't have a k8s cluster laying around.
Max On Thu, Dec 12, 2019 at 10:27 PM Maulik Soneji <maulik.son...@gojek.com> wrote: > Hello, > > I realize that the scheduler is waiting for the tasks to be completed > before shutting down. > > The problem is that the scheduler stops sending heartbeat and just waits > for the task queue to be joined. > > Is there a way where we can horizontally scale the number of instances for > scheduler, so that if one scheduler instance is waiting for the task > queue to complete, the other scheduler instance is running the jobs and > sending heartbeat messages? > > Please share some context on how we can better scale scheduler to handle > cases of failure. > > Thanks and regards, > Maulik > > > > W > > > On Fri, Dec 13, 2019 at 9:18 AM Maulik Soneji <maulik.son...@gojek.com> > wrote: > > > Hello all, > > > > *TLDR*: We are using local executor with KubernetesPodOperator for our > > airflow dags. > > From stack trace of scheduler we see that it is waiting on queue to join. > > > > File: > "/usr/local/lib/python3.7/site-packages/airflow/executors/local_executor.py", > line 212, in end > > self.queue.join() > > > > Airflow version: 1.10.6 > > > > Usecase: We are using airflow for ETL pipeline to transform data from one > > bigquery table to another bigquery table. > > > > We have noticed that the scheduler frequently hangs without any detailed > > logs. > > > > Last lines are below: > > > > [2019-12-12 13:28:53,370] {settings.py:277} DEBUG - Disposing DB > connection pool (PID 912099) > > [2019-12-12 13:28:53,860] {dag_processing.py:693} INFO - Sending > termination message to manager. > > [2019-12-12 13:28:53,860] {scheduler_job.py:1492} INFO - Deactivating > DAGs that haven't been touched since 2019-12-12T01:28:52.682686+00:00 > > > > On checking the thread dumps of the running process we found that the > > scheduler is stuck waiting to join the tasks in queue. > > > > File: > "/usr/local/lib/python3.7/site-packages/airflow/executors/local_executor.py", > line 212, in end > > self.queue.join() > > > > Detailed thread dump is below for all processes running in the Local > > scheduler can be found here: https://pastebin.com/i6ChafWH > > > > Please help in checking why the scheduler is getting stuck waiting for > the > > queue to join. > > > > Thanks and Regards, > > Maulik > > >