Re: Scheduler Stuck when using LocalExecutor and KubernetesPodOperator

Tao Feng Thu, 12 Dec 2019 23:31:43 -0800

Curious to know why you want to run k8spodoperator while you have k8s
executor in prod? Could it solve for a different use case that cannot solve
by k8s executor?


On Thu, Dec 12, 2019 at 11:13 PM Maulik Soneji <maulik.son...@gojek.com>
wrote:

> Hi Maxime,
>
> We have been using KubernetesExecutor for our ETL use cases.
>
> This is a separate instance of airflow where we have to run an image using
> KubernetesPodOperator, having KubernetesExecutor along with
> KubernetesPodOperator would create too many pods and we experienced
> scheduling delays while running the task with this approach. That is the
> reason why we shifted to using LocalExecutor along with
> KubernetesPodOperator.
>
> I realize that using LocalExecutor would not be the right choice for
> production use cases, we will move to use CeleryExecutor.
>
> Thanks and regards,
> Maulik
>
> On Fri, Dec 13, 2019 at 12:10 PM Maxime Beauchemin <
> maximebeauche...@gmail.com> wrote:
>
> > Friend don't let friends use LocalExecutor in production.
> >
> > LocalExecutor is essentially a subprocess pool running in-process. When I
> > wrote it originally I never thought it would ever be used in production.
> > Celery / CeleryExecutor is more reasonable as Celery is a proper
> > process/thread pool that's configurable and does what you want it to do.
> > It's clearly lacking orchestration features like containerization and
> > resource containment, but it's a decent solution if you don't have a k8s
> > cluster laying around.
> >
> > Max
> >
> > On Thu, Dec 12, 2019 at 10:27 PM Maulik Soneji <maulik.son...@gojek.com>
> > wrote:
> >
> >> Hello,
> >>
> >> I realize that the scheduler is waiting for the tasks to be completed
> >> before shutting down.
> >>
> >> The problem is that the scheduler stops sending heartbeat and just waits
> >> for the task queue to be joined.
> >>
> >> Is there a way where we can horizontally scale the number of instances
> for
> >> scheduler, so that if one scheduler instance is waiting for the task
> >> queue to complete, the other scheduler instance is running the jobs and
> >> sending heartbeat messages?
> >>
> >> Please share some context on how we can better scale scheduler to handle
> >> cases of failure.
> >>
> >> Thanks and regards,
> >> Maulik
> >>
> >>
> >>
> >> W
> >>
> >>
> >> On Fri, Dec 13, 2019 at 9:18 AM Maulik Soneji <maulik.son...@gojek.com>
> >> wrote:
> >>
> >> > Hello all,
> >> >
> >> > *TLDR*: We are using local executor with KubernetesPodOperator for our
> >> > airflow dags.
> >> > From stack trace of scheduler we see that it is waiting on queue to
> >> join.
> >> >
> >> > File:
> >>
> "/usr/local/lib/python3.7/site-packages/airflow/executors/local_executor.py",
> >> line 212, in end
> >> >     self.queue.join()
> >> >
> >> > Airflow version: 1.10.6
> >> >
> >> > Usecase: We are using airflow for ETL pipeline to transform data from
> >> one
> >> > bigquery table to another bigquery table.
> >> >
> >> > We have noticed that the scheduler frequently hangs without any
> detailed
> >> > logs.
> >> >
> >> > Last lines are below:
> >> >
> >> > [2019-12-12 13:28:53,370] {settings.py:277} DEBUG - Disposing DB
> >> connection pool (PID 912099)
> >> > [2019-12-12 13:28:53,860] {dag_processing.py:693} INFO - Sending
> >> termination message to manager.
> >> > [2019-12-12 13:28:53,860] {scheduler_job.py:1492} INFO - Deactivating
> >> DAGs that haven't been touched since 2019-12-12T01:28:52.682686+00:00
> >> >
> >> > On checking the thread dumps of the running process we found that the
> >> > scheduler is stuck waiting to join the tasks in queue.
> >> >
> >> > File:
> >>
> "/usr/local/lib/python3.7/site-packages/airflow/executors/local_executor.py",
> >> line 212, in end
> >> >     self.queue.join()
> >> >
> >> > Detailed thread dump is below for all processes running in the Local
> >> > scheduler can be found here: https://pastebin.com/i6ChafWH
> >> >
> >> > Please help in checking why the scheduler is getting stuck waiting for
> >> the
> >> > queue to join.
> >> >
> >> > Thanks and Regards,
> >> > Maulik
> >> >
> >>
> >
>

Re: Scheduler Stuck when using LocalExecutor and KubernetesPodOperator

Reply via email to