Re: Scheduler Stuck when using LocalExecutor and KubernetesPodOperator

Maulik Soneji Thu, 12 Dec 2019 23:14:13 -0800

Hi Maxime,

We have been using KubernetesExecutor for our ETL use cases.


This is a separate instance of airflow where we have to run an image using
KubernetesPodOperator, having KubernetesExecutor along with
KubernetesPodOperator would create too many pods and we experienced
scheduling delays while running the task with this approach. That is the
reason why we shifted to using LocalExecutor along with
KubernetesPodOperator.

I realize that using LocalExecutor would not be the right choice for
production use cases, we will move to use CeleryExecutor.

Thanks and regards,
Maulik

On Fri, Dec 13, 2019 at 12:10 PM Maxime Beauchemin <
maximebeauche...@gmail.com> wrote:

> Friend don't let friends use LocalExecutor in production.
>
> LocalExecutor is essentially a subprocess pool running in-process. When I
> wrote it originally I never thought it would ever be used in production.
> Celery / CeleryExecutor is more reasonable as Celery is a proper
> process/thread pool that's configurable and does what you want it to do.
> It's clearly lacking orchestration features like containerization and
> resource containment, but it's a decent solution if you don't have a k8s
> cluster laying around.
>
> Max
>
> On Thu, Dec 12, 2019 at 10:27 PM Maulik Soneji <maulik.son...@gojek.com>
> wrote:
>
>> Hello,
>>
>> I realize that the scheduler is waiting for the tasks to be completed
>> before shutting down.
>>
>> The problem is that the scheduler stops sending heartbeat and just waits
>> for the task queue to be joined.
>>
>> Is there a way where we can horizontally scale the number of instances for
>> scheduler, so that if one scheduler instance is waiting for the task
>> queue to complete, the other scheduler instance is running the jobs and
>> sending heartbeat messages?
>>
>> Please share some context on how we can better scale scheduler to handle
>> cases of failure.
>>
>> Thanks and regards,
>> Maulik
>>
>>
>>
>> W
>>
>>
>> On Fri, Dec 13, 2019 at 9:18 AM Maulik Soneji <maulik.son...@gojek.com>
>> wrote:
>>
>> > Hello all,
>> >
>> > *TLDR*: We are using local executor with KubernetesPodOperator for our
>> > airflow dags.
>> > From stack trace of scheduler we see that it is waiting on queue to
>> join.
>> >
>> > File:
>> "/usr/local/lib/python3.7/site-packages/airflow/executors/local_executor.py",
>> line 212, in end
>> >     self.queue.join()
>> >
>> > Airflow version: 1.10.6
>> >
>> > Usecase: We are using airflow for ETL pipeline to transform data from
>> one
>> > bigquery table to another bigquery table.
>> >
>> > We have noticed that the scheduler frequently hangs without any detailed
>> > logs.
>> >
>> > Last lines are below:
>> >
>> > [2019-12-12 13:28:53,370] {settings.py:277} DEBUG - Disposing DB
>> connection pool (PID 912099)
>> > [2019-12-12 13:28:53,860] {dag_processing.py:693} INFO - Sending
>> termination message to manager.
>> > [2019-12-12 13:28:53,860] {scheduler_job.py:1492} INFO - Deactivating
>> DAGs that haven't been touched since 2019-12-12T01:28:52.682686+00:00
>> >
>> > On checking the thread dumps of the running process we found that the
>> > scheduler is stuck waiting to join the tasks in queue.
>> >
>> > File:
>> "/usr/local/lib/python3.7/site-packages/airflow/executors/local_executor.py",
>> line 212, in end
>> >     self.queue.join()
>> >
>> > Detailed thread dump is below for all processes running in the Local
>> > scheduler can be found here: https://pastebin.com/i6ChafWH
>> >
>> > Please help in checking why the scheduler is getting stuck waiting for
>> the
>> > queue to join.
>> >
>> > Thanks and Regards,
>> > Maulik
>> >
>>
>

Re: Scheduler Stuck when using LocalExecutor and KubernetesPodOperator

Reply via email to