Curious to know why you want to run k8spodoperator while you have k8s executor in prod? Could it solve for a different use case that cannot solve by k8s executor?
On Thu, Dec 12, 2019 at 11:13 PM Maulik Soneji <maulik.son...@gojek.com> wrote: > Hi Maxime, > > We have been using KubernetesExecutor for our ETL use cases. > > This is a separate instance of airflow where we have to run an image using > KubernetesPodOperator, having KubernetesExecutor along with > KubernetesPodOperator would create too many pods and we experienced > scheduling delays while running the task with this approach. That is the > reason why we shifted to using LocalExecutor along with > KubernetesPodOperator. > > I realize that using LocalExecutor would not be the right choice for > production use cases, we will move to use CeleryExecutor. > > Thanks and regards, > Maulik > > On Fri, Dec 13, 2019 at 12:10 PM Maxime Beauchemin < > maximebeauche...@gmail.com> wrote: > > > Friend don't let friends use LocalExecutor in production. > > > > LocalExecutor is essentially a subprocess pool running in-process. When I > > wrote it originally I never thought it would ever be used in production. > > Celery / CeleryExecutor is more reasonable as Celery is a proper > > process/thread pool that's configurable and does what you want it to do. > > It's clearly lacking orchestration features like containerization and > > resource containment, but it's a decent solution if you don't have a k8s > > cluster laying around. > > > > Max > > > > On Thu, Dec 12, 2019 at 10:27 PM Maulik Soneji <maulik.son...@gojek.com> > > wrote: > > > >> Hello, > >> > >> I realize that the scheduler is waiting for the tasks to be completed > >> before shutting down. > >> > >> The problem is that the scheduler stops sending heartbeat and just waits > >> for the task queue to be joined. > >> > >> Is there a way where we can horizontally scale the number of instances > for > >> scheduler, so that if one scheduler instance is waiting for the task > >> queue to complete, the other scheduler instance is running the jobs and > >> sending heartbeat messages? > >> > >> Please share some context on how we can better scale scheduler to handle > >> cases of failure. > >> > >> Thanks and regards, > >> Maulik > >> > >> > >> > >> W > >> > >> > >> On Fri, Dec 13, 2019 at 9:18 AM Maulik Soneji <maulik.son...@gojek.com> > >> wrote: > >> > >> > Hello all, > >> > > >> > *TLDR*: We are using local executor with KubernetesPodOperator for our > >> > airflow dags. > >> > From stack trace of scheduler we see that it is waiting on queue to > >> join. > >> > > >> > File: > >> > "/usr/local/lib/python3.7/site-packages/airflow/executors/local_executor.py", > >> line 212, in end > >> > self.queue.join() > >> > > >> > Airflow version: 1.10.6 > >> > > >> > Usecase: We are using airflow for ETL pipeline to transform data from > >> one > >> > bigquery table to another bigquery table. > >> > > >> > We have noticed that the scheduler frequently hangs without any > detailed > >> > logs. > >> > > >> > Last lines are below: > >> > > >> > [2019-12-12 13:28:53,370] {settings.py:277} DEBUG - Disposing DB > >> connection pool (PID 912099) > >> > [2019-12-12 13:28:53,860] {dag_processing.py:693} INFO - Sending > >> termination message to manager. > >> > [2019-12-12 13:28:53,860] {scheduler_job.py:1492} INFO - Deactivating > >> DAGs that haven't been touched since 2019-12-12T01:28:52.682686+00:00 > >> > > >> > On checking the thread dumps of the running process we found that the > >> > scheduler is stuck waiting to join the tasks in queue. > >> > > >> > File: > >> > "/usr/local/lib/python3.7/site-packages/airflow/executors/local_executor.py", > >> line 212, in end > >> > self.queue.join() > >> > > >> > Detailed thread dump is below for all processes running in the Local > >> > scheduler can be found here: https://pastebin.com/i6ChafWH > >> > > >> > Please help in checking why the scheduler is getting stuck waiting for > >> the > >> > queue to join. > >> > > >> > Thanks and Regards, > >> > Maulik > >> > > >> > > >