The reason we didn't do it this way is that we would have needed to copy all of the logic in the k8sPodOperator into the executor. This would've added a lot more code/complexity to the executor.
While it does look silly from an external POV, it make sense from a seperations-of-concern POV. The executor shouldn't care about what it's executing. It passes the task off to a worker and the worker does the task it's been given. On Wed, Apr 17, 2019 at 2:50 AM Emmanuel Brard < emmanuel.br...@getyourguide.com> wrote: > Hey, > > That's an interesting example, you use the KubernetesExecutor along with > the PodOperator, so basically a pod will be started and will poke the > execution of another pod... it looks silly to me, monitor pod from the > PodOperator should be done by the scheduler directly when using the > KubernetesExecutor. > > Don't you think ? > > E > > On Tue, Apr 16, 2019 at 10:13 PM Kamil Gałuszka <ka...@flyrlabs.com> > wrote: > > > Hi Daniel, > > > > we will definitely try this out. We are still on 1.10.2, so we will do > the > > upgrade and see how it goes. > > > > Thanks > > Kamil > > > > On Tue, Apr 16, 2019 at 9:17 PM Daniel Imberman < > > dimberman.opensou...@gmail.com> wrote: > > > > > Hi Kamil, > > > > > > So if it's airflow that's the issue then the PR I posted should have > > solved > > > it. Have you upgraded to 1.10.3 and added the > > > worker_pods_creation_batch_size variable to your airflow.cfg? This > should > > > allow multiple pods to be launched in parallel. > > > > > > Also unfortunately the screenshot appears to be broken. Could you > please > > > upload it as an imgur link? > > > > > > On Tue, Apr 16, 2019 at 10:50 AM Kamil Gałuszka <ka...@flyrlabs.com> > > > wrote: > > > > > > > Hi Daniel, > > > > > > > > It's airflow. > > > > > > > > This is DAG that we could show. Of course, we can change this to > > > > KubernetesPodOperator and this get's even worse. > > > > > > > > ``` > > > > from airflow import DAG > > > > from datetime import datetime, timedelta > > > > from airflow.operators.dummy_operator import DummyOperator > > > > from airflow.operators.bash_operator import BashOperator > > > > > > > > default_args = { > > > > 'owner': 'airflow', > > > > 'depends_on_past': False, > > > > 'start_date': datetime(2019, 4, 10, 23), > > > > 'email': ['airf...@example.com'], > > > > 'email_on_failure': False, > > > > 'email_on_retry': False, > > > > 'retries': 5, > > > > 'retry_delay': timedelta(minutes=2) > > > > } > > > > > > > > SLEEP_DURATION = 300 > > > > > > > > dag = DAG( > > > > 'wide_bash_test_100_300', default_args=default_args, > > > > schedule_interval=timedelta(minutes=60), max_active_runs=1, > > > concurrency=100) > > > > > > > > start = DummyOperator(task_id='dummy_start_point', dag=dag) > > > > end = DummyOperator(task_id='dummy_end_point', dag=dag) > > > > > > > > for i in range(300): > > > > > > > > sleeper_agent = BashOperator(task_id=f'sleep_{i}', > > > > bash_command=f'sleep > > {SLEEP_DURATION}', > > > > dag=dag) > > > > > > > > start >> sleeper_agent >> end > > > > ``` > > > > > > > > And here are some stuff that happens after, some tasks failed, some > are > > > > retrying, and definietly we don't have 100 concurrency on it. We have > > > > autoscaling of nodes in GKE, so every pod after some time should move > > > from > > > > Pending to Running. With 300 concurrency this gets little worse. > > > > > > > > [image: Screen Shot 2019-04-16 at 10.30.18 AM (1).png] > > > > > > > > Thanks > > > > Kamil > > > > > > > > On Tue, Apr 16, 2019 at 6:40 PM Daniel Imberman < > > > > dimberman.opensou...@gmail.com> wrote: > > > > > > > >> Hi Kamil, > > > >> > > > >> Could you explain your use-case a little further? Is it that your > k8s > > > >> cluster runs into issues launching 250 tasks at the same time or > that > > > >> airflow runs into issues launching 250 tasks at the same time? I'd > > love > > > to > > > >> know more so I could try to address it in a future airflow release. > > > >> > > > >> Thanks! > > > >> > > > >> Daniel > > > >> > > > >> On Tue, Apr 16, 2019 at 3:32 AM Kamil Gałuszka <ka...@flyrlabs.com> > > > >> wrote: > > > >> > > > >> > Hey, > > > >> > > > > >> > We are quite interested in that Executor too but my main concern > > isn't > > > >> it a > > > >> > > waste of resource to start a whole pod to run thing like > > > DummyOperator > > > >> > for > > > >> > > example ? We have a cap of 200 tasks at any given time and we > > > >> regularly > > > >> > hit > > > >> > > this cap, we cope with that with 20 celery workers but with the > > > >> > > KubernetesExecutor that would mean 200 pods, does it really > scale > > > that > > > >> > > easily ? > > > >> > > > > > >> > > > > >> > Unfortunately no. > > > >> > > > > >> > We are now having problem of having a DAG with 300 tasks in DAG > that > > > >> should > > > >> > start parallel at once, and there is only about 140 task instances > > > >> started. > > > >> > Setting parallelism to 256 didn't help and system struggles to get > > the > > > >> > numbers up that high for running tasks. > > > >> > > > > >> > The biggest problem that we have now, is to find bottleneck in > > > >> scheduler, > > > >> > but it's taking time to debug it. > > > >> > > > > >> > We will definitely be investigating that further and share > findings > > > but > > > >> as > > > >> > for now, I wouldn't say it's "non-problematic" as some other > people > > > >> stated. > > > >> > > > > >> > Thanks > > > >> > Kamil > > > >> > > > > >> > > > > > > > > > > > -- > > > > > > > > > GetYourGuide AG > > Stampfenbachstrasse 48 > > 8006 Zürich > > Switzerland > > > > <https://www.facebook.com/GetYourGuide> > <https://twitter.com/GetYourGuide> > <https://www.instagram.com/getyourguide/> > <https://www.linkedin.com/company/getyourguide-ag> > <http://www.getyourguide.com> > > > > > > > >