Re: Benchmarking of Airflow Scheduler with Celery Executor

2018-04-18 Thread yrqls21


On 2018/04/13 17:00:36, Maxime Beauchemin  wrote: 
> If you're concerned about scheduler scalability I'd go with a bigger box.
> The scheduler uses multiprocessing so more CPU power means more throughput.
> 
> Also you may want to provision a beefy MySQL box to make sure that doesn't
> become the bottleneck. 10k tasks heartbeating to the DB every 30 seconds is
> significant load.
> 
> Perhaps Airbnb folks chime in about their scale and hardware setup?
> 
> Max
> 
> On Fri, Apr 13, 2018 at 9:14 AM, ramandu...@gmail.com 
> wrote:
> 
> > Thanks Ry,
> > Just wondering if there is any approximate number on concurrent tasks a
> > scheduler can run on say 16 GB RAM and 8 core machine.
> > If its already been done that would be useful.
> > We did some benchmarking with local executor and observed that each
> > TaskInstance was taking ~100MB of memory so we could only run ~130
> > concurrent tasks on 16 GB RAM and 8 core machine.
> >
> > -Raman Gupta
> >
> >
> >
> > On 2018/04/12 16:32:37, Ry Walker  wrote:
> > > Hi Raman -
> > >
> > > First, we’d be happy to help you test this out with Airflow. Or you could
> > > do it yourself by using http://open.astronomer.io/airflow/ (w/ Docker
> > > Engine + Docker Compose) to quickly spin up a test environment.
> > Everything
> > > is hooked to Prometheus/Grafana to monitor how the system reacts to your
> > > workload.
> > >
> > > -Ry
> > > CEO, Astronomer
> > >
> > > On April 12, 2018 at 12:23:46 PM, ramandu...@gmail.com (
> > ramandu...@gmail.com)
> > > wrote:
> > >
> > > Hi All,
> > > We have requirement to run 10k(s) of concurrent tasks. We are exploring
> > > Airflow's Celery Executor for same. Horizontally Scaling of worker nodes
> > > seem possible but it can only have one active scheduler.
> > > So will Airflow scheduler be able to handle these many concurrent tasks.
> > > Is there any benchmarking number around airflow scheduler's scalability.
> > > Thanks,
> > > Raman
> > >
> >
> With an AWS EC2 i2.8xlarge box, we run ~14k tasks at peek. Though the 
> scheduling delay also spikes to ~30 mins when we are at peek load. Here's 
> some scheduler config we have:

JOB_HEARTBEAT_SEC = 60
MAX_THREADS = 64
MAX_TIS_PER_QUERY = 512

Also we have the biggest Amazon RDS mysql instance.

Cheers,
Kevin Yang


Re: Benchmarking of Airflow Scheduler with Celery Executor

2018-04-13 Thread Maxime Beauchemin
If you're concerned about scheduler scalability I'd go with a bigger box.
The scheduler uses multiprocessing so more CPU power means more throughput.

Also you may want to provision a beefy MySQL box to make sure that doesn't
become the bottleneck. 10k tasks heartbeating to the DB every 30 seconds is
significant load.

Perhaps Airbnb folks chime in about their scale and hardware setup?

Max

On Fri, Apr 13, 2018 at 9:14 AM, ramandu...@gmail.com 
wrote:

> Thanks Ry,
> Just wondering if there is any approximate number on concurrent tasks a
> scheduler can run on say 16 GB RAM and 8 core machine.
> If its already been done that would be useful.
> We did some benchmarking with local executor and observed that each
> TaskInstance was taking ~100MB of memory so we could only run ~130
> concurrent tasks on 16 GB RAM and 8 core machine.
>
> -Raman Gupta
>
>
>
> On 2018/04/12 16:32:37, Ry Walker  wrote:
> > Hi Raman -
> >
> > First, we’d be happy to help you test this out with Airflow. Or you could
> > do it yourself by using http://open.astronomer.io/airflow/ (w/ Docker
> > Engine + Docker Compose) to quickly spin up a test environment.
> Everything
> > is hooked to Prometheus/Grafana to monitor how the system reacts to your
> > workload.
> >
> > -Ry
> > CEO, Astronomer
> >
> > On April 12, 2018 at 12:23:46 PM, ramandu...@gmail.com (
> ramandu...@gmail.com)
> > wrote:
> >
> > Hi All,
> > We have requirement to run 10k(s) of concurrent tasks. We are exploring
> > Airflow's Celery Executor for same. Horizontally Scaling of worker nodes
> > seem possible but it can only have one active scheduler.
> > So will Airflow scheduler be able to handle these many concurrent tasks.
> > Is there any benchmarking number around airflow scheduler's scalability.
> > Thanks,
> > Raman
> >
>


Re: Benchmarking of Airflow Scheduler with Celery Executor

2018-04-13 Thread ramandumcs
Thanks Ry,
Just wondering if there is any approximate number on concurrent tasks a 
scheduler can run on say 16 GB RAM and 8 core machine.
If its already been done that would be useful.
We did some benchmarking with local executor and observed that each 
TaskInstance was taking ~100MB of memory so we could only run ~130 concurrent 
tasks on 16 GB RAM and 8 core machine.

-Raman Gupta  

 

On 2018/04/12 16:32:37, Ry Walker  wrote: 
> Hi Raman -
> 
> First, we’d be happy to help you test this out with Airflow. Or you could
> do it yourself by using http://open.astronomer.io/airflow/ (w/ Docker
> Engine + Docker Compose) to quickly spin up a test environment. Everything
> is hooked to Prometheus/Grafana to monitor how the system reacts to your
> workload.
> 
> -Ry
> CEO, Astronomer
> 
> On April 12, 2018 at 12:23:46 PM, ramandu...@gmail.com (ramandu...@gmail.com)
> wrote:
> 
> Hi All,
> We have requirement to run 10k(s) of concurrent tasks. We are exploring
> Airflow's Celery Executor for same. Horizontally Scaling of worker nodes
> seem possible but it can only have one active scheduler.
> So will Airflow scheduler be able to handle these many concurrent tasks.
> Is there any benchmarking number around airflow scheduler's scalability.
> Thanks,
> Raman
> 


Re: Benchmarking of Airflow Scheduler with Celery Executor

2018-04-12 Thread Ry Walker
Hi Raman -

First, we’d be happy to help you test this out with Airflow. Or you could
do it yourself by using http://open.astronomer.io/airflow/ (w/ Docker
Engine + Docker Compose) to quickly spin up a test environment. Everything
is hooked to Prometheus/Grafana to monitor how the system reacts to your
workload.

-Ry
CEO, Astronomer

On April 12, 2018 at 12:23:46 PM, ramandu...@gmail.com (ramandu...@gmail.com)
wrote:

Hi All,
We have requirement to run 10k(s) of concurrent tasks. We are exploring
Airflow's Celery Executor for same. Horizontally Scaling of worker nodes
seem possible but it can only have one active scheduler.
So will Airflow scheduler be able to handle these many concurrent tasks.
Is there any benchmarking number around airflow scheduler's scalability.
Thanks,
Raman