Re: Benchmarking of Airflow Scheduler with Celery Executor
On 2018/04/13 17:00:36, Maxime Beaucheminwrote: > If you're concerned about scheduler scalability I'd go with a bigger box. > The scheduler uses multiprocessing so more CPU power means more throughput. > > Also you may want to provision a beefy MySQL box to make sure that doesn't > become the bottleneck. 10k tasks heartbeating to the DB every 30 seconds is > significant load. > > Perhaps Airbnb folks chime in about their scale and hardware setup? > > Max > > On Fri, Apr 13, 2018 at 9:14 AM, ramandu...@gmail.com > wrote: > > > Thanks Ry, > > Just wondering if there is any approximate number on concurrent tasks a > > scheduler can run on say 16 GB RAM and 8 core machine. > > If its already been done that would be useful. > > We did some benchmarking with local executor and observed that each > > TaskInstance was taking ~100MB of memory so we could only run ~130 > > concurrent tasks on 16 GB RAM and 8 core machine. > > > > -Raman Gupta > > > > > > > > On 2018/04/12 16:32:37, Ry Walker wrote: > > > Hi Raman - > > > > > > First, we’d be happy to help you test this out with Airflow. Or you could > > > do it yourself by using http://open.astronomer.io/airflow/ (w/ Docker > > > Engine + Docker Compose) to quickly spin up a test environment. > > Everything > > > is hooked to Prometheus/Grafana to monitor how the system reacts to your > > > workload. > > > > > > -Ry > > > CEO, Astronomer > > > > > > On April 12, 2018 at 12:23:46 PM, ramandu...@gmail.com ( > > ramandu...@gmail.com) > > > wrote: > > > > > > Hi All, > > > We have requirement to run 10k(s) of concurrent tasks. We are exploring > > > Airflow's Celery Executor for same. Horizontally Scaling of worker nodes > > > seem possible but it can only have one active scheduler. > > > So will Airflow scheduler be able to handle these many concurrent tasks. > > > Is there any benchmarking number around airflow scheduler's scalability. > > > Thanks, > > > Raman > > > > > > With an AWS EC2 i2.8xlarge box, we run ~14k tasks at peek. Though the > scheduling delay also spikes to ~30 mins when we are at peek load. Here's > some scheduler config we have: JOB_HEARTBEAT_SEC = 60 MAX_THREADS = 64 MAX_TIS_PER_QUERY = 512 Also we have the biggest Amazon RDS mysql instance. Cheers, Kevin Yang
Re: Benchmarking of Airflow Scheduler with Celery Executor
If you're concerned about scheduler scalability I'd go with a bigger box. The scheduler uses multiprocessing so more CPU power means more throughput. Also you may want to provision a beefy MySQL box to make sure that doesn't become the bottleneck. 10k tasks heartbeating to the DB every 30 seconds is significant load. Perhaps Airbnb folks chime in about their scale and hardware setup? Max On Fri, Apr 13, 2018 at 9:14 AM, ramandu...@gmail.comwrote: > Thanks Ry, > Just wondering if there is any approximate number on concurrent tasks a > scheduler can run on say 16 GB RAM and 8 core machine. > If its already been done that would be useful. > We did some benchmarking with local executor and observed that each > TaskInstance was taking ~100MB of memory so we could only run ~130 > concurrent tasks on 16 GB RAM and 8 core machine. > > -Raman Gupta > > > > On 2018/04/12 16:32:37, Ry Walker wrote: > > Hi Raman - > > > > First, we’d be happy to help you test this out with Airflow. Or you could > > do it yourself by using http://open.astronomer.io/airflow/ (w/ Docker > > Engine + Docker Compose) to quickly spin up a test environment. > Everything > > is hooked to Prometheus/Grafana to monitor how the system reacts to your > > workload. > > > > -Ry > > CEO, Astronomer > > > > On April 12, 2018 at 12:23:46 PM, ramandu...@gmail.com ( > ramandu...@gmail.com) > > wrote: > > > > Hi All, > > We have requirement to run 10k(s) of concurrent tasks. We are exploring > > Airflow's Celery Executor for same. Horizontally Scaling of worker nodes > > seem possible but it can only have one active scheduler. > > So will Airflow scheduler be able to handle these many concurrent tasks. > > Is there any benchmarking number around airflow scheduler's scalability. > > Thanks, > > Raman > > >
Re: Benchmarking of Airflow Scheduler with Celery Executor
Thanks Ry, Just wondering if there is any approximate number on concurrent tasks a scheduler can run on say 16 GB RAM and 8 core machine. If its already been done that would be useful. We did some benchmarking with local executor and observed that each TaskInstance was taking ~100MB of memory so we could only run ~130 concurrent tasks on 16 GB RAM and 8 core machine. -Raman Gupta On 2018/04/12 16:32:37, Ry Walkerwrote: > Hi Raman - > > First, we’d be happy to help you test this out with Airflow. Or you could > do it yourself by using http://open.astronomer.io/airflow/ (w/ Docker > Engine + Docker Compose) to quickly spin up a test environment. Everything > is hooked to Prometheus/Grafana to monitor how the system reacts to your > workload. > > -Ry > CEO, Astronomer > > On April 12, 2018 at 12:23:46 PM, ramandu...@gmail.com (ramandu...@gmail.com) > wrote: > > Hi All, > We have requirement to run 10k(s) of concurrent tasks. We are exploring > Airflow's Celery Executor for same. Horizontally Scaling of worker nodes > seem possible but it can only have one active scheduler. > So will Airflow scheduler be able to handle these many concurrent tasks. > Is there any benchmarking number around airflow scheduler's scalability. > Thanks, > Raman >
Re: Benchmarking of Airflow Scheduler with Celery Executor
Hi Raman - First, we’d be happy to help you test this out with Airflow. Or you could do it yourself by using http://open.astronomer.io/airflow/ (w/ Docker Engine + Docker Compose) to quickly spin up a test environment. Everything is hooked to Prometheus/Grafana to monitor how the system reacts to your workload. -Ry CEO, Astronomer On April 12, 2018 at 12:23:46 PM, ramandu...@gmail.com (ramandu...@gmail.com) wrote: Hi All, We have requirement to run 10k(s) of concurrent tasks. We are exploring Airflow's Celery Executor for same. Horizontally Scaling of worker nodes seem possible but it can only have one active scheduler. So will Airflow scheduler be able to handle these many concurrent tasks. Is there any benchmarking number around airflow scheduler's scalability. Thanks, Raman