On 2018/04/13 17:00:36, Maxime Beauchemin <maximebeauche...@gmail.com> wrote: > If you're concerned about scheduler scalability I'd go with a bigger box. > The scheduler uses multiprocessing so more CPU power means more throughput. > > Also you may want to provision a beefy MySQL box to make sure that doesn't > become the bottleneck. 10k tasks heartbeating to the DB every 30 seconds is > significant load. > > Perhaps Airbnb folks chime in about their scale and hardware setup? > > Max > > On Fri, Apr 13, 2018 at 9:14 AM, ramandu...@gmail.com <ramandu...@gmail.com> > wrote: > > > Thanks Ry, > > Just wondering if there is any approximate number on concurrent tasks a > > scheduler can run on say 16 GB RAM and 8 core machine. > > If its already been done that would be useful. > > We did some benchmarking with local executor and observed that each > > TaskInstance was taking ~100MB of memory so we could only run ~130 > > concurrent tasks on 16 GB RAM and 8 core machine. > > > > -Raman Gupta > > > > > > > > On 2018/04/12 16:32:37, Ry Walker <r...@astronomer.io> wrote: > > > Hi Raman - > > > > > > First, we’d be happy to help you test this out with Airflow. Or you could > > > do it yourself by using http://open.astronomer.io/airflow/ (w/ Docker > > > Engine + Docker Compose) to quickly spin up a test environment. > > Everything > > > is hooked to Prometheus/Grafana to monitor how the system reacts to your > > > workload. > > > > > > -Ry > > > CEO, Astronomer > > > > > > On April 12, 2018 at 12:23:46 PM, ramandu...@gmail.com ( > > ramandu...@gmail.com) > > > wrote: > > > > > > Hi All, > > > We have requirement to run 10k(s) of concurrent tasks. We are exploring > > > Airflow's Celery Executor for same. Horizontally Scaling of worker nodes > > > seem possible but it can only have one active scheduler. > > > So will Airflow scheduler be able to handle these many concurrent tasks. > > > Is there any benchmarking number around airflow scheduler's scalability. > > > Thanks, > > > Raman > > > > > > With an AWS EC2 i2.8xlarge box, we run ~14k tasks at peek. Though the > scheduling delay also spikes to ~30 mins when we are at peek load. Here's > some scheduler config we have:
JOB_HEARTBEAT_SEC = 60 MAX_THREADS = 64 MAX_TIS_PER_QUERY = 512 Also we have the biggest Amazon RDS mysql instance. Cheers, Kevin Yang