Not sure about 1.9 but parallelism seems to be supported on master <https://github.com/apache/incubator-airflow/blob/272952a9dce932cb2c648f82c9f9f2cafd572ff1/airflow/executors/base_executor.py#L113>. We are using 1.8 with some bug fixing cherry-picks. The machine is just out of the box AWS EC2 instances. We've been using I3 for scheduler and R3 for worker, but I urge you to checkout the new generations which are more powerful and cheaper. As always, you may pick the best series by profile your machine usage( on I/O, ram, cpu, etc). I don't think we've tuned too much on the default Airflow settings and the best setting for you guys should be different that the one best for us( that being said, I can provide some more details when I'm back to the office if you are curious on some particular settings).
Cheers, Kevin Y On Thu, Jun 7, 2018 at 9:02 PM ramandu...@gmail.com <ramandu...@gmail.com> wrote: > We have similar use case where we need to support multiple teams and > expected load is 1000(s) active TIs. We are exploring setting up multiple > airflow cluster on for each team and scale that cluster horizontally > through celery executor. > @Ruiquin could you please share some details on airflow setup like > Airflow Version, Machine configuration, Airflow cfg settings etc.. > How can we configure infinity(0) for cluster-wide setting. (We are using > airflow v1.9 and it seems that > airflow cfg's parallelism = 0 is not supported in v1.9) > > On 2018/06/07 22:27:20, Ruiqin Yang <yrql...@gmail.com> wrote: > > Here to provide a datapoint from Airbnb--all users share the same cluster > > (~8k active DAGs and ~15k running tasks at peak). > > > > For the cluster-wide concurrency setting, we put infinity( 0) there and > > scale up on the # of workers if we need more worker slot. > > > > For the scheduler & Airflow UI coupling, I believe Airflow UI is not > > coupled with the scheduler. Actually in Airbnb we couple airflow worker > and > > airflow webserver together on the same EC2 instance--but you can always > > have a set of instances only hosting webservers. > > > > If you have some critical users that don't want their DAG affected by > > changes from other users( adhoc new DAGs/tasks), you can probably set up > > dedicated celery queue( assuming you are using celery executor, local > > executor is in theory not for production) for the user, or, you can > enforce > > DAG level concurrency( maybe a CI or through policy > > < > https://github.com/apache/incubator-airflow/blob/master/airflow/settings.py#L109 > >--which > > I'm not sure is a good practice since it is more for task level > attributes). > > > > With the awesome RBAC change in place, I think it make sense to share the > > same cluster, easier maintenance, less user confusion, etc. > > > > Cheers, > > Kevin Y > > > > On Thu, Jun 7, 2018 at 1:59 PM Ananth Durai <vanant...@gmail.com> wrote: > > > > > At Slack, We follow a similar pattern of deploying multiple airflow > > > instances. Since the Airflow UI & the scheduler coupled, it introduces > > > friction as the user need to know underlying deployment strategy. (like > > > which Airflow URL I should visit to see my DAGs, multiple teams > > > collaborating on the same DAG, pipeline operations, etc.) > > > > > > In one of the forum question, max mentioned renaming the scheduler to > > > supervisor as the scheduler do more than just scheduling. > > > It would be super cool if we can make multiple supervisors share the > same > > > airflow metadata storage and the Airflow UI. (maybe introducing a > unique > > > config param `supervisor.id` for each instance) > > > > > > The approach will help us to scale Airflow scheduler horizontally and > while > > > keeping the simplicity from the user perspective. > > > > > > > > > Regards, > > > Ananth.P, > > > > > > > > > > > > > > > > > > > > > On 7 June 2018 at 04:08, Arturo Michel <arturo.mic...@starlizard.com> > > > wrote: > > > > > > > We have had up to 50 dags with multiple tasks each. Many of them run > in > > > > parallel, we've had some issues with compute as it was meant to be a > > > > temporary deployment but somehow it's now the permanent production > one > > > and > > > > resources are not great. > > > > Oranisationally it is very similar to what Gerard described. More > than > > > one > > > > group working with different engineering practices and standards, > this is > > > > probably one of the sources of problems. > > > > > > > > -----Original Message----- > > > > From: Gerard Toonstra <gtoons...@gmail.com> > > > > Sent: Wednesday, June 6, 2018 5:02 PM > > > > To: dev@airflow.incubator.apache.org > > > > Subject: Re: Single Airflow Instance Vs Multiple Airflow Instance > > > > > > > > We are using two cluster instances. One cluster is for the > engineering > > > > teams that are in the "tech" wing and which rigorously follow tech > > > > principles, the other instance is for use by business analysts and > more > > > > ad-hoc, experimental work, who do not necessarily follow the > principles. > > > We > > > > have a nomad engineer helping out the ad-hoc cluster, setting it up, > > > > connecting it to all systems and resolving programming questions. All > > > > clusters are fully puppetized, so we reuse configs and ways how > things > > > are > > > > configured, plus have a common "platform code" package that is reused > > > > across both clusters. > > > > > > > > G> > > > > > > > > > > > > On Wed, Jun 6, 2018 at 5:50 PM, James Meickle < > jmeic...@quantopian.com> > > > > wrote: > > > > > > > > > An important consideration here is that there are several settings > > > > > that are cluster-wide. In particular, cluster-wide concurrency > > > > > settings could result in Team B's DAG refusing to schedule based > on an > > > > error in Team A's DAG. > > > > > > > > > > Do your teams follow similar practices in how eagerly they ship > code, > > > > > or have similar SLAs for resolving issues? If so, you are probably > > > > > fine using co-tenancy. If not, you should probably talk about it > first > > > > > to make sure the teams are okay with co-tenancy. > > > > > > > > > > On Wed, Jun 6, 2018 at 11:24 AM, gauthiermarti...@gmail.com < > > > > > gauthiermarti...@gmail.com> wrote: > > > > > > > > > > > Hi Everyone, > > > > > > > > > > > > We have been experimenting with airflow for about 6 months now. > > > > > > We are planning to have multiple departments to use it. Since we > > > > > > don't have any internal experience with Airflow we are wondering > if > > > > > > single instance per department is more suited than single > instance > > > > > > with multi-tenancy? We have been aware about the upcoming > release of > > > > > > airflow > > > > > > 1.10 and changes that will be made to the RBAC which will be more > > > > > > suited for multi-tenancy. > > > > > > > > > > > > Any advice on this ? Any tips could be helpful to us. > > > > > > > > > > > > > > > > > > > This e-mail message and any attachments are confidential and are for > the > > > > exclusive use of the addressee only. If you are not the intended > > > > recipient, you should not use the content, place any reliance on it > or > > > > disclose it to anyone else. Please notify the sender immediately by > > > > replying to it and then ensure that it is deleted from your system > > > > (including any attachments). > > > > > > > > > >