Hello, lately in my organisation we have started to experience issues due
to having to use airflow as a multi-tenant environment, due to having too
many airflow environments to manage.

I have seen the same issue in other organisations as well, where one team
had to deploy, monitor and upgrade dozens of airflow instances, which
causes a lot of issues and complexity.

After some thought, I came to the idea of supporting multi-tenant airflow
clusters, as I know that now it is not supported and not recommended,
however, In my opinion and from what I have seen, it would benefit many
airflow clusters, and improve the usability and ease of maintenance
operations.

We, in our organisation, have a couple of possible propositions to allow
airflow cluster multi-tenancy, which include:
*1) In the airflow chart & code:*

   - Define the pools in the chart instead of the DB.
   - In the chart, set a new yaml array to define a tenant, whom consists
   of a list of pools.
   - For each said tenant, deploy a scheduler and a trigerrer (if needed).
   - Each deployed component only processes the related pools of the tenant.
   - Each connection or variable is changed to be accessed by a specific
   tenant (taken from the owner of the dag or any other way)


*2) In the code only:*

   - Create a tenant table in the database.
   - Create the ralation for tenant and pool.
   - Make the connections and variables accessed from the tenant table,
   thus achieving isolation.
   - For each tenant, create at least #schedulers / 2 instances of the
   scheduler and triggerrer job (on the same pod).
   - Change the code of the scheduler and trigerrer so that every job only
   queries on the pools of the related tenants.


These 2 issues would solve most of our issues that we have, such as
starvation and noisy neighbours, keep in mind that these 2 are very very
rough drafts, and are not the full spec of the idea, as I want to first
keep this mail as a discussion rather than a proposal, in hopes that it
will help me understand if an AIP can be opened on the thread.

I would love to hear what the airflow community thinks about the topic, in
addition to propositions or ideas on what can or should be done, and
whether it may solve any issues that the community is experiencing.

Reply via email to