Hey everyone, Great to see others chiming-in :). It's really great that we have various stakeholders and users taking part in the discussion - and I hope what we come up with will be something that will be driven and supported by the whole community. Also John from Amazon promised to bring his team in and I really see that we are getting in the right direction and what we come up with will be well thought/discussed.
TL;DR; I am genuinely excited. However I also know we have to hold some horses and cannot get mine (and others :) excitement to go too wild. So I propose that we complete discussions and approvals of AIP-43 and AIP-44 before we go further. I think there are multiple "layers" of isolation we can talk about. AIP-43 and AIP-44 build on the foundations laid in Airflow 2 but I am well aware this is a "long haul" and there will be multiple AIPs that will follow and make it really robust. We have discussed so far the "fine-grained" access level, and also those areas you talk about Ping Zhang are important and are definitely important to be looked at. I think the "docker" level isolation (as an option) is quite possible and really interesting direction. The separation of Dag processing to a different component is slightly "higher" level of isolation, because it allows for example (this is part of the https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation ) to run DAG processors for separate tenants not only in separate process or docker container, but also on a separate virtual machine and even in separate "security zone". This is the basic assumption behind AIP-43, to be able (if there is a need) to physically separate dag processing for different teams. And in fact Docker separation as optional (future) isolation layer for DagProcessor has been mentioned in AIP-44: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API see "Securing access to authorization token in DagProcessor and Triggerer" chapter. Similarly "DAG Serialization" discussion you started is cool and I think we should closely take a look at that. Of course we will have to think about how it fits in Airflow 2 and current features that are not there (or are in early state) in Airflow 1.10. However, I have an important request here. We can extremely easily get lost and distracted in the number of different threads that are opened in this direction. If we open too many threads and start discussing all of them at once, I think we can simply spread ourselves too thin. While there should be a "grand vision" worked out I think we should not actively discuss more than 2 AIPs, and only move on to "next steps" once we discussed and agreed (and approved) the work on the "foundations" so that we keep focus. I think specifically AIP-43 and AIP-44 are designed in the way that they are "opening" more of further discussions than closing them. The changes proposed in AIP-43 and 44 are more of reshuffling some of the existing logic and separating it - without impacting the internals of Airflow (such as running tasks) and they are laying foundation to further steps. My proposal is to discuss those two first (also Ping Zhang - specifically on how AIP-43 and AIP-44 would interact with the future proposals of yours), see if we can get consensus, approval and then we can move on with discussing the follow up steps. Especially if we all agree that AIP-43 and AIP-44 are more of the "opener" and they are not "clashing I am really looking forward to discussing and taking the next steps/discussing more AIPs, I am just afraid that if we want to discuss and clarify too many things too quickly in the area of multi-tenancy, we will quickly get lost in details. What I propose is that some time mid January (where we hopefully have consensus and approval for AIP-43/44) and I will ask you for a public demo and some deeper explanation on how you've implemented the features you explained, so that (especially people involved in Airflow 2 can comment on how it maps into current features - Airflow 2.2 is really far from the 1.10.* version that your implementation is based on and there are some interesting new features (like @task.docker decorator) that might influence the way we think about docker isolation. In the meantime - I would really appreciate it if you could get involved in AIP-43 and AIP-44 discussions - I think that might be super-useful that we are all on the same page here and we have some foundation to discuss further steps. J. On Fri, Dec 17, 2021 at 7:18 AM Ping Zhang <[email protected]> wrote: > > Ah, I guess I am a little bit late now. > > In terms of the >> >> Isolating code execution and parsing of DAG files. > > > In Airbnb, we use `Docker runtime isolation for airflow tasks` plus `Parsing > Service` to totally isolate the dag parsing, task execution from the airflow > infra runtime. > > `Docker runtime isolation for airflow tasks` (see email thread: [DISCUSS] > Docker runtime isolation for airflow tasks) introduces a docker layer which > wraps the dag parsing and task execution so each dag file can have its own > docker runtime. > `Parsing Service` totally removes the dag file parsing from the scheduler. > > These two features have been running in Airbnb's production for close to 1 > year. I am working on open source them. > > Ping > > > Best wishes > > Ping Zhang > > > On Wed, Dec 1, 2021 at 11:43 AM Jarek Potiuk <[email protected]> wrote: >> >> Very good/important thoughts. >> >> From the discussions and looking at the (upcoming) proposals from >> Mateusz we are going to have this all optional: >> >> We plan to have two config options: >> >> * DB Isolation mode for separating out DB access >> * Standalone DAG processor >> >> I totally agree that standalone/quick/dirty access mode for Airflow >> should be the default (so business as usual). Moreover - that will >> allow the introduction of the multi-tenant mode as "optional" in >> otherwise backwards-compatible Airflow - i.e. it could start to be >> available in 2.x line. >> >> Actually (and this is something up for discussion in the AIP) we could >> introduce "soft" multi-tenancy mode, where DB access will be still >> possible but flagged as a warning. >> This could give the user an option to switch gradually their DAGs to >> the multi-tenancy mode, if they are already using some direct DB >> access (for example in their callbacks or custom operator). >> >> Also I think part of the AIP and proof of concept while discussing it >> should be initially rough, and later more comprehensive performance >> testing of some "real-life" scenarios. >> >> J. >> >> On Wed, Dec 1, 2021 at 6:12 PM Ash Berlin-Taylor <[email protected]> wrote: >> > >> > I look forward to seeing these propsals etc. >> > >> > One thought I've just had is that we should be careful about two things >> > when taking on this work: >> > >> > 1. That performance is not impacted (specifically of the scheduler >> > "throughput") -- at least when only a single "tenant" is in use if not for >> > all. >> > 2. That we don't make the deployment story more complex for the small >> > deployments, nor for the "getting started on a laptop" initial user >> > experience. >> > >> > -ash >> > >> > On Fri, Nov 26 2021 at 18:23:32 +0100, Jarek Potiuk <[email protected]> >> > wrote: >> > >> > Recording available here: >> > https://drive.google.com/file/d/1Irw7qxxeTOHZTfdvT5lAbGowIfm9DHzi/view On >> > Fri, Nov 26, 2021 at 6:17 PM Jarek Potiuk <[email protected]> wrote: >> > >> > Thanks for the meeting this morning/afternoon :) ! It was very productive, >> > I believe: The notes are available here: >> > https://docs.google.com/document/d/19d0jQeARnWm8VTVc_qSIEDldQ81acRk0KuhVwAd96wo/edit >> > The most important take is that it looks like if the use cases are >> > slightly different, we are all aligned of what needs to be done and how >> > Action points: * Composer team (Mateusz) will soon submit AIP's (they are >> > close to be ready for proposing) for * DB access isolation * Separating >> > out DAG processor * Cloudera team (Ian) will work on follow-up >> > Fine-grained resource access AIP - it can be implemented as next steps. >> > The two AIPs above will implement "coarse" access level but in the way >> > that the "fine-grained" access will be possible to be plugged-in I >> > recorded the meeting and I am waiting for the video to be processed - I >> > will send/add it to notes when I get it. J. J. On Fri, Nov 26, 2021 at >> > 2:29 PM Jarek Potiuk <[email protected]> wrote: > > Reminder: the SIG >> > meeting is today in ~2.5 hrs. > > Calendar link here: > >> > https://calendar.google.com/event?action=TEMPLATE&tmeid=N3ZmbGFxNGF1OXBtajc2ODU3bWduMWVvc2YgcG90aXVrLmFwYWNoZS5vcmdAbQ&tmsrc=potiuk.apache.org%40gmail.com >> > > Notes/material links will be added here > >> > https://docs.google.com/document/d/19d0jQeARnWm8VTVc_qSIEDldQ81acRk0KuhVwAd96wo/edit?usp=sharing >> > > > I will record the meeting and post the link together with the notes. >> > > > On Thu, Nov 25, 2021 at 3:31 PM Jarek Potiuk <[email protected]> wrote: >> > > > > > Just a reminder -> multi-tenancy meeting tomorrow. Few people >> > worked > > on what will be presented tomorrow, and I am super excited we >> > will be > > able to kick that one off - it has been a long time on my >> > waiting list > > :) > > > > J. > > > > On Sat, Nov 20, 2021 at 10:14 AM >> > Jarek Potiuk <[email protected]> wrote: > > > > > > The meeting is set for >> > Friday 26th Nov 5 PM CET (4 PM UTC) > > > > > > This is the calendar link >> > (google meet link there): > > > >> > https://calendar.google.com/event?action=TEMPLATE&tmeid=N3ZmbGFxNGF1OXBtajc2ODU3bWduMWVvc2YgcG90aXVrLmFwYWNoZS5vcmdAbQ&tmsrc=potiuk.apache.org%40gmail.com >> > > > > > > > The initial agenda: > > > > > > 1) The goal of the group, >> > intro about the "isolation" and various "scopes" > > > of the >> > multi-tenancy - Jarek Potiuk > > > > > > 2) The review of the example >> > architecture that > > > needs the "multitenancy" - this is from the Google >> > Composer team - > > > Mateusz Henc > > > > > > 3) Maybe others would like >> > to get their case explain similarly > > > > > > 4) Discus proposals on the >> > scope of the AIP(s) we want to write > > > and rough approach we can take >> > for implementation and who will do > > > whatGoogle Meet call: >> > meet.google.com/rxu-tvdz-vpv (edited) > > > > > > We will send more >> > info/slides then. Anyone who would like to show/add > > > something, >> > please respond here :). > > > > > > J.
