Two more things: 1) there is one "caveat" I had to handle - timeout handling in DagFileProcessor (signals do not play well with threads of GRPC - but that should be easy to fix) 2) The way I proposed it (with very localized changes needed) will be very easy to split the job among multiple people and make it a true community effort to complete - no need for major refactorings or changes across the whole airflow code.
J. On Fri, Jul 15, 2022 at 4:32 PM Jarek Potiuk <ja...@potiuk.com> wrote: > Hello Everyone, > > First of all - apologies for those who waited for it, I've been dragged in > multiple directions, but finally I got some quality time to take a look at > open questions and implement POC for AIP-44 as promised before. > > TL;DR; I finally came back to the AIP-44 and multi-tenancy and have made > good progress that hopefully will lead to voting next week. I think I have > enough of the missing evidence of the impact of the internal API on > performance, also I have implemented working POC with one of the Internal > API calls (processFile) that we will have to implement and run a series of > performance tests with it. > > # Current state > -------------------- > > The state we left it few months ago was ( > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API > ): > * I've prepared inventory of methods and general approach we are going to > take (and we got consensus there) > * I've left decision on the final choice (Thrift vs. GRPC) to later POC (I > got some feedback from Beam about gRPC vs. Thrift, based on that I started > with GRPC and I am actually very happy with it, so i think we can leave > Thrift out of the picture). > * We've struggled a bit with the decision - should we leave two paths > (with GRPC/Direct DB) or one (local GRPC vs. remote GRPC) > > The POC is implemented here: https://github.com/apache/airflow/pull/25094 > and I have the following findings: > > # Performance impact > ------------------------------ > > The performance impact is visible (as expected). It's quite acceptable for > a distributed environment (in exchange for security), but it's likely > significant enough to stay with the original idea of having Airflow work in > two modes: a) secure - with RPC apis, b) standard - with direct DB calls. > > On "localhost" with a local DB the performance overhead for serializing > and transporting the messages between two different processes introduced up > to 10% overhead. I saw an increase of time to run 500 scans of all our > example folders dags going from ~290 to ~320s pretty consistently. Enough > of a difference to exclude volatility. I tested it in Docker on both ARM > and AMD. It seems that in some cases that can be partially offset by > slightly increased parallelism on multi-processing machines run on "bare" > metal. I got consistently just 2-3% increase on my linux without docker > with the same test harness, but we cannot rely on such configurations, I > think focusing on Docker-based installation without any special assumptions > on how your networking/localhost is implemented is something we should > treat as a baseline. > > The same experiment repeated with 50 messages but of much bigger size (300 > Callbacks passed per message) have shown 30% performance drop. This is > significant, but I believe most of our messages will be much smaller. Also > the messages I chose were very special - because in case of Callback we are > doing rather sophisticated serialization, because we have to account for > Kubernetes objects that are potentially not serializable, so the test > involved 300 messages that had to be not only GRPC serialized but also some > parts of them had to be Airflow-JSON serialized and the whole "executor > config" had to be picked/unpickled (for 100 such messages out of 300). Also > we have some conditional processing there (there is a union of three types > of callbacks that has to be properly deserialized). And those messages > could be repeated. This is not happening for most other cases. And we can > likely optimize it away in the future. but I wanted to see the "worst" > scenario. > > For the remote client (still local DB for the internal API server) I got > between 30% and 80% slow-down, but this was on my WiFi network. I am sure > with a "wired" network, it will be much better, also when a remote DB gets > into picture the overall percentage overhead will be much smaller. We knew > this would be a "slower" solution but this is the price for someone who > wants to have security and isolation. > > I have not yet performed the tests with SSL, but I think this will be a > modest increase. And anyhow the "impact" of 10% is IMHO enough to make a > decision that we cannot get "GRPC-only" - the path where we will continue > using Direct DB access should stay (and I know already how to do it > relatively easily). > > # Implementation and maintainability > ------------------------------------------------- > > In the PR you will see the way I see implementation details and how it > will impact our code in general. I think what I proposed is actually > rather elegant and easy to maintain, and likely we can improve it somehow > (happy to brainstorm on some creative ways we could use - for example - > decorators) to make it "friendler" but I focused more on explicitness and > showing the mechanisms involved rather than "fanciness". I found it rather > easy to implement and it does seem to have some good "easy maintainability" > properties. The way I propose it boils down to few "rules": > > * For all the data and "Messages" we sent, we have a nice "Internal API" > defined in .proto and protobuf objects generated out of that. The GRPC > proto nicely defines the structures we are going to send over the network. > It's very standard, it has a really good modern support. For one, we > automatically - I added pre-commit - generate MyPY type stubs for the > generated classes and it makes it super easy to both - implement the > mapping and automatically verify its correctness. Mypy nicely catches all > kinds of mistakes you can make)! Autocomplete for the PROTO-generated > classes works like a charm. I struggled initially without typing, but once > I configured mypy stub generation, I got really nice detection of mistakes > I've made and I was able to progress with the implementation way faster > than without it. Usually everything magically worked as soon as I fixed all > MyPy errors. > > * All Airflow objects that we get to send over GRPC should get > from_protobuf/to_protobuf methods. They are usually very simple (just > passing strings/ints/boolean fields to the constructor), and the structures > we pass are rather small (see the PR). Also (as mentioned above) I > implemented a few more complex cases (with more complex serialization) and > it is easy, readable and pretty well integrated into our code IMHO. This > introduces a little duplication here and there, but those objects change > rarely (only when we implement big features like Dynamic Task mapping) and > MyPy guards us against any mishaps there (now that our code is mostly > typed). > > * We make all "DB Aware" objects also "GRPC aware". The number of changes > in the actual code/logic is rather small and makes it super easy to > maintain IMHO. There are two changes: > > 1) for any of the objects that are used for database operations (in my > PR this is DagFileProcessor) we need to initialize it with the "use_grpc" > flag and pass it a channel that will be used for communication. > 2) the DB methods we have (inventory in the AIP) will have to be > refactored slightly. This is what really was added, the original > "process_file" method was renamed to "process_file_db" and the "callers" of > the method are completely intact. > > def process_file_grpc( > self, > file_path: str, > callback_requests: List[CallbackRequest], > pickle_dags: bool = False, > ) -> Tuple[int, int]: > request = internal_api_pb2.FileProcessorRequest(path=file_path, > pickle_dags=pickle_dags) > for callback_request in callback_requests: > request.callbacks.append(callback_request.to_protobuf()) > res = self.stub.processFile(request) > return res.dagsFound, res.errorsFound > > def process_file( > self, > file_path: str, > callback_requests: List[CallbackRequest], > pickle_dags: bool = False, > ) -> Tuple[int, int]: > if self.use_grpc: > return self.process_file_grpc( > file_path=file_path, callback_requests=callback_requests, > pickle_dags=pickle_dags > ) > return self.process_file_db( > file_path=file_path, callback_requests=callback_requests, > pickle_dags=pickle_dags > ) > > You can see all the details in the PR > https://github.com/apache/airflow/pull/25094. > > # POC testing harness (self-service :) ) > ---------------------------------------------------- > > You can also very easily test it yourself: > > * airflow internal-api server -> runs internal API server. > * airflow internal-api test-client --num-repeats 10 --num-callbacks 10 > --use-grpc -> runs file-processing of all example dags 10 times with 10x3 > callbacks sent. Same command without --use-grpc will run using direct DB > access. Server listens on "50051" port, client connects to > "localhost:50051" - you can modify the code to use remote IP/different > ports (easy to find by localhost:50051). > > # Discussion and voting > -------------------------------- > > Of course some decisions in the PR can be improved (I focused on > explicitness of the POC more than anything else). We can discuss some > changes and improvements to some decisions I made when the PR will be in > "reviewable state" and I am happy to improve it, but for now I would like > to focus on answering the two questions: > > * Does it look plausible? > * Does it look like it's almost ready to vote on it? (I will update the > AIP before starting voting, of course). > > Let me know what you think. > > J. > > > > > On Tue, Feb 1, 2022 at 3:11 PM Jarek Potiuk <ja...@potiuk.com> wrote: > >> Since we have AIP-43 already approved, I think I would love to have more >> questions and discussions about AIP-44 - The "Airflow Internal API" >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API >> (as the new name is). >> >> For those who would like to get more context - recording of the meeting >> where both AIP-43 and AIP-44 scope and proposals were discussed can be >> found here: >> https://drive.google.com/file/d/1SMFzazuY1kg4B4r11wNt8EQ_PmTDRKq6/view >> >> In the AIP I just made a small clarification regarding some "future" >> changes - specifically the token security might be nicely handled together >> with AIP-46 "Add support for docker runtime isolation" that is proposed by >> Ping. >> >> My goal is to gather comments till the end of the week, and if there will >> be no big concerns, I would love to start voting next week. >> >> J. >> >> >> On Mon, Jan 3, 2022 at 2:48 PM Jarek Potiuk <ja...@potiuk.com> wrote: >> >>> Also AIP-44 - which is the DB isolation mode is much more detailed and >>> ready for deeper discussion if needed. >>> >>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Database+API >>> >>> On Tue, Dec 14, 2021 at 12:07 PM Jarek Potiuk <ja...@potiuk.com> wrote: >>> > >>> > And just to add to that. >>> > >>> > Thanks again for the initial comments and pushing us to provide more >>> > details. That allowed us to discuss and focus on many of the aspects >>> > that were raised and we have many more answers now: >>> > >>> > * first of all - we focused on making sure impact on the existing code >>> > and "behavior" of Airflow is minimal. In fact there should be >>> > virtually no change vs. current behavior when db isolation is >>> > disabled. >>> > * secondly - we've done a full inventory of how the API needed should >>> > look like and (not unsurprisingly) it turned out that the current >>> > REST-style API is good for part of it but most of the 'logic" of >>> > airflow can be done efficiently when we go to RPC-style API. However >>> > we propose that the authorization/exposure of the API is the same as >>> > we use currently in the REST API, this will allow us to reuse a big >>> > part of the infrastructure we already have >>> > * thirdly - we took deep into our hearts the comments about having to >>> > maintain pretty much the same logic in a few different places. That's >>> > an obvious maintenance problem. The proposal we came up with addresses >>> > it - we are going to keep the logic of Airflow internals in one place >>> > only and we will simply smartly route on where the logic will be >>> > executed >>> > * regarding the performance impact - we described the deployment >>> > options that our proposal makes available - we do not want to favor >>> > one deployment option over another, but we made sure the architecture >>> > is done in the way, that you can choose which deployment is good for >>> > you: "no isolation", "partial isolation", "full isolation" - each with >>> > different performance/resource characteristics - all of them fully >>> > horizontally scalable and nicely manageable. >>> > >>> > We look forward to comments, also for the API 43 - >>> > >>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation >>> > - AIP-43 is a prerequiste to AIP-44 and they both work together. >>> > >>> > We also will think and discuss more follow-up AIPs once we get those >>> > approved (hopefully ;) ). The multi-tenancy is a 'long haul" and while >>> > those two AIPs are foundational building blocks, there are at least >>> > few more follow-up AIPs to be able to say "we're done with >>> > Multi-tenancy" :). >>> > >>> > J. >>> > >>> > >>> > >>> > On Tue, Dec 14, 2021 at 9:58 AM Mateusz Henc <mh...@google.com.invalid> >>> wrote: >>> > > >>> > > Hi, >>> > > >>> > > As promised we (credits to Jarek) updated the AIPs, added more >>> details, did inventory and changed the way API endpoints are generated. >>> > > >>> > > We also renamed it to Airflow Internal API - so the url has changed: >>> > > >>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API >>> > > >>> > > Please take a look, any comments are highly appreciated. >>> > > >>> > > Best regards, >>> > > Mateusz Henc >>> > > >>> > > >>> > > On Mon, Dec 6, 2021 at 3:09 PM Mateusz Henc <mh...@google.com> >>> wrote: >>> > >> >>> > >> Hi, >>> > >> Thank you Ash for your feedback (in both AIPs) >>> > >> >>> > >> We are working to address your concerns. We will update the AIPs in >>> a few days. >>> > >> I will let you know when it's done. >>> > >> >>> > >> Best regards, >>> > >> Mateusz Henc >>> > >> >>> > >> >>> > >> On Fri, Dec 3, 2021 at 7:27 PM Jarek Potiuk <ja...@potiuk.com> >>> wrote: >>> > >>> >>> > >>> Cool. Thanks for the guidance. >>> > >>> >>> > >>> On Fri, Dec 3, 2021 at 6:37 PM Ash Berlin-Taylor <a...@apache.org> >>> wrote: >>> > >>> > >>> > >>> > - make an inventory: It doesn't need to be exhaustive, but a >>> representative sample. >>> > >>> > - More clearly define _what_ the API calls return -- object >>> type, methods on them etc. >>> > >>> > >>> > >>> > From the AIP you have this example: >>> > >>> > >>> > >>> > def get_dag_run(self, dag_id, execution_date): >>> > >>> > return self.db_client.get_dag_run(dag_id,execution_date) >>> > >>> > >>> > >>> > What does that _actually_ return? What capabilities does it have? >>> > >>> > >>> > >>> > (I have other thoughts but those are less fundamental and can be >>> discussed later) >>> > >>> > >>> > >>> > -ash >>> > >>> > >>> > >>> > On Fri, Dec 3 2021 at 18:20:21 +0100, Jarek Potiuk < >>> ja...@potiuk.com> wrote: >>> > >>> > >>> > >>> > Surely - if you think that we need to do some more work to get >>> confidence, that's fine. I am sure we can improve it to the level that we >>> will not have to do full performance tests, and you are confident in the >>> direction. Just to clarify your concerns and make sure we are on the same >>> page - as I understand it should: * make an inventory of "actual changes" >>> the proposal will involve in the database-low-level code of Airflow * based >>> on that either assessment that those changes are unlikely (or likely make a >>> performance impact) * if we asses that it is likely to have an impact, some >>> at least rudimentary performance tests to prove that this is manageable I >>> think that might be a good exercise to do. Does it sound about right? Or do >>> you have any concerns about certain architectural decisions taken? No >>> problem with Friday, but if we get answers today. I think it will give us >>> time to think about it over the weekend and address it next week. J. On >>> Fri, Dec 3, 2021 at 5:56 PM Ash Berlin-Taylor <a...@apache.org> wrote: >>> > >>> > >>> > >>> > This is a fundamental change to the architecture with >>> significant possible impacts on performance, and likely requires touching a >>> large portion of the code base. Sorry, you're going to have to do expand on >>> the details first and work out what would actually be involved and what the >>> impacts will be: Right now I have serious reservations to this approach, so >>> I can't agree on the high level proposal without an actual proposal (The >>> current document is, at best, an outline, not an actual proposal.) Sorry to >>> be a grinch right before the weekend. Ash On Thu, Dec 2 2021 at 22:47:34 >>> +0100, Jarek Potiuk <ja...@potiuk.com> wrote: Oh yeah - good point and >>> we spoke about performance testing/implications. Performance is something >>> we were discussing as the next step when we get general "OK" in the >>> direction - we just want to make sure that there are no "huge" blockers in >>> the way this is proposed and explain any doubts first, so that the >>> investment in performance part makes sense. We do not want to spend a lot >>> of time on getting the tests done and detailed inventory of methods/ API >>> calls to get - only to find out that this is generally "bad direction". >>> Just to clarify again - we also considered (alternative option) to >>> automatically map all the DB methods in the remote calls. But we dropped >>> that idea - precisely for the reason of performance, and transaction >>> integrity. So we are NOT mapping DB calls into API calls. those will be >>> "logical operations" on the database. Generally speaking, most of the API >>> calls for the "airflow system-level but executed in worker" calls will be >>> rather "coarse" than fine-grained. For example, the aforementioned "mini >>> scheduler" - where we want to make a single API call and run the whole of >>> it on the DBAPI side. So there - performance impact is very limited IMHO. >>> And If we see any other "logic" like that in other parts of the code >>> (zombie detection as an example). We plan to make a detailed inventory of >>> those once we get general "Looks good" for the direction. For now we did >>> some "rough" checking and it seems a plausible approach and quite doable. >>> One more note - the "fine-grained" ( "variable" update/retrieval, >>> "connection update retrieval") - via REST API will still be used by the >>> user's code though (Parsing DAGs, operators, workers and callbacks). We >>> also plan to make sure that none of the "Community" operators are using >>> "non-blessed" DB calls (we can do it in our CI). So at the end of the >>> exercise, all operators, hooks, etc. from the community will be guaranteed >>> to only use the DB APIs that are available in the "DB API" module. But >>> there I do not expect pretty much any performance penalty as those are very >>> fast and rare operations (and good thing there is that we can cache results >>> of those in workers/DAG processing). J. On Thu, Dec 2, 2021 at 7:16 PM >>> Andrew Godwin <andrew.god...@astronomer.io.invalid> wrote: Ah, my bad, >>> I missed that. I'd still like to see discussion of the performance impacts, >>> though. On Thu, Dec 2, 2021 at 11:14 AM Ash Berlin-Taylor < >>> a...@apache.org> wrote: The scheduler was excluded from the components >>> that would use the dbapi - the mini scheduler is the odd one out here is it >>> (currently) runs on the work but shares much of the code from the >>> scheduling path. -a On 2 December 2021 17:56:40 GMT, Andrew Godwin < >>> andrew.god...@astronomer.io.INVALID> wrote: I would also like to see >>> some discussion in this AIP about how the data is going to be serialised to >>> and from the database instances (obviously Connexion is involved, but I >>> presume more transformation code is needed than that) and the potential >>> slowdown this would cause. In my experience, a somewhat direct ORM mapping >>> like this is going to result in considerably slower times for any complex >>> operation that's touching a few hundred rows. Is there a reason this is >>> being proposed for the scheduler code, too? In my mind, the best approach >>> to multitenancy would be to remove all user-supplied code from the >>> scheduler and leave it with direct DB access, rather than trying to >>> indirect all scheduler access through another API layer. Andrew On Thu, Dec >>> 2, 2021 at 10:29 AM Jarek Potiuk <ja...@potiuk.com> wrote: Yeah - I >>> thik Ash you are completely right we need some more "detailed" >>> clarification. I believe, I know what you are - rightfully - afraid of (re >>> impact on the code), and maybe we have not done a good job on explaining it >>> with some of our assumptions we had when we worked on it with Mateusz. >>> Simply it was not clear that our aim is to absolutely minimise the impact >>> on the "internal DB transactions" done in schedulers and workers. The idea >>> is that change will at most result in moving an execution of the >>> transactions to another process but not changing what the DB transactions >>> do internally. Actually this was one of the reason for the "alternative" >>> approach (you can see it in the document) we discussed about - hijack >>> "sqlalchemy session" - this is far too low level and the aim of the >>> "DB-API" is NOT to replace direct DB calls (Hence we need to figure out a >>> better name). The API is there to provide "scheduler logic" API and "REST >>> access to Airflow primitives like dags/tasks/variables/connections" etc.. >>> As an example (which we briefly talked about in slack) the >>> "_run_mini_scheduler_on_child_tasks" case ( >>> https://github.com/apache/airflow/blob/main/airflow/jobs/local_task_job.py#L225-L274) >>> is an example (that we would put in the doc). As we thought of it - this is >>> a "single DB-API operation". Those are not Pure REST calls of course, they >>> are more RPC-like calls. That is why even initially I thought of separating >>> the API completely. But since there are a lot of common "primitive" calls >>> that we can re-use, I think having a separate DB-API component which will >>> re-use connexion implementation, replacing authentication with the custom >>> worker <> DB-API authentication is the way to go. And yes if we agree on >>> the general idea, we need to choose the best way on how to best "connect" >>> the REST API we have with the RPC-kind of API we need for some cases in >>> workers. But we wanted to make sure we are on the same page with the >>> direction. And yes it means that DB-API will potentially have to handle >>> quite a number of DB operations (and that it has to be replicable and >>> scalable as well) - but DB-API will be "stateless" similarly as the >>> webserver is, so it will be scalable by definition. And yest performance >>> tests will be part of POC - likely even before we finally ask for votes >>> there. So in short: * no modification or impact on current scheduler >>> behaviour when DB Isolation is disabled * only higher level methods will be >>> moved out to DB-API and we will reuse existing "REST" APIS where it makes >>> sense * we aim to have "0" changes to the logic of processing - both in Dag >>> Processing logic and DB API. We think with this architecture we proposed >>> it's perfectly doable I hope this clarifies a bit, and once we agree on >>> general direction, we will definitely work on adding more details and >>> clarification (we actually already have a lot of that but we just wanted to >>> start with explaining the idea and going into more details later when we >>> are sure there are no "high-level" blockers from the community. J, On Thu, >>> Dec 2, 2021 at 4:46 PM Ash Berlin-Taylor <a...@apache.org> wrote: > > I >>> just provided a general idea for the approach - but if you want me to put >>> more examples then I am happy to do that > > > Yes please. > > It is too >>> general for me and I can't work out what effect it would actually have on >>> the code base, especially how it would look with the config option to >>> enable/disable direct db access. > > -ash > > On Thu, Dec 2 2021 at >>> 16:36:57 +0100, Mateusz Henc <mh...@google.com.INVALID> wrote: > > Hi, >>> > I am sorry if it is not clear enough, let me try to explain it here, so >>> maybe it gives more light on the idea. > See my comments below > > On Thu, >>> Dec 2, 2021 at 3:39 PM Ash Berlin-Taylor <a...@apache.org> wrote: >> >> >>> I'm sorry to say it, but this proposal right just doesn't contain enough >>> detail to say what the actual changes to the code would be, and what the >>> impact would be >> >> To take the one example you have so far: >> >> >> def >>> get_dag_run(self, dag_id, execution_date): >> return >>> self.db_client.get_dag_run(dag_id,execution_date) >> >> So form this >>> snippet I'm guessing it would be used like this: >> >> dag_run = >>> db_client.get_dag_run(dag_id, execution_date) >> >> What type of object is >>> returned? > > > As it replaces: > dag_run = session.query(DagRun) > >>> .filter(DagRun.dag_id == dag_id, DagRun.execution_date == execution_date) > >>> .first() > > then the type of the object will be exactly the same (DagRun) >>> . > >> >> >> Do we need one API method per individual query we have in the >>> source? > > > No, as explained by the sentence: > > The method may be >>> extended, accepting more optional parameters to avoid having too many >>> similar implementations. > > >> >> >> Which components would use this new >>> mode when it's enabled? > > > You may read: > Airflow Database APi is a new >>> independent component of Airflow. It allows isolating some components >>> (Worker, DagProcessor and Triggerer) from direct access to DB. > >> >> But >>> what you haven't said the first thing about is what _other_ changes would >>> be needed in the code. To take a fairly simple example: >> >> dag_run = >>> db_client.get_dag_run(dag_id, execution_date) >> dag_run.queued_at = >>> timezone.now() >> # How do I save this? >> >> In short, you need to put a >>> lot more detail into this before we can even have an idea of the full scope >>> of the change this proposal would involve, and what code changes would be >>> needed for compnents to work with and without this setting enabled. > > > >>> For this particular example - it depends on the intention of the code >>> author > - If this should be in transaction - then I would actually >>> introduce new method like enqueue_dag_run(...) that would run these two >>> steps on Airflow DB API side > - if not then, maybe just the >>> "update_dag_run" method accepting the whole "dag_run" object and saving it >>> to the DB. > > In general - we could take naive approach, eg replace code: >>> > dag_run = session.query(DagRun) > .filter(DagRun.dag_id == dag_id, >>> DagRun.execution_date == execution_date) > .first() > with: > if >>> self.db_isolation: > dag_run = session.query(DagRun) > >>> .filter(DagRun.dag_id == dag_id, DagRun.execution_date == execution_date) > >>> .first() > else: > dag_run = db_client.get_dag_run(self, dag_id, >>> execution_date) > > The problem is that Airflow DB API would need to have >>> the same implementation for the query - so duplicated code. That's why we >>> propose moving this code to the DBClient which is also used by the Airflow >>> DB API(in DB direct mode). > > I know there are many places where the code >>> is much more complicated than a single query, but they must be handled >>> one-by-one, during the implementation, otherwise this AIP would be way too >>> big. > > I just provided a general idea for the approach - but if you want >>> me to put more examples then I am happy to do that > > Best regards, > >>> Mateusz Henc > >> >> On Thu, Dec 2 2021 at 14:23:56 +0100, Mateusz Henc >>> <mh...@google.com.INVALID> wrote: >> >> Hi, >> I just added a new AIP >>> for running some Airflow components in DB-isolation mode, without direct >>> access to the Airflow Database, but they will use a new API for thi >>> purpose. >> >> PTAL: >> >>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Database+API >>> >> >> Open question: >> I called it "Airflow Database API" - however I feel >>> it could be more than just an access layer for the database. So if you have >>> a better name, please let me know, I am happy to change it. >> >> Best >>> regards, >> Mateusz Henc >>> >>