IMHO, In case if we decide to keep only Postgres support we need to have really powerful arguments to provide an interface which helps integrate with other DBs.
In this case, we must clearly understand what the community is responsible for in this case and how it can be sure that nothing is broken Especially if we take in account the Airflow has very tight integrations with specific Databases, requires a lot of effort to support additional ones (MS SQL case), and the DB part is not a part of the Public Interface of Airflow [1]. So I would consider that this should be two separate decisions: 1. Keep only Postgres (vanila, not forks) as supported/tested backend in Production. SQLite remains as development DB. 2. Provide public interface to DB integrations between Airflow and DB for third parties [1]: https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html On Tue, 14 May 2024 at 12:15, Kaxil Naik <kaxiln...@gmail.com> wrote: > Yeah, that works for me > > Can we have it possible to have two (or maybe three - > > like a sub-committee) co-owners of topics? > > > On Tue, 14 May 2024 at 06:15, Vikram Koka <vik...@astronomer.io.invalid> > wrote: > > > Definitely a fast moving thread on the mailing list. I haven’t been able > to > > respond for a few days and feel very far behind already. > > > > A few comments on topics discussed the last few days: > > - Jarek, in response to your comments around being more aggressive than > in > > Airflow 2 about deprecation and drops of functionality, I am very > > supportive of that stance. I completely agree that we could have been > more > > aggressive as part of Airflow 2. > > However, I would like to ask that as we go forward, we make sure that we > > have clean interfaces to be able to add support, even if we choose a > single > > implementation. For example, with respect to dropping MySQL support. I > can > > understand the perspective of the project that this should be deprecated > > from an Airflow OSS perspective. However, even if the only OSS supported > DB > > is Postgres, I would like to ensure that a clean interface exists for > > interaction with the DB, so that other databases such as MySQL or others > > CAN be supported by a third party or at a later date. > > I realize that this may seem onerous, but I believe that it enables us to > > be more flexible in the long run, rather than locking us into a single DB > > implementation. > > > > - Bolke, Daniel Standish, Ash, et al on the task execution contract, > > definitely looking forward to this. > > > > - To those that I proposed a couple of more detailed write ups, I still > > plan to do that, at the latest by early next week. > > > > Vikram > > > > > > > > On Mon, May 13, 2024 at 9:30 PM Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > Super-excited about that. > > > > > > Question/Proposal: Can we have it possible to have two (or maybe three > - > > > like a sub-committee) co-owners of topics? I think it's a lot to put on > > > one's head to "own" a topic and given circumstances/ volunteer time of > > > people, interruptions (and life intervening), it might be a bit risky > to > > > put it on one's shoulders only. > > > > > > I know it's against the rule ("if it is owned by many, it's not owned > by > > > anyone") - but I think in our case there are at least some topics that > > > could benefit from having more than one owner. Especially when we know > > and > > > trust that we can work together on some topics that we are passionate > > > about. It might also encourage getting out of people's comfort zones. > > > > > > For example - I'd absolutely love to volunteer to co-own the > "streamline > > > the development" with Andrey if he would be willing to of course :D > > (sorry > > > Andrey for "volunteering you" on that one :D) - and maybe we could get > > > someone else to join us. > > > > > > That might have the added benefit of being able to break with the way > > > we've been doing things. If I am owning it for one - I'd likely > gravitate > > > towards past choices, but with others joining me and taking decisions > > (and > > > responsibility in making sure we implement them) together, we could > make > > > better decisions and reduce bus factor for dev tooling/ CI in the > future. > > > > > > BTW. Shameless promotion: tomorrow I am giving a talk about that very > > > topic (in the context of last few years not yet Airflow 3.0) at the NY > > > meetup hosted at Astronomer NY headquarters > > > https://www.meetup.com/nyc-apache-airflow-meetup/events/300017228/ - > so > > if > > > you are in NY or around - I think you can stil sign up :D. I am also > > > getting to PyCon US in Pittsburgh next week so don't expect too much > from > > > me. I will be gearing up for streamlining the development by talking to > > the > > > right people and listening to the latest things and best practices of > the > > > larger Python community :). > > > > > > J. > > > > > > On Tue, May 14, 2024 at 12:03 AM Kaxil Naik <kaxiln...@gmail.com> > wrote: > > > > > > > Thank you all, I am very happy about the discussions. > > > > > > > > The mailing list moves fast :). The main reason I recommended > starting > > > the > > > > dev calls in early June was to have some of these discussions on the > > > > mailing list. > > > > > > > > Since Michal already scheduled a call, let's start there to discuss > > > > various ideas. For the week after that, I have created an Airflow > > 2-style > > > > recurring open dev calls for anyone to join, info below: > > > > > > > > *Date & Time: *Recurring every 2 weeks on Thursday at* 4pm BST *( 3 > PM > > > > GMT/UTC | 11 AM EST | 8 AM PST); starting* May 30, 2024 04:00 PM BST* > > and > > > > then > > > > *One-time registration Link*: > > > > > > > > > > > > > > https://astronomer.zoom.us/meeting/register/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG > > > > *Add to your calendar*: > > > > > > > > > > > > > > https://astronomer.zoom.us/meeting/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG/calendar/google/add > > > > > > > > I will post the meeting notes on the dev mailing list as well as > > > Confluence > > > > for archival purposes (example > > > > <https://cwiki.apache.org/confluence/display/AIRFLOW/Meeting+Notes > >). > > > > > > > > Once we discuss various proposals next week, I recommend that for > each > > > > "workstream", we have an owner who would want to lead that > workstream. > > > For > > > > items, that does not have an owner we can put those into Airflow 3 > Meta > > > > issue <https://github.com/apache/airflow/issues/39593> or cross-link > > > over > > > > there so someone in the community can take it on. If we don't have an > > > owner > > > > who will commit to working on it, we park that item until we find the > > > > owner. > > > > > > > > At the end of each call, I would solicit ideas for the agenda for the > > > next > > > > call and propose it to the broader group on the mailing list. > > > > > > > > Some of the items that should be discussed in the upcoming calls IMO: > > > > > > > > - Agreeing on Principles > > > > > > > > Based on the discussions, some potential items (all up for debate) > > > > - Considering Airflow 3.0 for early adopters and* breaking (and > > > > removing) things for AF 3.0*. Things can be re-added as needed > in > > > > upcoming minor releases > > > > - Optimize to get *foundational pieces in* and not "let perfect > > be > > > > the enemy of good" > > > > - Working on features that solidify Airflow as the* modern > > > > Orchestrator* that also has state of the art *support for Data, > > AI > > > & > > > > ML workloads*. This includes scalability & performance > discussion > > > > - Set up the codebase for the next 5 years. This encompasses > all > > > the > > > > things we are discussing e.g removing MySQL to reduce the test > > > > matrix, > > > > simplifying things architecturally, consolidating serialization > > > > methods, etc > > > > > > > > - Workstream & Stream Owners > > > > - Airflow 2 support policy including scope (feature vs bug fixes + > > > > security only) & support period > > > > - Separate discussions for each big workstream including one for > > items > > > > to remove & refactor (e.g dropping MySQL) > > > > - Discussion to streamline the development of Airflow 3 > > > > - Separating dev for Providers & Airflow (something Jarek > already > > > > kick-started), and > > > > - Separate branch for Airflow 2 > > > > - CI changes for the above > > > > - Finalize Scope + Timelines > > > > - Migration Utilities > > > > - Progress check-ins > > > > > > > > Looking forward to the exciting months ahead. > > > > > > > > Regards, > > > > Kaxil > > > > > > > > On Mon, 13 May 2024 at 21:40, Bolke de Bruin <bdbr...@gmail.com> > > wrote: > > > > > > > > > Declaring connections prior to task execution was already proposed > in > > > > AIP-1 > > > > > :-). At that time, I had in mind to communicate over IPC to the > task > > > the > > > > > required settings. Registration could then happen with a manifest. > > > Maybe > > > > > during DAG serialization this could be obtained unobtrusively? The > > > > benefit > > > > > is that tasks become truly atomic or independent from Airflow as > long > > > as > > > > > they communicate their exit codes (success, failed, and I think Ash > > > had a > > > > > couple of others in mind - the fewer the better). > > > > > > > > > > If you want two-way communication, maybe for variables as they can > > > change > > > > > during scheduling, this can happen with AIP-44. Although, I'd > prefer > > it > > > > to > > > > > happen with the *executor* rather than some centralized service. If > > the > > > > > executor is used, IPC is the logical choice. The benefit of this is > > > that > > > > > you have better resiliency and you can start to think about no > > downtime > > > > > upgrades > > > > > > > > > > So I hope Ash takes this to 2024 :-). > > > > > > > > > > B. > > > > > > > > > > > > > > > On Mon, 13 May 2024 at 19:27, Ash Berlin-Taylor <a...@apache.org> > > > wrote: > > > > > > > > > > > > That would require some mechanism of declaring prior to task > > > > execution > > > > > > what connections would be used > > > > > > > > > > > > That’s exactly what I’m proposing in the proposal doc I’m working > > on > > > > > (It’s > > > > > > part of also overhauling and re-designing the “Task Execution > > > > interface” > > > > > > that also gives us the ability to nicely have support for running > > > tasks > > > > > in > > > > > > other languages — much more than just BashOperator) > > > > > > > > > > > > This is a bit of a fundamental shift in thinking about task > > execution > > > > in > > > > > > Airflow, but I think it gives us some really nice properties that > > the > > > > > > project is currently missing. > > > > > > > > > > > > Tl;dr; lets discuss this in my doc when it comes our (next week > > most > > > > > > likely) please :) > > > > > > > > > > > > -ash > > > > > > > > > > > > > On 13 May 2024, at 18:15, Daniel Standish > > > > > > <daniel.stand...@astronomer.io.INVALID> wrote: > > > > > > > > > > > > > > re > > > > > > > > > > > > > > As tasks require connection access, I assume connection data > will > > > > > somehow > > > > > > >> be passed as part of the > > > > > > >> metadata to task execution - whether it's part of the executor > > > > > protocol > > > > > > or > > > > > > >> in some other way (I'm > > > > > > >> not an expert on that part of Airflow). Then, provided it's > > > > accessible > > > > > > as > > > > > > >> part of some execution > > > > > > >> context, and not only passed to the task's execute method, > > > > OpenLineage > > > > > > >> could utilize it. > > > > > > >> > > > > > > > > > > > > > > It's not strictly necessary that connection info be passed "as > > part > > > > of > > > > > > task > > > > > > > matadata". That would require some mechanism of declaring > prior > > to > > > > > task > > > > > > > execution what connections would be used. This is a thought > that > > > has > > > > > > come > > > > > > > up when thinking about execution of non-python tasks. But it's > > not > > > > > > > required from a technical perspective by AIP-44 because the > > > > > > > `get_connection` function can be made to be an RPC call so a > task > > > > could > > > > > > > continue to retrieve connections at runtime. > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > -- > > > > > Bolke de Bruin > > > > > bdbr...@gmail.com > > > > > > > > > > > > > > >