I agree with Andrey too on this. Thanks & Regards, Amogh Desai
On Fri, May 17, 2024 at 7:42 PM Kaxil Naik <kaxiln...@gmail.com> wrote: > Agreed on your points @andrey.ans...@taragol.is <andrey.ans...@taragol.is> > > On Fri, 17 May 2024 at 15:01, Andrey Anshin <andrey.ans...@taragol.is> > wrote: > > > IMHO, In case if we decide to keep only Postgres support we need to have > > really powerful arguments to provide an interface which helps integrate > > with other DBs. > > > > In this case, we must clearly understand what the community is > responsible > > for in this case and how it can be sure that nothing is broken > > > > Especially if we take in account the Airflow has very tight integrations > > with specific Databases, requires a lot of effort to support additional > > ones (MS SQL case), and the DB part is not a part of the Public Interface > > of Airflow [1]. > > > > So I would consider that this should be two separate decisions: > > 1. Keep only Postgres (vanila, not forks) as supported/tested backend in > > Production. SQLite remains as development DB. > > 2. Provide public interface to DB integrations between Airflow and DB for > > third parties > > > > [1]: > > > > > https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html > > > > > > On Tue, 14 May 2024 at 12:15, Kaxil Naik <kaxiln...@gmail.com> wrote: > > > > > Yeah, that works for me > > > > > > Can we have it possible to have two (or maybe three - > > > > like a sub-committee) co-owners of topics? > > > > > > > > > On Tue, 14 May 2024 at 06:15, Vikram Koka <vik...@astronomer.io.invalid > > > > > wrote: > > > > > > > Definitely a fast moving thread on the mailing list. I haven’t been > > able > > > to > > > > respond for a few days and feel very far behind already. > > > > > > > > A few comments on topics discussed the last few days: > > > > - Jarek, in response to your comments around being more aggressive > than > > > in > > > > Airflow 2 about deprecation and drops of functionality, I am very > > > > supportive of that stance. I completely agree that we could have been > > > more > > > > aggressive as part of Airflow 2. > > > > However, I would like to ask that as we go forward, we make sure that > > we > > > > have clean interfaces to be able to add support, even if we choose a > > > single > > > > implementation. For example, with respect to dropping MySQL support. > I > > > can > > > > understand the perspective of the project that this should be > > deprecated > > > > from an Airflow OSS perspective. However, even if the only OSS > > supported > > > DB > > > > is Postgres, I would like to ensure that a clean interface exists for > > > > interaction with the DB, so that other databases such as MySQL or > > others > > > > CAN be supported by a third party or at a later date. > > > > I realize that this may seem onerous, but I believe that it enables > us > > to > > > > be more flexible in the long run, rather than locking us into a > single > > DB > > > > implementation. > > > > > > > > - Bolke, Daniel Standish, Ash, et al on the task execution contract, > > > > definitely looking forward to this. > > > > > > > > - To those that I proposed a couple of more detailed write ups, I > still > > > > plan to do that, at the latest by early next week. > > > > > > > > Vikram > > > > > > > > > > > > > > > > On Mon, May 13, 2024 at 9:30 PM Jarek Potiuk <ja...@potiuk.com> > wrote: > > > > > > > > > Super-excited about that. > > > > > > > > > > Question/Proposal: Can we have it possible to have two (or maybe > > three > > > - > > > > > like a sub-committee) co-owners of topics? I think it's a lot to > put > > on > > > > > one's head to "own" a topic and given circumstances/ volunteer time > > of > > > > > people, interruptions (and life intervening), it might be a bit > risky > > > to > > > > > put it on one's shoulders only. > > > > > > > > > > I know it's against the rule ("if it is owned by many, it's not > owned > > > by > > > > > anyone") - but I think in our case there are at least some topics > > that > > > > > could benefit from having more than one owner. Especially when we > > know > > > > and > > > > > trust that we can work together on some topics that we are > passionate > > > > > about. It might also encourage getting out of people's comfort > zones. > > > > > > > > > > For example - I'd absolutely love to volunteer to co-own the > > > "streamline > > > > > the development" with Andrey if he would be willing to of course :D > > > > (sorry > > > > > Andrey for "volunteering you" on that one :D) - and maybe we could > > get > > > > > someone else to join us. > > > > > > > > > > That might have the added benefit of being able to break with the > way > > > > > we've been doing things. If I am owning it for one - I'd likely > > > gravitate > > > > > towards past choices, but with others joining me and taking > decisions > > > > (and > > > > > responsibility in making sure we implement them) together, we could > > > make > > > > > better decisions and reduce bus factor for dev tooling/ CI in the > > > future. > > > > > > > > > > BTW. Shameless promotion: tomorrow I am giving a talk about that > > very > > > > > topic (in the context of last few years not yet Airflow 3.0) at the > > NY > > > > > meetup hosted at Astronomer NY headquarters > > > > > https://www.meetup.com/nyc-apache-airflow-meetup/events/300017228/ > - > > > so > > > > if > > > > > you are in NY or around - I think you can stil sign up :D. I am > also > > > > > getting to PyCon US in Pittsburgh next week so don't expect too > much > > > from > > > > > me. I will be gearing up for streamlining the development by > talking > > to > > > > the > > > > > right people and listening to the latest things and best practices > of > > > the > > > > > larger Python community :). > > > > > > > > > > J. > > > > > > > > > > On Tue, May 14, 2024 at 12:03 AM Kaxil Naik <kaxiln...@gmail.com> > > > wrote: > > > > > > > > > > > Thank you all, I am very happy about the discussions. > > > > > > > > > > > > The mailing list moves fast :). The main reason I recommended > > > starting > > > > > the > > > > > > dev calls in early June was to have some of these discussions on > > the > > > > > > mailing list. > > > > > > > > > > > > Since Michal already scheduled a call, let's start there to > discuss > > > > > > various ideas. For the week after that, I have created an Airflow > > > > 2-style > > > > > > recurring open dev calls for anyone to join, info below: > > > > > > > > > > > > *Date & Time: *Recurring every 2 weeks on Thursday at* 4pm BST > *( 3 > > > PM > > > > > > GMT/UTC | 11 AM EST | 8 AM PST); starting* May 30, 2024 04:00 PM > > BST* > > > > and > > > > > > then > > > > > > *One-time registration Link*: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://astronomer.zoom.us/meeting/register/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG > > > > > > *Add to your calendar*: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://astronomer.zoom.us/meeting/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG/calendar/google/add > > > > > > > > > > > > I will post the meeting notes on the dev mailing list as well as > > > > > Confluence > > > > > > for archival purposes (example > > > > > > < > https://cwiki.apache.org/confluence/display/AIRFLOW/Meeting+Notes > > > >). > > > > > > > > > > > > Once we discuss various proposals next week, I recommend that for > > > each > > > > > > "workstream", we have an owner who would want to lead that > > > workstream. > > > > > For > > > > > > items, that does not have an owner we can put those into Airflow > 3 > > > Meta > > > > > > issue <https://github.com/apache/airflow/issues/39593> or > > cross-link > > > > > over > > > > > > there so someone in the community can take it on. If we don't > have > > an > > > > > owner > > > > > > who will commit to working on it, we park that item until we find > > the > > > > > > owner. > > > > > > > > > > > > At the end of each call, I would solicit ideas for the agenda for > > the > > > > > next > > > > > > call and propose it to the broader group on the mailing list. > > > > > > > > > > > > Some of the items that should be discussed in the upcoming calls > > IMO: > > > > > > > > > > > > - Agreeing on Principles > > > > > > > > > > > > Based on the discussions, some potential items (all up for > > debate) > > > > > > - Considering Airflow 3.0 for early adopters and* breaking > > (and > > > > > > removing) things for AF 3.0*. Things can be re-added as > > needed > > > in > > > > > > upcoming minor releases > > > > > > - Optimize to get *foundational pieces in* and not "let > > perfect > > > > be > > > > > > the enemy of good" > > > > > > - Working on features that solidify Airflow as the* modern > > > > > > Orchestrator* that also has state of the art *support for > > Data, > > > > AI > > > > > & > > > > > > ML workloads*. This includes scalability & performance > > > discussion > > > > > > - Set up the codebase for the next 5 years. This > encompasses > > > all > > > > > the > > > > > > things we are discussing e.g removing MySQL to reduce the > > test > > > > > > matrix, > > > > > > simplifying things architecturally, consolidating > > serialization > > > > > > methods, etc > > > > > > > > > > > > - Workstream & Stream Owners > > > > > > - Airflow 2 support policy including scope (feature vs bug > > fixes + > > > > > > security only) & support period > > > > > > - Separate discussions for each big workstream including one > for > > > > items > > > > > > to remove & refactor (e.g dropping MySQL) > > > > > > - Discussion to streamline the development of Airflow 3 > > > > > > - Separating dev for Providers & Airflow (something Jarek > > > already > > > > > > kick-started), and > > > > > > - Separate branch for Airflow 2 > > > > > > - CI changes for the above > > > > > > - Finalize Scope + Timelines > > > > > > - Migration Utilities > > > > > > - Progress check-ins > > > > > > > > > > > > Looking forward to the exciting months ahead. > > > > > > > > > > > > Regards, > > > > > > Kaxil > > > > > > > > > > > > On Mon, 13 May 2024 at 21:40, Bolke de Bruin <bdbr...@gmail.com> > > > > wrote: > > > > > > > > > > > > > Declaring connections prior to task execution was already > > proposed > > > in > > > > > > AIP-1 > > > > > > > :-). At that time, I had in mind to communicate over IPC to the > > > task > > > > > the > > > > > > > required settings. Registration could then happen with a > > manifest. > > > > > Maybe > > > > > > > during DAG serialization this could be obtained unobtrusively? > > The > > > > > > benefit > > > > > > > is that tasks become truly atomic or independent from Airflow > as > > > long > > > > > as > > > > > > > they communicate their exit codes (success, failed, and I think > > Ash > > > > > had a > > > > > > > couple of others in mind - the fewer the better). > > > > > > > > > > > > > > If you want two-way communication, maybe for variables as they > > can > > > > > change > > > > > > > during scheduling, this can happen with AIP-44. Although, I'd > > > prefer > > > > it > > > > > > to > > > > > > > happen with the *executor* rather than some centralized > service. > > If > > > > the > > > > > > > executor is used, IPC is the logical choice. The benefit of > this > > is > > > > > that > > > > > > > you have better resiliency and you can start to think about no > > > > downtime > > > > > > > upgrades > > > > > > > > > > > > > > So I hope Ash takes this to 2024 :-). > > > > > > > > > > > > > > B. > > > > > > > > > > > > > > > > > > > > > On Mon, 13 May 2024 at 19:27, Ash Berlin-Taylor < > a...@apache.org> > > > > > wrote: > > > > > > > > > > > > > > > > That would require some mechanism of declaring prior to > task > > > > > > execution > > > > > > > > what connections would be used > > > > > > > > > > > > > > > > That’s exactly what I’m proposing in the proposal doc I’m > > working > > > > on > > > > > > > (It’s > > > > > > > > part of also overhauling and re-designing the “Task Execution > > > > > > interface” > > > > > > > > that also gives us the ability to nicely have support for > > running > > > > > tasks > > > > > > > in > > > > > > > > other languages — much more than just BashOperator) > > > > > > > > > > > > > > > > This is a bit of a fundamental shift in thinking about task > > > > execution > > > > > > in > > > > > > > > Airflow, but I think it gives us some really nice properties > > that > > > > the > > > > > > > > project is currently missing. > > > > > > > > > > > > > > > > Tl;dr; lets discuss this in my doc when it comes our (next > week > > > > most > > > > > > > > likely) please :) > > > > > > > > > > > > > > > > -ash > > > > > > > > > > > > > > > > > On 13 May 2024, at 18:15, Daniel Standish > > > > > > > > <daniel.stand...@astronomer.io.INVALID> wrote: > > > > > > > > > > > > > > > > > > re > > > > > > > > > > > > > > > > > > As tasks require connection access, I assume connection > data > > > will > > > > > > > somehow > > > > > > > > >> be passed as part of the > > > > > > > > >> metadata to task execution - whether it's part of the > > executor > > > > > > > protocol > > > > > > > > or > > > > > > > > >> in some other way (I'm > > > > > > > > >> not an expert on that part of Airflow). Then, provided > it's > > > > > > accessible > > > > > > > > as > > > > > > > > >> part of some execution > > > > > > > > >> context, and not only passed to the task's execute method, > > > > > > OpenLineage > > > > > > > > >> could utilize it. > > > > > > > > >> > > > > > > > > > > > > > > > > > > It's not strictly necessary that connection info be passed > > "as > > > > part > > > > > > of > > > > > > > > task > > > > > > > > > matadata". That would require some mechanism of declaring > > > prior > > > > to > > > > > > > task > > > > > > > > > execution what connections would be used. This is a > thought > > > that > > > > > has > > > > > > > > come > > > > > > > > > up when thinking about execution of non-python tasks. But > > it's > > > > not > > > > > > > > > required from a technical perspective by AIP-44 because the > > > > > > > > > `get_connection` function can be made to be an RPC call so > a > > > task > > > > > > could > > > > > > > > > continue to retrieve connections at runtime. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > -- > > > > > > > Bolke de Bruin > > > > > > > bdbr...@gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > >