Thank you all, I am very happy about the discussions. The mailing list moves fast :). The main reason I recommended starting the dev calls in early June was to have some of these discussions on the mailing list.
Since Michal already scheduled a call, let's start there to discuss various ideas. For the week after that, I have created an Airflow 2-style recurring open dev calls for anyone to join, info below: *Date & Time: *Recurring every 2 weeks on Thursday at* 4pm BST *( 3 PM GMT/UTC | 11 AM EST | 8 AM PST); starting* May 30, 2024 04:00 PM BST* and then *One-time registration Link*: https://astronomer.zoom.us/meeting/register/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG *Add to your calendar*: https://astronomer.zoom.us/meeting/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG/calendar/google/add I will post the meeting notes on the dev mailing list as well as Confluence for archival purposes (example <https://cwiki.apache.org/confluence/display/AIRFLOW/Meeting+Notes>). Once we discuss various proposals next week, I recommend that for each "workstream", we have an owner who would want to lead that workstream. For items, that does not have an owner we can put those into Airflow 3 Meta issue <https://github.com/apache/airflow/issues/39593> or cross-link over there so someone in the community can take it on. If we don't have an owner who will commit to working on it, we park that item until we find the owner. At the end of each call, I would solicit ideas for the agenda for the next call and propose it to the broader group on the mailing list. Some of the items that should be discussed in the upcoming calls IMO: - Agreeing on Principles Based on the discussions, some potential items (all up for debate) - Considering Airflow 3.0 for early adopters and* breaking (and removing) things for AF 3.0*. Things can be re-added as needed in upcoming minor releases - Optimize to get *foundational pieces in* and not "let perfect be the enemy of good" - Working on features that solidify Airflow as the* modern Orchestrator* that also has state of the art *support for Data, AI & ML workloads*. This includes scalability & performance discussion - Set up the codebase for the next 5 years. This encompasses all the things we are discussing e.g removing MySQL to reduce the test matrix, simplifying things architecturally, consolidating serialization methods, etc - Workstream & Stream Owners - Airflow 2 support policy including scope (feature vs bug fixes + security only) & support period - Separate discussions for each big workstream including one for items to remove & refactor (e.g dropping MySQL) - Discussion to streamline the development of Airflow 3 - Separating dev for Providers & Airflow (something Jarek already kick-started), and - Separate branch for Airflow 2 - CI changes for the above - Finalize Scope + Timelines - Migration Utilities - Progress check-ins Looking forward to the exciting months ahead. Regards, Kaxil On Mon, 13 May 2024 at 21:40, Bolke de Bruin <bdbr...@gmail.com> wrote: > Declaring connections prior to task execution was already proposed in AIP-1 > :-). At that time, I had in mind to communicate over IPC to the task the > required settings. Registration could then happen with a manifest. Maybe > during DAG serialization this could be obtained unobtrusively? The benefit > is that tasks become truly atomic or independent from Airflow as long as > they communicate their exit codes (success, failed, and I think Ash had a > couple of others in mind - the fewer the better). > > If you want two-way communication, maybe for variables as they can change > during scheduling, this can happen with AIP-44. Although, I'd prefer it to > happen with the *executor* rather than some centralized service. If the > executor is used, IPC is the logical choice. The benefit of this is that > you have better resiliency and you can start to think about no downtime > upgrades > > So I hope Ash takes this to 2024 :-). > > B. > > > On Mon, 13 May 2024 at 19:27, Ash Berlin-Taylor <a...@apache.org> wrote: > > > > That would require some mechanism of declaring prior to task execution > > what connections would be used > > > > That’s exactly what I’m proposing in the proposal doc I’m working on > (It’s > > part of also overhauling and re-designing the “Task Execution interface” > > that also gives us the ability to nicely have support for running tasks > in > > other languages — much more than just BashOperator) > > > > This is a bit of a fundamental shift in thinking about task execution in > > Airflow, but I think it gives us some really nice properties that the > > project is currently missing. > > > > Tl;dr; lets discuss this in my doc when it comes our (next week most > > likely) please :) > > > > -ash > > > > > On 13 May 2024, at 18:15, Daniel Standish > > <daniel.stand...@astronomer.io.INVALID> wrote: > > > > > > re > > > > > > As tasks require connection access, I assume connection data will > somehow > > >> be passed as part of the > > >> metadata to task execution - whether it's part of the executor > protocol > > or > > >> in some other way (I'm > > >> not an expert on that part of Airflow). Then, provided it's accessible > > as > > >> part of some execution > > >> context, and not only passed to the task's execute method, OpenLineage > > >> could utilize it. > > >> > > > > > > It's not strictly necessary that connection info be passed "as part of > > task > > > matadata". That would require some mechanism of declaring prior to > task > > > execution what connections would be used. This is a thought that has > > come > > > up when thinking about execution of non-python tasks. But it's not > > > required from a technical perspective by AIP-44 because the > > > `get_connection` function can be made to be an RPC call so a task could > > > continue to retrieve connections at runtime. > > > > > > -- > > -- > Bolke de Bruin > bdbr...@gmail.com >