Super-excited about that.

Question/Proposal: Can we have it possible to have two (or maybe three -
like a sub-committee) co-owners of topics? I think it's a lot to put on
one's head to "own" a topic and given circumstances/ volunteer time of
people, interruptions (and life intervening), it might be a bit risky to
put it on one's shoulders only.

I know it's against the rule ("if it is owned by many, it's not owned by
anyone") - but I think in our case there are at least some topics that
could benefit from having more than one owner. Especially when we know and
trust that we can work together on some topics that we are passionate
about. It might also encourage getting out of people's comfort zones.

For example - I'd absolutely love to volunteer to co-own the "streamline
the development" with Andrey if he would be willing to of course :D (sorry
Andrey for "volunteering you" on that one :D) - and maybe we could get
someone else to join us.

That might have the added benefit of being able to break with the way
we've been doing things. If I am owning it for one - I'd likely gravitate
towards past choices, but with others joining me and taking decisions (and
responsibility in making sure we implement them) together, we could make
better decisions and reduce bus factor for dev tooling/ CI in the future.

BTW.  Shameless promotion: tomorrow I am giving a talk about that very
topic (in the context of last few years not yet Airflow 3.0) at the NY
meetup hosted at Astronomer NY headquarters
https://www.meetup.com/nyc-apache-airflow-meetup/events/300017228/ - so if
you are in NY or around - I think you can stil sign up :D. I am also
getting to PyCon US in Pittsburgh next week so don't expect too much from
me. I will be gearing up for streamlining the development by talking to the
right people and listening to the latest things and best practices of the
larger Python community :).

J.

On Tue, May 14, 2024 at 12:03 AM Kaxil Naik <kaxiln...@gmail.com> wrote:

> Thank you all, I am very happy about the discussions.
>
> The mailing list moves fast :). The main reason I recommended starting the
> dev calls in early June was to have some of these discussions on the
> mailing list.
>
> Since Michal already scheduled a call, let's start there to discuss
> various ideas. For the week after that, I have created an Airflow 2-style
> recurring open dev calls for anyone to join, info below:
>
> *Date & Time: *Recurring every 2 weeks on Thursday at* 4pm BST *( 3 PM
> GMT/UTC | 11 AM EST | 8 AM PST); starting* May 30, 2024 04:00 PM BST* and
> then
> *One-time registration Link*:
>
> https://astronomer.zoom.us/meeting/register/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG
> *Add to your calendar*:
>
> https://astronomer.zoom.us/meeting/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG/calendar/google/add
>
> I will post the meeting notes on the dev mailing list as well as Confluence
> for archival purposes (example
> <https://cwiki.apache.org/confluence/display/AIRFLOW/Meeting+Notes>).
>
> Once we discuss various proposals next week, I recommend that for each
> "workstream", we have an owner who would want to lead that workstream. For
> items, that does not have an owner we can put those into Airflow 3 Meta
> issue <https://github.com/apache/airflow/issues/39593> or cross-link over
> there so someone in the community can take it on. If we don't have an owner
> who will commit to working on it, we park that item until we find the
> owner.
>
> At the end of each call, I would solicit ideas for the agenda for the next
> call and propose it to the broader group on the mailing list.
>
> Some of the items that should be discussed in the upcoming calls IMO:
>
>    - Agreeing on Principles
>
>    Based on the discussions, some potential items (all up for debate)
>       - Considering Airflow 3.0 for early adopters and* breaking (and
>       removing) things for AF 3.0*. Things can be re-added as needed in
>       upcoming minor releases
>       - Optimize to get *foundational pieces in* and not "let perfect be
>       the enemy of good"
>       - Working on features that solidify Airflow as the* modern
>       Orchestrator* that also has state of the art *support for Data, AI &
>       ML workloads*. This includes scalability & performance discussion
>       - Set up the codebase for the next 5 years. This encompasses all the
>       things we are discussing e.g removing MySQL to reduce the test
> matrix,
>       simplifying things architecturally, consolidating serialization
> methods, etc
>
>       - Workstream & Stream Owners
>    - Airflow 2 support policy including scope (feature vs bug fixes +
>    security only) & support period
>    - Separate discussions for each big workstream including one for items
>    to remove & refactor (e.g dropping MySQL)
>    - Discussion to streamline the development of Airflow 3
>       - Separating dev for Providers & Airflow (something Jarek already
>       kick-started), and
>       - Separate branch for Airflow 2
>       - CI changes for the above
>    - Finalize Scope + Timelines
>    - Migration Utilities
>    - Progress check-ins
>
> Looking forward to the exciting months ahead.
>
> Regards,
> Kaxil
>
> On Mon, 13 May 2024 at 21:40, Bolke de Bruin <bdbr...@gmail.com> wrote:
>
> > Declaring connections prior to task execution was already proposed in
> AIP-1
> > :-). At that time, I had in mind to communicate over IPC to the task the
> > required settings. Registration could then happen with a manifest. Maybe
> > during DAG serialization this could be obtained unobtrusively? The
> benefit
> > is that tasks become truly atomic or independent from Airflow as long as
> > they communicate their exit codes (success, failed, and I think Ash had a
> > couple of others in mind - the fewer the better).
> >
> > If you want two-way communication, maybe for variables as they can change
> > during scheduling, this can happen with AIP-44. Although, I'd prefer it
> to
> > happen with the *executor* rather than some centralized service. If the
> > executor is used, IPC is the logical choice. The benefit of this is that
> > you have better resiliency and you can start to think about no downtime
> > upgrades
> >
> > So I hope Ash takes this to 2024 :-).
> >
> > B.
> >
> >
> > On Mon, 13 May 2024 at 19:27, Ash Berlin-Taylor <a...@apache.org> wrote:
> >
> > > > That would require some mechanism of declaring prior to task
> execution
> > > what connections would be used
> > >
> > > That’s exactly what I’m proposing in the proposal doc I’m working on
> > (It’s
> > > part of also overhauling and re-designing the “Task Execution
> interface”
> > > that also gives us the ability to nicely have support for running tasks
> > in
> > > other languages — much more than just BashOperator)
> > >
> > > This is a bit of a fundamental shift in thinking about task execution
> in
> > > Airflow, but I think it gives us some really nice properties that the
> > > project is currently missing.
> > >
> > > Tl;dr; lets discuss this in my doc when it comes our (next week most
> > > likely) please :)
> > >
> > > -ash
> > >
> > > > On 13 May 2024, at 18:15, Daniel Standish
> > > <daniel.stand...@astronomer.io.INVALID> wrote:
> > > >
> > > > re
> > > >
> > > > As tasks require connection access, I assume connection data will
> > somehow
> > > >> be passed as part of the
> > > >> metadata to task execution - whether it's part of the executor
> > protocol
> > > or
> > > >> in some other way (I'm
> > > >> not an expert on that part of Airflow). Then, provided it's
> accessible
> > > as
> > > >> part of some execution
> > > >> context, and not only passed to the task's execute method,
> OpenLineage
> > > >> could utilize it.
> > > >>
> > > >
> > > > It's not strictly necessary that connection info be passed "as part
> of
> > > task
> > > > matadata".  That would require some mechanism of declaring prior to
> > task
> > > > execution what connections would be used.  This is a thought that has
> > > come
> > > > up when thinking about execution of non-python tasks.  But it's not
> > > > required from a technical perspective by AIP-44 because the
> > > > `get_connection` function can be made to be an RPC call so a task
> could
> > > > continue to retrieve connections at runtime.
> > >
> > >
> >
> > --
> >
> > --
> > Bolke de Bruin
> > bdbr...@gmail.com
> >
>

Reply via email to