Thank you all, I am very happy about the discussions.

The mailing list moves fast :). The main reason I recommended starting the
dev calls in early June was to have some of these discussions on the
mailing list.

Since Michal already scheduled a call, let's start there to discuss
various ideas. For the week after that, I have created an Airflow 2-style
recurring open dev calls for anyone to join, info below:

*Date & Time: *Recurring every 2 weeks on Thursday at* 4pm BST *( 3 PM
GMT/UTC | 11 AM EST | 8 AM PST); starting* May 30, 2024 04:00 PM BST* and
then
*One-time registration Link*:
https://astronomer.zoom.us/meeting/register/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG
*Add to your calendar*:
https://astronomer.zoom.us/meeting/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG/calendar/google/add

I will post the meeting notes on the dev mailing list as well as Confluence
for archival purposes (example
<https://cwiki.apache.org/confluence/display/AIRFLOW/Meeting+Notes>).

Once we discuss various proposals next week, I recommend that for each
"workstream", we have an owner who would want to lead that workstream. For
items, that does not have an owner we can put those into Airflow 3 Meta
issue <https://github.com/apache/airflow/issues/39593> or cross-link over
there so someone in the community can take it on. If we don't have an owner
who will commit to working on it, we park that item until we find the owner.

At the end of each call, I would solicit ideas for the agenda for the next
call and propose it to the broader group on the mailing list.

Some of the items that should be discussed in the upcoming calls IMO:

   - Agreeing on Principles

   Based on the discussions, some potential items (all up for debate)
      - Considering Airflow 3.0 for early adopters and* breaking (and
      removing) things for AF 3.0*. Things can be re-added as needed in
      upcoming minor releases
      - Optimize to get *foundational pieces in* and not "let perfect be
      the enemy of good"
      - Working on features that solidify Airflow as the* modern
      Orchestrator* that also has state of the art *support for Data, AI &
      ML workloads*. This includes scalability & performance discussion
      - Set up the codebase for the next 5 years. This encompasses all the
      things we are discussing e.g removing MySQL to reduce the test matrix,
      simplifying things architecturally, consolidating serialization
methods, etc

      - Workstream & Stream Owners
   - Airflow 2 support policy including scope (feature vs bug fixes +
   security only) & support period
   - Separate discussions for each big workstream including one for items
   to remove & refactor (e.g dropping MySQL)
   - Discussion to streamline the development of Airflow 3
      - Separating dev for Providers & Airflow (something Jarek already
      kick-started), and
      - Separate branch for Airflow 2
      - CI changes for the above
   - Finalize Scope + Timelines
   - Migration Utilities
   - Progress check-ins

Looking forward to the exciting months ahead.

Regards,
Kaxil

On Mon, 13 May 2024 at 21:40, Bolke de Bruin <bdbr...@gmail.com> wrote:

> Declaring connections prior to task execution was already proposed in AIP-1
> :-). At that time, I had in mind to communicate over IPC to the task the
> required settings. Registration could then happen with a manifest. Maybe
> during DAG serialization this could be obtained unobtrusively? The benefit
> is that tasks become truly atomic or independent from Airflow as long as
> they communicate their exit codes (success, failed, and I think Ash had a
> couple of others in mind - the fewer the better).
>
> If you want two-way communication, maybe for variables as they can change
> during scheduling, this can happen with AIP-44. Although, I'd prefer it to
> happen with the *executor* rather than some centralized service. If the
> executor is used, IPC is the logical choice. The benefit of this is that
> you have better resiliency and you can start to think about no downtime
> upgrades
>
> So I hope Ash takes this to 2024 :-).
>
> B.
>
>
> On Mon, 13 May 2024 at 19:27, Ash Berlin-Taylor <a...@apache.org> wrote:
>
> > > That would require some mechanism of declaring prior to task execution
> > what connections would be used
> >
> > That’s exactly what I’m proposing in the proposal doc I’m working on
> (It’s
> > part of also overhauling and re-designing the “Task Execution interface”
> > that also gives us the ability to nicely have support for running tasks
> in
> > other languages — much more than just BashOperator)
> >
> > This is a bit of a fundamental shift in thinking about task execution in
> > Airflow, but I think it gives us some really nice properties that the
> > project is currently missing.
> >
> > Tl;dr; lets discuss this in my doc when it comes our (next week most
> > likely) please :)
> >
> > -ash
> >
> > > On 13 May 2024, at 18:15, Daniel Standish
> > <daniel.stand...@astronomer.io.INVALID> wrote:
> > >
> > > re
> > >
> > > As tasks require connection access, I assume connection data will
> somehow
> > >> be passed as part of the
> > >> metadata to task execution - whether it's part of the executor
> protocol
> > or
> > >> in some other way (I'm
> > >> not an expert on that part of Airflow). Then, provided it's accessible
> > as
> > >> part of some execution
> > >> context, and not only passed to the task's execute method, OpenLineage
> > >> could utilize it.
> > >>
> > >
> > > It's not strictly necessary that connection info be passed "as part of
> > task
> > > matadata".  That would require some mechanism of declaring prior to
> task
> > > execution what connections would be used.  This is a thought that has
> > come
> > > up when thinking about execution of non-python tasks.  But it's not
> > > required from a technical perspective by AIP-44 because the
> > > `get_connection` function can be made to be an RPC call so a task could
> > > continue to retrieve connections at runtime.
> >
> >
>
> --
>
> --
> Bolke de Bruin
> bdbr...@gmail.com
>

Reply via email to