Hi everyone,
great discussion.

I talked with several community members and I think that we should have a
non technical discussion thread before we proceed.
To give an example:
We have several actors: Cluster Admin, Dag Author, UI user. In Airflow 2
perspective all these actors are the same user.
For example: If a user wants advanced scheduling, we will tell him that
Airflow has a solution for this - Timetables. However to deploy timetable
the Dag Author must interact with the Cluster Admin to deploy the plugin of
the timetable. These two actors can be two different persons with two
different agendas/road maps/priorities etc..
This is an example of something that we consider as solved - but is it
really solved?

I think before we dive deep into technical stuff we should first have a
vision of who the Airflow user(s), and what their needs/requirements are.
It should give us better direction, clarity and confidence that we are
developing the right features.
If others think this is a good idea we can maybe form a small team that
works on this doc and share it to be reviewed with the community (happy to
participate on this)

On Mon, May 20, 2024 at 7:02 AM Amogh Desai <amoghdesai....@gmail.com>
wrote:

> I agree with Andrey too on this.
> Thanks & Regards,
> Amogh Desai
>
>
> On Fri, May 17, 2024 at 7:42 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>
> > Agreed on your points @andrey.ans...@taragol.is <
> andrey.ans...@taragol.is>
> >
> > On Fri, 17 May 2024 at 15:01, Andrey Anshin <andrey.ans...@taragol.is>
> > wrote:
> >
> > > IMHO, In case if we decide to keep only Postgres support we need to
> have
> > > really powerful arguments to provide an interface which helps integrate
> > > with other DBs.
> > >
> > > In this case, we must clearly understand what the community is
> > responsible
> > > for in this case and how it can be sure that nothing is broken
> > >
> > > Especially if we take in account the Airflow has very tight
> integrations
> > > with specific Databases, requires a lot of effort to support additional
> > > ones (MS SQL case), and the DB part is not a part of the Public
> Interface
> > > of Airflow [1].
> > >
> > > So I would consider that this should be two separate decisions:
> > > 1. Keep only Postgres (vanila, not forks) as supported/tested backend
> in
> > > Production. SQLite remains as development DB.
> > > 2. Provide public interface to DB integrations between Airflow and DB
> for
> > > third parties
> > >
> > > [1]:
> > >
> > >
> >
> https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html
> > >
> > >
> > > On Tue, 14 May 2024 at 12:15, Kaxil Naik <kaxiln...@gmail.com> wrote:
> > >
> > > > Yeah, that works for me
> > > >
> > > > Can we have it possible to have two (or maybe three -
> > > > > like a sub-committee) co-owners of topics?
> > > >
> > > >
> > > > On Tue, 14 May 2024 at 06:15, Vikram Koka
> <vik...@astronomer.io.invalid
> > >
> > > > wrote:
> > > >
> > > > > Definitely a fast moving thread on the mailing list. I haven’t been
> > > able
> > > > to
> > > > > respond for a few days and feel very far behind already.
> > > > >
> > > > > A few comments on topics discussed the last few days:
> > > > > - Jarek, in response to your comments around being more aggressive
> > than
> > > > in
> > > > > Airflow 2 about deprecation and drops of functionality, I am very
> > > > > supportive of that stance. I completely agree that we could have
> been
> > > > more
> > > > > aggressive as part of Airflow 2.
> > > > > However, I would like to ask that as we go forward, we make sure
> that
> > > we
> > > > > have clean interfaces to be able to add support, even if we choose
> a
> > > > single
> > > > > implementation. For example, with respect to dropping MySQL
> support.
> > I
> > > > can
> > > > > understand the perspective of the project that this should be
> > > deprecated
> > > > > from an Airflow OSS perspective. However, even if the only OSS
> > > supported
> > > > DB
> > > > > is Postgres, I would like to ensure that a clean interface exists
> for
> > > > > interaction with the DB, so that other databases such as MySQL or
> > > others
> > > > > CAN be supported by a third party or at a later date.
> > > > > I realize that this may seem onerous, but I believe that it enables
> > us
> > > to
> > > > > be more flexible in the long run, rather than locking us into a
> > single
> > > DB
> > > > > implementation.
> > > > >
> > > > > - Bolke, Daniel Standish, Ash, et al on the task execution
> contract,
> > > > > definitely looking forward to this.
> > > > >
> > > > > - To those that I proposed a couple of more detailed write ups, I
> > still
> > > > > plan to do that, at the latest by early next week.
> > > > >
> > > > > Vikram
> > > > >
> > > > >
> > > > >
> > > > > On Mon, May 13, 2024 at 9:30 PM Jarek Potiuk <ja...@potiuk.com>
> > wrote:
> > > > >
> > > > > > Super-excited about that.
> > > > > >
> > > > > > Question/Proposal: Can we have it possible to have two (or maybe
> > > three
> > > > -
> > > > > > like a sub-committee) co-owners of topics? I think it's a lot to
> > put
> > > on
> > > > > > one's head to "own" a topic and given circumstances/ volunteer
> time
> > > of
> > > > > > people, interruptions (and life intervening), it might be a bit
> > risky
> > > > to
> > > > > > put it on one's shoulders only.
> > > > > >
> > > > > > I know it's against the rule ("if it is owned by many, it's not
> > owned
> > > > by
> > > > > > anyone") - but I think in our case there are at least some topics
> > > that
> > > > > > could benefit from having more than one owner. Especially when we
> > > know
> > > > > and
> > > > > > trust that we can work together on some topics that we are
> > passionate
> > > > > > about. It might also encourage getting out of people's comfort
> > zones.
> > > > > >
> > > > > > For example - I'd absolutely love to volunteer to co-own the
> > > > "streamline
> > > > > > the development" with Andrey if he would be willing to of course
> :D
> > > > > (sorry
> > > > > > Andrey for "volunteering you" on that one :D) - and maybe we
> could
> > > get
> > > > > > someone else to join us.
> > > > > >
> > > > > > That might have the added benefit of being able to break with the
> > way
> > > > > > we've been doing things. If I am owning it for one - I'd likely
> > > > gravitate
> > > > > > towards past choices, but with others joining me and taking
> > decisions
> > > > > (and
> > > > > > responsibility in making sure we implement them) together, we
> could
> > > > make
> > > > > > better decisions and reduce bus factor for dev tooling/ CI in the
> > > > future.
> > > > > >
> > > > > > BTW.  Shameless promotion: tomorrow I am giving a talk about that
> > > very
> > > > > > topic (in the context of last few years not yet Airflow 3.0) at
> the
> > > NY
> > > > > > meetup hosted at Astronomer NY headquarters
> > > > > >
> https://www.meetup.com/nyc-apache-airflow-meetup/events/300017228/
> > -
> > > > so
> > > > > if
> > > > > > you are in NY or around - I think you can stil sign up :D. I am
> > also
> > > > > > getting to PyCon US in Pittsburgh next week so don't expect too
> > much
> > > > from
> > > > > > me. I will be gearing up for streamlining the development by
> > talking
> > > to
> > > > > the
> > > > > > right people and listening to the latest things and best
> practices
> > of
> > > > the
> > > > > > larger Python community :).
> > > > > >
> > > > > > J.
> > > > > >
> > > > > > On Tue, May 14, 2024 at 12:03 AM Kaxil Naik <kaxiln...@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Thank you all, I am very happy about the discussions.
> > > > > > >
> > > > > > > The mailing list moves fast :). The main reason I recommended
> > > > starting
> > > > > > the
> > > > > > > dev calls in early June was to have some of these discussions
> on
> > > the
> > > > > > > mailing list.
> > > > > > >
> > > > > > > Since Michal already scheduled a call, let's start there to
> > discuss
> > > > > > > various ideas. For the week after that, I have created an
> Airflow
> > > > > 2-style
> > > > > > > recurring open dev calls for anyone to join, info below:
> > > > > > >
> > > > > > > *Date & Time: *Recurring every 2 weeks on Thursday at* 4pm BST
> > *( 3
> > > > PM
> > > > > > > GMT/UTC | 11 AM EST | 8 AM PST); starting* May 30, 2024 04:00
> PM
> > > BST*
> > > > > and
> > > > > > > then
> > > > > > > *One-time registration Link*:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://astronomer.zoom.us/meeting/register/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG
> > > > > > > *Add to your calendar*:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://astronomer.zoom.us/meeting/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG/calendar/google/add
> > > > > > >
> > > > > > > I will post the meeting notes on the dev mailing list as well
> as
> > > > > > Confluence
> > > > > > > for archival purposes (example
> > > > > > > <
> > https://cwiki.apache.org/confluence/display/AIRFLOW/Meeting+Notes
> > > > >).
> > > > > > >
> > > > > > > Once we discuss various proposals next week, I recommend that
> for
> > > > each
> > > > > > > "workstream", we have an owner who would want to lead that
> > > > workstream.
> > > > > > For
> > > > > > > items, that does not have an owner we can put those into
> Airflow
> > 3
> > > > Meta
> > > > > > > issue <https://github.com/apache/airflow/issues/39593> or
> > > cross-link
> > > > > > over
> > > > > > > there so someone in the community can take it on. If we don't
> > have
> > > an
> > > > > > owner
> > > > > > > who will commit to working on it, we park that item until we
> find
> > > the
> > > > > > > owner.
> > > > > > >
> > > > > > > At the end of each call, I would solicit ideas for the agenda
> for
> > > the
> > > > > > next
> > > > > > > call and propose it to the broader group on the mailing list.
> > > > > > >
> > > > > > > Some of the items that should be discussed in the upcoming
> calls
> > > IMO:
> > > > > > >
> > > > > > >    - Agreeing on Principles
> > > > > > >
> > > > > > >    Based on the discussions, some potential items (all up for
> > > debate)
> > > > > > >       - Considering Airflow 3.0 for early adopters and*
> breaking
> > > (and
> > > > > > >       removing) things for AF 3.0*. Things can be re-added as
> > > needed
> > > > in
> > > > > > >       upcoming minor releases
> > > > > > >       - Optimize to get *foundational pieces in* and not "let
> > > perfect
> > > > > be
> > > > > > >       the enemy of good"
> > > > > > >       - Working on features that solidify Airflow as the*
> modern
> > > > > > >       Orchestrator* that also has state of the art *support for
> > > Data,
> > > > > AI
> > > > > > &
> > > > > > >       ML workloads*. This includes scalability & performance
> > > > discussion
> > > > > > >       - Set up the codebase for the next 5 years. This
> > encompasses
> > > > all
> > > > > > the
> > > > > > >       things we are discussing e.g removing MySQL to reduce the
> > > test
> > > > > > > matrix,
> > > > > > >       simplifying things architecturally, consolidating
> > > serialization
> > > > > > > methods, etc
> > > > > > >
> > > > > > >       - Workstream & Stream Owners
> > > > > > >    - Airflow 2 support policy including scope (feature vs bug
> > > fixes +
> > > > > > >    security only) & support period
> > > > > > >    - Separate discussions for each big workstream including one
> > for
> > > > > items
> > > > > > >    to remove & refactor (e.g dropping MySQL)
> > > > > > >    - Discussion to streamline the development of Airflow 3
> > > > > > >       - Separating dev for Providers & Airflow (something Jarek
> > > > already
> > > > > > >       kick-started), and
> > > > > > >       - Separate branch for Airflow 2
> > > > > > >       - CI changes for the above
> > > > > > >    - Finalize Scope + Timelines
> > > > > > >    - Migration Utilities
> > > > > > >    - Progress check-ins
> > > > > > >
> > > > > > > Looking forward to the exciting months ahead.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Kaxil
> > > > > > >
> > > > > > > On Mon, 13 May 2024 at 21:40, Bolke de Bruin <
> bdbr...@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Declaring connections prior to task execution was already
> > > proposed
> > > > in
> > > > > > > AIP-1
> > > > > > > > :-). At that time, I had in mind to communicate over IPC to
> the
> > > > task
> > > > > > the
> > > > > > > > required settings. Registration could then happen with a
> > > manifest.
> > > > > > Maybe
> > > > > > > > during DAG serialization this could be obtained
> unobtrusively?
> > > The
> > > > > > > benefit
> > > > > > > > is that tasks become truly atomic or independent from Airflow
> > as
> > > > long
> > > > > > as
> > > > > > > > they communicate their exit codes (success, failed, and I
> think
> > > Ash
> > > > > > had a
> > > > > > > > couple of others in mind - the fewer the better).
> > > > > > > >
> > > > > > > > If you want two-way communication, maybe for variables as
> they
> > > can
> > > > > > change
> > > > > > > > during scheduling, this can happen with AIP-44. Although, I'd
> > > > prefer
> > > > > it
> > > > > > > to
> > > > > > > > happen with the *executor* rather than some centralized
> > service.
> > > If
> > > > > the
> > > > > > > > executor is used, IPC is the logical choice. The benefit of
> > this
> > > is
> > > > > > that
> > > > > > > > you have better resiliency and you can start to think about
> no
> > > > > downtime
> > > > > > > > upgrades
> > > > > > > >
> > > > > > > > So I hope Ash takes this to 2024 :-).
> > > > > > > >
> > > > > > > > B.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, 13 May 2024 at 19:27, Ash Berlin-Taylor <
> > a...@apache.org>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > > That would require some mechanism of declaring prior to
> > task
> > > > > > > execution
> > > > > > > > > what connections would be used
> > > > > > > > >
> > > > > > > > > That’s exactly what I’m proposing in the proposal doc I’m
> > > working
> > > > > on
> > > > > > > > (It’s
> > > > > > > > > part of also overhauling and re-designing the “Task
> Execution
> > > > > > > interface”
> > > > > > > > > that also gives us the ability to nicely have support for
> > > running
> > > > > > tasks
> > > > > > > > in
> > > > > > > > > other languages — much more than just BashOperator)
> > > > > > > > >
> > > > > > > > > This is a bit of a fundamental shift in thinking about task
> > > > > execution
> > > > > > > in
> > > > > > > > > Airflow, but I think it gives us some really nice
> properties
> > > that
> > > > > the
> > > > > > > > > project is currently missing.
> > > > > > > > >
> > > > > > > > > Tl;dr; lets discuss this in my doc when it comes our (next
> > week
> > > > > most
> > > > > > > > > likely) please :)
> > > > > > > > >
> > > > > > > > > -ash
> > > > > > > > >
> > > > > > > > > > On 13 May 2024, at 18:15, Daniel Standish
> > > > > > > > > <daniel.stand...@astronomer.io.INVALID> wrote:
> > > > > > > > > >
> > > > > > > > > > re
> > > > > > > > > >
> > > > > > > > > > As tasks require connection access, I assume connection
> > data
> > > > will
> > > > > > > > somehow
> > > > > > > > > >> be passed as part of the
> > > > > > > > > >> metadata to task execution - whether it's part of the
> > > executor
> > > > > > > > protocol
> > > > > > > > > or
> > > > > > > > > >> in some other way (I'm
> > > > > > > > > >> not an expert on that part of Airflow). Then, provided
> > it's
> > > > > > > accessible
> > > > > > > > > as
> > > > > > > > > >> part of some execution
> > > > > > > > > >> context, and not only passed to the task's execute
> method,
> > > > > > > OpenLineage
> > > > > > > > > >> could utilize it.
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > > It's not strictly necessary that connection info be
> passed
> > > "as
> > > > > part
> > > > > > > of
> > > > > > > > > task
> > > > > > > > > > matadata".  That would require some mechanism of
> declaring
> > > > prior
> > > > > to
> > > > > > > > task
> > > > > > > > > > execution what connections would be used.  This is a
> > thought
> > > > that
> > > > > > has
> > > > > > > > > come
> > > > > > > > > > up when thinking about execution of non-python tasks.
> But
> > > it's
> > > > > not
> > > > > > > > > > required from a technical perspective by AIP-44 because
> the
> > > > > > > > > > `get_connection` function can be made to be an RPC call
> so
> > a
> > > > task
> > > > > > > could
> > > > > > > > > > continue to retrieve connections at runtime.
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > --
> > > > > > > > Bolke de Bruin
> > > > > > > > bdbr...@gmail.com
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to