I agree with Andrey too on this.
Thanks & Regards,
Amogh Desai

On Fri, May 17, 2024 at 7:42 PM Kaxil Naik <kaxiln...@gmail.com> wrote:

> Agreed on your points @andrey.ans...@taragol.is <andrey.ans...@taragol.is>
>
> On Fri, 17 May 2024 at 15:01, Andrey Anshin <andrey.ans...@taragol.is>
> wrote:
>
> > IMHO, In case if we decide to keep only Postgres support we need to have
> > really powerful arguments to provide an interface which helps integrate
> > with other DBs.
> >
> > In this case, we must clearly understand what the community is
> responsible
> > for in this case and how it can be sure that nothing is broken
> >
> > Especially if we take in account the Airflow has very tight integrations
> > with specific Databases, requires a lot of effort to support additional
> > ones (MS SQL case), and the DB part is not a part of the Public Interface
> > of Airflow [1].
> >
> > So I would consider that this should be two separate decisions:
> > 1. Keep only Postgres (vanila, not forks) as supported/tested backend in
> > Production. SQLite remains as development DB.
> > 2. Provide public interface to DB integrations between Airflow and DB for
> > third parties
> >
> > [1]:
> >
> >
> https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html
> >
> >
> > On Tue, 14 May 2024 at 12:15, Kaxil Naik <kaxiln...@gmail.com> wrote:
> >
> > > Yeah, that works for me
> > >
> > > Can we have it possible to have two (or maybe three -
> > > > like a sub-committee) co-owners of topics?
> > >
> > >
> > > On Tue, 14 May 2024 at 06:15, Vikram Koka <vik...@astronomer.io.invalid
> >
> > > wrote:
> > >
> > > > Definitely a fast moving thread on the mailing list. I haven’t been
> > able
> > > to
> > > > respond for a few days and feel very far behind already.
> > > >
> > > > A few comments on topics discussed the last few days:
> > > > - Jarek, in response to your comments around being more aggressive
> than
> > > in
> > > > Airflow 2 about deprecation and drops of functionality, I am very
> > > > supportive of that stance. I completely agree that we could have been
> > > more
> > > > aggressive as part of Airflow 2.
> > > > However, I would like to ask that as we go forward, we make sure that
> > we
> > > > have clean interfaces to be able to add support, even if we choose a
> > > single
> > > > implementation. For example, with respect to dropping MySQL support.
> I
> > > can
> > > > understand the perspective of the project that this should be
> > deprecated
> > > > from an Airflow OSS perspective. However, even if the only OSS
> > supported
> > > DB
> > > > is Postgres, I would like to ensure that a clean interface exists for
> > > > interaction with the DB, so that other databases such as MySQL or
> > others
> > > > CAN be supported by a third party or at a later date.
> > > > I realize that this may seem onerous, but I believe that it enables
> us
> > to
> > > > be more flexible in the long run, rather than locking us into a
> single
> > DB
> > > > implementation.
> > > >
> > > > - Bolke, Daniel Standish, Ash, et al on the task execution contract,
> > > > definitely looking forward to this.
> > > >
> > > > - To those that I proposed a couple of more detailed write ups, I
> still
> > > > plan to do that, at the latest by early next week.
> > > >
> > > > Vikram
> > > >
> > > >
> > > >
> > > > On Mon, May 13, 2024 at 9:30 PM Jarek Potiuk <ja...@potiuk.com>
> wrote:
> > > >
> > > > > Super-excited about that.
> > > > >
> > > > > Question/Proposal: Can we have it possible to have two (or maybe
> > three
> > > -
> > > > > like a sub-committee) co-owners of topics? I think it's a lot to
> put
> > on
> > > > > one's head to "own" a topic and given circumstances/ volunteer time
> > of
> > > > > people, interruptions (and life intervening), it might be a bit
> risky
> > > to
> > > > > put it on one's shoulders only.
> > > > >
> > > > > I know it's against the rule ("if it is owned by many, it's not
> owned
> > > by
> > > > > anyone") - but I think in our case there are at least some topics
> > that
> > > > > could benefit from having more than one owner. Especially when we
> > know
> > > > and
> > > > > trust that we can work together on some topics that we are
> passionate
> > > > > about. It might also encourage getting out of people's comfort
> zones.
> > > > >
> > > > > For example - I'd absolutely love to volunteer to co-own the
> > > "streamline
> > > > > the development" with Andrey if he would be willing to of course :D
> > > > (sorry
> > > > > Andrey for "volunteering you" on that one :D) - and maybe we could
> > get
> > > > > someone else to join us.
> > > > >
> > > > > That might have the added benefit of being able to break with the
> way
> > > > > we've been doing things. If I am owning it for one - I'd likely
> > > gravitate
> > > > > towards past choices, but with others joining me and taking
> decisions
> > > > (and
> > > > > responsibility in making sure we implement them) together, we could
> > > make
> > > > > better decisions and reduce bus factor for dev tooling/ CI in the
> > > future.
> > > > >
> > > > > BTW.  Shameless promotion: tomorrow I am giving a talk about that
> > very
> > > > > topic (in the context of last few years not yet Airflow 3.0) at the
> > NY
> > > > > meetup hosted at Astronomer NY headquarters
> > > > > https://www.meetup.com/nyc-apache-airflow-meetup/events/300017228/
> -
> > > so
> > > > if
> > > > > you are in NY or around - I think you can stil sign up :D. I am
> also
> > > > > getting to PyCon US in Pittsburgh next week so don't expect too
> much
> > > from
> > > > > me. I will be gearing up for streamlining the development by
> talking
> > to
> > > > the
> > > > > right people and listening to the latest things and best practices
> of
> > > the
> > > > > larger Python community :).
> > > > >
> > > > > J.
> > > > >
> > > > > On Tue, May 14, 2024 at 12:03 AM Kaxil Naik <kaxiln...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Thank you all, I am very happy about the discussions.
> > > > > >
> > > > > > The mailing list moves fast :). The main reason I recommended
> > > starting
> > > > > the
> > > > > > dev calls in early June was to have some of these discussions on
> > the
> > > > > > mailing list.
> > > > > >
> > > > > > Since Michal already scheduled a call, let's start there to
> discuss
> > > > > > various ideas. For the week after that, I have created an Airflow
> > > > 2-style
> > > > > > recurring open dev calls for anyone to join, info below:
> > > > > >
> > > > > > *Date & Time: *Recurring every 2 weeks on Thursday at* 4pm BST
> *( 3
> > > PM
> > > > > > GMT/UTC | 11 AM EST | 8 AM PST); starting* May 30, 2024 04:00 PM
> > BST*
> > > > and
> > > > > > then
> > > > > > *One-time registration Link*:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://astronomer.zoom.us/meeting/register/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG
> > > > > > *Add to your calendar*:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://astronomer.zoom.us/meeting/tZAsde2vqDwpE9XrBAbCeIFHA_l7OLywrWkG/calendar/google/add
> > > > > >
> > > > > > I will post the meeting notes on the dev mailing list as well as
> > > > > Confluence
> > > > > > for archival purposes (example
> > > > > > <
> https://cwiki.apache.org/confluence/display/AIRFLOW/Meeting+Notes
> > > >).
> > > > > >
> > > > > > Once we discuss various proposals next week, I recommend that for
> > > each
> > > > > > "workstream", we have an owner who would want to lead that
> > > workstream.
> > > > > For
> > > > > > items, that does not have an owner we can put those into Airflow
> 3
> > > Meta
> > > > > > issue <https://github.com/apache/airflow/issues/39593> or
> > cross-link
> > > > > over
> > > > > > there so someone in the community can take it on. If we don't
> have
> > an
> > > > > owner
> > > > > > who will commit to working on it, we park that item until we find
> > the
> > > > > > owner.
> > > > > >
> > > > > > At the end of each call, I would solicit ideas for the agenda for
> > the
> > > > > next
> > > > > > call and propose it to the broader group on the mailing list.
> > > > > >
> > > > > > Some of the items that should be discussed in the upcoming calls
> > IMO:
> > > > > >
> > > > > >    - Agreeing on Principles
> > > > > >
> > > > > >    Based on the discussions, some potential items (all up for
> > debate)
> > > > > >       - Considering Airflow 3.0 for early adopters and* breaking
> > (and
> > > > > >       removing) things for AF 3.0*. Things can be re-added as
> > needed
> > > in
> > > > > >       upcoming minor releases
> > > > > >       - Optimize to get *foundational pieces in* and not "let
> > perfect
> > > > be
> > > > > >       the enemy of good"
> > > > > >       - Working on features that solidify Airflow as the* modern
> > > > > >       Orchestrator* that also has state of the art *support for
> > Data,
> > > > AI
> > > > > &
> > > > > >       ML workloads*. This includes scalability & performance
> > > discussion
> > > > > >       - Set up the codebase for the next 5 years. This
> encompasses
> > > all
> > > > > the
> > > > > >       things we are discussing e.g removing MySQL to reduce the
> > test
> > > > > > matrix,
> > > > > >       simplifying things architecturally, consolidating
> > serialization
> > > > > > methods, etc
> > > > > >
> > > > > >       - Workstream & Stream Owners
> > > > > >    - Airflow 2 support policy including scope (feature vs bug
> > fixes +
> > > > > >    security only) & support period
> > > > > >    - Separate discussions for each big workstream including one
> for
> > > > items
> > > > > >    to remove & refactor (e.g dropping MySQL)
> > > > > >    - Discussion to streamline the development of Airflow 3
> > > > > >       - Separating dev for Providers & Airflow (something Jarek
> > > already
> > > > > >       kick-started), and
> > > > > >       - Separate branch for Airflow 2
> > > > > >       - CI changes for the above
> > > > > >    - Finalize Scope + Timelines
> > > > > >    - Migration Utilities
> > > > > >    - Progress check-ins
> > > > > >
> > > > > > Looking forward to the exciting months ahead.
> > > > > >
> > > > > > Regards,
> > > > > > Kaxil
> > > > > >
> > > > > > On Mon, 13 May 2024 at 21:40, Bolke de Bruin <bdbr...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Declaring connections prior to task execution was already
> > proposed
> > > in
> > > > > > AIP-1
> > > > > > > :-). At that time, I had in mind to communicate over IPC to the
> > > task
> > > > > the
> > > > > > > required settings. Registration could then happen with a
> > manifest.
> > > > > Maybe
> > > > > > > during DAG serialization this could be obtained unobtrusively?
> > The
> > > > > > benefit
> > > > > > > is that tasks become truly atomic or independent from Airflow
> as
> > > long
> > > > > as
> > > > > > > they communicate their exit codes (success, failed, and I think
> > Ash
> > > > > had a
> > > > > > > couple of others in mind - the fewer the better).
> > > > > > >
> > > > > > > If you want two-way communication, maybe for variables as they
> > can
> > > > > change
> > > > > > > during scheduling, this can happen with AIP-44. Although, I'd
> > > prefer
> > > > it
> > > > > > to
> > > > > > > happen with the *executor* rather than some centralized
> service.
> > If
> > > > the
> > > > > > > executor is used, IPC is the logical choice. The benefit of
> this
> > is
> > > > > that
> > > > > > > you have better resiliency and you can start to think about no
> > > > downtime
> > > > > > > upgrades
> > > > > > >
> > > > > > > So I hope Ash takes this to 2024 :-).
> > > > > > >
> > > > > > > B.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, 13 May 2024 at 19:27, Ash Berlin-Taylor <
> a...@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > > > That would require some mechanism of declaring prior to
> task
> > > > > > execution
> > > > > > > > what connections would be used
> > > > > > > >
> > > > > > > > That’s exactly what I’m proposing in the proposal doc I’m
> > working
> > > > on
> > > > > > > (It’s
> > > > > > > > part of also overhauling and re-designing the “Task Execution
> > > > > > interface”
> > > > > > > > that also gives us the ability to nicely have support for
> > running
> > > > > tasks
> > > > > > > in
> > > > > > > > other languages — much more than just BashOperator)
> > > > > > > >
> > > > > > > > This is a bit of a fundamental shift in thinking about task
> > > > execution
> > > > > > in
> > > > > > > > Airflow, but I think it gives us some really nice properties
> > that
> > > > the
> > > > > > > > project is currently missing.
> > > > > > > >
> > > > > > > > Tl;dr; lets discuss this in my doc when it comes our (next
> week
> > > > most
> > > > > > > > likely) please :)
> > > > > > > >
> > > > > > > > -ash
> > > > > > > >
> > > > > > > > > On 13 May 2024, at 18:15, Daniel Standish
> > > > > > > > <daniel.stand...@astronomer.io.INVALID> wrote:
> > > > > > > > >
> > > > > > > > > re
> > > > > > > > >
> > > > > > > > > As tasks require connection access, I assume connection
> data
> > > will
> > > > > > > somehow
> > > > > > > > >> be passed as part of the
> > > > > > > > >> metadata to task execution - whether it's part of the
> > executor
> > > > > > > protocol
> > > > > > > > or
> > > > > > > > >> in some other way (I'm
> > > > > > > > >> not an expert on that part of Airflow). Then, provided
> it's
> > > > > > accessible
> > > > > > > > as
> > > > > > > > >> part of some execution
> > > > > > > > >> context, and not only passed to the task's execute method,
> > > > > > OpenLineage
> > > > > > > > >> could utilize it.
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > > It's not strictly necessary that connection info be passed
> > "as
> > > > part
> > > > > > of
> > > > > > > > task
> > > > > > > > > matadata".  That would require some mechanism of declaring
> > > prior
> > > > to
> > > > > > > task
> > > > > > > > > execution what connections would be used.  This is a
> thought
> > > that
> > > > > has
> > > > > > > > come
> > > > > > > > > up when thinking about execution of non-python tasks.  But
> > it's
> > > > not
> > > > > > > > > required from a technical perspective by AIP-44 because the
> > > > > > > > > `get_connection` function can be made to be an RPC call so
> a
> > > task
> > > > > > could
> > > > > > > > > continue to retrieve connections at runtime.
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > --
> > > > > > > Bolke de Bruin
> > > > > > > bdbr...@gmail.com
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to