> we definitely need a way to opt-out for the ones who aren't interested

I disagree, what I propose is to infer the async connection from the sync
configuration using a translation method, with the possibility of providing
the async connection configuration explicitly. This will help to completely
migrate the REST API and web server to the async version without
duplicating the code.

> We should have a seamless fallback to sync if async doesn't work for
whatever reasons

For the async version of connections/variables, we will use the sync method
wrapped by sync_to_async in the base class, in this case, the async methods
will work in the custom secrets backends without any issues and users can
override the async methods for better implementation.

> are we limiting the scope to lets say connections + variables and
expanding based on the results in the long term?

This needs to be implemented step by step, the first step is to add
integration to the different providers and DB, then implement an async
version for the secrets backends, then migrate the REST API and web server,
and later migrate our official executors, which will need also integrating
other tools like kubernetes-asyncio, and async integration for celery.

> I think this needs to be an all or nothing thing

Here are some of the available drivers
https://github.com/apache/airflow/pull/36504#issuecomment-1872653755, I
have already tested one for each database, so we will have async support
for all supported databases.

> having to maintain sync and async versions of functions/features is a
non-starter in my mind;

During the migration, we will have both sync and async endpoints in the API
and the webserver (they will be migrated one by one and not at the same
time), but without any code duplication, in the worst case, instead of
duplicating a method, we can use sync_to_async, and optimize it later
after migrating all endpoints that use it.
But for Secrets Backends, we may have some duplicated code when it is not
possible to export it to a common method shared between sync and async
versions.

> how can we keep one codenase bit cooe with sqlite?

For my PoC, I used https://github.com/omnilib/aiosqlite and it worked
without any issues.



On Mon, Apr 8, 2024 at 10:08 PM Daniel Standish
<daniel.stand...@astronomer.io.invalid> wrote:

> If nothing else, write an ugly adapter using sync_to_async?
>
> On Mon, Apr 8, 2024 at 1:06 PM Daniel Standish <
> daniel.stand...@astronomer.io> wrote:
>
> > https://github.com/omnilib/aiosqlite maybe?
> >
> > On Mon, Apr 8, 2024 at 1:03 PM Scheffler Jens (XC-AS/EAE-ADA-T)
> > <jens.scheff...@de.bosch.com.invalid> wrote:
> >
> >> I understand the „all-in“ approach as we were dropping MSSQL… how can we
> >> keep one codenase bit cooe with sqlite? I assume we must support this at
> >> least for dev setups.
> >>
> >> Sent from Outlook for iOS<https://aka.ms/o0ukef>
> >> ________________________________
> >> From: Jarek Potiuk <ja...@potiuk.com>
> >> Sent: Monday, April 8, 2024 8:30:18 PM
> >> To: dev@airflow.apache.org <dev@airflow.apache.org>
> >> Subject: Re: [DISCUSS] Asynchronous SQLAlchemy
> >>
> >> Yep. If we can make both Postgres and MySQL work with Async - I am also
> >> all
> >> for the "All" approach. If it means that we need to support only certain
> >> drivers and certain versions of the DBs - so be it. As mentioned in my
> >> original comments (long time ago when we had MSSQL support) - this was
> not
> >> really possible back then - but now, by getting rid of Mssql and if we
> >> have
> >> the right drivers for mysql, it should be possible - I guess.
> >>
> >> On Mon, Apr 8, 2024 at 8:17 PM Daniel Standish
> >> <daniel.stand...@astronomer.io.invalid> wrote:
> >>
> >> > I wholeheartedly agree with Ash that it should be all or nothing.  And
> >> > *all* sounds
> >> > better to me :)
> >> >
> >> >
> >> >
> >> > On Mon, Apr 8, 2024 at 10:54 AM Ash Berlin-Taylor <a...@apache.org>
> >> wrote:
> >> >
> >> > > I’m all in favour of async SQLAlchemy. We’ve built two products
> >> > > exclusively at @ Astronomer that use sqlalchemy+psycopg3+async and
> >> love
> >> > it.
> >> > > Async does take a bit of a learning curve, but SQLA has done it
> nicely
> >> > and
> >> > > it works really well.
> >> > >
> >> > > I think this needs to be an all or nothing thing — having to
> maintain
> >> > sync
> >> > > and async versions of functions/features is a non-starter in my
> mind;
> >> > it’d
> >> > > just be a worryingly large amount of duplicated work. Given the only
> >> DBs
> >> > we
> >> > > support now is postgres and mysql then I can’t think of any reason
> >> users
> >> > > should even care — they give it a DSN and that’s the end of their
> >> > > involvement.
> >> > >
> >> > > Amogh: I don’t understand what you mean by point 3 below.
> >> > >
> >> > > -ash
> >> > >
> >> > > > On 8 Apr 2024, at 05:31, Amogh Desai <amoghdesai....@gmail.com>
> >> wrote:
> >> > > >
> >> > > > I checked the content and the PR that you attached.
> >> > > >
> >> > > > The results do seem promising and I like the general idea of this
> >> > > approach.
> >> > > > But as Jarek
> >> > > > also mentioned on the PR:
> >> > > >
> >> > > > 1. Not everyone might be on the board to go all async due to
> certain
> >> > > > limitations around
> >> > > > access to the drivers, or corporate limitations. So, we definitely
> >> > need a
> >> > > > way to opt-out
> >> > > > for the ones who aren't interested.
> >> > > >
> >> > > > 2. We should have a seamless fallback to sync if async doesn't
> work
> >> for
> >> > > > whatever reasons.
> >> > > >
> >> > > > 3. Are we going all in or are we limiting the scope to lets say
> >> > > > connections + variables and expanding
> >> > > > based on the results in the long term?
> >> > > >
> >> > > > Looking forward to improvements async can bring in!
> >> > > >
> >> > > > Thanks & Regards,
> >> > > > Amogh Desai
> >> > > >
> >> > > >
> >> > > > On Sun, Apr 7, 2024 at 3:13 AM Hussein Awala <huss...@awala.fr>
> >> wrote:
> >> > > >
> >> > > >> The Metadata Database is the brain of Airflow, where all
> scheduling
> >> > > >> decisions, cross-communication, synchronization between
> components,
> >> > and
> >> > > >> management via the web server, are made using this database.
> >> > > >>
> >> > > >> One option to optimize the DB queries is to merge many into a
> >> single
> >> > > query
> >> > > >> to reduce latency and overall time, but this is not always
> possible
> >> > > because
> >> > > >> the queries are sometimes completely independent, and it is
> >> > > impossible/too
> >> > > >> complicated to merge them. But in this case, we have another
> option
> >> > > which
> >> > > >> is running them concurrently since they are independent. The only
> >> way
> >> > > to do
> >> > > >> this currently is to use multithreading (the sync_to_async
> >> decorator
> >> > > >> creates a thread and waits for it using an asyncio coroutine),
> >> which
> >> > is
> >> > > >> already a good start, but by using the asyncio extension for
> >> > sqlalchemy
> >> > > we
> >> > > >> will be able to create thousands of lightweight coroutines with
> the
> >> > same
> >> > > >> amount of resources as a few threads, which will also help to
> >> reduce
> >> > > >> resources consumption.
> >> > > >>
> >> > > >> A few months ago I started a PoC to add support for this
> extension
> >> and
> >> > > >> implement an asynchronous version of connections and variables to
> >> be
> >> > > able
> >> > > >> to get/set them from triggers without blocking the event loop and
> >> > > affecting
> >> > > >> the performance of the triggerer, and the result was impressive (
> >> > > >>
> >>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fpull%2F36504&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C453e379d0e284252391708dc57f9fd87%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638481978411968167%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=LfaqvpAcffa830qqp1fLdsbKuVkpgqsGOSt%2FrnQL2Wk%3D&reserved=0
> >> )<https://github.com/apache/airflow/pull/36504>.
> >> > > >>
> >> > > >> I see a good opportunity to improve the performance of our REST
> API
> >> > and
> >> > > web
> >> > > >> server (for example
> >>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fissues%2F38776&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C453e379d0e284252391708dc57f9fd87%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638481978411976153%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=FDHZsKZ6d5%2BTrfI43pVY%2BSHJ2RsMW93MpxIqidhlSoE%3D&reserved=0
> >> )<https://github.com/apache/airflow/issues/38776>,
> >> > > >> knowing that we can mix sync and async endpoints, which will help
> >> for
> >> > a
> >> > > >> smooth migration.
> >> > > >>
> >> > > >> I also think that it will be possible (and very useful) to
> migrate
> >> > some
> >> > > of
> >> > > >> our executors to a full asynchronous version to improve their
> >> > > performance
> >> > > >> (kubernetes and celery)
> >> > > >>
> >> > > >> I use the sqlalchemy asyncio extension in many personal and
> company
> >> > > >> projects, and I'm very happy with it, but I would like to hear
> from
> >> > > others
> >> > > >> if they have any positive or negative feedback about it.
> >> > > >>
> >> > > >> I will create a new AIP for integrating the asyncio extension of
> >> > > >> sqlaclhemy, and other following AIPs to migrate/support each
> >> component
> >> > > once
> >> > > >> the first one is implemented, but first, I prefer to check what
> the
> >> > > >> community and other committers think about this integration.
> >> > > >>
> >> > >
> >> > >
> >> > >
> ---------------------------------------------------------------------
> >> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> > > For additional commands, e-mail: dev-h...@airflow.apache.org
> >> > >
> >> > >
> >> >
> >>
> >
>

Reply via email to