mike bayer <mike...@zzzcomputing.com> added the comment:

hey there,

I seem to have two cents to offer so here it is.    An obscure issue in the 
Python bug tracker is probably not the right place for this so consider this as 
an early draft of something that maybe I'll talk about more elsewhere.

> This basically divides code into two islands - async and non-async

yes, this is the problem, and at the bottom of this apparently somewhat ranty 
comment is a solution, and the good news is that it does not require Python or 
asyncio be modified.  My concern is kind of around how it is that everyone has 
been OK with the current state of affairs for so long, why it is that "asyncio 
is fundamentally incompatible with library X" is considered to be acceptable, 
and also how easy it was to find a workaround, this is not something I would 
have expected to come up with.  Kind of like you don't expect to invent Velcro 
or windshield wipers.

asyncio's approach is what those of us in the library/framework community call 
"explicit async", you have to mark functions that will be doing IO and the 
points at which IO occurs must also be marked.    Long ago it was via callback 
functions, then asyncio turned it into decorators and yields, and finally 
pep492 turned it into async/await, and it is very nicely done.  It is of course 
a feature of asyncio that writing out async/await means your code can in theory 
be clearer as to where IO occurs and all that, and while I don't totally buy 
that myself, I'm of course in favor of that style of coding being available, it 
definitely has its own kind of self-satisfaction built in when you do it.  
That's all great.

But as those of us in the library/framework community also know, asyncio's 
approach essentially means, libraries like Flask, Django, my own SQLAlchemy, 
etc. are all automatically "non-workable" with the asyncio approach; while 
these libraries can certainly have asyncio endpoints added to them, the task as 
designed is not that simple, since to go from an asyncio endpoint all the way 
through library code that doesn't care about async and then down into a 
networking library that again has asyncio endpoints, the publishing of "async" 
and the "await" or yield approach must be wired all the way through every 
function and method.  This is all despite that when you're not at the 
endpoints, the points at which IO occurs is fully predictable such that 
libraries like gevent don't need you to write it.   So we are told that 
libraries have to have full end-to-end rewrites of all their code to work this 
way, or otherwise maintain two codebases, or something like that.

The side effect of this is that a whole bunch of library and framework authors 
now get to create all new libraries and frameworks, which do exactly the same 
thing as all the existing libraries and frameworks, except they sprinkle the 
"async/await" keywords throughout middle tiers as required.  Vague claims of 
"framework X is faster because it's async" appear, impossible to confirm as it 
is unknown how much of their performance gains come from the "async" aspect and 
how much of it is that they happened to rewrite a new framework from scratch in 
a completely different way (hint: it's the latter).

Or in other cases, as if to make it obvious how much the "async/await" keywords 
come down to being more or less boilerplate for the "middle" parts of 
libraries, the urllib3 project wrote the "unasync" project [1] so that they can 
simply maintain two separate codebases, one that has "async/await" and  the 
other which just search-and-replaced them out.

SQLAlchemy has not been "replaced" by this trend as asyncio database libraries 
have not really taken off in Python, and there are very few actual async 
drivers.   Some folks have written SQLAlchemy-async libraries that use 
SQLAlchemy's expression system while they have done the tedious, redundant and 
impossible-to-maintain work of replicating enough of SQLAlchemy's execution 
internals such that a modest "sqlalchemy-like" experience with asyncio can be 
reproduced. But these libraries are closed out from all of the fixes and 
improvements that occur to SQLAlchemy itself, as well as that these systems 
likely target a smaller subset of SQLAlchemy's behaviors and features in any 
case.    They certainly can't get the ORM working as the ORM runs lots of SQL 
executions internally, all of which would have to propagate their "asyncness" 
outwards throughout hundreds of functions.

The asyncpg project, one of the few asyncio database drivers that exists, notes 
in its FAQ "asyncpg uses asynchronous execution model and API, which is 
fundamentally incompatible with SQLAlchemy" [2], yet we know this is not true  
because SQLAlchemy works just fine with gevent and eventlet, with no 
architectural changes at all.  Using libraries like SQLAlchemy or Django with a 
non-blocking IO, event-based model is commonplace.   It's the "explicit" part 
of it that is hard, which is because of how asyncio is designed, without any 
mediation for code that doesn't publish "async / await" keywords in the middle.

So I finally just sat down to figure out how to use the underlying greenlet 
library (which we all know as the portable version of "Stackless Python") to 
bridge the gap between asyncio and blocking-style code, it's about 30 lines and 
I have SQLAlchemy working with an async front-end to asyncpg DBAPI as can be 
seen at [3] based on the proof of concept at [4].  I'm actually running the 
full py.test suite all inside the asyncio event loop and running asyncpg 
through SQLAlchemy's whole battery of thousands of tests, all of them written 
in purely blocking style, and there's not any need to add "async / await / 
yield / etc" anywhere except the very endpoints, that is, where the top 
function is called, and then down where we call into asyncpg directly, using a 
function called await_() that works just like the "await" keyword.  Just no 
"async" function declaration.

A day later, someone took the same idea and got Flask to work in an asyncio 
event loop at [5].  The general idea of using greenlet in this way is also 
present at [6], so I won't be patenting this idea today as oremanj can claim 
prior art.

Using greenlet, there is no need to break out of the asyncio event loop at all, 
nor does it change the control flow of parallel coroutines within the loop. It 
uses greenlet's "switch", quite minimally, to bridge the gap between code that 
does not push out an "async/await" yield and code that does.   There are no 
threadpools, no alternate event loops, no monkeypatching, just a few 
greenlet.switch() calls in the right spots.   A slight performance decrease of 
about 15%, but in theory one would only be using asyncio if their application 
is expected to be IO bound in any case (which folks that know me know is 
another assertion I frequently doubt).

So to sum up, last week, libraries like Flask and SQLAlchemy were 
"fundamentally incompatible" with asyncio, and this week they are not.    
What's confusing me is that I'm not that smart and this is something all of the 
affected libraries should have been doing years ago, and really, while I know 
this is not going to happen, this should be *part of asyncio itself* or at 
least a very standard approach so that nobody has to assume asyncio means 
"rewrite all your library code".

To add an extra bonus, you can use this greenlet approach to have 
blocking-style functions right in the middle of your otherwise asyncio 
application.  Which means this also is a potential solution to the 
"lazy-loading" problem.  You have an asyncio app that does lots of asyncio to 
talk to microservices, but some functions are doing database work and they 
really would like to just work in a transaction, load some objects and access 
their attributes without worrying that a SQL statement can't be emitted.  This 
approach makes that possible as well.  ORM lazy loading with the asyncpg 
driver: [7]  .     Indeed, if you have a PostgreSQL SQLAlchemy application 
already written in blocking style, you can use this new extension and drop the 
entire application into the event loop and use the asyncpg driver, not too 
unlike using gevent except nothing is monkeypatched.

The recipe is simple and so far appears to be very effective.   Using greenlet 
to manipulate the stack is of course "spooky" and I would assume Python devs 
may propose that this would lead to hard-to-debug conditions.   I've used 
gevent and eventlet for many years and while they do produce some new issues, 
most of them relate to the fact that they use monkeypatching of existing 
modules and particularly around low level network drivers like pymysql.  The 
actual stack moving around within business logic doesn't seem to produce any 
difficult new issues.   Using plain asyncio has a lot of novel and confusing 
failure modes too.    Using the little bit of "spookyness" of greenlet IMO is a 
lot less work than rewriting SQLAlchemy, Django ORM, Flask, urllib3, etc. from 
scratch and maintaining two codebases though.


[1] https://pypi.org/project/unasync/

[2] 
https://magicstack.github.io/asyncpg/current/faq.html#can-i-use-asyncpg-with-sqlalchemy-orm

[3] https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/2071

[4] https://gist.github.com/zzzeek/4e89ce6226826e7a8df13e1b573ad354

[5] https://twitter.com/miguelgrinberg/status/1279894131976921088

[6] https://github.com/oremanj/greenback

[7] 
https://gerrit.sqlalchemy.org/plugins/gitiles/sqlalchemy/sqlalchemy/+/refs/changes/71/2071/10/examples/asyncio/greenlet_orm.py

----------
nosy: +zzzeek

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue22239>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to