[issue22239] asyncio: nested event loop

mike bayer Tue, 07 Jul 2020 20:44:54 -0700


mike bayer <mike...@zzzcomputing.com> added the comment:


> Oh, I thought the primary problem for SQLAlchemy supporting async is that the 
> ORM needs to do IO from inside __getattr__ methods. So I assumed that the 
> reason you were so excited about greenlets was that it would let you use 
> await_() from inside those __getattr__ calls, which would involve exposing 
> your use of greenlets as part of your public API.


The primary problem is people want to execute() a SQL statement using await, 
and then they want to use a non-blocking database driver (basically asyncpg, 
I'm not sure there are any others, maybe there's one for MySQL also) on the 
back.    Tools like aiopg have provided partial SQLAlchemy-like front-ends to 
accomplish this but they can't do ORM support, not because the ORM has lazy 
loading, but just to do explicit operations like query.all() or session.flush() 
that can sometimes require a lot of front-to-back database operations to 
complete which would be very involved to rewrite all that code using 
async/await.

Then there's the secondary problem of ORMs doing lazy loading, which is what 
you refer towards as "IO inside __getattr__ methods".   SQLAlchemy is not 
actually as dependent on lazy loading as other ORMs as we support a wide range 
of ways to "eagerly" load data up front.  With the SQLAlchemy 2.0-style ORM API 
that has a clear spot for "await" to occur, they can call "await 
session.execute(select(SomeObject))" and get a whole traversible graph of 
things loaded up front.    We even have a loader called "raiseload" that is 
specifically anti-lazy loading, it's a loader that raises an error if you try 
to access something that wasn't explicitly loaded already.  So for a lot of 
cases we are already there.

But then, towards your example of "something.b = x", or more commonly in ORMS a 
get operation like "something.b" emitting SQL, the extension I'm building will 
very likely include some kind of feature that they can do this with an explicit 
call.  At the moment with the preliminary code that's in there, this might look 
like:

   await greenlet_spawn(getattr, something, "b")

not very pretty at the moment but that general idea.   

But the thing is, greenlet_spawn() can naturally apply to anything.  So it 
remains to be seen both how I would want to present this notion, as well as if 
people are going to be interested in it or not, but as a totally extra thing 
beyond the "await session.execute()" API that is the main thing, someone could 
do something like this:

   await greenlet_spawn(my_business_orm_method)

and then in "my_business_orm_method()", all the blocking style ORM things that 
async advocates warn against could be happening in there.     I'm certainly not 
going to tell people they have to be doing that, but I dont think I should 
discourage it either, because if the above business method is written 
"reasonably" (see next paragraph), there really is no problem introduced by 
implicit IO.

By "written reasonably" I'm referring to the fact that in this whole situation, 
90% of everything people are doing here are in the context of HTTP services.   
The problem of, "something.a now creates state that other tasks might see" is 
not a real "problem" that is solved by using IO-only explicit context 
switching.  This is because in a CRUD-style application, "something" is not 
going to be a process-local yet thread-global object that had to be created 
specifically for the application (there's things like the database connection 
pool and some registries that the ORM uses, but those problems are taken care 
of and aren't specific to one particular application).     There is certainly 
going to be global mutable state with the CRUD/HTTP application which is the 
database itself.  Event based programming doesn't save you from concurrency 
issues here because any number of processes maybe accessing the database at the 
same time.  There are well-established concurrency patterns one uses wit
 h relational databases, which include first and foremost transaction 
isolation, but also things like compare-and-swap, "select for update", ensuring 
MVCC is turned on (SQL Server), table locks, etc.  These techniques are 
independent of the concurrency pattern used within the application, and they 
are arguably better suited to blocking-style code in any case because on the 
database side we must emit our commands within a transaction serially in any 
case.   The major convenient point of "async" that we can fire off a bunch of 
web service requests in parallel does not apply to the CRUD-style business 
methods within our web service request because we can only do things in our 
ACID transaction one at a time.

The problem of "something.a" emitting IO needs to be made sane against other 
processes also viewing or altering "something.a", assuming "something" is a 
database-bound object like a row in a table, using traditional database 
concurrency constructs such as choosing an appropriate isolation mode, using 
atomically-composed SQL statements, things like that.   The problem of two 
greenlets or coroutines seeing "something" before it's been fully altered would 
happen across two processes in any case, but if "something" is a database row, 
that second greenlet would not see "something.a / something.b" in mid-flight 
because the isolation level is going to be at least "read committed".

In the realm of Python HTTP/CRUD applications, async is actually very popular 
however it is in the form of gevent and sometimes eventlet monkeypatching, 
often because people are using async web servers like gunicorn.    I don't see 
much explicit async at all because as mentioned before, there are very few 
async database drivers and there are also very few async database abstraction 
layers.   I've sort of made a side business at work out of helping people with 
the problems of gevent-enabled HTTP services.  There are two problems that I 
see: the main one is that they configure their workers for 1000 greenlets, they 
set their database connection pool to only allow 20 database connections, and 
then their processes get totally hung as all the requests pile up in one 
process that is advertising that it still has 980 more requests it can service. 
 The other one is that their application is completely CPU bound, and sometimes 
so badly that we see database timeouts because their greenlets can
 't respond to a database ping or authentication challenge within 30 seconds.   
I have never seen any issues related to the fact that IO is implicit or that 
lazy loading confused someone.    Maybe this is a thing if they had some kind 
of microservice-parallel HTTP request spawning monster of some kind but we 
don't have that kind of thing in CRUD applications.

The two aforementioned problems with too many greenlets or coroutines vs. what 
their application can actually handle would occur just as much with an explicit 
async driver, and that's fine, I know how to debug these cases.  But in any 
case, people are already writing huge CRUD apps that run under gevent.   To my 
secondary idea that someone can run their app using asyncio and then on an *as 
needed* basis put some more CRUD-like methods into greenlets with blocking 
style code, this is an *improvement* over the current state of affairs where 
everything everywhere is implicit IO.  Not only that, but they can do this 
already common programming style and interact with a database driver that is 
*designed for async*.   Right now everyone uses pymysql because it is pure 
Python and therefore can have all the socket / IO related code monkeypatched by 
gevent.  It's bad.  Whether or not one thinks writing HTTP services using 
greenlets is a good idea or not, it is definitely better to do it using 
 a database driver that is designed for async talking to the database without 
doing any monkeypatching.  My approach makes this possible where it has 
previously not been possible at all, so I think this represents a big 
improvement to an already popular programming pattern while at the same time 
introduces the notion of a single application using both explicit and implicit 
approaches simultaneously.

I think the notion that someone who really wants to use async/await in order to 
carefully schedule how they communicate with other web services and resources 
which often need to be loaded in parallel, but then for their transactional 
CRUD code which is necessarily serial in any case they can write those parts in 
blocking style, is a good thing.    This style of code is already prevalent and 
here we'd be giving an application the ability to use both styles 
simultaneously.   I had always hoped that Python's move towards asyncio would 
allow this programming paradigm to flourish as it seems inherently useful.  


> If you're just talking about using greenlets internally and then writing both 
> sync and async shims to be your public API, then obviously that reduces the 
> risks. Maybe greenlets will cause you problems, maybe not, but either way you 
> know what you're getting into and the decision only affects you :-). But, if 
> that's all you're using them for, then I'm not sure that they have a 
> significant advantage over the edgedb-style synchronous wrapper or the 
> unasync-style automatically generated sync code.> 

w.r.t the issue of writing everything as async and then using the coroutine 
primitives to convert to "sync" as means of maintaining both facades, I don't 
think that covers the fact that most DBAPI drivers are sync only (and not 
monkeypatchable either, but I think we all agree here that monkeypatching is 
terrible in any case), and to suit the much more common use case of sync front 
end -> agnostic middle -> sync driver, to go from an async event loop to a 
blocking IO database driver you need to use a thread executor of some kind.    
The other way around, that the library code is written in "sync" and you can 
attach "async" to both ends of it using greenlets in the middle, much more 
lightweight of a transition in that direction, vs. the transition of async 
internals out to a sync only driver.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue22239>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22239] asyncio: nested event loop

Reply via email to