Re: [Async-sig] async/sync library reusage

Cory Benfield Fri, 09 Jun 2017 02:07:11 -0700

> On 9 Jun 2017, at 06:48, Nathaniel Smith <n...@pobox.com> wrote:
> 
> I would say that this is something that we as a community are still
> figuring out. I really like the Sans-IO approach, and it's a really
> valuable piece of the solution, but it doesn't solve the whole problem
> by itself - you still need to actually do I/O, and this means things
> like error handling and timeouts that aren't obviously a natural fit
> to the Sans-IO approach, and this means you may still have some tricky
> code that can end up duplicated. (Or maybe the Sans-IO approach can be
> extended to handle these things too?) There are active discussions
> happening in projects like urllib3 [1] and packaging [2] about what
> the best strategy to take is. And the options vary a lot depending on
> whether you need to support python 2 etc.



Let me take a moment to elaborate on some of the thinking that has gone on for 
urllib3/Requests. We have an unusual set of constraints that are worth 
understanding, and so I’ll throw out all the ideas we had and why they were 
rejected (and indeed, why you may not want to reject them).

1. Implement the core library in asyncio, add a synchronous shim on top of it 
in terms of asyncio.run_until_complete().

This works great in many ways: you get a nice async-based library 
implementation, you correctly prioritise people using the async case over those 
using the synchronous one, and you can expect wide support and interop thanks 
to asyncio’s role as the common event loop implementation. However, you don’t 
support more novel async paradigms like those used by curio and trio.

More damningly for urllib3/Requests, this also limits your supported Python 
versions to 3.5 and later. There are also some efficiency concerns. Finally, 
unless you’re willing to only support 3.7 you end up needing to pass loop 
arguments around which is pretty gross.

2. Have an abstract low-level I/O interface and “bleach” it (remove the 
keywords async/await) on Python 2.

This would require you write all your code in terms of a small number of 
abstract I/O operations with “async” in front of their name, e.g. “async def 
send”, “async def recv”, and so-on. You can then implement these across 
multiple I/O backends, and also provide a synchronous one that still has 
“async” in front of it and just doesn’t ever use the word “await”. You can then 
provide a code transformation at install time on Python 2 that transforms that 
codebase, removing all the words “async” and “await” and leaving behind a 
synchronous-only codebase.

The advantages here are better support for novel async paradigms (e.g. curio 
and trio), the ability to write more native backends for non-asyncio I/O models 
(e.g. Twisted/Tornado), and having a single codebase that handles sync and 
async.

There are many myriad disadvantages. The first is the most obvious: the code 
your users run is not the same as the code you shipped. While the 
transformation is small and pretty easy to understand, that doesn’t remove its 
risks. It also makes debugging harder and more painful. On top of that, your 
Python 3 synchronous code looks pretty ugly because you have to write the word 
“await” around it even though it is not in fact asynchronous (technically you 
*don’t* have to do that but I guarantee IDEs will get mad).

More subtly, this causes problems for backpressure and task management on event 
loops. It turns out defining your low-level I/O primitives is not trivial. In 
urllib3’s case, one of the things we’d need is either the equivalent of ‘async 
def select()’ or ‘async def new_task’. In the first case, to write this would 
require a careful management of futures/deferreds and various bits of state in 
order to correctly suspect execution on event loops. In the second case, the 
synchronous version of this is called “threading.Thread” and that has a number 
of issues. I’d say that if you’re going to use threads you may as well just 
always use threads, but more importantly it has substantially different 
semantics to all async task management which make it difficult to reason about 
and to ensure that the code is sensible.

This approach is also entirely untested, at any scale. It’s simply not clear 
that it works yet. All the tooling would need to be written.

3. Just use Twisted/Tornado.

This variation on number (1) turns out to get you surprisingly close to our 
actual goal. Twisted and Tornado support Python 2 and Python 3, when 
async/await are present they integrate fairly nicely with them, and they give 
you the added advantage of allowing your Python 2 users to do asynchronous code 
so long as they buy into the relevant async ecosystem. It also means that you 
can use the run_until_complete model for your Python 2 synchronous code.

However, these also have some downsides. Twisted, the library I know better, 
doesn’t yet integrate as cleanly with async/await as we’d like: that’s coming 
sometime this year, probably with the landing of 3.7. Additionally, Twisted has 
no equivalent of asyncio.run_until_complete(), which would mean that someone 
would have to add the relevant Twisted support (either restartable or 
instantiable reactors, neither of which Twisted has yet).

This also adds a potentially sizeable external dependency, which isn’t 
necessarily all that fun.

4. ??? Who knows.

Right now there is no clarity about what we’re going to do. It’s possible that 
the answer will end up being “nothing at the moment’ and that we’ll wait for 
the ecosystem to progress for a while before making the change. Either way, 
it’s clear that there is no easy answer to this problem.

Cory

_______________________________________________
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/

Re: [Async-sig] async/sync library reusage

Reply via email to