> for any IO-bound call with a variable time where async isn't an option (either because it's not available, standardized, widespread, etc.), I'd advise using loop.run_in_executor()/to_thread() preemptively.
Clarification: this pretty much applies to any non-async IO-bound call that can block the event loop. You can definitely get away with ignoring some that have a consistently negligible duration, but I would not *directly* call any of them that could vary significantly in time (or are consistently long running) within a coroutine. Otherwise, it's a complete gamble as to how long it stalls the rest of the program, which is generally not desirable to say the least. On Sun, Jun 14, 2020 at 1:42 AM Kyle Stanley <aeros...@gmail.com> wrote: > > IOW the solution to the problem is to use threads. You can see here > why I said what I did: threads specifically avoid this problem and the > only way for asyncio to avoid it is to use threads. > > In the case of the above example, I'd say it's more so "use coroutines by > default and threads as needed" rather than just using threads, but fair > enough. I'll concede that point. > > > For instance, maybe during testing (with debug=True), your > DNS lookups are always reasonably fast, but then some time after > deployment, you find that they're stalling you out. How much effort is > it to change this over? How many other things are going to be slow, > and can you find them all? > > That's very situationally dependent, but for any IO-bound call with a > variable time where async isn't an option (either because it's not > available, standardized, widespread, etc.), I'd advise using > loop.run_in_executor()/to_thread() preemptively. This is easier said than > done of course and it's very possible for some to be glossed over. If it's > missed though, I don't think it's too much effort to change it over; IMO > the main challenge is more so with locating all of them in production for a > large, existing codebase. > > > 3) Steven D'Aprano is terrified of them and will rail on you for using > threads. > > Haha, I've somehow completely missed that. I CC'd Steven in the response, > since I'm curious as to what he has to say about that. > > > Take your pick. Figure out what your task needs. Both exist for good > reasons. > > Completely agreed, threads and coroutines are two completely different > approaches, with neither one being clearly superior for all situations. > Even as someone who's invested a significant amount of time in helping to > improve asyncio recently, I'll admit that I decently often encounter users > that would be better off using threads. Particularly for code that isn't > performance or resource critical, or when it involves a reasonably small > number of concurrent operations that aren't expected to scale in volume > significantly. The fine-grained control over context switching (which can > be a pro or a con), shorter switch delay, and lower resource usage from > coroutines isn't always worth the added code complexity. > > > > On Sun, Jun 14, 2020 at 12:43 AM Chris Angelico <ros...@gmail.com> wrote: > >> On Sun, Jun 14, 2020 at 2:16 PM Kyle Stanley <aeros...@gmail.com> wrote: >> > >> > > If >> > you're fine with invisible context switches, you're probably better >> > off with threads, because they're not vulnerable to unexpectedly >> > blocking actions (a common culprit being name lookups before network >> > transactions - you can connect sockets asynchronously, but >> > gethostbyname will block the current thread). >> > >> > These "unexpectedly blocking actions" can be identified in asyncio's >> debug mode. Specifically, any callback or task step that has a duration >> greater than 100ms will be logged. Then, the user can take a closer look at >> the offending long running step. If it's like socket.gethostbyname() and is >> a blocking IO-bound function call, it can be executed in a thread pool >> using loop.run_in_executor(None, socket.gethostbyname, hostname) to avoid >> blocking the event loop. In 3.9, there's also a roughly equivalent >> higher-level function that doesn't require access to the event loop: >> asyncio.to_thread(socket.gethostbyname, hostname). >> > >> > With the default duration of 100ms, it likely wouldn't pick up on >> socket.gethostbyname(), but it can rather easily be adjusted via the >> modifiable loop.slow_callback_duration attribute. >> > >> > Here's a quick, trivial example: >> > ``` >> > import asyncio >> > import socket >> > >> > async def main(): >> > loop = asyncio.get_running_loop() >> > loop.slow_callback_duration = .01 # 10ms >> > socket.gethostbyname("python.org") >> > >> > asyncio.run(main(), debug=True) >> > # If asyncio.run() is not an option, it can also be enabled via: >> > # loop.set_debug() >> > # using -X dev >> > # PYTHONASYNCIODEBUG env var >> > ``` >> > Output (3.8.3): >> > Executing <Task finished name='Task-1' coro=<main() done, defined at >> asyncio_debug_ex.py:5> result=None created at >> /usr/lib/python3.8/asyncio/base_events.py:595> took 0.039 seconds >> > >> > This is a bit more involved than it is for working with threads; I just >> wanted to demonstrate one method of addressing the problem, as it's a >> decently common issue. For more details about asyncio's debug mode, see >> https://docs.python.org/3/library/asyncio-dev.html#debug-mode. >> > >> >> IOW the solution to the problem is to use threads. You can see here >> why I said what I did: threads specifically avoid this problem and the >> only way for asyncio to avoid it is to use threads. (Yes, you can >> asynchronously do a DNS lookup rather than using gethostbyname, but >> the semantics aren't identical, and you may seriously annoy someone >> who uses other forms of name resolution. So that doesn't count.) As an >> additional concern, you don't always know which operations are going >> to be slow. For instance, maybe during testing (with debug=True), your >> DNS lookups are always reasonably fast, but then some time after >> deployment, you find that they're stalling you out. How much effort is >> it to change this over? How many other things are going to be slow, >> and can you find them all? >> >> That's why threads are so convenient for these kinds of jobs. >> >> Disadvantages of threads: >> 1) Overhead. If you make one thread for each task, your maximum >> simultaneous tasks can potentially be capped. Irrelevant if each task >> is doing things with far greater overhead anyway. >> 2) Unexpected context switching. Unless you use locks, a context >> switch can occur at any time. The GIL ensures that this won't corrupt >> Python's internal data structures, but you have to be aware of it with >> any mutable globals or shared state. >> 3) Steven D'Aprano is terrified of them and will rail on you for using >> threads. >> >> Disadvantages of asyncio: >> 1) Code complexity. You have to explicitly show which things are >> waiting on which others. >> 2) Unexpected LACK of context switching. Unless you use await, a >> context switch cannot occur. >> >> Take your pick. Figure out what your task needs. Both exist for good >> reasons. >> >> ChrisA >> _______________________________________________ >> Python-ideas mailing list -- python-ideas@python.org >> To unsubscribe send an email to python-ideas-le...@python.org >> https://mail.python.org/mailman3/lists/python-ideas.python.org/ >> Message archived at >> https://mail.python.org/archives/list/python-ideas@python.org/message/AJ2EOLSWSOAPSUG7BOM5MF3CHP3BHS3H/ >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7EY33HD56HN2WNP4AKG74PBELPFK3DKD/ Code of Conduct: http://python.org/psf/codeofconduct/