Maor Kleinberger <kma...@gmail.com> added the comment:

After digging into asyncio, I stumbled upon this particularly suspicious block 
in BaseEventLoop._run_once: 
https://github.com/python/cpython/blob/v3.9.0a3/Lib/asyncio/base_events.py#L1873

handle = self._ready.popleft()
if handle._cancelled:
    continue
if self._debug:
    ...
    handle._run()
    ...
else:
    handle._run()

As you can see, a callback is popped from the dequeue of ready callbacks, and 
only after a couple of lines that callback is called. The question arises, what 
happens if an exception is raised in between? Or more specifically, What 
happens to that callback if a KeyboardInterrupt is raised before it is called?
Well, appparently it dies and becomes one with the universe. The chances of it 
happening are the highest when the ioloop is running very short coroutines 
(like sleep(0)), and are increased when debug is on (because more code is 
executed in between).

This is how the bug we've been experiencing came to life:
When SIGINT is received it raises a KeyboardInterrupt in the running frame. If 
the running frame is a coroutine, it stops, the exception climbs up the stack, 
and the ioloop shuts down. Otherwise, the KeyboardInterrupt is probably raised 
inside asyncio's code, somewhere inside run_forever. In that case, the ioloop 
stops and proceeds to cancel all of the running tasks. After cancelling all the 
tasks, asyncio actually reruns the ioloop so all tasks receive the 
CancelledError and handle it or just die (see 
asyncio.runners._cancel_all_tasks).
Enter our bug; sometimes, randomly, the loop gets stuck waiting for all the 
cancelled tasks to finish. This behavior is caused by the flaw I described 
earlier - if the KeyboardInterrupt was raised after a callback was popped and 
before it was run, the callback is lost and the task that was waiting for it 
will wait forever.
Depending on the running tasks, the event loop might hang on the select call 
(until a interrupted by a signal, like SIGINT). This is what happens in 
SleepTest.py. Another case might be that only a part of the ioloop gets stuck, 
and other parts that are not dependent on the lost call still run correctly 
(and run into a CancelledError). This behavior is demonstrated in the script I 
added to this thread, asyncio_bug_demo.py.

I see two possible solutions:
1. Make all the code inside run_forever signal safe
2. Override the default SIGINT handler in asyncio.run with one more fitting the 
way asyncio works

I find the second solution much easier to implement well, and I think it makes 
more sense. I think python's default SIGINT handler fits normal single-threaded 
applications very well, but not so much an event loop. When using an event loop 
it makes sense to handle a signal as an event an process it along with the 
other running tasks. This is fully supported by the ioloop with the help of 
signal.set_wakeup_fd.
I have implemented the second solution and opened a PR, please review it and 
tell me what you think!

----------
Added file: https://bugs.python.org/file48906/asyncio_bug_demo.py

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue39622>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to