Jeff Robbins <je...@livedata.com> added the comment:

Steve, I did some more digging into why the native condition variable approach 
might be causing problems on Windows.  Testing my fix revealed that there is at 
least one place in Modules\overlapped.c that either 

a) waits too long to call GetLastError(), or 

b) reveals an expectation that Py_END_ALLOW_THREADS won't change the results of 
GetLastError().



    Py_BEGIN_ALLOW_THREADS
    ret = GetQueuedCompletionStatus(CompletionPort, &NumberOfBytes,
                                    &CompletionKey, &Overlapped, Milliseconds);
    save_err = GetLastError();
    Py_END_ALLOW_THREADS

    err = ret ? ERROR_SUCCESS : GetLastError();





The problem in this code is that it allows *other* Windows API calls between 
the original Windows API call (in this case GetQueuedCompletionStatus()) and 
the call to GetLastError().  If those other Windows API calls change the 
thread-specific GetLastError state,
the info we need is lost.


To test for this possibility, I added a diagnostic test right after the code 
above



    if (!ret && (err != save_err)) {
        printf("GetQueuedCompletionStatus returned 0 but we lost the error=%d 
lost=%d Overlapped=%d\n", save_err, err, (long)Overlapped);
    }




 and ran a test that eventually produced this on the console:



GetQueuedCompletionStatus returned 0 but we lost the error=258 lost=0 
Overlapped=0




error 258 is WAIT_TIMEOUT.  The next lines of code are looking for that "error" 
in order to decide if GetQueuedCompletionStatus failed, or merely timed out.



    if (Overlapped == NULL) {
        if (err == WAIT_TIMEOUT)
            Py_RETURN_NONE;
        else
            return SetFromWindowsErr(err);
    }



So the impact of this problem is severe.   Instead of returning None to the 
caller (in this case _poll in asyncio\windows_events.py), it will raise an 
error!



        while True:
            status = _overlapped.GetQueuedCompletionStatus(self._iocp, ms)
            if status is None:
                break




And, to make things extra confusing, the error raised via 
SetFromWindowsErr(err) (where err == 0) ends up looking like this:



OSError: [WinError 0] The operation completed successfully




A valid WAIT_TIMEOUT thus gets converted to a Python error, but also loses the 
original Windows Error Code of 258, so you are left scratching your head about 
how a WinError 0 (ERROR_SUCCESS) could have crashed your call to, say, 
asyncio.run()? (See traceback below.)


So either we need to make sure that all calls to GetLastError() are made 
immediately after the relevant Windows API call, without any intervening other 
Windows API calls, and thereby prevent case a) above, or as in case b), the GIL 
code (using either emulated or native condition variables from condvar.h) needs 
to preserve the Error state. 

Some code in Python\thread_nt.h in fact does this already, e.g.



void *
PyThread_get_key_value(int key)
{
    /* because TLS is used in the Py_END_ALLOW_THREAD macro,
     * it is necessary to preserve the windows error state, because
     * it is assumed to be preserved across the call to the macro.
     * Ideally, the macro should be fixed, but it is simpler to
     * do it here.
     */
    DWORD error = GetLastError();
    void *result = TlsGetValue(key);
    SetLastError(error);
    return result;
}



Of course there might be *other* problems associated with using native 
condition variables on Windows, but this is the only one 
I've experienced after some fairly heavy use of Python 3.7.2 asyncio on Windows.


traceback:
------------------------------------------------------------------------------------------------------------------------
    asyncio.run(self.main())
  File "C:\Users\jeffr\Documents\projects\Python-3.7.2\lib\asyncio\runners.py", 
line 43, in run
    return loop.run_until_complete(main)
  File 
"C:\Users\jeffr\Documents\projects\Python-3.7.2\lib\asyncio\base_events.py", 
line 571, in run_until_complete
    self.run_forever()
  File 
"C:\Users\jeffr\Documents\projects\Python-3.7.2\lib\asyncio\base_events.py", 
line 539, in run_forever
    self._run_once()
  File 
"C:\Users\jeffr\Documents\projects\Python-3.7.2\lib\asyncio\base_events.py", 
line 1739, in _run_once
    event_list = self._selector.select(timeout)
  File 
"C:\Users\jeffr\Documents\projects\Python-3.7.2\lib\asyncio\windows_events.py", 
line 405, in select
    self._poll(timeout)
  File 
"C:\Users\jeffr\Documents\projects\Python-3.7.2\lib\asyncio\windows_events.py", 
line 703, in _poll
    status = _overlapped.GetQueuedCompletionStatus(self._iocp, ms)
OSError: [WinError 0] The operation completed successfully

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35662>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to