Josiah Carlson added the comment:

I added the chunking for Windows because in manual testing before finishing the 
patch, I found that large sends on Windows without actually waiting for the 
result can periodically result in zero data sent, despite a child process that 
wants to read.

Looking a bit more, this zero result is caused by ov.cancel() followed by 
ov.getresult() raising an OSError, specifically:
[WinError 995] The I/O operation has been aborted because of either a thread 
exit or an application request

That causes us to observe on the Python side of things that zero data was sent 
for some writes, but when looking at what the child process actually received, 
we discover that some data was actually sent. How much compared to what we 
thought we sent? That depends. I observed in testing today that the client 
could receive ~3.5 megs when we thought we had sent ~3 megs.

To make a long story short-ish, using Overlapped IO with WriteFile() and 
Overlapped.cancel(), without pausing between attempts with either a sleep or 
something else results in a difference in what we think vs. reality roughly 87% 
of the time with 512 byte chunks (87 trials out of 100), and roughly 100% of 
the time with 4096 byte chunks (100 trials out of 100). Note that this is when 
constantly trying to write data to the pipe. (each trial is as many 
Popen.write_nonblocking() calls as can complete in .25 seconds)

Inducing a 1 ms sleep between each overlapped.WriteFile() attempt drops the 
error rate to 0/100 trials and 1/100 trials for 512 byte and 4096 byte writes, 
respectively. Testing for larger block sizes suggests that 2048 bytes is the 
largest send that we can push through and actually get correct results.


So, according to my tests, there isn't a method by which we can both cancel an 
overlapped IO while at the same time guaranteeing that we will account exactly 
for the data that was actually sent without adding an implicit or explicit 
delay. Which makes sense as we are basically trying to interrupt another 
process in their attempts to read data that we said they could read, but doing 
so via a kernel call that interrupts another kernel call that is doing 
chunk-by-chunk copies from our write buffer (maybe to some kernel memory then) 
to their read buffer.

Anyway, by cutting down what we attempt to send at any one time, and forcing 
delays between attempted sends, we can come pretty close to guaranteeing that 
child processes don't have any sends that we can't account for. I'll try to get 
a patch out this weekend that encompasses these ideas with a new test that 
demonstrates the issue on Windows (for those who want to verify my results).

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue1191964>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to