Jakub Stasiak <jakub+python....@stasiak.at> added the comment:

I thought I'd give it a shot and I believe i found the issue. Let's use the 
testCount test as an example.

The client side (or the data sending side) looks like this:

    def _testCount(self):
        address = self.serv.getsockname()
        file = open(os_helper.TESTFN, 'rb')
        sock = socket.create_connection(address,
                                        timeout=support.LOOPBACK_TIMEOUT)
        with sock, file:
            count = 5000007
            meth = self.meth_from_sock(sock)
            sent = meth(file, count=count)
            self.assertEqual(sent, count)
            self.assertEqual(file.tell(), count)

So we're sending 5000007 bytes of data at once to a socket that has a timeout 
of 5.0 seconds set (default in those tests).

Somewhere along the way socket._sendfile_use_sendfile() is called and in it 
there's a loop:

            try:
                while True:
                    (...)
                    try:
                        sent = os_sendfile(sockno, fileno, offset, blocksize)
                    except BlockingIOError:
                        if not timeout:
                            # Block until the socket is ready to send some
                            # data; avoids hogging CPU resources.
                            selector_select()
                        continue
                (...)
                return total_sent

On my test VM running openindiana 5.11 (I think that's the version number?) 
this is basically an infinite loop (I think it'll end at some point, but I 
didn't have the patience to verify this). That's because trying to send 5000007 
bytes to that socket with 5 seconds timeout will trigger BlockingIOError. Why?

This is the relevant part of os.sendfile() implementation from posixmodule.c:

    do {
        Py_BEGIN_ALLOW_THREADS
        ret = sendfile(out_fd, in_fd, &offset, count);
        Py_END_ALLOW_THREADS
    } while (ret < 0 && errno == EINTR && !(async_err = PyErr_CheckSignals()));
    if (ret < 0)
        return (!async_err) ? posix_error() : NULL;
    return Py_BuildValue("n", ret);


offset is 0 in our case, but its exact value doesn't matter. The trouble is 
this is what illumos sendfile man page[1] says:


RETURN VALUES

       (...) In some error cases sendfile() may still write some
       data before encountering an error and returning -1.  When that occurs,
       off is updated to point to the byte that follows the last byte copied and
       should be compared with its value before calling sendfile() to determine
       how much data was sent.

After some input from Jakub Kulik I believe this is a unique behavior (Linux, 
Oracle Solaris, Mac OS, FreeBSD and likely all or almost all other OSes don't 
do this) and it lacks handling on the Python side.

I tested this and indeed the sendfile(out_fd, in_fd, &offset, count) call 
*does* do partial writes in our case which gets reported as EAGAIN which gets 
converted to BlockingIOError which makes socket._sendfile_use_sendfile() retry 
again and again, each time resending the very beginning of the data and not 
going any further, therefore accumulating a lot of garbage on the receiving 
socket's side.

This patch works for me and I run the whole test_socket test suite with it, no 
more failures:

diff --git a/Modules/posixmodule.c b/Modules/posixmodule.c
index 203f98515dfd..eeff11b5b8ea 100644
--- a/Modules/posixmodule.c
+++ b/Modules/posixmodule.c
@@ -9475,13 +9475,20 @@ os_sendfile_impl(PyObject *module, int out_fd, int 
in_fd, PyObject *offobj,
    }
#endif

+    off_t original_offset = offset;
    do {
        Py_BEGIN_ALLOW_THREADS
        ret = sendfile(out_fd, in_fd, &offset, count);
        Py_END_ALLOW_THREADS
    } while (ret < 0 && errno == EINTR && !(async_err = PyErr_CheckSignals()));
-    if (ret < 0)
-        return (!async_err) ? posix_error() : NULL;
+    if (ret < 0) {
+       if (offset != original_offset) {
+           ret = offset - original_offset;
+       }
+       else {
+           return (!async_err) ? posix_error() : NULL;
+       }
+    }
    return Py_BuildValue("n", ret);
#endif
}

If it's verified to be a good change I'm happy to resubmit it in a PR with 
appropriate illumos-specific #ifndefs.


[1] https://illumos.org/man/3ext/sendfile

----------
nosy: +jstasiak

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue42237>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to