Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?
On Thu, Jan 28, 2016 at 9:41 PM, Chris Angelicowrote: > However, as I was doing so (and I just discarded a draft message where > I'd been typing up notes), my entire system went kerblooie, and I've > spent the last day rebuilding stuff from scratch. When I get around to > it, I'll rebuild the buildbot VM - and it'll be Debian Jessie (current > stable), because there's no particular reason to use Wheezy > (oldstable). So the problem will almost certainly disappear. Sure enough, the problem is no longer reproducible. Sorry folks. Hopefully it wasn't actually a bug anywhere, but was some bizarre piece of VM setup weirdness. The "AMD64 Debian root" buildbot (angelico-debian-amd64) is now running Debian Jessie, rather than the Wheezy it was before. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?
> After digging through test_socket.py for over an hour (the MRO for > RecvmsgUDP6Test is enormous!!), I've boiled the issue down to this: > > import socket > MSG = b'asdf qwer zxcv' > serv = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM) > serv.bind(("::1", 0)) > cli = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM) > cli.bind(("::1", 0)) > cli.sendto(MSG, serv.getsockname()) > print(serv.recvmsg(len(MSG) - 3, 0, socket.MSG_PEEK)) > print(serv.recvmsg(len(MSG), 0, socket.MSG_PEEK)) > print(serv.recvmsg(len(MSG))) > > On my main system, this produces three lines of output: the first has > truncated text, the second has full text, and the third also has full > text. This proves that MSG_PEEK is working correctly. On the buildbot, > though, the first one stalls out. Commenting that line out produces > correct results - peek the full data, then read it, and all is well. > > Any idea why partial read on a datagram socket would sometimes stall? I think it would stall if there is no data to receive. Maybe check the return value of sendto(), to ensure it is sending the whole message. Attached is a C program which should do the equivalent of your boiled-down Python script, in case that helps: $ gcc -Wall peek-udp6.c -o peek-udp6 $ ./peek-udp6 Bytes sent: 14 Received [asdf qwer z] Received [asdf qwer zxcv] Received [asdf qwer zxcv] Other things that come to mind are to see if there is anything odd about the buildbot’s Linux kernel and glibc versions. Maybe run the Python script under “strace” to see if anything strange is going on. /* $ gcc -Wall peek-udp6.c -o peek-udp6 $ ./peek-udp6 Bytes sent: 14 Received [asdf qwer z] Received [asdf qwer zxcv] Received [asdf qwer zxcv] */ #include #include #include #include #include static const char MSG[] = "asdf qwer zxcv"; static const size_t MSG_LEN = sizeof MSG - 1; int main() { int serv = socket(AF_INET6, SOCK_DGRAM | SOCK_CLOEXEC, 0); if(serv < 0) { perror("serv socket()"); return 1; } struct sockaddr_in6 bind_addr; memset(_addr, 0, sizeof bind_addr); bind_addr.sin6_family = AF_INET6; bind_addr.sin6_port = htons(0); bind_addr.sin6_addr = (struct in6_addr)IN6ADDR_LOOPBACK_INIT; if(bind(serv, (struct sockaddr *)_addr, sizeof bind_addr) < 0) { perror("bind(serv)"); return 1; } int cli = socket(AF_INET6, SOCK_DGRAM | SOCK_CLOEXEC, 0); if(cli < 0) { perror("cli socket()"); return 1; } if(bind(cli, (struct sockaddr *)_addr, sizeof bind_addr) < 0) { perror("bind(cli)"); return 1; } struct sockaddr_in6 addr; socklen_t addr_len = sizeof addr; if(getsockname(serv, (struct sockaddr *), _len) < 0) { perror("getsockname()"); return 1; } ssize_t size; size = sendto(cli, MSG, MSG_LEN, 0, (struct sockaddr *), addr_len); if(size < 0) { perror("sendto()"); return 1; } printf("Bytes sent: %zi\n", size); char buffer[MSG_LEN]; struct msghdr message = { .msg_name = , .msg_namelen = sizeof addr, .msg_iov = (struct iovec [1]){ {.iov_base = buffer, .iov_len = MSG_LEN - 3}, }, .msg_iovlen = 1, }; size = recvmsg(serv, , MSG_PEEK); if(size < 0) { perror("short recvmsg(MSG_PEEK)"); return 1; } printf("Received [%.*s]\n", (int)size, buffer); message.msg_iov[0].iov_len = MSG_LEN; size = recvmsg(serv, , MSG_PEEK); if(size < 0) { perror("full recvmsg(MSG_PEEK)"); return 1; } printf("Received [%.*s]\n", (int)size, buffer); size = recvmsg(serv, , 0); if(size < 0) { perror("full recvmsg(0)"); return 1; } printf("Received [%.*s]\n", (int)size, buffer); return 0; } ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?
On Thu, Jan 28, 2016 at 7:35 PM, Martin Panterwrote: > Other things that come to mind are to see if there is anything odd > about the buildbot’s Linux kernel and glibc versions. Maybe run the > Python script under “strace” to see if anything strange is going on. > I did that, and a few other things. Most notably, commenting out the partial-read resulted in a flawless run, and strace showed it stalling in the partial read too. However, as I was doing so (and I just discarded a draft message where I'd been typing up notes), my entire system went kerblooie, and I've spent the last day rebuilding stuff from scratch. When I get around to it, I'll rebuild the buildbot VM - and it'll be Debian Jessie (current stable), because there's no particular reason to use Wheezy (oldstable). So the problem will almost certainly disappear. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?
2016-01-23 7:03 GMT+01:00 Chris Angelico: > I just had a major crash on the system that hosts the > angelico-debian-amd64 buildbot, and as usual, checked it carefully > after bringing everything up. It seems now to be timing out after an > hour of operation: > > http://buildbot.python.org/all/builders/AMD64%20Debian%20root%203.x/builds/3132/steps/test/logs/stdio I opened http://bugs.python.org/issue26206 to track this issue. > Running just that test file: > > $ ./python Lib/test/test_socket.py > ... chomp lots of lines ... > testRecvmsgPeek (__main__.RecvmsgUDP6Test) ... > > seems to indicate that the stall is due to IPv6 and UDP. The VM should > have full IPv6 support, although my ISPs don't carry IPv6 traffic, so > it won't be able to reach the internet proper; but it should be able > to do all manner of local tests. Try to apply attached patch and run: $ ./python -m test -v -m testRecvmsgPeek test_socket (...) testRecvmsgPeek (test.test_socket.RecvmsgUDP6Test) ... CLI SOCK SERV SOCK CLI SOCK ('::1', 44347, 0, 0) SERV SOCK ('::1', 40488, 0, 0) ok testRecvmsgPeek (test.test_socket.RecvmsgIntoUDP6Test) ... CLI SOCK SERV SOCK CLI SOCK ('::1', 52721, 0, 0) SERV SOCK ('::1', 43967, 0, 0) ok (...) As you can see: the test uses the local loopback interface. Inet6TestBase.host is "::1". You can try to run a UDP server using netcat: "nc -l -u ::1 12345". Keep the command running in a terminal, and then run in a different terminal: "echo abc | nc -u ::1 12345". You should receive abc in the server. Victor diff -r 58266f5101cc Lib/test/test_socket.py --- a/Lib/test/test_socket.py Tue Jan 26 21:46:03 2016 -0800 +++ b/Lib/test/test_socket.py Wed Jan 27 10:35:32 2016 +0100 @@ -2049,6 +2049,8 @@ class SendrecvmsgConnectionlessBase(Send return ([], [], 0, self.serv_addr) def sendToServer(self, msg): +print("CLI SOCK", self.cli_sock) +print("CLI SOCK", self.cli_sock.getsockname()) return self.cli_sock.sendto(msg, self.serv_addr) @@ -2371,6 +2373,8 @@ class RecvmsgGenericTests(SendrecvmsgBas # data without consuming it. # Receive part of data with MSG_PEEK. +print("SERV SOCK", self.serv_sock) +print("SERV SOCK", self.serv_sock.getsockname()) msg, ancdata, flags, addr = self.doRecvmsg(self.serv_sock, len(MSG) - 3, 0, socket.MSG_PEEK) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?
On Wed, Jan 27, 2016 at 8:39 PM, Victor Stinnerwrote: > 2016-01-23 7:03 GMT+01:00 Chris Angelico : >> Running just that test file: >> >> $ ./python Lib/test/test_socket.py >> ... chomp lots of lines ... >> testRecvmsgPeek (__main__.RecvmsgUDP6Test) ... >> >> seems to indicate that the stall is due to IPv6 and UDP. The VM should >> have full IPv6 support, although my ISPs don't carry IPv6 traffic, so >> it won't be able to reach the internet proper; but it should be able >> to do all manner of local tests. > > Try to apply attached patch and run: > > $ ./python -m test -v -m testRecvmsgPeek test_socket > (...) > testRecvmsgPeek (test.test_socket.RecvmsgUDP6Test) ... CLI SOCK > type=SocketKind.SOCK_DGRAM, proto=0, laddr=('::1', 44347, 0, 0)> > SERV SOCK type=SocketKind.SOCK_DGRAM, proto=0, laddr=('::1', 40488, 0, 0)> > CLI SOCK ('::1', 44347, 0, 0) > SERV SOCK ('::1', 40488, 0, 0) > ok > testRecvmsgPeek (test.test_socket.RecvmsgIntoUDP6Test) ... CLI SOCK > type=SocketKind.SOCK_DGRAM, proto=0, laddr=('::1', 52721, 0, 0)> > SERV SOCK type=SocketKind.SOCK_DGRAM, proto=0, laddr=('::1', 43967, 0, 0)> > CLI SOCK ('::1', 52721, 0, 0) > SERV SOCK ('::1', 43967, 0, 0) > ok > (...) > > As you can see: the test uses the local loopback interface. > Inet6TestBase.host is "::1". Confirmed. It does two tests of IPv4 which run just fine, and then: testRecvmsgPeek (test.test_socket.RecvmsgUDP6Test) ... CLI SOCK CLI SOCK ('::1', 47518, 0, 0) SERV SOCK SERV SOCK ('::1', 42421, 0, 0) and hangs until I interrupt it. > You can try to run a UDP server using netcat: "nc -l -u ::1 12345". > Keep the command running in a terminal, and then run in a different > terminal: "echo abc | nc -u ::1 12345". You should receive abc in the > server. (Oddly, the default netcat-traditional doesn't seem to support this, but installing netcat-openbsd adds IPv6 support.) Yep, and that works flawlessly. It's nothing weird about that particular port, either - nc can use 42421 without a problem. After digging through test_socket.py for over an hour (the MRO for RecvmsgUDP6Test is enormous!!), I've boiled the issue down to this: import socket MSG = b'asdf qwer zxcv' serv = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM) serv.bind(("::1", 0)) cli = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM) cli.bind(("::1", 0)) cli.sendto(MSG, serv.getsockname()) print(serv.recvmsg(len(MSG) - 3, 0, socket.MSG_PEEK)) print(serv.recvmsg(len(MSG), 0, socket.MSG_PEEK)) print(serv.recvmsg(len(MSG))) On my main system, this produces three lines of output: the first has truncated text, the second has full text, and the third also has full text. This proves that MSG_PEEK is working correctly. On the buildbot, though, the first one stalls out. Commenting that line out produces correct results - peek the full data, then read it, and all is well. Any idea why partial read on a datagram socket would sometimes stall? ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?
On Sun, Jan 24, 2016 at 1:37 AM, Victor Stinnerwrote: > 3600 seconds is the maximum duration of a single test file. We may reduce it > since a single test file should not take longer than 30 min. Maybe we can do > better and put the timeout on a single test function. I'd be inclined to put some strong timeouts on test_socket.py (at some level or other of granularity). Most of those tests should be finished in a fraction of a second; a few might take a few seconds, maybe. None should take a whole minute. But they might easily sit around that long. But I'd rather know what I messed up in my recreation of the VM's configs. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?
3600 seconds is the maximum duration of a single test file. We may reduce it since a single test file should not take longer than 30 min. Maybe we can do better and put the timeout on a single test function. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?
On Sat, Jan 23, 2016 at 5:39 PM, Zachary Warewrote: > On Sat, Jan 23, 2016 at 12:03 AM, Chris Angelico wrote: >> By the way, this looks odd: >> >> make buildbottest TESTOPTS= TESTPYTHONOPTS= TESTTIMEOUT=3600 >> in dir /root/buildarea/3.x.angelico-debian-amd64/build (timeout 3900 secs) >> >> The parameter says 3600 (which corresponds to the error message at the >> end), but it echoes back that the timeout is 3900 seconds. > > I'm no help on the main issue, but to explain the timeout difference: > TESTTIMEOUT is a makefile variable that sets the faulthandler timeout > that tries to make Python bail out with a stack trace instead of > letting buildbot kill Python silently. The 3900 second timeout is > buildbot's "there's been no output in this long, assume it's hung and > kill it" timeout. The difference between the two is to give > faulthandler a chance to do its thing. Ah, cool. That's one mystery cleared up, at least. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?
On Sat, Jan 23, 2016 at 12:03 AM, Chris Angelicowrote: > By the way, this looks odd: > > make buildbottest TESTOPTS= TESTPYTHONOPTS= TESTTIMEOUT=3600 > in dir /root/buildarea/3.x.angelico-debian-amd64/build (timeout 3900 secs) > > The parameter says 3600 (which corresponds to the error message at the > end), but it echoes back that the timeout is 3900 seconds. I'm no help on the main issue, but to explain the timeout difference: TESTTIMEOUT is a makefile variable that sets the faulthandler timeout that tries to make Python bail out with a stack trace instead of letting buildbot kill Python silently. The 3900 second timeout is buildbot's "there's been no output in this long, assume it's hung and kill it" timeout. The difference between the two is to give faulthandler a chance to do its thing. -- Zach ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com