Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?

2016-01-28 Thread Chris Angelico
On Thu, Jan 28, 2016 at 9:41 PM, Chris Angelico  wrote:
> However, as I was doing so (and I just discarded a draft message where
> I'd been typing up notes), my entire system went kerblooie, and I've
> spent the last day rebuilding stuff from scratch. When I get around to
> it, I'll rebuild the buildbot VM - and it'll be Debian Jessie (current
> stable), because there's no particular reason to use Wheezy
> (oldstable). So the problem will almost certainly disappear.

Sure enough, the problem is no longer reproducible. Sorry folks.
Hopefully it wasn't actually a bug anywhere, but was some bizarre
piece of VM setup weirdness.

The "AMD64 Debian root" buildbot (angelico-debian-amd64) is now
running Debian Jessie, rather than the Wheezy it was before.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?

2016-01-28 Thread Martin Panter
> After digging through test_socket.py for over an hour (the MRO for
> RecvmsgUDP6Test is enormous!!), I've boiled the issue down to this:
>
> import socket
> MSG = b'asdf qwer zxcv'
> serv = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
> serv.bind(("::1", 0))
> cli = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
> cli.bind(("::1", 0))
> cli.sendto(MSG, serv.getsockname())
> print(serv.recvmsg(len(MSG) - 3, 0, socket.MSG_PEEK))
> print(serv.recvmsg(len(MSG), 0, socket.MSG_PEEK))
> print(serv.recvmsg(len(MSG)))
>
> On my main system, this produces three lines of output: the first has
> truncated text, the second has full text, and the third also has full
> text. This proves that MSG_PEEK is working correctly. On the buildbot,
> though, the first one stalls out. Commenting that line out produces
> correct results - peek the full data, then read it, and all is well.
>
> Any idea why partial read on a datagram socket would sometimes stall?

I think it would stall if there is no data to receive. Maybe check the
return value of sendto(), to ensure it is sending the whole message.

Attached is a C program which should do the equivalent of your
boiled-down Python script, in case that helps:

$ gcc -Wall peek-udp6.c -o peek-udp6
$ ./peek-udp6
Bytes sent: 14
Received [asdf qwer z]
Received [asdf qwer zxcv]
Received [asdf qwer zxcv]

Other things that come to mind are to see if there is anything odd
about the buildbot’s Linux kernel and glibc versions. Maybe run the
Python script under “strace” to see if anything strange is going on.
/*
$ gcc -Wall peek-udp6.c -o peek-udp6
$ ./peek-udp6
Bytes sent: 14
Received [asdf qwer z]
Received [asdf qwer zxcv]
Received [asdf qwer zxcv]
*/

#include 
#include 
#include 
#include 
#include 

static const char MSG[] = "asdf qwer zxcv";
static const size_t MSG_LEN = sizeof MSG - 1;

int main() {
int serv = socket(AF_INET6, SOCK_DGRAM | SOCK_CLOEXEC, 0);
if(serv < 0) {
perror("serv socket()");
return 1;
}

struct sockaddr_in6 bind_addr;
memset(_addr, 0, sizeof bind_addr);
bind_addr.sin6_family = AF_INET6;
bind_addr.sin6_port = htons(0);
bind_addr.sin6_addr = (struct in6_addr)IN6ADDR_LOOPBACK_INIT;
if(bind(serv, (struct sockaddr *)_addr, sizeof bind_addr) < 0) {
perror("bind(serv)");
return 1;
}

int cli = socket(AF_INET6, SOCK_DGRAM | SOCK_CLOEXEC, 0);
if(cli < 0) {
perror("cli socket()");
return 1;
}

if(bind(cli, (struct sockaddr *)_addr, sizeof bind_addr) < 0) {
perror("bind(cli)");
return 1;
}

struct sockaddr_in6 addr;
socklen_t addr_len = sizeof addr;
if(getsockname(serv, (struct sockaddr *), _len) < 0) {
perror("getsockname()");
return 1;
}
ssize_t size;
size = sendto(cli, MSG, MSG_LEN, 0, (struct sockaddr *), addr_len);
if(size < 0) {
perror("sendto()");
return 1;
}
printf("Bytes sent: %zi\n", size);

char buffer[MSG_LEN];
struct msghdr message = {
.msg_name = ,
.msg_namelen = sizeof addr,
.msg_iov = (struct iovec [1]){
{.iov_base = buffer, .iov_len = MSG_LEN - 3},
},
.msg_iovlen = 1,
};
size = recvmsg(serv, , MSG_PEEK);
if(size < 0) {
perror("short recvmsg(MSG_PEEK)");
return 1;
}
printf("Received [%.*s]\n", (int)size, buffer);

message.msg_iov[0].iov_len = MSG_LEN;
size = recvmsg(serv, , MSG_PEEK);
if(size < 0) {
perror("full recvmsg(MSG_PEEK)");
return 1;
}
printf("Received [%.*s]\n", (int)size, buffer);

size = recvmsg(serv, , 0);
if(size < 0) {
perror("full recvmsg(0)");
return 1;
}
printf("Received [%.*s]\n", (int)size, buffer);

return 0;
}
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?

2016-01-28 Thread Chris Angelico
On Thu, Jan 28, 2016 at 7:35 PM, Martin Panter  wrote:
> Other things that come to mind are to see if there is anything odd
> about the buildbot’s Linux kernel and glibc versions. Maybe run the
> Python script under “strace” to see if anything strange is going on.
>

I did that, and a few other things. Most notably, commenting out the
partial-read resulted in a flawless run, and strace showed it stalling
in the partial read too.

However, as I was doing so (and I just discarded a draft message where
I'd been typing up notes), my entire system went kerblooie, and I've
spent the last day rebuilding stuff from scratch. When I get around to
it, I'll rebuild the buildbot VM - and it'll be Debian Jessie (current
stable), because there's no particular reason to use Wheezy
(oldstable). So the problem will almost certainly disappear.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?

2016-01-27 Thread Victor Stinner
2016-01-23 7:03 GMT+01:00 Chris Angelico :
> I just had a major crash on the system that hosts the
> angelico-debian-amd64 buildbot, and as usual, checked it carefully
> after bringing everything up. It seems now to be timing out after an
> hour of operation:
>
> http://buildbot.python.org/all/builders/AMD64%20Debian%20root%203.x/builds/3132/steps/test/logs/stdio

I opened http://bugs.python.org/issue26206 to track this issue.

> Running just that test file:
>
> $ ./python Lib/test/test_socket.py
> ... chomp lots of lines ...
> testRecvmsgPeek (__main__.RecvmsgUDP6Test) ...
>
> seems to indicate that the stall is due to IPv6 and UDP. The VM should
> have full IPv6 support, although my ISPs don't carry IPv6 traffic, so
> it won't be able to reach the internet proper; but it should be able
> to do all manner of local tests.

Try to apply attached patch and run:

$ ./python -m test -v -m testRecvmsgPeek test_socket
(...)
testRecvmsgPeek (test.test_socket.RecvmsgUDP6Test) ... CLI SOCK

SERV SOCK 
CLI SOCK ('::1', 44347, 0, 0)
SERV SOCK ('::1', 40488, 0, 0)
ok
testRecvmsgPeek (test.test_socket.RecvmsgIntoUDP6Test) ... CLI SOCK

SERV SOCK 
CLI SOCK ('::1', 52721, 0, 0)
SERV SOCK ('::1', 43967, 0, 0)
ok
(...)

As you can see: the test uses the local loopback interface.
Inet6TestBase.host is "::1".

You can try to run a UDP server using netcat: "nc -l -u ::1 12345".
Keep the command running in a terminal, and then run in a different
terminal: "echo abc | nc -u ::1 12345". You should receive abc in the
server.

Victor
diff -r 58266f5101cc Lib/test/test_socket.py
--- a/Lib/test/test_socket.py	Tue Jan 26 21:46:03 2016 -0800
+++ b/Lib/test/test_socket.py	Wed Jan 27 10:35:32 2016 +0100
@@ -2049,6 +2049,8 @@ class SendrecvmsgConnectionlessBase(Send
 return ([], [], 0, self.serv_addr)
 
 def sendToServer(self, msg):
+print("CLI SOCK", self.cli_sock)
+print("CLI SOCK", self.cli_sock.getsockname())
 return self.cli_sock.sendto(msg, self.serv_addr)
 
 
@@ -2371,6 +2373,8 @@ class RecvmsgGenericTests(SendrecvmsgBas
 # data without consuming it.
 
 # Receive part of data with MSG_PEEK.
+print("SERV SOCK", self.serv_sock)
+print("SERV SOCK", self.serv_sock.getsockname())
 msg, ancdata, flags, addr = self.doRecvmsg(self.serv_sock,
len(MSG) - 3, 0,
socket.MSG_PEEK)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?

2016-01-27 Thread Chris Angelico
On Wed, Jan 27, 2016 at 8:39 PM, Victor Stinner
 wrote:
> 2016-01-23 7:03 GMT+01:00 Chris Angelico :
>> Running just that test file:
>>
>> $ ./python Lib/test/test_socket.py
>> ... chomp lots of lines ...
>> testRecvmsgPeek (__main__.RecvmsgUDP6Test) ...
>>
>> seems to indicate that the stall is due to IPv6 and UDP. The VM should
>> have full IPv6 support, although my ISPs don't carry IPv6 traffic, so
>> it won't be able to reach the internet proper; but it should be able
>> to do all manner of local tests.
>
> Try to apply attached patch and run:
>
> $ ./python -m test -v -m testRecvmsgPeek test_socket
> (...)
> testRecvmsgPeek (test.test_socket.RecvmsgUDP6Test) ... CLI SOCK
>  type=SocketKind.SOCK_DGRAM, proto=0, laddr=('::1', 44347, 0, 0)>
> SERV SOCK  type=SocketKind.SOCK_DGRAM, proto=0, laddr=('::1', 40488, 0, 0)>
> CLI SOCK ('::1', 44347, 0, 0)
> SERV SOCK ('::1', 40488, 0, 0)
> ok
> testRecvmsgPeek (test.test_socket.RecvmsgIntoUDP6Test) ... CLI SOCK
>  type=SocketKind.SOCK_DGRAM, proto=0, laddr=('::1', 52721, 0, 0)>
> SERV SOCK  type=SocketKind.SOCK_DGRAM, proto=0, laddr=('::1', 43967, 0, 0)>
> CLI SOCK ('::1', 52721, 0, 0)
> SERV SOCK ('::1', 43967, 0, 0)
> ok
> (...)
>
> As you can see: the test uses the local loopback interface.
> Inet6TestBase.host is "::1".

Confirmed. It does two tests of IPv4 which run just fine, and then:

testRecvmsgPeek (test.test_socket.RecvmsgUDP6Test) ... CLI SOCK

CLI SOCK ('::1', 47518, 0, 0)
SERV SOCK 
SERV SOCK ('::1', 42421, 0, 0)

and hangs until I interrupt it.

> You can try to run a UDP server using netcat: "nc -l -u ::1 12345".
> Keep the command running in a terminal, and then run in a different
> terminal: "echo abc | nc -u ::1 12345". You should receive abc in the
> server.

(Oddly, the default netcat-traditional doesn't seem to support this,
but installing netcat-openbsd adds IPv6 support.)

Yep, and that works flawlessly. It's nothing weird about that
particular port, either - nc can use 42421 without a problem.

After digging through test_socket.py for over an hour (the MRO for
RecvmsgUDP6Test is enormous!!), I've boiled the issue down to this:

import socket
MSG = b'asdf qwer zxcv'
serv = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
serv.bind(("::1", 0))
cli = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
cli.bind(("::1", 0))
cli.sendto(MSG, serv.getsockname())
print(serv.recvmsg(len(MSG) - 3, 0, socket.MSG_PEEK))
print(serv.recvmsg(len(MSG), 0, socket.MSG_PEEK))
print(serv.recvmsg(len(MSG)))

On my main system, this produces three lines of output: the first has
truncated text, the second has full text, and the third also has full
text. This proves that MSG_PEEK is working correctly. On the buildbot,
though, the first one stalls out. Commenting that line out produces
correct results - peek the full data, then read it, and all is well.

Any idea why partial read on a datagram socket would sometimes stall?

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?

2016-01-23 Thread Chris Angelico
On Sun, Jan 24, 2016 at 1:37 AM, Victor Stinner
 wrote:
> 3600 seconds is the maximum duration of a single test file. We may reduce it
> since a single test file should not take longer than 30 min. Maybe we can do
> better and put the timeout on a single test function.

I'd be inclined to put some strong timeouts on test_socket.py (at some
level or other of granularity). Most of those tests should be finished
in a fraction of a second; a few might take a few seconds, maybe. None
should take a whole minute. But they might easily sit around that
long.

But I'd rather know what I messed up in my recreation of the VM's configs.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?

2016-01-23 Thread Victor Stinner
3600 seconds is the maximum duration of a single test file. We may reduce
it since a single test file should not take longer than 30 min. Maybe we
can do better and put the timeout on a single test function.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?

2016-01-22 Thread Chris Angelico
On Sat, Jan 23, 2016 at 5:39 PM, Zachary Ware
 wrote:
> On Sat, Jan 23, 2016 at 12:03 AM, Chris Angelico  wrote:
>> By the way, this looks odd:
>>
>> make buildbottest TESTOPTS= TESTPYTHONOPTS= TESTTIMEOUT=3600
>>  in dir /root/buildarea/3.x.angelico-debian-amd64/build (timeout 3900 secs)
>>
>> The parameter says 3600 (which corresponds to the error message at the
>> end), but it echoes back that the timeout is 3900 seconds.
>
> I'm no help on the main issue, but to explain the timeout difference:
> TESTTIMEOUT is a makefile variable that sets the faulthandler timeout
> that tries to make Python bail out with a stack trace instead of
> letting buildbot kill Python silently.  The 3900 second timeout is
> buildbot's "there's been no output in this long, assume it's hung and
> kill it" timeout.  The difference between the two is to give
> faulthandler a chance to do its thing.

Ah, cool. That's one mystery cleared up, at least.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Buildbot timing out - test suite failure - test_socket issue with UDP6?

2016-01-22 Thread Zachary Ware
On Sat, Jan 23, 2016 at 12:03 AM, Chris Angelico  wrote:
> By the way, this looks odd:
>
> make buildbottest TESTOPTS= TESTPYTHONOPTS= TESTTIMEOUT=3600
>  in dir /root/buildarea/3.x.angelico-debian-amd64/build (timeout 3900 secs)
>
> The parameter says 3600 (which corresponds to the error message at the
> end), but it echoes back that the timeout is 3900 seconds.

I'm no help on the main issue, but to explain the timeout difference:
TESTTIMEOUT is a makefile variable that sets the faulthandler timeout
that tries to make Python bail out with a stack trace instead of
letting buildbot kill Python silently.  The 3900 second timeout is
buildbot's "there's been no output in this long, assume it's hung and
kill it" timeout.  The difference between the two is to give
faulthandler a chance to do its thing.

-- 
Zach
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com