Re: Problem with slow httplib connections on Windows (and maybe other platforms)
Steve Holden schrieb: Search for the subject line "socket.create_connection slow" - this was discovered by Kristjan Valur Jonsson. It certainly seems like a Microsoft weirdness. Thanks for the pointer, Steve. I hadn't seen that yet. I agree that's actually the real problem here. The solution suggested in that thread, using a dual-stacked socket for the TCPserver, seems a good one to me. -- Christoph -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with slow httplib connections on Windows (and maybe other platforms)
rdmur...@bitdance.com wrote: > Quoth Christoph Zwerschke : >> rdmur...@bitdance.com schrieb: >>> Quoth Christoph Zwerschke : With Py 2.3 (without IPv6 support) this is only the IPv4 address, but with Py 2.4-2.6 the order is (on my Win XP host) the IPv6 address first, then the IPv4 address. Since the IPv6 address is checked first, this gives a timeout and causes the slow connect() call. The order by which getaddrinfo returns IPv4/v6 under Linux seems to vary depending on the glibc version, so it may be a problem on other platforms, too. >>> Based on something I read in another thread, this appears to be a problem >>> only under Windows. Everybody else implemented the TCP/IP stack according >>> to spec, and the IPV6 connect attempt times out immediately, producing >>> no slowdown. >>> >>> Microsoft, however >> The order in which getaddrinfo returns IPv4 and IPv6 is probably not >> written in the specs (Posix 1003.1g and RFC 2553). The fact that Windows >> returns IPv6 addresses first is not wrong in itself. >> >> For this discussion, see also >> http://www.ops.ietf.org/lists/v6ops/v6ops.2002/msg00869.html >> https://bugzilla.redhat.com/show_bug.cgi?id=190495 >> >> But yes, I also wonder why the connect to the IPv6 loopback address does >> not time out more quickly on Windows. > > Right, it's not the order of the returned items that's the Microsoft > weirdness, it's the long timeout on an attempt to connect to something > that doesn't exist. There was a long discussion about this, and it might > even have been on python-dev, but I can't lay my hands on the thread. > In short, Microsoft retries and waits a while when the far end says > "no thanks" to a connection attempt, instead of immediately returning > the connection failure the way Linux and etc and etc do. This applies > to IPV4, too. > Search for the subject line "socket.create_connection slow" - this was discovered by Kristjan Valur Jonsson. It certainly seems like a Microsoft weirdness. regards Steve -- Steve Holden+1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with slow httplib connections on Windows (and maybe other platforms)
Quoth Christoph Zwerschke : > rdmur...@bitdance.com schrieb: > > Quoth Christoph Zwerschke : > >>With Py 2.3 (without IPv6 support) this is only the IPv4 address, > >>but with Py 2.4-2.6 the order is (on my Win XP host) the IPv6 address > >>first, then the IPv4 address. Since the IPv6 address is checked first, > >>this gives a timeout and causes the slow connect() call. The order by > >>which getaddrinfo returns IPv4/v6 under Linux seems to vary depending > >>on the glibc version, so it may be a problem on other platforms, too. > > > > Based on something I read in another thread, this appears to be a problem > > only under Windows. Everybody else implemented the TCP/IP stack according > > to spec, and the IPV6 connect attempt times out immediately, producing > > no slowdown. > > > > Microsoft, however > > The order in which getaddrinfo returns IPv4 and IPv6 is probably not > written in the specs (Posix 1003.1g and RFC 2553). The fact that Windows > returns IPv6 addresses first is not wrong in itself. > > For this discussion, see also > http://www.ops.ietf.org/lists/v6ops/v6ops.2002/msg00869.html > https://bugzilla.redhat.com/show_bug.cgi?id=190495 > > But yes, I also wonder why the connect to the IPv6 loopback address does > not time out more quickly on Windows. Right, it's not the order of the returned items that's the Microsoft weirdness, it's the long timeout on an attempt to connect to something that doesn't exist. There was a long discussion about this, and it might even have been on python-dev, but I can't lay my hands on the thread. In short, Microsoft retries and waits a while when the far end says "no thanks" to a connection attempt, instead of immediately returning the connection failure the way Linux and etc and etc do. This applies to IPV4, too. --RDM -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with slow httplib connections on Windows (and maybe other platforms)
rdmur...@bitdance.com schrieb: Quoth Christoph Zwerschke : With Py 2.3 (without IPv6 support) this is only the IPv4 address, but with Py 2.4-2.6 the order is (on my Win XP host) the IPv6 address first, then the IPv4 address. Since the IPv6 address is checked first, this gives a timeout and causes the slow connect() call. The order by which getaddrinfo returns IPv4/v6 under Linux seems to vary depending on the glibc version, so it may be a problem on other platforms, too. Based on something I read in another thread, this appears to be a problem only under Windows. Everybody else implemented the TCP/IP stack according to spec, and the IPV6 connect attempt times out immediately, producing no slowdown. Microsoft, however The order in which getaddrinfo returns IPv4 and IPv6 is probably not written in the specs (Posix 1003.1g and RFC 2553). The fact that Windows returns IPv6 addresses first is not wrong in itself. For this discussion, see also http://www.ops.ietf.org/lists/v6ops/v6ops.2002/msg00869.html https://bugzilla.redhat.com/show_bug.cgi?id=190495 But yes, I also wonder why the connect to the IPv6 loopback address does not time out more quickly on Windows. -- Christoph -- http://mail.python.org/mailman/listinfo/python-list
Problem with slow httplib connections on Windows (and maybe other platforms)
Quoth Christoph Zwerschke : > What actually happens is the following: > > * BaseHTTPServer binds only to the IPv4 address of localhost, because >it's based on TCPServer which has address_family=AF_INET by default. > > * HTTPConnection.connect() however tries to connect to all IP addresses >of localhost, in the order determined socket.getaddrinfo('localhost'). > >With Py 2.3 (without IPv6 support) this is only the IPv4 address, >but with Py 2.4-2.6 the order is (on my Win XP host) the IPv6 address >first, then the IPv4 address. Since the IPv6 address is checked first, >this gives a timeout and causes the slow connect() call. The order by >which getaddrinfo returns IPv4/v6 under Linux seems to vary depending >on the glibc version, so it may be a problem on other platforms, too. Based on something I read in another thread, this appears to be a problem only under Windows. Everybody else implemented the TCP/IP stack according to spec, and the IPV6 connect attempt times out immediately, producing no slowdown. Microsoft, however --RDM -- http://mail.python.org/mailman/listinfo/python-list
Problem with slow httplib connections on Windows (and maybe other platforms)
It cost me a while to analyze the cause of the following problem. The symptom was that testing a local web app with twill was fast on Python 2.3, but very slow on Python 2.4-2.6 on a Win XP box. This boiled down to the problem that if you run a SimpleHTTPServer for localhost like this, BaseHTTPServer.HTTPServer(('localhost', 8000), SimpleHTTPServer.SimpleHTTPRequestHandler).serve_forever() and access it using httplib.HTTPConnection on the same host like this httplib.HTTPConnection('localhost', 8000).connect() then this call is fast using Py 2.3, but slow with Py 2.4-2.6. I found that this was caused by a mismatch of the ip version used by SimpleHTTPServer and HTTPConnection for a "localhost" argument. What actually happens is the following: * BaseHTTPServer binds only to the IPv4 address of localhost, because it's based on TCPServer which has address_family=AF_INET by default. * HTTPConnection.connect() however tries to connect to all IP addresses of localhost, in the order determined socket.getaddrinfo('localhost'). With Py 2.3 (without IPv6 support) this is only the IPv4 address, but with Py 2.4-2.6 the order is (on my Win XP host) the IPv6 address first, then the IPv4 address. Since the IPv6 address is checked first, this gives a timeout and causes the slow connect() call. The order by which getaddrinfo returns IPv4/v6 under Linux seems to vary depending on the glibc version, so it may be a problem on other platforms, too. You can see the cause of the slow connect() like this: import httplib conn = httplib.HTTPConnection('localhost', 8000) conn.set_debuglevel(1) conn.connect() This is what I get: connect: (localhost, 8000) connect fail: ('localhost', 8000) connect: (localhost, 8000) The first (failing) connect is the attempt to connect to the IPv6 address which BaseHTTPServer doesn't listen to. (This is the debug output of Py 2.5 which really should be improved to show the IP address that is actually used. Unfortunately, in Py 2.6 the debug output when connecting has even fallen prey to a refactoring. I think it should either be added again or set_debuglevel() is now pretty meaningless.) Can we do something about the mismatch that SimpleHTTPServer only serves IPv4, but HTTPConnection tries to connect with IPv6 first? I guess other people also stumbled over this, maybe without even noticing and just wondering about the slow performance. E.g.: http://schotime.net/blog/index.php/2008/05/27/slow-tcpclient-connection-sockets/ One possible solution would be to improve the TCPServer in the standard lib so that it determines the address_family and real server_address based on the first return value of socket.getaddrinfo, like this: class TCPServer(BaseServer): ... def __init__(self, server_address, RequestHandlerClass): if server_address and len(server_address) == 2: (self.address_family, dummy, dummy, dummy, server_address) = socket.getaddrinfo(*server_address)[0] else: raise TypeError("server_address must be a 2-tuple") BaseServer.__init__(self, server_address, RequestHandlerClass) ... That way, if you either serve as or connect to 'localhost', you will always consistently do this via IPv4 or IPv6, depending on what is preferred on your platform. Does this sound reasonable? Any better ideas? -- Christoph -- http://mail.python.org/mailman/listinfo/python-list