Re: Problem with slow httplib connections on Windows (and maybe other platforms)

2009-02-01 Thread Christoph Zwerschke

Steve Holden schrieb:

Search for the subject line "socket.create_connection slow" - this was
discovered by Kristjan Valur Jonsson. It certainly seems like a
Microsoft weirdness.


Thanks for the pointer, Steve. I hadn't seen that yet. I agree that's 
actually the real problem here. The solution suggested in that thread, 
using a dual-stacked socket for the TCPserver, seems a good one to me.


-- Christoph
--
http://mail.python.org/mailman/listinfo/python-list


Re: Problem with slow httplib connections on Windows (and maybe other platforms)

2009-02-01 Thread Steve Holden
rdmur...@bitdance.com wrote:
> Quoth Christoph Zwerschke :
>> rdmur...@bitdance.com schrieb:
>>> Quoth Christoph Zwerschke :
With Py 2.3 (without IPv6 support) this is only the IPv4 address,
but with Py 2.4-2.6 the order is (on my Win XP host) the IPv6 address
first, then the IPv4 address. Since the IPv6 address is checked first,
this gives a timeout and causes the slow connect() call. The order by
which getaddrinfo returns IPv4/v6 under Linux seems to vary depending
on the glibc version, so it may be a problem on other platforms, too.
>>> Based on something I read in another thread, this appears to be a problem
>>> only under Windows.  Everybody else implemented the TCP/IP stack according
>>> to spec, and the IPV6 connect attempt times out immediately, producing
>>> no slowdown.
>>>
>>> Microsoft, however
>> The order in which getaddrinfo returns IPv4 and IPv6 is probably not 
>> written in the specs (Posix 1003.1g and RFC 2553). The fact that Windows 
>> returns IPv6 addresses first is not wrong in itself.
>>
>> For this discussion, see also
>> http://www.ops.ietf.org/lists/v6ops/v6ops.2002/msg00869.html
>> https://bugzilla.redhat.com/show_bug.cgi?id=190495
>>
>> But yes, I also wonder why the connect to the IPv6 loopback address does 
>> not time out more quickly on Windows.
> 
> Right, it's not the order of the returned items that's the Microsoft
> weirdness, it's the long timeout on an attempt to connect to something
> that doesn't exist.  There was a long discussion about this, and it might
> even have been on python-dev, but I can't lay my hands on the thread.
> In short, Microsoft retries and waits a while when the far end says
> "no thanks" to a connection attempt, instead of immediately returning
> the connection failure the way Linux and etc and etc do.  This applies
> to IPV4, too.
> 
Search for the subject line "socket.create_connection slow" - this was
discovered by Kristjan Valur Jonsson. It certainly seems like a
Microsoft weirdness.

regards
 Steve
-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC  http://www.holdenweb.com/

--
http://mail.python.org/mailman/listinfo/python-list


Re: Problem with slow httplib connections on Windows (and maybe other platforms)

2009-02-01 Thread rdmurray
Quoth Christoph Zwerschke :
> rdmur...@bitdance.com schrieb:
> > Quoth Christoph Zwerschke :
> >>With Py 2.3 (without IPv6 support) this is only the IPv4 address,
> >>but with Py 2.4-2.6 the order is (on my Win XP host) the IPv6 address
> >>first, then the IPv4 address. Since the IPv6 address is checked first,
> >>this gives a timeout and causes the slow connect() call. The order by
> >>which getaddrinfo returns IPv4/v6 under Linux seems to vary depending
> >>on the glibc version, so it may be a problem on other platforms, too.
> > 
> > Based on something I read in another thread, this appears to be a problem
> > only under Windows.  Everybody else implemented the TCP/IP stack according
> > to spec, and the IPV6 connect attempt times out immediately, producing
> > no slowdown.
> > 
> > Microsoft, however
> 
> The order in which getaddrinfo returns IPv4 and IPv6 is probably not 
> written in the specs (Posix 1003.1g and RFC 2553). The fact that Windows 
> returns IPv6 addresses first is not wrong in itself.
> 
> For this discussion, see also
> http://www.ops.ietf.org/lists/v6ops/v6ops.2002/msg00869.html
> https://bugzilla.redhat.com/show_bug.cgi?id=190495
> 
> But yes, I also wonder why the connect to the IPv6 loopback address does 
> not time out more quickly on Windows.

Right, it's not the order of the returned items that's the Microsoft
weirdness, it's the long timeout on an attempt to connect to something
that doesn't exist.  There was a long discussion about this, and it might
even have been on python-dev, but I can't lay my hands on the thread.
In short, Microsoft retries and waits a while when the far end says
"no thanks" to a connection attempt, instead of immediately returning
the connection failure the way Linux and etc and etc do.  This applies
to IPV4, too.

--RDM

--
http://mail.python.org/mailman/listinfo/python-list


Re: Problem with slow httplib connections on Windows (and maybe other platforms)

2009-02-01 Thread Christoph Zwerschke

rdmur...@bitdance.com schrieb:

Quoth Christoph Zwerschke :

   With Py 2.3 (without IPv6 support) this is only the IPv4 address,
   but with Py 2.4-2.6 the order is (on my Win XP host) the IPv6 address
   first, then the IPv4 address. Since the IPv6 address is checked first,
   this gives a timeout and causes the slow connect() call. The order by
   which getaddrinfo returns IPv4/v6 under Linux seems to vary depending
   on the glibc version, so it may be a problem on other platforms, too.


Based on something I read in another thread, this appears to be a problem
only under Windows.  Everybody else implemented the TCP/IP stack according
to spec, and the IPV6 connect attempt times out immediately, producing
no slowdown.

Microsoft, however


The order in which getaddrinfo returns IPv4 and IPv6 is probably not 
written in the specs (Posix 1003.1g and RFC 2553). The fact that Windows 
returns IPv6 addresses first is not wrong in itself.


For this discussion, see also
http://www.ops.ietf.org/lists/v6ops/v6ops.2002/msg00869.html
https://bugzilla.redhat.com/show_bug.cgi?id=190495

But yes, I also wonder why the connect to the IPv6 loopback address does 
not time out more quickly on Windows.


-- Christoph
--
http://mail.python.org/mailman/listinfo/python-list


Problem with slow httplib connections on Windows (and maybe other platforms)

2009-02-01 Thread rdmurray
Quoth Christoph Zwerschke :
> What actually happens is the following:
> 
> * BaseHTTPServer binds only to the IPv4 address of localhost, because
>it's based on TCPServer which has address_family=AF_INET by default.
> 
> * HTTPConnection.connect() however tries to connect to all IP addresses
>of localhost, in the order determined socket.getaddrinfo('localhost').
> 
>With Py 2.3 (without IPv6 support) this is only the IPv4 address,
>but with Py 2.4-2.6 the order is (on my Win XP host) the IPv6 address
>first, then the IPv4 address. Since the IPv6 address is checked first,
>this gives a timeout and causes the slow connect() call. The order by
>which getaddrinfo returns IPv4/v6 under Linux seems to vary depending
>on the glibc version, so it may be a problem on other platforms, too.

Based on something I read in another thread, this appears to be a problem
only under Windows.  Everybody else implemented the TCP/IP stack according
to spec, and the IPV6 connect attempt times out immediately, producing
no slowdown.

Microsoft, however

--RDM

--
http://mail.python.org/mailman/listinfo/python-list


Problem with slow httplib connections on Windows (and maybe other platforms)

2009-02-01 Thread Christoph Zwerschke

It cost me a while to analyze the cause of the following problem.

The symptom was that testing a local web app with twill was fast
on Python 2.3, but very slow on Python 2.4-2.6 on a Win XP box.

This boiled down to the problem that if you run a SimpleHTTPServer
for localhost like this,

  BaseHTTPServer.HTTPServer(('localhost', 8000),
  SimpleHTTPServer.SimpleHTTPRequestHandler).serve_forever()

and access it using httplib.HTTPConnection on the same host like this

  httplib.HTTPConnection('localhost', 8000).connect()

then this call is fast using Py 2.3, but slow with Py 2.4-2.6.

I found that this was caused by a mismatch of the ip version used
by SimpleHTTPServer and HTTPConnection for a "localhost" argument.

What actually happens is the following:

* BaseHTTPServer binds only to the IPv4 address of localhost, because
  it's based on TCPServer which has address_family=AF_INET by default.

* HTTPConnection.connect() however tries to connect to all IP addresses
  of localhost, in the order determined socket.getaddrinfo('localhost').

  With Py 2.3 (without IPv6 support) this is only the IPv4 address,
  but with Py 2.4-2.6 the order is (on my Win XP host) the IPv6 address
  first, then the IPv4 address. Since the IPv6 address is checked first,
  this gives a timeout and causes the slow connect() call. The order by
  which getaddrinfo returns IPv4/v6 under Linux seems to vary depending
  on the glibc version, so it may be a problem on other platforms, too.

You can see the cause of the slow connect() like this:

  import httplib
  conn = httplib.HTTPConnection('localhost', 8000)
  conn.set_debuglevel(1)
  conn.connect()

This is what I get:

  connect: (localhost, 8000)
  connect fail: ('localhost', 8000)
  connect: (localhost, 8000)

The first (failing) connect is the attempt to connect to the IPv6
address which BaseHTTPServer doesn't listen to. (This is the debug
output of Py 2.5 which really should be improved to show the IP address
that is actually used. Unfortunately, in Py 2.6 the debug output when
connecting has even fallen prey to a refactoring. I think it should
either be added again or set_debuglevel() is now pretty meaningless.)

Can we do something about the mismatch that SimpleHTTPServer only serves
IPv4, but HTTPConnection tries to connect with IPv6 first?

I guess other people also stumbled over this, maybe without even
noticing and just wondering about the slow performance. E.g.:
http://schotime.net/blog/index.php/2008/05/27/slow-tcpclient-connection-sockets/

One possible solution would be to improve the TCPServer in the standard
lib so that it determines the address_family and real server_address
based on the first return value of socket.getaddrinfo, like this:

class TCPServer(BaseServer):
...

def __init__(self, server_address, RequestHandlerClass):
if server_address and len(server_address) == 2:
(self.address_family, dummy, dummy, dummy,
server_address) = socket.getaddrinfo(*server_address)[0]
else:
raise TypeError("server_address must be a 2-tuple")
BaseServer.__init__(self, server_address, RequestHandlerClass)
...

That way, if you either serve as or connect to 'localhost', you will
always consistently do this via IPv4 or IPv6, depending on what is
preferred on your platform.

Does this sound reasonable? Any better ideas?

-- Christoph
--
http://mail.python.org/mailman/listinfo/python-list