Re: More urllib timeout issues.
Facundo Batista wrote: Steve Holden wrote: 1) There is work afoot to build timeout arguments into network libraries for 2.6, and I know Facundo Batista has been involved, you might want to Google or email Facundo about that. Right now (in svn trunk) httplib, ftplib, telnetlib, etc, has a timeout argument. If you use it, the socket timeout will be set (through s.settimeout()). What behaviour has the socket after setting it the timeout, is beyond of these changes, though. BTW, I still need to make the final step here, that is adding a timeout argument to urllib2.urlopen(). Regards, urllib, robotparser, and M2Crypto also need to be updated to match. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: More urllib timeout issues.
John Nagle wrote: I took a look at Facundo Batista's work in the tracker, and he currently seems to be trying to work out a good way to test the existing SSL module. It has to connect to something to be tested, Right now, test_socket_ssl.py has, besides the previous tests, the capability of executing openssl's s_server and connect to him. I'm lees than a SSL begginer, so I do not have the knowledge to make interesting tests, I just made the obvious ones... If you know SSL, you can take that code and add new tests very easily. Regards, -- . Facundo . Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ -- http://mail.python.org/mailman/listinfo/python-list
Re: More urllib timeout issues.
Steve Holden wrote: 1) There is work afoot to build timeout arguments into network libraries for 2.6, and I know Facundo Batista has been involved, you might want to Google or email Facundo about that. Right now (in svn trunk) httplib, ftplib, telnetlib, etc, has a timeout argument. If you use it, the socket timeout will be set (through s.settimeout()). What behaviour has the socket after setting it the timeout, is beyond of these changes, though. BTW, I still need to make the final step here, that is adding a timeout argument to urllib2.urlopen(). Regards, -- . Facundo . Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ -- http://mail.python.org/mailman/listinfo/python-list
Re: More urllib timeout issues.
John Nagle wrote: I thought I had all the timeout problems with urllib worked around, but no. socket.setdefaulttimeout is useful, but not always effective. I'm setting that to 15 seconds. If the host end won't open the connection within 15 seconds, urllib times out. But if the host end opens the connection, then never sends anything, urllib waits for many minutes before timing out. Any idea how to deal with this? And don't just say use urllib2 unless you KNOW it works better there and can explain why. I finally have M2Crypto and urllib playing well together, and don't want to mess with that. For some wierd reason, several UK academic sites have this behavior, including soton.ac.uk. If you try to open that in a browser, the browser just sits there, and eventually, after several minutes, displays The site is taking too long to respond. What's the current status in this area? Some patches to sockets were proposed a while back. There's a long history of trouble in this area, and some fixes, but nothing that just works. The sockets module has two timeout settings (socket.setdefaulttimeout and sock.settimeout, the M2Crypto module has two (sock.set_socket_read_timeout and sock.set_socket_write_timeout), and none of them play well together or with the urllib/urllib2/httplib level and the blocking/non blocking socket distinction. What we really should have is something like this: Sockets should have set_socket_connect_timeout set_socket_read_timeout set_socket_write_timeout which set an upper limit on how long a socket can go with a request for a connect, read or write pending but without progress on the connection. This needs to be independent of select poll timeouts, and these timeouts should work on blocking sockets. The existing socket function settimeout should set all of the above, and socket.setdefaulttimeout should set the default value for settimeout to be used on new sockets. SSL and M2Crypto, which wrap socket functionality, should understand all the above functions. HTTPlib, urllib, and urllib2 objects should understand settimeout Making the connect/read/write timeout distinction at that level probably isn't worth the trouble. Then we'd have a reasonable network timeout system. We have about half of the above now, but it's not consistent. Comments? The only comments I'll make for now are 1) There is work afoot to build timeout arguments into network libraries for 2.6, and I know Facundo Batista has been involved, you might want to Google or email Facundo about that. 2) The main reason why socket.setdefaulttimeout is unsuitable for many purposes is its thread-unsafe property, so all threads must use the same default timeout or have it randomly change according to the whim of hte last thread to alter it. 3) This is important and sensible work and if properly followed through will likely lead to serious quality improvements in the network libraries. regards Steve -- Steve Holden+1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden -- Asciimercial - Get Python in your .sig and on the web. Blog and lens holdenweb.blogspot.comsquidoo.com/pythonology tag items:del.icio.us/steve.holden/python All these services currently offer free registration! -- Thank You for Reading -- http://mail.python.org/mailman/listinfo/python-list
Re: More urllib timeout issues.
Steve Holden wrote: John Nagle wrote: Then we'd have a reasonable network timeout system. We have about half of the above now, but it's not consistent. Comments? The only comments I'll make for now are 1) There is work afoot to build timeout arguments into network libraries for 2.6, and I know Facundo Batista has been involved, you might want to Google or email Facundo about that. 2) The main reason why socket.setdefaulttimeout is unsuitable for many purposes is its thread-unsafe property, so all threads must use the same default timeout or have it randomly change according to the whim of hte last thread to alter it. It has other problems. If you set that value, it affects socket blocking/non blocking modes. It can mess up M2Crypto, causing it to report Peer did not return certificate. 3) This is important and sensible work and if properly followed through will likely lead to serious quality improvements in the network libraries. Agreed. regards Steve I took a look at Facundo Batista's work in the tracker, and he currently seems to be trying to work out a good way to test the existing SSL module. It has to connect to something to be tested, of course. Testing network functionality is tough; to do it right, you need a little test network to talk to, one that forces some of the error cases. And network testing doesn't have the repeatability upon which the Python test system/buildbot depends. It's really tough to test this stuff properly. The best I've been able to do so far is to run the 11,000 site list from the Webspam Challenge through our web spider. Here's a list of URLs from our error log which have given us connection trouble of one kind or another. Most of these open an HTTP transaction, but for some reason, don't carry it through to completion properly, resulting in a long stall in urllib. blaby.gov.uk boys-brigade.org.uk cam.ac.uk essex.ac.uk gla.ac.uk open.ac.uk soton.ac.uk uea.ac.uk ulster.ac.uk So that's a short, but useful, set of timeout test cases. Those are the ones that timed out after, not during, TCP connection opening. It's interesting that this problem appears for the root domains of many English universities. They must all run the same server software. Some of these fail because robotparser, which uses urllib, hangs for minutes trying to read the robots.txt file associated with the domain. This isn't something that requires a major redesign. These are bugs. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
More urllib timeout issues.
I thought I had all the timeout problems with urllib worked around, but no. socket.setdefaulttimeout is useful, but not always effective. I'm setting that to 15 seconds. If the host end won't open the connection within 15 seconds, urllib times out. But if the host end opens the connection, then never sends anything, urllib waits for many minutes before timing out. Any idea how to deal with this? And don't just say use urllib2 unless you KNOW it works better there and can explain why. I finally have M2Crypto and urllib playing well together, and don't want to mess with that. For some wierd reason, several UK academic sites have this behavior, including soton.ac.uk. If you try to open that in a browser, the browser just sits there, and eventually, after several minutes, displays The site is taking too long to respond. What's the current status in this area? Some patches to sockets were proposed a while back. There's a long history of trouble in this area, and some fixes, but nothing that just works. The sockets module has two timeout settings (socket.setdefaulttimeout and sock.settimeout, the M2Crypto module has two (sock.set_socket_read_timeout and sock.set_socket_write_timeout), and none of them play well together or with the urllib/urllib2/httplib level and the blocking/non blocking socket distinction. What we really should have is something like this: Sockets should have set_socket_connect_timeout set_socket_read_timeout set_socket_write_timeout which set an upper limit on how long a socket can go with a request for a connect, read or write pending but without progress on the connection. This needs to be independent of select poll timeouts, and these timeouts should work on blocking sockets. The existing socket function settimeout should set all of the above, and socket.setdefaulttimeout should set the default value for settimeout to be used on new sockets. SSL and M2Crypto, which wrap socket functionality, should understand all the above functions. HTTPlib, urllib, and urllib2 objects should understand settimeout Making the connect/read/write timeout distinction at that level probably isn't worth the trouble. Then we'd have a reasonable network timeout system. We have about half of the above now, but it's not consistent. Comments? John Nagle -- http://mail.python.org/mailman/listinfo/python-list