[kamaelia-list] Re: Bug in SingleShotHTTPClient
Michael, I was reviewing the TCPClient.py code. In the runClient method you have: sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM); yield 0.3 self.sock = sock # We need this for shutdown later try: sock.setblocking(0); yield 0.6 try: startConnect = time.time() while not self.safeConnect(sock,(self.host, self.port)): And in safeConnect you have: sock.connect(*sockArgsList); # Expect socket.error: (115, 'Operation now in progress') In the python socket module docs I see: s.setblocking(0) is equivalent to s.settimeout(0) and Note that the connect() operation is subject to the timeout setting, and in general it is recommended to call settimeout() before calling connect(). So I get that you want the socket operations to be non-blocking. And non-blocking operations should fail if they can't complete rather than block. But the connect operation is using a timeout of zero because of the blocking setting. And it seems like the problem I'm having on windows is that the connection attempt never times out. So, would it be reasonable to: 1) setblocking(0) in runClient as it is today 2) In safeConnect, sock.settimeout(20) 3) sock.connect() as it is today 4) sock.settimeout(0) after the connection It seems like this would allow you to have a timeout honored for the connect operation without impacting non-blocking data operations post- connect. --Steve --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups kamaelia group. To post to this group, send email to kamaelia@googlegroups.com To unsubscribe from this group, send email to kamaelia+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/kamaelia?hl=en -~--~~~~--~~--~--~---
[kamaelia-list] Re: Bug in SingleShotHTTPClient
On Tuesday 03 March 2009 19:26:57 Steve wrote: Michael, I was reviewing the TCPClient.py code. Many thanks for this. As a preface to what follows, I've put a different implementation into TCPClient - in line with my comments yesterday. The reason is to allow TCPClient to continue to not cause the system to freeze. The cost at present is higher CPU usage than would be ideal, but it's during a connection phase, so your example usage (making many many outbound connections simultaneously) is an edge case, which we can come back to and optimse. (personal general viewpoint: get it working, make it work correctly[1], then optimise) [1] eg handle edge cases you (me in this case) haven't considered :) In the runClient method you have: sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM); yield 0.3 self.sock = sock # We need this for shutdown later try: sock.setblocking(0); yield 0.6 try: startConnect = time.time() while not self.safeConnect(sock,(self.host, self.port)): Correct. For some history as to why it uses the raise Finality structure, you can see the history here: * http://mail.python.org/pipermail/python-list/2003-June/207723.html http://mail.python.org/pipermail/python-list/2003-June/thread.html#207723 And in safeConnect you have: sock.connect(*sockArgsList); # Expect socket.error: (115, 'Operation now in progress') In the python socket module docs I see: s.setblocking(0) is equivalent to s.settimeout(0) and Note that the connect() operation is subject to the timeout setting, and in general it is recommended to call settimeout() before calling connect(). Note, this code form is due to me being used to coding sockets stuff in C, C++ perl previously where socket calls don't contain any timeout. Indeed, if you want an idea of the complexity of implementing timeouts normally, it's perhaps worth looking at this page: * http://tinyurl.com/bu8tz2 (scroll down to just past 1/2 way - There are three ways to place a timeout on an I/O operation involving a socket.) The timeout you're referring to here is actually implemented inside Python/Modules/socketmodule.c, and behind the scenes actually uses either poll or select (depending on platform) in a blocking mode in order to do the right thing. (do the right thing being subjective here relative to blocking sockets) However in this case, setting the timeout to non-zero, eventually ends up with this piece of c-code being executed: tv.tv_sec = (int)s-sock_timeout; ... if (writing) n = select(s-sock_fd+1, NULL, fds, NULL, tv); else n = select(s-sock_fd+1, fds, NULL, NULL, tv); This turns into a blocking call, which then hangs the system. (Which is why sock.setblocking(0) has to set the timeout to 0 as well :) So I get that you want the socket operations to be non-blocking. And non-blocking operations should fail if they can't complete rather than block. But the connect operation is using a timeout of zero because of the blocking setting. And it seems like the problem I'm having on windows is that the connection attempt never times out. This conflates the two issues really. The real issues is simply that I never thought of putting timeout handling into the TCPClient code, nor where. So, would it be reasonable to: 1) setblocking(0) in runClient as it is today 2) In safeConnect, sock.settimeout(20) 3) sock.connect() as it is today 4) sock.settimeout(0) after the connection It seems like this would allow you to have a timeout honored for the connect operation without impacting non-blocking data operations post- connect. From the above you should see what this isn't reasonable, but in case it isn't suppose you start 10 TCPClients as follows: for x in range(10): Pipeline( TCPClient(dest[x],port[x], connect_timeout=20), OutputHandler() ).activate() And suppose every single one is blocked. Rather than this timing out in about 20 seconds (as it would now given the fix just put in), it would effectively hang the system for 200 seconds, until all 10 connections time out - effectively serialising the connection attempts. 1000 failed/filtered consecutive connections in this manner would take 20,000 seconds or just over 5 1/2 hours :) Fundamentally that's why I've not taken this approach here :) The fix put in, which solves the immediate issue, is here: * http://tinyurl.com/covwp6 Michael. -- http://yeoldeclue.com/blog http://twitter.com/kamaelian http://www.kamaelia.org/Home --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups kamaelia group. To post to this group, send email to kamaelia@googlegroups.com To unsubscribe from this group, send email to kamaelia+unsubscr...@googlegroups.com For more options,
[kamaelia-list] Re: Bug in SingleShotHTTPClient
class TCPClient(Axon.Component.component): def __init__(self,host,port,delay=0,connect_timeout=60): self.connect_timeout = connect_timeout ... connect_start = time.time() while not self.safeConnect(sock,(self.host, self.port)): if self.shutdown(): return if ( time.time() - connect_start ) self.connect_timeout: self.howDied = timeout raise Finality yield 1 I just updated to my tcpclient to get the timeout you checked in. May I suggest rearranging the math a little to take it out of the loop: waitTill = time.time() + self.connect_timeout while not self.safeConnect(sock,(self.host, self.port)): if self.shutdown(): return if time.time() = self.connect_timeout: self.howDied = timeout raise Finality yield 1 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups kamaelia group. To post to this group, send email to kamaelia@googlegroups.com To unsubscribe from this group, send email to kamaelia+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/kamaelia?hl=en -~--~~~~--~~--~--~---
[kamaelia-list] Re: Bug in SingleShotHTTPClient
Thinko, I meant: waitTill = time.time() + self.connect_timeout while not self.safeConnect(sock,(self.host, self.port)): if self.shutdown(): return if time.time() = waitTill: self.howDied = timeout raise Finality yield 1 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups kamaelia group. To post to this group, send email to kamaelia@googlegroups.com To unsubscribe from this group, send email to kamaelia+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/kamaelia?hl=en -~--~~~~--~~--~--~---
[kamaelia-list] Re: Bug in SingleShotHTTPClient
Oh heck, this bug is in the underlying TCPClient! After spending days developing against localhost, I now find that I can't go live without having to do manual name resolution. :( --Steve On Mar 2, 1:39 am, Steve unetright.thebas...@xoxy.net wrote: FYI SingleShotHTTPClient on windows vista goes nuts opening thousands of ports when making a connection to an address which requires name resolution and which includes a port number. SingleShotHTTPClient('http://www.google.com/') = OK SingleShotHTTPClient('http://www.google.com:8000/') = Kaboom SingleShotHTTPClient('http://66.102.7.99:8000/') = OK SingleShotHTTPClient('http://localhost:8000/') = OK Cheers, Steve --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups kamaelia group. To post to this group, send email to kamaelia@googlegroups.com To unsubscribe from this group, send email to kamaelia+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/kamaelia?hl=en -~--~~~~--~~--~--~---
[kamaelia-list] Re: Bug in SingleShotHTTPClient
On Monday 02 March 2009 09:49:36 Steve wrote: Oh heck, this bug is in the underlying TCPClient! After spending days developing against localhost, I now find that I can't go live without having to do manual name resolution. That really should not be a problem (ie I've not seen that problem before). Can you give a minimal example using TCPClient that doesn't work for you? What platform are you under ? I've not needed to change TCPClient with regard to basic functionality in several years which is why I'm asking this. I've got a feeling it's a windows vs linux/Mac OS X thing... Michael. -- http://yeoldeclue.com/blog http://twitter.com/kamaelian http://www.kamaelia.org/Home --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups kamaelia group. To post to this group, send email to kamaelia@googlegroups.com To unsubscribe from this group, send email to kamaelia+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/kamaelia?hl=en -~--~~~~--~~--~--~---
[kamaelia-list] Re: Bug in SingleShotHTTPClient
On Monday 02 March 2009 09:39:02 Steve wrote: SingleShotHTTPClient on windows vista goes nuts opening thousands of ports when making a connection to an address which requires name resolution and which includes a port number. 1 SingleShotHTTPClient('http://www.google.com/') = OK Google is listening on port 80. 2 SingleShotHTTPClient('http://www.google.com:8000/') = Kaboom www.google.com is not listening on port 8000 BUT they're filtering it, so rather than their system sending back a TCP RESET, it just doesn't respond. Behaviour of Google with telnet: ~/ telnet www.google.com 8000 Trying 209.85.229.147... [hang] Behaviour with a system that's not listening on port 8000 AND not filtering (meaning the TCP stack responds with a TCP RESET) : ~/code.google/kamaelia/trunk/Code/Python/Kamaelia/Examples telnet 192.168.2.1 8001 Trying 192.168.2.1... telnet: connect to address 192.168.2.1: Connection refused 3 SingleShotHTTPClient('http://66.102.7.99:8000/') = OK 66.102.7.99 8000 is not listening on port 8000 (checked, it isn't - and like www.google.com it's also filtering rather than just resetting) 4 SingleShotHTTPClient('http://localhost:8000/') = OK No idea :) 2 3 should result in the same behaviour on your system, and I'm confused as to why it doesn't at present. I think I need slightly more detail here to help out fixing whatever is the issue. Michael. -- http://yeoldeclue.com/blog http://twitter.com/kamaelian http://www.kamaelia.org/Home --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups kamaelia group. To post to this group, send email to kamaelia@googlegroups.com To unsubscribe from this group, send email to kamaelia+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/kamaelia?hl=en -~--~~~~--~~--~--~---
[kamaelia-list] Re: Bug in SingleShotHTTPClient
On Monday 02 March 2009 09:49:36 Steve wrote: Oh heck, this bug is in the underlying TCPClient! After spending days developing against localhost, I now find that I can't go live without having to do manual name resolution. :( I've done some digging. There's definitely something odd going on here. I got suspicious that there was a bug in TCPClient. Specifically, if you do: TCPClient(www.google.com, 80) This results in a call inside TCPClient of: self.safeConnect(sock, (www.google.com, 80)) Which results in (fundamentally) : sock.connect((www.google.com, 80)) This has worked (correctly) for the past several years, hence the surprise. However, your comment made me wonder if this should result in on some platforms: IP = sock.gethostbyname(www.google.com) sock.connect((IP, 80)) Since that's what you'd normally do with the C sockets API. However, re-reading both the docs for the socket module *AND* looking at the source for Python-2.6.1/Modules/socketmodule.c , I note that the comments in the socketmodule say: /* Convert a string specifying a host name or one of a few symbolic names to a numeric IP address. This usually calls gethostbyname() to do the work; the names and broadcast are special. Return the length (IPv4 should be 4 bytes), or negative if an error occurred; then an exception is raised. */ Specifically this means that socket.connect itself decodes (for example) the IP address 66.102.7.99 - for example from: SingleShotHTTPClient('http://66.102.7.99:8000/') = OK Using the same function, which through one path checks for something looking like an IP and parses it, and through the other path, ends up calling gethostbyname (in gettaddrinfo.c) UNLESS you have IPV6 enabled, in which case it ends up calling getipnodebyname . As a result I'm rather puzzled as to what's causing your problem here... Michael. -- http://yeoldeclue.com/blog http://twitter.com/kamaelian http://www.kamaelia.org/Home --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups kamaelia group. To post to this group, send email to kamaelia@googlegroups.com To unsubscribe from this group, send email to kamaelia+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/kamaelia?hl=en -~--~~~~--~~--~--~---
[kamaelia-list] Re: Bug in SingleShotHTTPClient
Michael, Thank you for taking a look at this. I slimmed down my test case to just: from Kamaelia.Internet.TCPClient import TCPClient if __name__ == '__main__': TCPClient('www.google.com', 80).run() I also tested with this line instead: TCPClient('127.0.0.1', 80).run() I was thrown off by the UDP traffic generated my the name resolution of localhost, but the real problem is the handling of a connection that is refused. I have no server running on port 80 so it should timeout and return. Instead it tries (in a seemingly infinite loop) to make the connection. My software firewall (set to allow all) reports around 1 connection attempt every second. The python process also consumes ~85% of one of my cores during this time. Things get even weireder when I test with: TCPClient('localhost', 80).run() Here my firewall reports the same TCP connection attempts as the 127 test, but it also reports thousands of UDP packets as well. I suspect every second it is doing some name resolution which is generating the UDP traffic. The end result is that I believe Kamaelia (or the socket layer) is not properly handling a silently refused connection. There may be a similar problem with UDP sockets because I was about to detail an inability to shutdown unconnected UDP peers. Thanks, Steve --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups kamaelia group. To post to this group, send email to kamaelia@googlegroups.com To unsubscribe from this group, send email to kamaelia+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/kamaelia?hl=en -~--~~~~--~~--~--~---
[kamaelia-list] Re: Bug in SingleShotHTTPClient
Hi Steve, This email may read a little odd. I've been writing this whilst reading and trying things out, saw your update, and having had a thought. As a result the train of thought changes as I go through this, but I've left that in since it may be of use. On Monday 02 March 2009 21:11:53 Steve wrote: .. TCPClient('127.0.0.1', 80).run() I have no server running on port 80 so it should timeout and return. Instead it tries (in a seemingly infinite loop) to make the connection. My software firewall (set to allow all) reports around 1 connection attempt every second. That's rather odd. I've just tried the same thing here, and I don't see any looping attempt to connect - it fails to connect and exits straight away: ~ python Python 2.5.1 (r251:54863, Jan 10 2008, 18:01:57) [GCC 4.2.1 (SUSE Linux)] on linux2 Type help, copyright, credits or license for more information. from Kamaelia.Internet.TCPClient import TCPClient TCPClient(127.0.0.1, 81).run() That said, I'm not running a firewall though... I've also checked that it's passing on shutdown correctly. First the case that doesn't exit, until control-c : from Kamaelia.Chassis.Pipeline import Pipeline from Kamaelia.Util.Console import ConsoleEchoer Pipeline( ConsoleEchoer() ).run() And then a version that does exit because the TCPClient passes on it's shutdown on error: ~ python Python 2.5.1 (r251:54863, Jan 10 2008, 18:01:57) [GCC 4.2.1 (SUSE Linux)] on linux2 Type help, copyright, credits or license for more information. from Kamaelia.Chassis.Pipeline import Pipeline from Kamaelia.Util.Console import ConsoleEchoer from Kamaelia.Internet.TCPClient import TCPClient Pipeline( TCPClient(127.0.0.1, 81), ConsoleEchoer() ).run() All this tells me is that the situation I've tested it in and mainly used it in still works as expected. This isn't the same as your situation. As a result, I'm no real information up - I'm not reproducing your behaviour/environment correctly yet. I'm beginning to think I ought to start up a windows VM and see if I can reproduce this there. The end result is that I believe Kamaelia (or the socket layer) is not properly handling a silently refused connection I can see this is possible, simply because a filtered connection that doesn't send a TCP RESET won't cause an error at the socket layer (it just won't connect). That leads to this hanging: Pipeline( TCPClient(www.google.com, 8000), ConsoleEchoer() ).run() ... Which is (part of) why you're after timeouts etc, and go after that root cause. Michael, I'm starting to think that this whole TTL component that I've made might be completely unneeded. I think I was just reacting to the bugs that I've been describing in other threads. If we could identify why refused connections are infinite looping on vista and why unconnected udp peers are refusing to shutdown, there might be no need for this kind of time terminating component I agree that going after the root cause is a good idea. (That said, the TTL component is sufficiently generic to be useful beyond this issue, even if we deal with it differently) I think you're hitting a combination of things, in the main code, we have: while not self.safeConnect(sock,(self.host, self.port)): if self.shutdown(): return yield 1 This in combination with the other parts is probably why you're seeing a looping connect attempt. This goes into safeConnect, and if you're connecting to a filtered connection, it would hit this logic path in safeConnect: try: sock.connect(*sockArgsList); # Expect socket.error: (115, 'Operation now in progress') except socket.error, socket.msg: (errorno, errmsg) = socket.msg.args if errorno==errno.EALREADY: # The socket is non-blocking and a previous connection attempt has not yet been completed # We handle this by allowing the code to come back and repeatedly retry # connecting. This is a valid, if brute force approach. assert(self.connecting==1) return False elif errorno==errno.EINPROGRESS or errorno==errno.EWOULDBLOCK: #The socket is non-blocking and the connection cannot be completed immediately. # We handle this by allowing the code to come back and repeatedly retry # connecting. Rather brute force. self.connecting=1 return False # Not connected should retry until no error ie it hits one of these three conditions. For what it's worth a non-filtered not connected socket hits this path: try: sock.connect(*sockArgsList); # Expect socket.error: (115, 'Operation now in progress') except socket.error, socket.msg: (errorno, errmsg) = socket.msg.args if errorno==errno.EALREADY: # Anything else is an error we don't handle else: raise socket.msg That
[kamaelia-list] Re: Bug in SingleShotHTTPClient
Michael, Thank you again for looking into this. Also, I don't have a lot of network code experience in python, so please take all this with ample salt. I'm beginning to think I ought to start up a windows VM and see if I can reproduce this there. I would like to see big K be well supported on windows. And I'd be happy to help in anyway I could. I can't donate a microsoft license to you, but I can test locally. I bet you could get MS to donate a license. But if you need any help setting up a VM, let me know. I use vmware and virtualbox. Now there's several ways that I could go down this, but I can see that probably the simplest would be to add a connection timeout this way: class TCPClient(Axon.Component.component): def __init__(self,host,port,delay=0,connect_timeout=60): self.connect_timeout = connect_timeout ... connect_start = time.time() while not self.safeConnect(sock,(self.host, self.port)): if self.shutdown(): return if ( time.time() - connect_start ) self.connect_timeout: self.howDied = timeout raise Finality yield 1 I am +1 to the idea of including a timeout parameter. In fact, I think every network operation call inside Kamaelia should expose a defaulted timeout parameter. That said, my gut feeling is that the timeout should be handled at the lowest level possible and then exposed all the way up the call tree. For example: connection, it would hit this logic path in safeConnect: try: sock.connect(*sockArgsList); # Expect socket.error: (115, 'Operation now in progress') except socket.error, socket.msg: (errorno, errmsg) = socket.msg.args if errorno==errno.EALREADY: # The socket is non-blocking and a previous connection attempt has not yet been completed # We handle this by allowing the code to come back and repeatedly retry # connecting. This is a valid, if brute force approach. assert(self.connecting==1) return False elif errorno==errno.EINPROGRESS or errorno==errno.EWOULDBLOCK: #The socket is non-blocking and the connection cannot be completed immediately. # We handle this by allowing the code to come back and repeatedly retry # connecting. Rather brute force. self.connecting=1 return False # Not connected should retry until no error Here we have EALREADY, EINPROGRESS and EWOULDBLOCK. I think there needs to be a way to timeout these connection attempts rather than simply not starting another attempt after some timeout period. Why would we want to keep retrying the connection during the timeout period? I think it should only make 1 single connection attempt and wait at most timeout period for success. On an only slightly related note, I remember reading a posting once by Glyph where he was promoting one of the values for using twisted. He said that they had spent considerable time testing and debugging the frustratingly disparate socket behaviors on the 3 major platforms and only twisted really did the right thing while still exposing a uniform framework interface. My brain doesn't fit twisted, but now I'm starting to appreciate what he was talking about. Thanks, Steve --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups kamaelia group. To post to this group, send email to kamaelia@googlegroups.com To unsubscribe from this group, send email to kamaelia+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/kamaelia?hl=en -~--~~~~--~~--~--~---