[kamaelia-list] Re: Bug in SingleShotHTTPClient

2009-03-03 Thread Steve

Michael,

I was reviewing the TCPClient.py code.  In the runClient method you
have:

 sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM);
yield 0.3
 self.sock = sock # We need this for shutdown later
 try:
sock.setblocking(0); yield 0.6
try:
   startConnect = time.time()
   while not self.safeConnect(sock,(self.host,
self.port)):

And in safeConnect you have:

 sock.connect(*sockArgsList); # Expect socket.error: (115,
'Operation now in progress')

In the python socket module docs I see:

s.setblocking(0) is equivalent to s.settimeout(0)

and

Note that the connect() operation is subject to the timeout
setting, and in general it is   recommended to call settimeout()
before calling connect().

So I get that you want the socket operations to be non-blocking.  And
non-blocking operations should fail if they can't complete rather than
block.  But the connect operation is using a timeout of zero because
of the blocking setting.  And it seems like the problem I'm having on
windows is that the connection attempt never times out.

So, would it be reasonable to:
1) setblocking(0) in runClient as it is today
2) In safeConnect, sock.settimeout(20)
3) sock.connect() as it is today
4) sock.settimeout(0) after the connection

It seems like this would allow you to have a timeout honored for the
connect operation without impacting non-blocking data operations post-
connect.

--Steve

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
kamaelia group.
To post to this group, send email to kamaelia@googlegroups.com
To unsubscribe from this group, send email to 
kamaelia+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/kamaelia?hl=en
-~--~~~~--~~--~--~---



[kamaelia-list] Re: Bug in SingleShotHTTPClient

2009-03-03 Thread Michael Sparks

On Tuesday 03 March 2009 19:26:57 Steve wrote:
 Michael,

 I was reviewing the TCPClient.py code.  

Many thanks for this. As a preface to what follows, I've put a different
implementation into TCPClient - in line with my comments yesterday. The
reason is to allow TCPClient to continue to not cause the system to freeze.

The cost at present is higher CPU usage than would be ideal, but it's during
a connection phase, so your example usage (making many many outbound
connections simultaneously) is an edge case, which we can come back to
and optimse. (personal general viewpoint: get it working, make it work
correctly[1], then optimise)

[1] eg handle edge cases you (me in this case) haven't considered :)

 In the runClient method you 
 have:

          sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM);
 yield 0.3
          self.sock = sock # We need this for shutdown later
          try:
             sock.setblocking(0); yield 0.6
             try:
                startConnect = time.time()
                while not self.safeConnect(sock,(self.host,
 self.port)):

Correct. For some history as to why it uses the raise Finality structure,
you can see the history here:
* http://mail.python.org/pipermail/python-list/2003-June/207723.html
   
http://mail.python.org/pipermail/python-list/2003-June/thread.html#207723

 And in safeConnect you have:

          sock.connect(*sockArgsList); # Expect socket.error: (115,
 'Operation now in progress')

 In the python socket module docs I see:

     s.setblocking(0) is equivalent to s.settimeout(0)

 and

     Note that the connect() operation is subject to the timeout
 setting, and in general it is   recommended to call settimeout()
 before calling connect().

Note, this code form is due to me being used to coding sockets stuff
in C, C++  perl previously where socket calls don't contain any timeout.

Indeed, if you want an idea of the complexity of implementing timeouts
normally, it's perhaps worth looking at this page:
* http://tinyurl.com/bu8tz2

(scroll down to just past 1/2 way - There are three ways to place a timeout
on an I/O operation involving a socket.)

The timeout you're referring to here is actually implemented inside
Python/Modules/socketmodule.c, and behind the scenes actually
uses either poll or select (depending on platform) in a blocking mode
in order to do the right thing. (do the right thing being subjective
here relative to blocking sockets)

However in this case, setting the timeout to non-zero, eventually ends up
with this piece of c-code being executed:
tv.tv_sec = (int)s-sock_timeout;
...
if (writing)
n = select(s-sock_fd+1, NULL, fds, NULL, tv);
else
n = select(s-sock_fd+1, fds, NULL, NULL, tv);

This turns into a blocking call, which then hangs the system. (Which is why
sock.setblocking(0) has to set the timeout to 0 as well :)

 So I get that you want the socket operations to be non-blocking.  And
 non-blocking operations should fail if they can't complete rather than
 block.  But the connect operation is using a timeout of zero because
 of the blocking setting.  And it seems like the problem I'm having on
 windows is that the connection attempt never times out.

This conflates the two issues really. The real issues is simply that I
never thought of putting timeout handling into the TCPClient code, nor
where.

 So, would it be reasonable to:
 1) setblocking(0) in runClient as it is today
 2) In safeConnect, sock.settimeout(20)
 3) sock.connect() as it is today
 4) sock.settimeout(0) after the connection

 It seems like this would allow you to have a timeout honored for the
 connect operation without impacting non-blocking data operations post-
 connect.

From the above you should see what this isn't reasonable, but in case it
isn't suppose you start 10 TCPClients as follows:

for x in range(10):
Pipeline( TCPClient(dest[x],port[x], connect_timeout=20), 
OutputHandler() ).activate()

And suppose every single one is blocked. Rather than this timing out
in about 20 seconds (as it would now given the fix just put in), it would
effectively hang the system for 200 seconds, until all 10 connections time
out - effectively serialising the connection attempts. 1000 failed/filtered
consecutive connections in this manner would take 20,000 seconds or
just over 5 1/2 hours :)

Fundamentally that's why I've not taken this approach here :)

The fix put in, which solves the immediate issue, is here:
* http://tinyurl.com/covwp6


Michael.
-- 
http://yeoldeclue.com/blog
http://twitter.com/kamaelian
http://www.kamaelia.org/Home

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
kamaelia group.
To post to this group, send email to kamaelia@googlegroups.com
To unsubscribe from this group, send email to 
kamaelia+unsubscr...@googlegroups.com
For more options, 

[kamaelia-list] Re: Bug in SingleShotHTTPClient

2009-03-03 Thread Steve

  class TCPClient(Axon.Component.component):
 def __init__(self,host,port,delay=0,connect_timeout=60):
  self.connect_timeout = connect_timeout
  ...
 connect_start = time.time()
 while not self.safeConnect(sock,(self.host, self.port)):
if self.shutdown():
return
if ( time.time() - connect_start )  self.connect_timeout:
self.howDied = timeout
raise Finality
yield 1

I just updated to my tcpclient to get the timeout you checked in.  May
I suggest rearranging the math a little to take it out of the loop:

   waitTill = time.time() + self.connect_timeout
   while not self.safeConnect(sock,(self.host,
self.port)):
  if self.shutdown():
  return
  if time.time() = self.connect_timeout:
  self.howDied = timeout
  raise Finality
  yield 1

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
kamaelia group.
To post to this group, send email to kamaelia@googlegroups.com
To unsubscribe from this group, send email to 
kamaelia+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/kamaelia?hl=en
-~--~~~~--~~--~--~---



[kamaelia-list] Re: Bug in SingleShotHTTPClient

2009-03-03 Thread Steve
Thinko, I meant:

                waitTill = time.time() + self.connect_timeout
                while not self.safeConnect(sock,(self.host,
 self.port)):
                   if self.shutdown():
                       return
                   if time.time() = waitTill:
                       self.howDied = timeout
                       raise Finality
                   yield 1

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
kamaelia group.
To post to this group, send email to kamaelia@googlegroups.com
To unsubscribe from this group, send email to 
kamaelia+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/kamaelia?hl=en
-~--~~~~--~~--~--~---



[kamaelia-list] Re: Bug in SingleShotHTTPClient

2009-03-02 Thread Steve

Oh heck, this bug is in the underlying TCPClient!  After spending days
developing against localhost, I now find that I can't go live without
having to do manual name resolution.  :(

--Steve


On Mar 2, 1:39 am, Steve unetright.thebas...@xoxy.net wrote:
 FYI

 SingleShotHTTPClient on windows vista goes nuts opening thousands of
 ports when making a connection to an address which requires name
 resolution and which includes a port number.

 SingleShotHTTPClient('http://www.google.com/') = OK
 SingleShotHTTPClient('http://www.google.com:8000/') = Kaboom
 SingleShotHTTPClient('http://66.102.7.99:8000/') = OK
 SingleShotHTTPClient('http://localhost:8000/') = OK

 Cheers,
 Steve


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
kamaelia group.
To post to this group, send email to kamaelia@googlegroups.com
To unsubscribe from this group, send email to 
kamaelia+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/kamaelia?hl=en
-~--~~~~--~~--~--~---



[kamaelia-list] Re: Bug in SingleShotHTTPClient

2009-03-02 Thread Michael Sparks

On Monday 02 March 2009 09:49:36 Steve wrote:
 Oh heck, this bug is in the underlying TCPClient!  After spending days
 developing against localhost, I now find that I can't go live without
 having to do manual name resolution.  

That really should not be a problem (ie I've not seen that problem before).
Can you give a minimal example using TCPClient that doesn't work for you?
What platform are you under ?

I've not needed to change TCPClient with regard to basic functionality in 
several years which is why I'm asking this.

I've got a feeling it's a windows vs linux/Mac OS X thing...


Michael.
-- 
http://yeoldeclue.com/blog
http://twitter.com/kamaelian
http://www.kamaelia.org/Home

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
kamaelia group.
To post to this group, send email to kamaelia@googlegroups.com
To unsubscribe from this group, send email to 
kamaelia+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/kamaelia?hl=en
-~--~~~~--~~--~--~---



[kamaelia-list] Re: Bug in SingleShotHTTPClient

2009-03-02 Thread Michael Sparks

On Monday 02 March 2009 09:39:02 Steve wrote:
 SingleShotHTTPClient on windows vista goes nuts opening thousands of
 ports when making a connection to an address which requires name
 resolution and which includes a port number.

1  SingleShotHTTPClient('http://www.google.com/') = OK

Google is listening on port 80.

2  SingleShotHTTPClient('http://www.google.com:8000/') = Kaboom

www.google.com is not listening on port 8000

BUT they're filtering it, so rather than their system sending back a TCP 
RESET, it just doesn't respond. Behaviour of Google with telnet:

~/ telnet www.google.com 8000
Trying 209.85.229.147...
[hang]

Behaviour with a system that's not listening on port 8000 AND not filtering 
(meaning the TCP stack responds with a TCP RESET) : 

~/code.google/kamaelia/trunk/Code/Python/Kamaelia/Examples telnet 192.168.2.1 
8001
Trying 192.168.2.1...
telnet: connect to address 192.168.2.1: Connection refused

3  SingleShotHTTPClient('http://66.102.7.99:8000/') = OK

66.102.7.99 8000 is not listening on port 8000
(checked, it isn't - and like www.google.com it's also filtering rather than 
just resetting)

4  SingleShotHTTPClient('http://localhost:8000/') = OK

No idea :)

2  3 should result in the same behaviour on your system, and I'm confused as 
to why it doesn't at present. I think I need slightly more detail here to 
help out fixing whatever is the issue.


Michael.
-- 
http://yeoldeclue.com/blog
http://twitter.com/kamaelian
http://www.kamaelia.org/Home

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
kamaelia group.
To post to this group, send email to kamaelia@googlegroups.com
To unsubscribe from this group, send email to 
kamaelia+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/kamaelia?hl=en
-~--~~~~--~~--~--~---



[kamaelia-list] Re: Bug in SingleShotHTTPClient

2009-03-02 Thread Michael Sparks

On Monday 02 March 2009 09:49:36 Steve wrote:
 Oh heck, this bug is in the underlying TCPClient!  After spending days
 developing against localhost, I now find that I can't go live without
 having to do manual name resolution.  :(

I've done some digging. There's definitely something odd going on here.

I got suspicious that there was a bug in TCPClient. Specifically, if you do:
TCPClient(www.google.com, 80)
This results in a call inside TCPClient of:

self.safeConnect(sock, (www.google.com, 80))

Which results in (fundamentally) :
sock.connect((www.google.com, 80))

This has worked (correctly) for the past several years, hence the surprise.

However, your comment made me wonder if this should result in on some 
platforms:

IP = sock.gethostbyname(www.google.com)
sock.connect((IP, 80))

Since that's what you'd normally do with the C sockets API.

However, re-reading both the docs for the socket module *AND* looking at the 
source for Python-2.6.1/Modules/socketmodule.c , I note that the comments in 
the socketmodule say:

/* Convert a string specifying a host name or one of a few symbolic
   names to a numeric IP address.  This usually calls gethostbyname()
   to do the work; the names  and broadcast are special.
   Return the length (IPv4 should be 4 bytes), or negative if
   an error occurred; then an exception is raised. */

Specifically this means that socket.connect itself decodes (for example) the 
IP address 66.102.7.99 - for example from:
SingleShotHTTPClient('http://66.102.7.99:8000/') = OK

Using the same function, which through one path checks for something looking 
like an IP and parses it, and through the other path, ends up calling 
gethostbyname (in gettaddrinfo.c) UNLESS you have IPV6 enabled, in
which case it ends up calling getipnodebyname .

As a result I'm rather puzzled as to what's causing your problem here...


Michael.
-- 
http://yeoldeclue.com/blog
http://twitter.com/kamaelian
http://www.kamaelia.org/Home

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
kamaelia group.
To post to this group, send email to kamaelia@googlegroups.com
To unsubscribe from this group, send email to 
kamaelia+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/kamaelia?hl=en
-~--~~~~--~~--~--~---



[kamaelia-list] Re: Bug in SingleShotHTTPClient

2009-03-02 Thread Steve

Michael,

Thank you for taking a look at this.  I slimmed down my test case to
just:

from Kamaelia.Internet.TCPClient import TCPClient

if __name__ == '__main__':
TCPClient('www.google.com', 80).run()

I also tested with this line instead:
TCPClient('127.0.0.1', 80).run()

I was thrown off by the UDP traffic generated my the name resolution
of localhost, but the real problem is the handling of a connection
that is refused.  I have no server running on port 80 so it should
timeout and return.  Instead it tries (in a seemingly infinite loop)
to make the connection.  My software firewall (set to allow all)
reports around 1 connection attempt every second.  The python process
also consumes ~85% of one of my cores during this time.

Things get even weireder when I test with:
TCPClient('localhost', 80).run()

Here my firewall reports the same TCP connection attempts as the 127
test, but it also reports thousands of UDP packets as well.  I suspect
every second it is doing some name resolution which is generating the
UDP traffic.

The end result is that I believe Kamaelia (or the socket layer) is not
properly handling a silently refused connection.   There may be a
similar problem with UDP sockets because I was about to detail an
inability to shutdown unconnected UDP peers.

Thanks,
Steve

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
kamaelia group.
To post to this group, send email to kamaelia@googlegroups.com
To unsubscribe from this group, send email to 
kamaelia+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/kamaelia?hl=en
-~--~~~~--~~--~--~---



[kamaelia-list] Re: Bug in SingleShotHTTPClient

2009-03-02 Thread Michael Sparks

Hi Steve,


This email may read a little odd. I've been writing this whilst reading and
trying things out, saw your update, and having had a thought. As a result
the train of thought changes as I go through this, but I've left that in
since it may be of use.

On Monday 02 March 2009 21:11:53 Steve wrote:
..
 TCPClient('127.0.0.1', 80).run()

 I have no server running on port 80 so it should timeout and return.
 Instead it tries (in a seemingly infinite loop) to make the connection.  My
 software firewall (set to allow all) reports around 1 connection attempt
 every second. 

That's rather odd. I've just tried the same thing here, and I don't see any
looping attempt to connect - it fails to connect and exits straight away:

~ python
Python 2.5.1 (r251:54863, Jan 10 2008, 18:01:57)
[GCC 4.2.1 (SUSE Linux)] on linux2
Type help, copyright, credits or license for more information.
 from Kamaelia.Internet.TCPClient import TCPClient
 TCPClient(127.0.0.1, 81).run()


That said, I'm not running a firewall though...

I've also checked that it's passing on shutdown correctly. First the case
that doesn't exit, until control-c :

 from Kamaelia.Chassis.Pipeline import Pipeline
 from Kamaelia.Util.Console import ConsoleEchoer
 Pipeline( ConsoleEchoer() ).run()

And then a version that does exit because the TCPClient passes on it's
shutdown on error:

~ python
Python 2.5.1 (r251:54863, Jan 10 2008, 18:01:57)
[GCC 4.2.1 (SUSE Linux)] on linux2
Type help, copyright, credits or license for more information.
 from Kamaelia.Chassis.Pipeline import Pipeline
 from Kamaelia.Util.Console import ConsoleEchoer
 from Kamaelia.Internet.TCPClient import TCPClient
 Pipeline( TCPClient(127.0.0.1, 81), ConsoleEchoer() ).run()


All this tells me is that the situation I've tested it in and mainly used
it in still works as expected. This isn't the same as your situation.
As a result, I'm no real information up - I'm not reproducing your
behaviour/environment correctly yet.

I'm beginning to think I ought to start up a windows VM and see if I can
reproduce this there.

 The end result is that I believe Kamaelia (or the socket layer) is not
 properly handling a silently refused connection

I can see this is possible, simply because a filtered connection that
doesn't send a TCP RESET won't cause an error at the socket layer
(it just won't connect). That leads to this hanging:

 Pipeline( TCPClient(www.google.com, 8000), ConsoleEchoer() ).run()
...

Which is (part of) why you're after timeouts etc, and go after that root
cause.

 Michael, I'm starting to think that this whole TTL component that I've
 made might be completely unneeded.  I think I was just reacting to the
 bugs that I've been describing in other threads.  If we could identify
 why refused connections are infinite looping on vista and why
 unconnected udp peers are refusing to shutdown, there might be no need
 for this kind of time terminating component

I agree that going after the root cause is a good idea. (That said, the TTL
component is sufficiently generic to be useful beyond this issue, even if
we deal with it differently)

I think you're hitting a combination of things, in the main code, we have:

   while not self.safeConnect(sock,(self.host, self.port)):
  if self.shutdown():
  return
  yield 1

This in combination with the other parts is probably why you're seeing a
looping connect attempt.

This goes into safeConnect, and if you're connecting to a filtered
connection, it would hit this logic path in safeConnect:
  try:
 sock.connect(*sockArgsList); # Expect socket.error: (115, 'Operation 
now in progress')

  except socket.error, socket.msg:
 (errorno, errmsg) = socket.msg.args
 if errorno==errno.EALREADY:
# The socket is non-blocking and a previous connection attempt has 
not yet been completed
# We handle this by allowing  the code to come back and repeatedly 
retry
# connecting. This is a valid, if brute force approach.
assert(self.connecting==1)
return False
 elif errorno==errno.EINPROGRESS or errorno==errno.EWOULDBLOCK:
#The socket is non-blocking and the connection cannot be completed 
immediately.
# We handle this by allowing  the code to come back and repeatedly 
retry
# connecting. Rather brute force.
self.connecting=1
return False # Not connected should retry until no error

ie it hits one of these three conditions.

For what it's worth a non-filtered not connected socket hits this path:
  try:
 sock.connect(*sockArgsList); # Expect socket.error: (115, 'Operation 
now in progress')

  except socket.error, socket.msg:
 (errorno, errmsg) = socket.msg.args
 if errorno==errno.EALREADY:

 # Anything else is an error we don't handle
 else:
raise socket.msg

That 

[kamaelia-list] Re: Bug in SingleShotHTTPClient

2009-03-02 Thread Steve

Michael,

Thank you again for looking into this.  Also, I don't have a lot of
network code experience in python, so please take all this with ample
salt.

 I'm beginning to think I ought to start up a windows VM and see if I can
 reproduce this there.

I would like to see big K be well supported on windows.  And I'd be
happy to help in anyway I could.  I can't donate a microsoft license
to you, but I can test locally.  I bet you could get MS to donate a
license.  But if you need any help setting up a VM, let me know.  I
use vmware and virtualbox.

 Now there's several ways that I could go down this, but I can see that
 probably the simplest would be to add a connection timeout this way:

 class TCPClient(Axon.Component.component):
def __init__(self,host,port,delay=0,connect_timeout=60):
 self.connect_timeout = connect_timeout
 ...
connect_start = time.time()
while not self.safeConnect(sock,(self.host, self.port)):
   if self.shutdown():
   return
   if ( time.time() - connect_start )  self.connect_timeout:
   self.howDied = timeout
   raise Finality
   yield 1


I am +1 to the idea of including a timeout parameter.  In fact, I
think every network operation call inside Kamaelia should expose a
defaulted timeout parameter.  That said, my gut feeling is that the
timeout should be handled at the lowest level possible and then
exposed all the way up the call tree.  For example:

 connection, it would hit this logic path in safeConnect:
   try:
  sock.connect(*sockArgsList); # Expect socket.error: (115, 'Operation 
 now in progress')
 
   except socket.error, socket.msg:
  (errorno, errmsg) = socket.msg.args
  if errorno==errno.EALREADY:
 # The socket is non-blocking and a previous connection attempt 
 has not yet been completed
 # We handle this by allowing  the code to come back and 
 repeatedly retry
 # connecting. This is a valid, if brute force approach.
 assert(self.connecting==1)
 return False
  elif errorno==errno.EINPROGRESS or errorno==errno.EWOULDBLOCK:
 #The socket is non-blocking and the connection cannot be 
 completed immediately.
 # We handle this by allowing  the code to come back and 
 repeatedly retry
 # connecting. Rather brute force.
 self.connecting=1
 return False # Not connected should retry until no error

Here we have EALREADY, EINPROGRESS and EWOULDBLOCK.  I think there
needs to be a way to timeout these connection attempts rather than
simply not starting another attempt after some timeout period.  Why
would we want to keep retrying the connection during the timeout
period?  I think it should only make 1 single connection attempt and
wait at most timeout period for success.

On an only slightly related note, I remember reading a posting once by
Glyph where he was promoting one of the values for using twisted.  He
said that they had spent considerable time testing and debugging the
frustratingly disparate socket behaviors on the 3 major platforms and
only twisted really did the right thing while still exposing a
uniform framework interface.  My brain doesn't fit twisted, but now
I'm starting to appreciate what he was talking about.

Thanks,
Steve
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
kamaelia group.
To post to this group, send email to kamaelia@googlegroups.com
To unsubscribe from this group, send email to 
kamaelia+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/kamaelia?hl=en
-~--~~~~--~~--~--~---