[kamaelia-list] Re: Windows socket errors and timeouts

Michael Sparks Tue, 03 Mar 2009 15:06:26 -0800

On Tuesday 03 March 2009 22:11:11 Steve wrote:
> The real problem is we need a way to set a timeout on the connection
> attempt in the background without making it blocking.

Yes, this is what I've done :-)

OK, not without sucking CPU, but I did say "The cost at present is higher
CPU usage than would be ideal". I didn't make it clear that I can also see
how we resolve that point.

Regarding your concerns around WSAEINVAL, you may wish to be aware that
what I'm doing mirrors what twisted does inside
twisted.internet.BaseClient.doConnect.

Furthermore, there's an explanatory comment there:
# on Windows EINVAL means sometimes that we should keep trying:
#
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winsock/winsock/connect_2.asp

If you follow this back, you find the referred to rationale:

Until the connection attempt completes on a nonblocking socket, all
subsequent calls to connect on the same socket will fail with the
error code WSAEALREADY, and WSAEISCONN when the connection
completes successfully. Due to ambiguities in version 1.1 of the
Windows Sockets specification, error codes returned from connect
while a connection is already pending may vary among implementations.
As a result, it is not recommended that applications use multiple calls
to connect to detect connection completion. If they do, they must be
prepared to handle WSAEINVAL and WSAEWOULDBLOCK error values the
same way that they handle WSAEALREADY, to assure robust operation.

This tracks with what I've seen in the past (this code was added a long
while back - 6 line fix - IIRC by a colleague, 4 years ago? :-).

The underlying issue you keep banging up against is that sockets in reality
in blocking mode don't provide for timeouts. For example the code in the
Python socket module that you're seeking to use looks like this:

http://pastebin.com/m1e2171fd

In order for that code to work, the underlying code does this:
if (defaulttimeout >= 0.0)
internal_setblocking(s, 0);

or for sock_settimeout, this line:
internal_setblocking(s, timeout < 0.0);

The upshot being this: if you set a timeout, internally python changes the
socket to non-blocking. Then any operation that can fail - for example
connection - results (in windows) in entering into a select statement to
check to see when the operation would be completed - cf:
res = select(s->sock_fd+1, NULL, &fds, &fds_exc, &tv);

That &tv is the actual timeout you set originally, and then it's blocking
on select. Now the way we'd do this properly in Kamaelia is to get the
TCPClient to get access to the Selector service, and to ask the selector
service to let the TCPClient know when the socket is ready to read.

BUT in the error case - which we're dealing with, the Selector would never
re-awaken the TCPClient since it's an error case. So waiting for the selector
would need to have a timeout mechanism itself. ie fundamentally we would
still need the timeout mechanism I added.

That's the sort of thing that twisted implements with deferreds, and in
threaded components you can implement both aspects with self.pause().

The nice thing though about doing that with self.pause() is that it then
would give self.pause() the same sort of semantics for generator components
as it does for threaded components.

But beyond that, when we fail we also need to tell the selector that we're no
longer want it to notify us that we're done using it.

That's easy enough to do btw, buty all of this is a significant complexity
jump over what we currently have which is why I've initially gone for the
simpler timeout mechanism. (ie to get something working correctly before
optimising it - which is what this would be)

This is admittedly a little complex, and not something most users have to ever
deal with, and it's fundamentally an optimisation really. However it makes
sense to address that now since there's a real case that needs it fixed :-)

Oh, as for this which has come in as I was typing:
> I don't understand why the TCPClient code only
> sees an infinte set of WSAEINVALIDs.

Sheer speed. As fast as you can type, you're unlikely to be able to repeat
anything manually faster than between 8 and 20 ms (best case). That's at
least 2-3 (minimum) orders of magnitude slower than python will.

Please bear in mind that the code you're critiquing does in fact critique
itself:
"Rather brute force".

My personal view on dealing with this is this:
* Get the code working as it should - ie allow timeouts to occur.
* Get the code working such that it doesn't suck your CPU as it's
doing (ie performance improvement)
* Then refactor that "not sucking" CPU into nicer, more readable,
more reusable, modular code.

I think we've got to stage 1, and are now on stage 2.

Michael
--
http://yeoldeclue.com/blog
http://twitter.com/kamaelian
http://www.kamaelia.org/Home

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"kamaelia" group.
To post to this group, send email to kamaelia@googlegroups.com
To unsubscribe from this group, send email to
kamaelia+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/kamaelia?hl=en
-~----------~----~----~----~------~----~------~--~---

[kamaelia-list] Re: Windows socket errors and timeouts

Reply via email to