On 31/03/2009, at 1:12 AM, Erick Tryzelaar wrote: > On Mon, Mar 30, 2009 at 12:49 AM, john skaller > <skal...@users.sourceforge.net> wrote: >> Yes. It never works properly, and is impossible to maintain on >> multiple >> platforms. >> >> Also, Async TCP/IP is stuffed as we found out,.. someone should write >> a paper >> on that and send it to some conference, it is a SERIOUS issue costing >> billions >> of dollars and compromising world network security. > > Which bugs are you referring to? Maybe the erlang folks have some > ideas on working with it. Ulf Wiger, you still subscribed?
no no, this is a *design fault* in the TCP protocol C interface (sockets). Roughly speaking, SO_LINGER does not work with asynchronous sockets. Under Linux this means that when you close a socket, the buffers are flushed. This means you CANNOT close the socket, (at least without setting a user space timer ***), because when you do any transmissions you've made asynchronously might be lost. The only reliable way to close a socket is with a synchronous close. to make that asynchronous, you have to launch a pthread for every socket, imposing a massive overhead and limits related to pthreads on your program, and defeating the purpose of asynchronous I/O. This bug manifested immediately in tools/webserver, so I have no idea how millions of people are doing high performance web stuff on Posix (Windows probably doesn't have this issue, a good reason to switch to Windows for networking .. arrrgggghhh :) The guts of the problem is this: a webserver MUST use a finite number of threads to manage an unbounded number of sockets (any bounds are applied by connect failures or user counters). Unbounded threads aren't tenable because the OS can't schedule them fast enough (and threads are resource hungry). Given the above assumptions, that we have to use async I/O, Felix uses just two threads: one uses notifications like epoll to perform synchronous I/O for the client thread (which uses Felix f-threads for servicing, since these schedule O(1)). All this works just fine. The problem is that we want to avoid Denial of Service (DOS) attacks by a rogue client, but on the other hand clients can make requests of unbounded length. So we read until End of Message or a maximum number of bytes are read (in the later case we can close the socket, assuming a rogue client). All fine. Now also, clients can read/write data very slowly. In fact, they can read/write a bit then just hang, and this blocks the socket -- another DNS attack possibility. So to write/read reliably, we have to set a timer to trigger even if the socket doesn't report itself ready (or use the notification service timer and/or any OS level stuff). Then if an I/O op fails (reads/writes 0 bytes) we can also close the socket. Otherwise we have progress at some minimal rate. So by the above algorithm we can read and write everything from and to the client browser, and if the browser tries to flood write to us, or if it tries to starve on either read or write we can detect it with a timer and a progress failure. Note there is NO OTHER way to do this with sockets. The problem is that there is no way to delay when closing a socket in Async mode. For synchronous sockets, the TCP gurus decided reliability wasn't possible without SO_LINGER. This causes a close on a socket to hang for a while to give it a chance to finish writing before closing the underlying socket: without lingering EVERY write followed by a close would fail! The reason is that the C interface is synchronous, but the underlying transport is not. So you basically have to say "if the writes don't go thru in X seconds we're not going to waste resources any longer, kill the connection". The guru's stuffed up. SO_LINGER must work on Asynchronous sockets too. Although Async sockets cannot return an error code on closing, and the close function SHOULD return immediately, the OS should NOT be allowed to simply flush the buffers and close the sockets (Linux DOES). It should wait SO_LINGER time before doing that, otherwise there is NO possibility of a previous write succeeding. And that's what happens. Clients of the Felix webserver lose the end of the page being downloaded and good fraction of the time, almost completely reliably when the page is long (and the connection is to another computer). *** setting a user space time on close is NOT ACCEPTABLE because it leads to a DNS attack based on opening too many sockets: a socket that would ordinarily be closed in milliseconds may be held onto for seconds, starving the system of free sockets. The timer MUST be implemented in the OS (TCP library) so that the socket can be closed when the data is transfered OR the time limit is up, whichever comes first. It is NOT possible for the client to test if the data is transmitted. Hence THIS IS A BUG IN POSIX SOCKET INTERFACE. HIGH PERFORMANCE (I.E. ASYNCHRONOUS) SOCKET I/O CANNOT BE MADE RELIABLE. I hope I'm wrong.. but I doubt it. At least Linux should be fixed, at the moment it is screwed. Note: a summary of the problem shows people just didn't think. async I/O clearly implies async close. It's stupidity to have buffered I/O and unbuffered close. The buffering (delay) must be inside the OS. Consequently there's no way to know if the close succeed in writing all the data or not. This COULD be fixed by a notification signal (i.e. in Linux adding a case to the epoll service however it is non-trivial because the socket can't be identified by the socket-id because it is closed and invalid). Another solution would be to be able to TEST if the underlying transport was ready. This can be done for lower level I/O operation (meaning lower down the ISO stack), and it can be done to see if you can READ but it can't test if write buffers are empty. -- john skaller skal...@users.sourceforge.net ------------------------------------------------------------------------------ _______________________________________________ Felix-language mailing list Felix-language@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/felix-language