Re: SO_SNDTIMEO: 2.4 kernel bugs
[EMAIL PROTECTED] said: > I think the proper fix, long term, is to fix our internal I/O routine > APIs so that they are capable of returning a byte count _and_ an > error. One day, that might be a useful thing to export to userspace. The MTD code already does this. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
On Mon, 19 Feb 2001 [EMAIL PROTECTED] wrote: > Wakeup does not happen until _enough_ (1/3 of snbuf) of space in sndbuf > is released, otherwise you will overschedule. So, as soon as > write() goes to sleep, it will sleep waiting until 1/3 is released. Of course. Thank you. > If it is interrupted, it use all the released space immediately before > exit. Again, to make more for in this context. This can be even wrong > and, probably, we should return instantly with -EAGAIN/-EINTR/partial > count, but it is most likely suboptimal (though I have already changed > this to instant return). But this does not look essential from > caller's viewpoint, except for sendfile() of course. 8) Cool. I think the proper fix, long term, is to fix our internal I/O routine APIs so that they are capable of returning a byte count _and_ an error. One day, that might be a useful thing to export to userspace. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
Hello! > You are right - our sendfile() implementation is broken. I have fixed it Thank you! > Investigation shows that the Linux network layer is behaving oddly. It > seems that we are writing 4096 bytes to a socket. This proceeds in 4096 > byte chunks until the send buffer on the socket is full, and a 4096 byte > write blocks. This blocking write is eventually interrupted by the > timeout, and the write call returns.. wait for it.. 4096! This suggests > there was socket space after all, and the call should not have blocked. Wakeup does not happen until _enough_ (1/3 of snbuf) of space in sndbuf is released, otherwise you will overschedule. So, as soon as write() goes to sleep, it will sleep waiting until 1/3 is released. If it is interrupted, it use all the released space immediately before exit. Again, to make more for in this context. This can be even wrong and, probably, we should return instantly with -EAGAIN/-EINTR/partial count, but it is most likely suboptimal (though I have already changed this to instant return). But this does not look essential from caller's viewpoint, except for sendfile() of course. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
sendfile() breakage was Re: SO_SNDTIMEO: 2.4 kernel bugs
On Mon, 19 Feb 2001, Chris Evans wrote: > > BTW, if you have enough fast network, you probably can observe > > that sendfile() is even not interrupted by signals. 8) But this > > is possible to fix at least. BTW the same fix will repair SO_*TIMEO > > partially, i.e. it will timeout after n*timeo, where n is an arbitrary > > number not exceeding size/sndbuf. > > Hi Alexey, > > You are right - our sendfile() implementation is broken. I have fixed it > (patch at end of mail). Actually the whole mess stems from our broken internal ->write() and ->read() APIs. The _single_ return value is trying to convery _two_ pieces of information - always a bad move. They are: 1) Success/failure (and error code if it's a failure) 2) Amount of bytes read or written This bogon does not allow for the following information to be returned (assume I asked for 8192 bytes to be written): "4096 bytes were written, and the operation was aborted due to EINTR" Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
On Sun, 18 Feb 2001 [EMAIL PROTECTED] wrote: > Hello! > > > Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile(): > > None of the options apply to sendfile(). It is not socket level > operation. You have to use alarm for it. > > BTW, if you have enough fast network, you probably can observe > that sendfile() is even not interrupted by signals. 8) But this > is possible to fix at least. BTW the same fix will repair SO_*TIMEO > partially, i.e. it will timeout after n*timeo, where n is an arbitrary > number not exceeding size/sndbuf. Hi Alexey, You are right - our sendfile() implementation is broken. I have fixed it (patch at end of mail). However, I believe something is still wrong in the networking layer, even with my fix applied. Before I go into details, I want to step back and describe things from a _users_ perspective. That is most important after all. Take two different operations: write() to a socket and sendfile() down a socket. In both cases, the socket has a send timeout of 10 seconds. From a users' point of view, these are two socket write operations. The source of data is different (a buffer or a file descriptor), but that is irrelevant. The user has the right to expect a timeout after 10 seconds of no progress, on both operations. I have tried this on FreeBSD, and this is what happens: both sendfile() and write() timeout in the same way. On Linux, this is not the case => bug. I fixed a small sendfile() issue, which did not recognise partial writes as an interruption, but as I said above, the bug still remains. Investigation shows that the Linux network layer is behaving oddly. It seems that we are writing 4096 bytes to a socket. This proceeds in 4096 byte chunks until the send buffer on the socket is full, and a 4096 byte write blocks. This blocking write is eventually interrupted by the timeout, and the write call returns.. wait for it.. 4096! This suggests there was socket space after all, and the call should not have blocked. I wonder what is going on? I'd like to get this fixed. I think the FreeBSD behaviour is definitely correct and we want it on Linux. Cheers Chris --- filemap.c.old Sun Feb 18 23:35:06 2001 +++ filemap.c Mon Feb 19 00:13:38 2001 @@ -1062,7 +1062,7 @@ for (;;) { struct page *page, **hash; - unsigned long end_index, nr; + unsigned long end_index, nr, actor_ret; end_index = inode->i_size >> PAGE_CACHE_SHIFT; if (index > end_index) @@ -1110,13 +1110,13 @@ * "pos" here (the actor routine has to update the user buffer * pointers and the remaining count). */ - nr = actor(desc, page, offset, nr); - offset += nr; + actor_ret = actor(desc, page, offset, nr); + offset += actor_ret; index += offset >> PAGE_CACHE_SHIFT; offset &= ~PAGE_CACHE_MASK; page_cache_release(page); - if (nr && desc->count) + if (actor_ret == nr && desc->count) continue; break; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
Hello! > .. unless that page was partially written, in which case a short write > count is returned (rather than a timeout error), and the loop goes around > again. sendfile() does not return on partial write and tries to push more until error. On fast link it most likely succeeds, so that it is unkillable even with SIGKILL. > Which is good, because SO_SNDTIMEO is an inactivity monitor. Then why did you blame? 8)8) I do not think so. It is rather scheduling breaker. If connection is idle 99% of time, but wakes each sndtimeo-1usec, it must yuild, otherwise thread is lost for production. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
On Sun, 18 Feb 2001 [EMAIL PROTECTED] wrote: > Hello! > > > So the actual timeout would be 2 * SO_SNDTIMEO. > > It will timeout if write of some page blocks for SO_SNDTIMEO. .. unless that page was partially written, in which case a short write count is returned (rather than a timeout error), and the loop goes around again. > If transmission of any page never takes more than SO_SNDTIMEO it never > times out. Which is good, because SO_SNDTIMEO is an inactivity monitor. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
Hello! > So the actual timeout would be 2 * SO_SNDTIMEO. It will timeout if write of some page blocks for SO_SNDTIMEO. If transmission of any page never takes more than SO_SNDTIMEO it never times out. You can think about sendfile() as subroutine doing: for (;;) { read(4K from fdin); write(4K to fdout); } All the options apply to each read()/write() separetely, so that... alas. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
On Sun, 18 Feb 2001 [EMAIL PROTECTED] wrote: > Hello! > > > Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile(): > > None of the options apply to sendfile(). It is not socket level > operation. You have to use alarm for it. Hi Alexey, Actually sendfile() _does_ timeout using SO_SNDTIMEO. It just takes longer to timeout because the kernel sendfile() page loop will (usually) need to timeout a short write, and then timeout a 0 byte write. So the actual timeout would be 2 * SO_SNDTIMEO. Unfortunately, I'm seeing timeout at (I think) 3 * SO_SNDTIMEO, which I can't account for. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
Hello! > Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile(): None of the options apply to sendfile(). It is not socket level operation. You have to use alarm for it. BTW, if you have enough fast network, you probably can observe that sendfile() is even not interrupted by signals. 8) But this is possible to fix at least. BTW the same fix will repair SO_*TIMEO partially, i.e. it will timeout after n*timeo, where n is an arbitrary number not exceeding size/sndbuf. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
Hi, By the way - I tested SO_RCVLOWAT, another 2.4 addition. Good news this time - seems to work fine. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
Hi Alexey, This patch fixes my simple read()/write() tests, nice one. The behaviour also now matches BSD (someone kindly donated me a FreeBSD shell for testing). Unfortunately, I discovered a bug with SO_SNDTIMEO/sendfile(): - Connect an AF_INET, SOCK_STREAM socket to a local listening socket. - Set 5 seconds SO_SNDTIMEO on the connected socket - Do a sendfile() from a big file down the connected socket. Make sure the size is big (e.g. 1Mb) so the call blocks. --> BUG!! The call blocks indefinitely rather than being interrupted after 5 seconds. Cheers Chris On Sat, 17 Feb 2001 [EMAIL PROTECTED] wrote: > Hello! > > > Unfortunately, it seems to be very buggy. Here are two buggy scenarios. > > > --- ../vger3-010210/linux/net/ipv4/tcp.c Sat Feb 10 23:16:51 2001 > +++ linux/net/ipv4/tcp.c Sat Feb 17 23:27:43 2001 > @@ -691,6 +691,8 @@ > > set_current_state(TASK_INTERRUPTIBLE); > > + if (!timeo) > + break; > if (signal_pending(current)) > break; > if (tcp_memory_free(sk) && !vm_wait) > --- ../vger3-010210/linux/net/core/sock.c Tue Jan 30 21:20:16 2001 > +++ linux/net/core/sock.c Sat Feb 17 23:27:44 2001 > @@ -727,6 +727,8 @@ > clear_bit(SOCK_ASYNC_NOSPACE, &sk->socket->flags); > add_wait_queue(sk->sleep, &wait); > for (;;) { > + if (!timeo) > + break; > if (signal_pending(current)) > break; > set_bit(SOCK_NOSPACE, &sk->socket->flags); > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
Alexey, Damn you are quick! :) Testing immediately Cheers Chris On Sat, 17 Feb 2001 [EMAIL PROTECTED] wrote: > Hello! > > > Unfortunately, it seems to be very buggy. Here are two buggy scenarios. > > > --- ../vger3-010210/linux/net/ipv4/tcp.c Sat Feb 10 23:16:51 2001 > +++ linux/net/ipv4/tcp.c Sat Feb 17 23:27:43 2001 > @@ -691,6 +691,8 @@ > > set_current_state(TASK_INTERRUPTIBLE); > > + if (!timeo) > + break; > if (signal_pending(current)) > break; > if (tcp_memory_free(sk) && !vm_wait) > --- ../vger3-010210/linux/net/core/sock.c Tue Jan 30 21:20:16 2001 > +++ linux/net/core/sock.c Sat Feb 17 23:27:44 2001 > @@ -727,6 +727,8 @@ > clear_bit(SOCK_ASYNC_NOSPACE, &sk->socket->flags); > add_wait_queue(sk->sleep, &wait); > for (;;) { > + if (!timeo) > + break; > if (signal_pending(current)) > break; > set_bit(SOCK_NOSPACE, &sk->socket->flags); > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
Hello! > Unfortunately, it seems to be very buggy. Here are two buggy scenarios. --- ../vger3-010210/linux/net/ipv4/tcp.cSat Feb 10 23:16:51 2001 +++ linux/net/ipv4/tcp.cSat Feb 17 23:27:43 2001 @@ -691,6 +691,8 @@ set_current_state(TASK_INTERRUPTIBLE); + if (!timeo) + break; if (signal_pending(current)) break; if (tcp_memory_free(sk) && !vm_wait) --- ../vger3-010210/linux/net/core/sock.c Tue Jan 30 21:20:16 2001 +++ linux/net/core/sock.c Sat Feb 17 23:27:44 2001 @@ -727,6 +727,8 @@ clear_bit(SOCK_ASYNC_NOSPACE, &sk->socket->flags); add_wait_queue(sk->sleep, &wait); for (;;) { + if (!timeo) + break; if (signal_pending(current)) break; set_bit(SOCK_NOSPACE, &sk->socket->flags); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
SO_SNDTIMEO: 2.4 kernel bugs
Hi, I was glad to see Linux gain SO_SNDTIMEO in kernel 2.4. It is a very use feature which can avoid complexity and pain in userspace programs. Unfortunately, it seems to be very buggy. Here are two buggy scenarios. 1) Create a socketpair(), PF_UNIX, SOCK_STREAM. Set a 5 second SO_SNDTIMEO on the socket. write() 100k down the socket in one write(), i.e. enough to cause the write to have to block. --> BUG!!! The call blocks indefinitely instead of returning after 5 seconds (Note that the same test but with SO_RCVTIMEO and a read() works as expected - I get EAGAIN after 5 seconds). 2) Create a localhost listening socket - AF_INET, SOCK_STREAM. Connect to the listening port Set a 5 second SO_SNDTIMEO on the socket. write() 1Mb down the socket in one write(), i.e. enough to cause it to have to block -> The write() will return after 5 seconds with a partial write count. GOOD! Repeat the write() - send another 1Mb. --> BUG!! The call blocks indefinitely instead of returning with EAGAIN after 5s. I hope this is detailled enough. I'm trying to gain access to a FreeBSD box to compare results.. Cheers Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/