Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
> BTW, there was a query about this back in Feb. 2006 on linux-kernel, > nobody replied, he reposted to linux-net in September 2006 and this > is likely where this kernel bugzilla comes from :-) > > This is not a kernel bug, let's close this and move on. Agreed closed -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
From: Evgeniy Polyakov <[EMAIL PROTECTED]> Date: Thu, 22 Feb 2007 16:46:21 +0300 > On Thu, Feb 22, 2007 at 05:31:38AM -0800, David Miller ([EMAIL PROTECTED]) > wrote: > > TCP_CLOSE is where we end up on a non-error close too, this has > > the same kind of bug as your previous attempt to set POLLERR > > here. > > > > One side gets TCP_TIMEWAIT the other goes straight to TCP_CLOSE. > > > > It really is not possible to change current semantics, they are the > > best possible unfortunately. > > AS a last attempt - we can have a sockt flag set when sk-err is > installed in tcp_reset(), and tcp_poll() will set POLLERR if that flag > exist (just like it checks for sk_err, which is cleared in tcp_sendmsg() > when error is being returned). Oh is that the problem? Someone sees a fatal connection error from write() then attempts to poll() the socket? That is illegal. Socket is dead, you cannot do anything reasonable with it and you know the socket is errored so there is nothing you can possibly try to poll() on it for. One should close() the file descriptor at this point. Even getpeername() cannot work at this point, since socket is closed and has lost identity. Socket errors are delivered as unique events, once error is delivered the socket is not in error state any more, it is instead closed. That's why we clear sk->sk_err after error delivery. BTW, there was a query about this back in Feb. 2006 on linux-kernel, nobody replied, he reposted to linux-net in September 2006 and this is likely where this kernel bugzilla comes from :-) This is not a kernel bug, let's close this and move on. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
On Thu, Feb 22, 2007 at 05:31:38AM -0800, David Miller ([EMAIL PROTECTED]) wrote: > TCP_CLOSE is where we end up on a non-error close too, this has > the same kind of bug as your previous attempt to set POLLERR > here. > > One side gets TCP_TIMEWAIT the other goes straight to TCP_CLOSE. > > It really is not possible to change current semantics, they are the > best possible unfortunately. AS a last attempt - we can have a sockt flag set when sk-err is installed in tcp_reset(), and tcp_poll() will set POLLERR if that flag exist (just like it checks for sk_err, which is cleared in tcp_sendmsg() when error is being returned). -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
From: Evgeniy Polyakov <[EMAIL PROTECTED]> Date: Thu, 22 Feb 2007 15:14:27 +0300 > if (sk->sk_shutdown == SHUTDOWN_MASK || sk->sk_state == TCP_CLOSE) > mask |= POLLHUP; > > to > > if (sk->sk_shutdown == SHUTDOWN_MASK || sk->sk_state == TCP_CLOSE) > mask |= POLLHUP | (sk->sk_state == TCP_CLOSE)?POLLERR:0; > > > Thoughts? TCP_CLOSE is where we end up on a non-error close too, this has the same kind of bug as your previous attempt to set POLLERR here. One side gets TCP_TIMEWAIT the other goes straight to TCP_CLOSE. It really is not possible to change current semantics, they are the best possible unfortunately. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
On Wed, Feb 21, 2007 at 10:45:43PM -0800, David Miller ([EMAIL PROTECTED]) wrote: > POLLHUP is a complete mess, because it means different things on > different kinds of fds. And exactly, because it is not maskable, > this makes it totally useless for sockets. POLLHUP does show that socket can not be used for transfer, that is what select() expects to receive. Given select() semantic and the fact, that socket error is cleared, there is no simple way to say that socket is not suitable for output - actually not - all reset events ends up with tcp_done() which marks socket as close, so if in polling time we have closed socket, it can be considered as error and thus we can add POLLERR into the mask - so we can extend following code in poll: if (sk->sk_shutdown == SHUTDOWN_MASK || sk->sk_state == TCP_CLOSE) mask |= POLLHUP; to if (sk->sk_shutdown == SHUTDOWN_MASK || sk->sk_state == TCP_CLOSE) mask |= POLLHUP | (sk->sk_state == TCP_CLOSE)?POLLERR:0; Thoughts? -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
From: Evgeniy Polyakov <[EMAIL PROTECTED]> Date: Sat, 17 Feb 2007 19:25:33 +0300 > On Fri, Feb 16, 2007 at 09:34:27PM +0300, Evgeniy Polyakov ([EMAIL > PROTECTED]) wrote: > > Otherwise we can extend select output mask to include hungup too > > (getting into account that hungup is actually output event). > > This is another possible way to fix select after write after connection > reset. > > Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> ... > #define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP | POLLERR) > -#define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR) > +#define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR | POLLHUP) > #define POLLEX_SET (POLLPRI) POLLHUP is absolutely not maskable, this is very deeply entrenched in the poll() semantics and definitions. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
From: Evgeniy Polyakov <[EMAIL PROTECTED]> Date: Fri, 16 Feb 2007 21:34:27 +0300 > Otherwise we can extend select output mask to include hungup too > (getting into account that hungup is actually output event). POLLHUP is non-maskable and this is very clearly defined in just about every Unix definition of poll(). This non-maskability leads to all the strange semantics in this area. There is simply no way for sockets to return HUP for someone who is only waiting to write. Definitions of POLLHUP include language such as "This event and POLLOUT are mutually exclusive". I reviewed the current code and the many (and I mean MANY) threads on this topic that have occurred over the years going back to even 1999, and what we're doing now is the best possible set of semantics we can provide without breaking any existing applications. How many others have gone into the archives and actually looked at the discussions? That appears to be going out of style :-/ Now, on the read side, for the epoll folks, we added a POLLRDHUP to handle that case. It's not relevant here, but I'm mentioning it in passing since it's roughly related and it shows part of the devil in dealing with POLLHUP semantics. POLLHUP is a complete mess, because it means different things on different kinds of fds. And exactly, because it is not maskable, this makes it totally useless for sockets. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
On 17-02-2007 17:25, Evgeniy Polyakov wrote: > On Fri, Feb 16, 2007 at 09:34:27PM +0300, Evgeniy Polyakov ([EMAIL > PROTECTED]) wrote: >> Otherwise we can extend select output mask to include hungup too >> (getting into account that hungup is actually output event). > > This is another possible way to fix select after write after connection > reset. I hope you know what you are doing and that this will change functionality for some users. In my opinion it looks like a problem with interpretation and not a bug. From tcp.c: " * Some poll() documentation says that POLLHUP is incompatible * with the POLLOUT/POLLWR flags, so somebody should check this * all. But careful, it tends to be safer to return too many * bits than too few, and you can easily break real applications * if you don't tell them that something has hung up! ... * Actually, it is interesting to look how Solaris and DUX * solve this dilemma. I would prefer, if PULLHUP were maskable, * then we could set it on SND_SHUTDOWN. BTW examples given * in Stevens' books assume exactly this behaviour, it explains * why PULLHUP is incompatible with POLLOUT.--ANK * * NOTE. Check for TCP_CLOSE is added. The goal is to prevent * blocking on fresh not-connected or disconnected socket. --ANK */" So it seems ANK hesitated and somebody choose not to do this - maybe for some reason... Regards, Jarek P. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
On Fri, Feb 16, 2007 at 09:34:27PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: > Otherwise we can extend select output mask to include hungup too > (getting into account that hungup is actually output event). This is another possible way to fix select after write after connection reset. Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> diff --git a/fs/select.c b/fs/select.c index fe0893a..f2a8e46 100644 --- a/fs/select.c +++ b/fs/select.c @@ -186,7 +186,7 @@ get_max: #define SET(i,m) (*(m) |= (i)) #define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP | POLLERR) -#define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR) +#define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR | POLLHUP) #define POLLEX_SET (POLLPRI) int do_select(int n, fd_set_bits *fds, s64 *timeout) -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
On Fri, Feb 16, 2007 at 10:29:14AM -0800, Stephen Hemminger ([EMAIL PROTECTED]) wrote: > On Fri, 16 Feb 2007 19:10:45 +0300 > Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > > > one of the possible fixes for select() after write() after ECONNRESET. > > > > Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> > > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index b67e0dd..661ca0c 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -365,7 +365,7 @@ unsigned int tcp_poll(struct file *file, struct socket > > *sock, poll_table *wait) > > * blocking on fresh not-connected or disconnected socket. --ANK > > */ > > if (sk->sk_shutdown == SHUTDOWN_MASK || sk->sk_state == TCP_CLOSE) > > - mask |= POLLHUP; > > + mask |= POLLHUP | POLLERR; > > if (sk->sk_shutdown & RCV_SHUTDOWN) > > mask |= POLLIN | POLLRDNORM | POLLRDHUP; > > > > > > No, that would end up setting error bit on normal shutdown. This is incorrect > behaviour and might even be described in SUSE standard. > > There might even be LSB tests on this? Maybe - I do not have it handy. Otherwise we can extend select output mask to include hungup too (getting into account that hungup is actually output event). > -- > Stephen Hemminger <[EMAIL PROTECTED]> -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
On Fri, 16 Feb 2007 19:10:45 +0300 Evgeniy Polyakov <[EMAIL PROTECTED]> wrote: > one of the possible fixes for select() after write() after ECONNRESET. > > Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index b67e0dd..661ca0c 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -365,7 +365,7 @@ unsigned int tcp_poll(struct file *file, struct socket > *sock, poll_table *wait) >* blocking on fresh not-connected or disconnected socket. --ANK >*/ > if (sk->sk_shutdown == SHUTDOWN_MASK || sk->sk_state == TCP_CLOSE) > - mask |= POLLHUP; > + mask |= POLLHUP | POLLERR; > if (sk->sk_shutdown & RCV_SHUTDOWN) > mask |= POLLIN | POLLRDNORM | POLLRDHUP; > > No, that would end up setting error bit on normal shutdown. This is incorrect behaviour and might even be described in SUSE standard. There might even be LSB tests on this? -- Stephen Hemminger <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
one of the possible fixes for select() after write() after ECONNRESET. Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index b67e0dd..661ca0c 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -365,7 +365,7 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait) * blocking on fresh not-connected or disconnected socket. --ANK */ if (sk->sk_shutdown == SHUTDOWN_MASK || sk->sk_state == TCP_CLOSE) - mask |= POLLHUP; + mask |= POLLHUP | POLLERR; if (sk->sk_shutdown & RCV_SHUTDOWN) mask |= POLLIN | POLLRDNORM | POLLRDHUP; -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
On Fri, Feb 16, 2007 at 06:39:58PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: > On Fri, Feb 16, 2007 at 06:28:58PM +0300, Evgeniy Polyakov ([EMAIL > PROTECTED]) wrote: > > On Fri, Feb 16, 2007 at 06:25:57PM +0300, Evgeniy Polyakov ([EMAIL > > PROTECTED]) wrote: > > > On Thu, Feb 15, 2007 at 04:04:05PM -0800, Stephen Hemminger ([EMAIL > > > PROTECTED]) wrote: > > > > Someone want to take a stab at fixing this?? > > Ok, I've fund a reason - select() only checks if valid output condition > is > #define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR) > > Linux never sets POLLERR on shutdown - only on socket error, on shutdown > POLLHUP is setup instead, so reading always shows that: > > #define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP | > POLLERR) > > A 'fix' can be to add POLLHUP into POLLOUT_SET or workaround that in > application to check both input and output events in select() or use > poll and explicitly check returned mask. Here is mask returned from poll() in the described case: pollin [ 2000.564000] do_select: 2076: mask: 104, i: 3/4, j: 3/32, ret: 0, in: pollin/pollout [ 2000.572000] do_select: 2076: mask: 145, i: 3/4, j: 3/32, ret: 1, in: [ 2000.572000] do_select: 2076: mask: 145, i: 3/4, j: 3/32, ret: 1, in: error is set! but only once since sock_error() replaces sk->sk_err with zero. Also pollhup is set. [ 2000.572000] do_select: 2076: mask: 2059, i: 3/4, j: 3/32, ret: 1, in: Only pollhup is set without pollerr. [ 2000.572000] do_select: 2076: mask: 2051, i: 3/4, j: 3/32, ret: 0, in: [ 2014.40] do_select: 2076: mask: 2051, i: 3/4, j: 3/32, ret: 0, in: Above sock_error() is called through the following codepath: write() -> tcp_sendmsg() -> disconnect detected -> sk_stream_error() -> sock_error() ==> ends up with zero sk->sk_err. select() -> tcp_poll() ==> check for sk->sk_err fails and thus poller is not set and tcp_poll() does not return that bit, so select() fails to say that file descriptor is ready for output. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
On Fri, Feb 16, 2007 at 06:28:58PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: > On Fri, Feb 16, 2007 at 06:25:57PM +0300, Evgeniy Polyakov ([EMAIL > PROTECTED]) wrote: > > On Thu, Feb 15, 2007 at 04:04:05PM -0800, Stephen Hemminger ([EMAIL > > PROTECTED]) wrote: > > > Someone want to take a stab at fixing this?? Ok, I've fund a reason - select() only checks if valid output condition is #define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR) Linux never sets POLLERR on shutdown - only on socket error, on shutdown POLLHUP is setup instead, so reading always shows that: #define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP | POLLERR) A 'fix' can be to add POLLHUP into POLLOUT_SET or workaround that in application to check both input and output events in select() or use poll and explicitly check returned mask. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
On Fri, Feb 16, 2007 at 06:25:57PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: > On Thu, Feb 15, 2007 at 04:04:05PM -0800, Stephen Hemminger ([EMAIL > PROTECTED]) wrote: > > Someone want to take a stab at fixing this?? > > Works for me with attached application - select on connreset socket > immediately returns one ready descriptor. > > $ ./a.out 192.168.0.48 22 > select: err: 1, errno: 0. > write: err: 128, errno: 0. > select: err: 1, errno: 0. > write: err: 128, errno: 0. > select: err: 1, errno: 0. > write: err: -1, errno: 104. > on exit: select: err: 1, errno: 104. Nope, fails if file descriptor is checked for write. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
On Thu, Feb 15, 2007 at 04:04:05PM -0800, Stephen Hemminger ([EMAIL PROTECTED]) wrote: > Someone want to take a stab at fixing this?? Works for me with attached application - select on connreset socket immediately returns one ready descriptor. $ ./a.out 192.168.0.48 22 select: err: 1, errno: 0. write: err: 128, errno: 0. select: err: 1, errno: 0. write: err: 128, errno: 0. select: err: 1, errno: 0. write: err: -1, errno: 104. on exit: select: err: 1, errno: 104. -- Evgeniy Polyakov #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define ulog(f, a...) fprintf(stderr, f, ##a) #define ulog_err(f, a...) ulog(f ": %s [%d].\n", ##a, strerror(errno), errno) int main(int argc, char *argv[]) { struct hostent *h; int s, err; char buf[128]; struct sockaddr_in sa; fd_set rfds; struct timeval tv; if (argc != 3) { ulog("Usage: %s \n", argv[0]); return -1; } h = gethostbyname(argv[1]); if (!h) { ulog_err("%s: Failed to get address of %s.\n", __func__, argv[1]); return -1; } s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); if (s == -1) { ulog_err("%s: Failed to create server socket", __func__); return -1; } memcpy(&(sa.sin_addr.s_addr), h->h_addr_list[0], 4); sa.sin_port = htons(atoi(argv[2])); sa.sin_family = AF_INET; if (connect(s, (struct sockaddr *)&sa, sizeof(sa))) { ulog_err("%s: Failed to connect to %s:%s", __func__, argv[1], argv[2]); return -1; } do { FD_ZERO(&rfds); FD_SET(s, &rfds); tv.tv_sec = 500; tv.tv_usec = 0; err = select(s+1, &rfds, NULL, NULL, &tv); ulog("select: err: %d, errno: %d.\n", err, errno); err = write(s, buf, sizeof(buf)); ulog("write: err: %d, errno: %d.\n", err, errno); } while (err > 0); FD_ZERO(&rfds); FD_SET(s, &rfds); tv.tv_sec = 500; tv.tv_usec = 0; err = select(s+1, &rfds, NULL, NULL, &tv); ulog("on exit: select: err: %d, errno: %d.\n", err, errno); }
Fw: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET
Someone want to take a stab at fixing this?? Begin forwarded message: Date: Wed, 14 Feb 2007 19:32:52 -0800 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: [Bug 8013] New: select for write hangs on a socket after write returned ECONNRESET http://bugzilla.kernel.org/show_bug.cgi?id=8013 Summary: select for write hangs on a socket after write returned ECONNRESET Kernel Version: 2.6.16 Status: NEW Severity: normal Owner: [EMAIL PROTECTED] Submitter: [EMAIL PROTECTED] Distribution: Debian Also reproduced on: 2.4 based Redhat. Hardware Environment: i686/Xeon Problem Description: If you write() to a disconnected socket, write returns ECONNRESET. If you then select() on that socket, checking for write, the select never returns. For example from strace: write(4, "fred", 4) = 4 ... write(4, "fred", 4) = -1 ECONNRESET (Connection reset by peer) select(5, NULL, [4], NULL, NULL ... hung in select The select documentation says "those in writefds will be watched to see if a write will not block". A write on this socket will not block, therefore select should return immediately. When the program is run on Solaris, AIX and HPUX, the select returns immediately. --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html