Re: strange tcp behavior
From: Evgeniy Polyakov <[EMAIL PROTECTED]> Date: Sat, 4 Aug 2007 20:51:51 +0400 > On Fri, Aug 03, 2007 at 02:17:17PM -0700, David Miller ([EMAIL PROTECTED]) > wrote: > > From: Evgeniy Polyakov <[EMAIL PROTECTED]> > > Date: Fri, 3 Aug 2007 12:22:42 +0400 > > > > > Maybe recvmsg should be changed too for symmetry? > > > > I took a look at this, and it's not %100 trivial. > > > > Let's do this later, and only sendmsg for now in order to > > fix the bug in the stable branches. > > I've tested your patch, besides there was an offset in one of hooks, > it works perfectly ok. > > Feel free to add my ack, tested-by or whatever is needed for this :) > Your patch fixes the problem. It is already merged to Linus's tree long before you found a chance to test it :-) So it would be difficult for me to do so. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Fri, Aug 03, 2007 at 02:17:17PM -0700, David Miller ([EMAIL PROTECTED]) wrote: > From: Evgeniy Polyakov <[EMAIL PROTECTED]> > Date: Fri, 3 Aug 2007 12:22:42 +0400 > > > Maybe recvmsg should be changed too for symmetry? > > I took a look at this, and it's not %100 trivial. > > Let's do this later, and only sendmsg for now in order to > fix the bug in the stable branches. I've tested your patch, besides there was an offset in one of hooks, it works perfectly ok. Feel free to add my ack, tested-by or whatever is needed for this :) Your patch fixes the problem. Actually inet_sendmsg() can be renamed to something less misleading, since it is not used by TCP now. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Fri, Aug 03, 2007 at 01:04:51PM -0700, David Miller ([EMAIL PROTECTED]) wrote: > From: Evgeniy Polyakov <[EMAIL PROTECTED]> > Date: Fri, 3 Aug 2007 12:22:42 +0400 > > > On Thu, Aug 02, 2007 at 07:21:34PM -0700, David Miller ([EMAIL PROTECTED]) > > wrote: > > > What in the world are we doing allowing stream sockets to autobind? > > > That is totally bogus. Even if we autobind, that won't make a connect > > > happen. > > > > For accepted socket it is perfectly valid assumption - we could autobind > > it during the first send. Or may bind it during accept. Its a matter of > > taste I think. Autobinding during first sending can end up being a > > protection against DoS in some obscure rare case... > > accept()ed socket is by definition fully bound and already in > established state. That what I meant - it binds during accept (well it can not be called real binding), but could be autobound during first send to needed port. Maybe that was one of intentions, don't know. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Fri, Aug 03, 2007 at 07:29:58PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: > On 03/08/07 18:39, Evgeniy Polyakov wrote: > > On Fri, Aug 03, 2007 at 05:51:42PM +0100, Simon Arlott ([EMAIL PROTECTED]) > > wrote: > > > >> 17:38:03.533589 IP 192.168.7.4.50550 > 192.168.7.8.2500: R > >> 82517592:82517592(0) win 1500 (raw) > >> vs > >> 17:37:38.383085 IP 192.168.7.8.2500 > 192.168.7.4.50550: R > >> 4259643274:4259643274(0) ack 1171836829 win 14360 > >> What happened there ? > > Erm... you seem to have removed parts of my message in a way that doesn't > make sense... Sorry, I left line I tought were enough to understand your point. > On Fri, Aug 03, 2007 at 05:51:42PM +0100, Simon Arlott wrote: > > 17:38:04.536277 IP 192.168.7.8.2500 > 192.168.7.4.50550: R 1:1(0) ack 17 > > win 14360 > > vs > > 17:37:38.383085 IP 192.168.7.8.2500 > 192.168.7.4.50550: R > > 4259643274:4259643274(0) ack 1171836829 win 14360 > > What happened there ? > > The first one is the RST sent when the connection is close()d without > reading, and the second one is the same RST but after other connection > has been made on the same ports using a different socket. I understood it, and your question is about possibility for those numbers to be roughly the same. Answer is 'no', it is not possible (possible, but with extremely low probability). If it is - this is a bug in ISN generation algo and must be fixed. > > It is the same situation, which would happen if you will spam remote > > side with RST packets with arbitrary sequence number in hope that it > > will reset some connection. > > Isn't it still possible that the connection that got reset is left open > (possibly for days) until another connection using the same ports is > using roughly the same sequence numbers? Of course it is possible, but it very unlikely. Practically it is impossible in modern OSes - ISN generation algos are designed to prevent this from happening. > -- > Simon Arlott -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
From: Evgeniy Polyakov <[EMAIL PROTECTED]> Date: Fri, 3 Aug 2007 12:22:42 +0400 > Maybe recvmsg should be changed too for symmetry? I took a look at this, and it's not %100 trivial. Let's do this later, and only sendmsg for now in order to fix the bug in the stable branches. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
From: Evgeniy Polyakov <[EMAIL PROTECTED]> Date: Fri, 3 Aug 2007 12:22:42 +0400 > On Thu, Aug 02, 2007 at 07:21:34PM -0700, David Miller ([EMAIL PROTECTED]) > wrote: > > What in the world are we doing allowing stream sockets to autobind? > > That is totally bogus. Even if we autobind, that won't make a connect > > happen. > > For accepted socket it is perfectly valid assumption - we could autobind > it during the first send. Or may bind it during accept. Its a matter of > taste I think. Autobinding during first sending can end up being a > protection against DoS in some obscure rare case... accept()ed socket is by definition fully bound and already in established state. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On 03/08/07 18:39, Evgeniy Polyakov wrote: > On Fri, Aug 03, 2007 at 05:51:42PM +0100, Simon Arlott ([EMAIL PROTECTED]) > wrote: > >> 17:38:03.533589 IP 192.168.7.4.50550 > 192.168.7.8.2500: R >> 82517592:82517592(0) win 1500 (raw) >> vs >> 17:37:38.383085 IP 192.168.7.8.2500 > 192.168.7.4.50550: R >> 4259643274:4259643274(0) ack 1171836829 win 14360 >> What happened there ? Erm... you seem to have removed parts of my message in a way that doesn't make sense... On Fri, Aug 03, 2007 at 05:51:42PM +0100, Simon Arlott wrote: > 17:38:04.536277 IP 192.168.7.8.2500 > 192.168.7.4.50550: R 1:1(0) ack 17 win > 14360 > vs > 17:37:38.383085 IP 192.168.7.8.2500 > 192.168.7.4.50550: R > 4259643274:4259643274(0) ack 1171836829 win 14360 > What happened there ? The first one is the RST sent when the connection is close()d without reading, and the second one is the same RST but after other connection has been made on the same ports using a different socket. > It is the same situation, which would happen if you will spam remote > side with RST packets with arbitrary sequence number in hope that it > will reset some connection. Isn't it still possible that the connection that got reset is left open (possibly for days) until another connection using the same ports is using roughly the same sequence numbers? -- Simon Arlott - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Fri, Aug 03, 2007 at 05:51:42PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: > 17:38:03.533589 IP 192.168.7.4.50550 > 192.168.7.8.2500: R > 82517592:82517592(0) win 1500 (raw) > vs > 17:37:38.383085 IP 192.168.7.8.2500 > 192.168.7.4.50550: R > 4259643274:4259643274(0) ack 1171836829 win 14360 > What happened there ? You mean what will happend if second rst (4259643274) is close enough to first (82517592) to reset the connection? If this will be session hijiking attack first (known) implemented by Kevin Mitnik. So far things moved forward and sequence number generation algorithm changed a lot. It is the same situation, which would happen if you will spam remote side with RST packets with arbitrary sequence number in hope that it will reset some connection. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On 03/08/07 13:09, Evgeniy Polyakov wrote: > On Fri, Aug 03, 2007 at 01:03:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) > wrote: >> On Fri, August 3, 2007 12:56, Evgeniy Polyakov wrote: >> > On Fri, Aug 03, 2007 at 12:21:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) >> > wrote: >> >> Since the connection is considered closed, couldn't another socket re-use >> >> it? >> >> >> >> Socket A: Recv data (unread) >> >> Socket A: Recv RST >> >> Socket B: Reuses connection (same IPs/ports) >> >> Socket A: Close >> >> >> >> Wouldn't that disrupt socket B's use of the connection? >> > >> > Then it will drop our data, since there were no appropriate handhsake. >> >> Couldn't the sequence numbers be close enough to make the RST valid? > > It does not matter - if connection is not in synchronized state all > unrelated data is dropped, so remote side is only allowed to receive syn > flag only, anything else must be dropped. If remote side does not do > that, it violates RFC. Except the remote side has a connection, because another one can be made before the existing connection is closed: 17:37:37.377571 IP 192.168.7.4.50550 > 192.168.7.8.2500: S 134077329:134077329(0) win 1500 (raw) 17:37:37.382352 IP 192.168.7.8.2500 > 192.168.7.4.50550: S 3460060233:3460060233(0) ack 134077330 win 14360 (accept) 17:37:37.377966 IP 192.168.7.4.50550 > 192.168.7.8.2500: . ack 1 win 1500 (raw) 17:37:37.378128 IP 192.168.7.4.50550 > 192.168.7.8.2500: P 1:17(16) ack 1 win 1500 (raw) 17:37:37.378162 IP 192.168.7.8.2500 > 192.168.7.4.50550: . ack 17 win 14360 17:37:37.378131 IP 192.168.7.4.50550 > 192.168.7.8.2500: R 134077346:134077346(0) win 1500 (raw) 17:37:37.412709 IP 192.168.7.4.50550 > 192.168.7.8.2500: SWE 3257207813:3257207813(0) win 14280 (connect) 17:37:37.412785 IP 192.168.7.8.2500 > 192.168.7.4.50550: SE 3495384256:3495384256(0) ack 3257207814 win 14336 (accept) 17:37:37.412960 IP 192.168.7.4.50550 > 192.168.7.8.2500: . ack 1 win 447 17:37:38.383085 IP 192.168.7.8.2500 > 192.168.7.4.50550: R 4259643274:4259643274(0) ack 1171836829 win 14360 (close (previous connection)) 17:37:47.417649 IP 192.168.7.8.2500 > 192.168.7.4.50550: F 1:1(0) ack 1 win 224 (close) 17:37:47.417993 IP 192.168.7.4.50550 > 192.168.7.8.2500: F 1:1(0) ack 2 win 447 (read returned) 17:37:47.418466 IP 192.168.7.8.2500 > 192.168.7.4.50550: . ack 2 win 224 The second connection also modified the RST|ACK that was sent compared to no second connection: 17:38:03.532703 IP 192.168.7.4.50550 > 192.168.7.8.2500: S 82517575:82517575(0) win 1500 (raw) 17:38:03.532832 IP 192.168.7.8.2500 > 192.168.7.4.50550: S 3495449795:3495449795(0) ack 82517576 win 14360 (accept) 17:38:03.533388 IP 192.168.7.4.50550 > 192.168.7.8.2500: . ack 1 win 1500 (raw) 17:38:03.533457 IP 192.168.7.4.50550 > 192.168.7.8.2500: P 1:17(16) ack 1 win 1500 (raw) 17:38:03.533597 IP 192.168.7.8.2500 > 192.168.7.4.50550: . ack 17 win 14360 17:38:03.533589 IP 192.168.7.4.50550 > 192.168.7.8.2500: R 82517592:82517592(0) win 1500 (raw) 17:38:04.536277 IP 192.168.7.8.2500 > 192.168.7.4.50550: R 1:1(0) ack 17 win 14360 (close) 17:38:04.536277 IP 192.168.7.8.2500 > 192.168.7.4.50550: R 1:1(0) ack 17 win 14360 vs 17:37:38.383085 IP 192.168.7.8.2500 > 192.168.7.4.50550: R 4259643274:4259643274(0) ack 1171836829 win 14360 What happened there ? On the server, run tcptest-server.c, which waits for 1s on the first connection then 10s on the second connection. On the client, run: iptables -I INPUT -i eth0 -p tcp --dport 50550 -j DROP; ./client; iptables -D INPUT -i eth0 -p tcp --dport 50550 -j DROP; ./tcptest-client (client.c from john's original email) -- Simon Arlott #include #include #include #include #include #define PORT 2500 #define xerror(str) do { perror(str); exit(1); } while (0) int main(void) { struct sockaddr_in sa; int l, s, tmp; int t = 0; memset(&sa, 0, sizeof(sa)); l = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP); if (!l) xerror("socket"); sa.sin_family = AF_INET; sa.sin_addr.s_addr = htonl(INADDR_ANY); sa.sin_port = htons(PORT); tmp = 1; setsockopt(l, SOL_SOCKET, SO_REUSEADDR, (char*)&tmp, sizeof(tmp)); if (bind(l, (struct sockaddr*)&sa, sizeof(sa)) != 0) xerror("bind"); if (listen(l, 0) != 0) xerror("listen"); printf("server %d ready...\n", getpid()); for (t = 1; t <= 2; t++) { s = accept(l, NULL, NULL); switch (fork()) { case -1: xerror("fork"); break; case 0: switch (t) { case 1: printf("server %d accepted connection\n", getpid()); #if 0 tmp = fcntl(s, F_GETFL, 0); if (fcntl(s, F_SETFL, tmp | O_NONBLOCK) != 0) xerror("fcntl"); if (send(s, "AAA", 7, 0) != 7) xerror("send"); #endif printf("server %d waiting for 1 second...\n", getpid()); sleep(1); printf("server %d closing connection\n", getpid()); close(s); return 0; break; case 2: printf("server %d acc
Re: strange tcp behavior
On Fri, Aug 03, 2007 at 01:03:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: > On Fri, August 3, 2007 12:56, Evgeniy Polyakov wrote: > > On Fri, Aug 03, 2007 at 12:21:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) > > wrote: > >> Since the connection is considered closed, couldn't another socket re-use > >> it? > >> > >> Socket A: Recv data (unread) > >> Socket A: Recv RST > >> Socket B: Reuses connection (same IPs/ports) > >> Socket A: Close > >> > >> Wouldn't that disrupt socket B's use of the connection? > > > > Then it will drop our data, since there were no appropriate handhsake. > > Couldn't the sequence numbers be close enough to make the RST valid? It does not matter - if connection is not in synchronized state all unrelated data is dropped, so remote side is only allowed to receive syn flag only, anything else must be dropped. If remote side does not do that, it violates RFC. > -- > Simon Arlott -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Fri, August 3, 2007 12:56, Evgeniy Polyakov wrote: > On Fri, Aug 03, 2007 at 12:21:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) > wrote: >> Since the connection is considered closed, couldn't another socket re-use it? >> >> Socket A: Recv data (unread) >> Socket A: Recv RST >> Socket B: Reuses connection (same IPs/ports) >> Socket A: Close >> >> Wouldn't that disrupt socket B's use of the connection? > > Then it will drop our data, since there were no appropriate handhsake. Couldn't the sequence numbers be close enough to make the RST valid? -- Simon Arlott - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Fri, Aug 03, 2007 at 12:21:46PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: > Since the connection is considered closed, couldn't another socket re-use it? > > Socket A: Recv data (unread) > Socket A: Recv RST > Socket B: Reuses connection (same IPs/ports) > Socket A: Close > > Wouldn't that disrupt socket B's use of the connection? Then it will drop our data, since there were no appropriate handhsake. > -- > Simon Arlott -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Fri, August 3, 2007 09:25, Evgeniy Polyakov wrote: > On Thu, Aug 02, 2007 at 07:58:03PM +0100, Simon Arlott ([EMAIL PROTECTED]) > wrote: >> 19:24:32.897071 IP 192.168.7.4.5 > 192.168.7.8.2500: S >> 705362199:705362199(0) win 1500 >> 19:24:32.897211 IP 192.168.7.8.2500 > 192.168.7.4.5: S >> 4159455228:4159455228(0) ack 705362200 win >> 14360 >> 19:24:32.920784 IP 192.168.7.4.5 > 192.168.7.8.2500: . ack 1 win 1500 >> 19:24:32.921732 IP 192.168.7.4.5 > 192.168.7.8.2500: P 1:17(16) ack 1 >> win 1500 >> 19:24:32.921795 IP 192.168.7.8.2500 > 192.168.7.4.5: . ack 17 win 14360 >> 19:24:32.922881 IP 192.168.7.4.5 > 192.168.7.8.2500: R >> 705362216:705362216(0) win 1500 >> 19:24:34.927717 IP 192.168.7.8.2500 > 192.168.7.4.5: R 1:1(0) ack 17 win >> 14360 >> >> According to RFC 793, the RST from .4 means that the connection >> is CLOSED. > > RFC 2525 - common tcp problems, says we should send RST in this case, > although it does not specify should we send it if socket is in CLOSED > state or not. Well, we send :) > Even if tcp_send_active_reset() will check if socket is in CLOSED state > and will not send data, but is still there, it will not be easily > triggered though, but it can be possible. Since the connection is considered closed, couldn't another socket re-use it? Socket A: Recv data (unread) Socket A: Recv RST Socket B: Reuses connection (same IPs/ports) Socket A: Close Wouldn't that disrupt socket B's use of the connection? -- Simon Arlott - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Thu, Aug 02, 2007 at 07:58:03PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: > 19:24:32.897071 IP 192.168.7.4.5 > 192.168.7.8.2500: S > 705362199:705362199(0) win 1500 > 19:24:32.897211 IP 192.168.7.8.2500 > 192.168.7.4.5: S > 4159455228:4159455228(0) ack 705362200 win 14360 > 19:24:32.920784 IP 192.168.7.4.5 > 192.168.7.8.2500: . ack 1 win 1500 > 19:24:32.921732 IP 192.168.7.4.5 > 192.168.7.8.2500: P 1:17(16) ack 1 win > 1500 > 19:24:32.921795 IP 192.168.7.8.2500 > 192.168.7.4.5: . ack 17 win 14360 > 19:24:32.922881 IP 192.168.7.4.5 > 192.168.7.8.2500: R > 705362216:705362216(0) win 1500 > 19:24:34.927717 IP 192.168.7.8.2500 > 192.168.7.4.5: R 1:1(0) ack 17 win > 14360 > > According to RFC 793, the RST from .4 means that the connection > is CLOSED. RFC 2525 - common tcp problems, says we should send RST in this case, although it does not specify should we send it if socket is in CLOSED state or not. Well, we send :) Even if tcp_send_active_reset() will check if socket is in CLOSED state and will not send data, but is still there, it will not be easily triggered though, but it can be possible. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Thu, Aug 02, 2007 at 07:21:34PM -0700, David Miller ([EMAIL PROTECTED]) wrote: > > On Thu, Aug 02, 2007 at 10:08:42PM +0400, Evgeniy Polyakov ([EMAIL > > PROTECTED]) wrote: > > > So, following patch fixes problem for me. > > > > Or this one. Essentially the same though. > > > > Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> > > So, this bug got introduced partly in 2.3.15, which is when > we SMP threaded the networking stack. > > The error check was present in inet_sendmsg() previously, it > looked like this: > > int inet_sendmsg(struct socket *sock, struct msghdr *msg, int size, >struct scm_cookie *scm) > { > struct sock *sk = sock->sk; > > if (sk->shutdown & SEND_SHUTDOWN) { > if (!(msg->msg_flags&MSG_NOSIGNAL)) > send_sig(SIGPIPE, current, 1); > return(-EPIPE); > } This one would caught our problem. > if (sk->prot->sendmsg == NULL) > return(-EOPNOTSUPP); > if(sk->err) > return sock_error(sk); And this one too. > /* We may need to bind the socket. */ > if (inet_autobind(sk) != 0) > return -EAGAIN; > > return sk->prot->sendmsg(sk, msg, size); > } > > I believe the idea was to move the sk->err check down into > tcp_sendmsg(). > > But this raises a major issue. > > What in the world are we doing allowing stream sockets to autobind? > That is totally bogus. Even if we autobind, that won't make a connect > happen. For accepted socket it is perfectly valid assumption - we could autobind it during the first send. Or may bind it during accept. Its a matter of taste I think. Autobinding during first sending can end up being a protection against DoS in some obscure rare case... > There is logic down in TCP to handle all of these details properly > as long as we don't do this bogus autobind stuff. Yes, TCP sending function will catch this problems. > do_tcp_sendpages() and tcp_sendmsg() both invoke sk_stream_wait_connect() > if TCP is in a state where data sending is not possible. Inside of > sk_stream_wait_connect() it handles socket errors as first priority, > then if no socket errors are pending it checks if we are trying to > connect currently and if not returns -EPIPE. It is exactly what we > want under these circumstances. > > So the bug is purely that autobind is attempted for TCP sockets at > all. > > TCP's sendpage handles this correctly already, it calls directly down > into tcp_sendpage(), inet_sendpage() is not used at all. > > So the fix is to make tcp_sendmsg() direct as well, that bypasses all > of this autobind madness. The error checking and state verification > in TCP's sendmsg() and sendpage() implementations will do the right > thing. > > Comments? > > Signed-off-by: David S. Miller <[EMAIL PROTECTED]> > > diff --git a/include/net/tcp.h b/include/net/tcp.h > index c209361..185c7ec 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -281,7 +281,7 @@ extern int > tcp_v4_remember_stamp(struct sock *sk); > > extern int tcp_v4_tw_remember_stamp(struct > inet_timewait_sock *tw); > > -extern int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, > +extern int tcp_sendmsg(struct kiocb *iocb, struct socket > *sock, > struct msghdr *msg, size_t size); Maybe recvmsg should be changed too for symmetry? -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
From: Evgeniy Polyakov <[EMAIL PROTECTED]> Date: Thu, 2 Aug 2007 22:48:42 +0400 > On Thu, Aug 02, 2007 at 10:08:42PM +0400, Evgeniy Polyakov ([EMAIL > PROTECTED]) wrote: > > So, following patch fixes problem for me. > > Or this one. Essentially the same though. > > Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> So, this bug got introduced partly in 2.3.15, which is when we SMP threaded the networking stack. The error check was present in inet_sendmsg() previously, it looked like this: int inet_sendmsg(struct socket *sock, struct msghdr *msg, int size, struct scm_cookie *scm) { struct sock *sk = sock->sk; if (sk->shutdown & SEND_SHUTDOWN) { if (!(msg->msg_flags&MSG_NOSIGNAL)) send_sig(SIGPIPE, current, 1); return(-EPIPE); } if (sk->prot->sendmsg == NULL) return(-EOPNOTSUPP); if(sk->err) return sock_error(sk); /* We may need to bind the socket. */ if (inet_autobind(sk) != 0) return -EAGAIN; return sk->prot->sendmsg(sk, msg, size); } I believe the idea was to move the sk->err check down into tcp_sendmsg(). But this raises a major issue. What in the world are we doing allowing stream sockets to autobind? That is totally bogus. Even if we autobind, that won't make a connect happen. There is logic down in TCP to handle all of these details properly as long as we don't do this bogus autobind stuff. do_tcp_sendpages() and tcp_sendmsg() both invoke sk_stream_wait_connect() if TCP is in a state where data sending is not possible. Inside of sk_stream_wait_connect() it handles socket errors as first priority, then if no socket errors are pending it checks if we are trying to connect currently and if not returns -EPIPE. It is exactly what we want under these circumstances. So the bug is purely that autobind is attempted for TCP sockets at all. TCP's sendpage handles this correctly already, it calls directly down into tcp_sendpage(), inet_sendpage() is not used at all. So the fix is to make tcp_sendmsg() direct as well, that bypasses all of this autobind madness. The error checking and state verification in TCP's sendmsg() and sendpage() implementations will do the right thing. Comments? Signed-off-by: David S. Miller <[EMAIL PROTECTED]> diff --git a/include/net/tcp.h b/include/net/tcp.h index c209361..185c7ec 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -281,7 +281,7 @@ extern int tcp_v4_remember_stamp(struct sock *sk); extern int tcp_v4_tw_remember_stamp(struct inet_timewait_sock *tw); -extern int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, +extern int tcp_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, size_t size); extern ssize_t tcp_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 06c08e5..e681034 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -831,7 +831,7 @@ const struct proto_ops inet_stream_ops = { .shutdown = inet_shutdown, .setsockopt= sock_common_setsockopt, .getsockopt= sock_common_getsockopt, - .sendmsg = inet_sendmsg, + .sendmsg = tcp_sendmsg, .recvmsg = sock_common_recvmsg, .mmap = sock_no_mmap, .sendpage = tcp_sendpage, diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index da4c0b6..7e74011 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -658,9 +658,10 @@ static inline int select_size(struct sock *sk) return tmp; } -int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, +int tcp_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, size_t size) { + struct sock *sk = sock->sk; struct iovec *iov; struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 3f5f742..9c94627 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2425,7 +2425,6 @@ struct proto tcp_prot = { .shutdown = tcp_shutdown, .setsockopt = tcp_setsockopt, .getsockopt = tcp_getsockopt, - .sendmsg= tcp_sendmsg, .recvmsg= tcp_recvmsg, .backlog_rcv= tcp_v4_do_rcv, .hash = tcp_v4_hash, diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index eed0937..b5f9637 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -484,7 +484,7 @@ const struct proto_ops inet6_stream_ops = { .shutdown = inet_shutdown, /* ok */ .setsockopt= sock_common_sets
Re: strange tcp behavior
From: Evgeniy Polyakov <[EMAIL PROTECTED]> Date: Thu, 2 Aug 2007 22:48:42 +0400 > On Thu, Aug 02, 2007 at 10:08:42PM +0400, Evgeniy Polyakov ([EMAIL > PROTECTED]) wrote: > > So, following patch fixes problem for me. > > Or this one. Essentially the same though. Thanks a lot for figuring out this bug Evgeniy, I'll look at this later. I'm very surprised autobind isn't guarded properly as this is a case that Alexey Kuznetsov and I used to audit from time to time. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On 02/08/07 19:08, Evgeniy Polyakov wrote: > On Thu, Aug 02, 2007 at 06:15:52PM +0100, Simon Arlott ([EMAIL PROTECTED]) > wrote: >> 17:33:45.351273 IP 192.168.7.4.5 > 192.168.7.8.2500: R >> 1385353596:1385353596(0) win 1500 >> 17:33:45.360878 IP 192.168.7.8.48186 > 192.168.7.4.5: R >> 1388203103:1388203103(0) ack 1385353596 win 14360 > > Problem is not in tcp_send_active_reset(), when socket is being released > it is already damaged. > Problem is that inet_autobind() function is called for socket, which is > already dead, but not yet completely - it smells bad (since it has its > port freed), but stil alive (accessible via send()), so for its last > word inet_sendmsg() tries to bind it again, and only after that time it > will be eventually closed and freed completely. > > So, following patch fixes problem for me. > Another solution might not to release port until socket is being > released, but that can lead to performance degradation. > Correct me if sk_err can be reset. 19:24:32.897071 IP 192.168.7.4.5 > 192.168.7.8.2500: S 705362199:705362199(0) win 1500 19:24:32.897211 IP 192.168.7.8.2500 > 192.168.7.4.5: S 4159455228:4159455228(0) ack 705362200 win 14360 19:24:32.920784 IP 192.168.7.4.5 > 192.168.7.8.2500: . ack 1 win 1500 19:24:32.921732 IP 192.168.7.4.5 > 192.168.7.8.2500: P 1:17(16) ack 1 win 1500 19:24:32.921795 IP 192.168.7.8.2500 > 192.168.7.4.5: . ack 17 win 14360 19:24:32.922881 IP 192.168.7.4.5 > 192.168.7.8.2500: R 705362216:705362216(0) win 1500 19:24:34.927717 IP 192.168.7.8.2500 > 192.168.7.4.5: R 1:1(0) ack 17 win 14360 According to RFC 793, the RST from .4 means that the connection is CLOSED. Reset Processing The receiver of a RST first validates it, then changes state. If the receiver was in the LISTEN state, it ignores it. If the receiver was in SYN-RECEIVED state and had previously been in the LISTEN state, then the receiver returns to the LISTEN state, otherwise the receiver aborts the connection and goes to the CLOSED state. If the receiver was in any other state, it aborts the connection and advises the user and goes to the CLOSED state. So when the call to close() is made without reading: Abort Format: ABORT (local connection name) This command causes all pending SENDs and RECEIVES to be aborted, the TCB to be removed, and a special RESET message to be sent to the TCP on the other side of the connection. Isn't there no other side of the connection to send the RESET too? > Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index 06c08e5..6790b23 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -168,8 +169,14 @@ void inet_sock_destruct(struct sock *sk) > static int inet_autobind(struct sock *sk) > { > struct inet_sock *inet; > + > /* We may need to bind the socket. */ > lock_sock(sk); > + if (sk->sk_err) { > + release_sock(sk); > + return sk->sk_err; > + } > + > inet = inet_sk(sk); > if (!inet->num) { > if (sk->sk_prot->get_port(sk, 0)) { > @@ -686,8 +703,11 @@ int inet_sendmsg(struct kiocb *iocb, struct socket > *sock, struct msghdr *msg, > struct sock *sk = sock->sk; > > /* We may need to bind the socket. */ > - if (!inet_sk(sk)->num && inet_autobind(sk)) > - return -EAGAIN; > + if (!inet_sk(sk)->num) { > + int err = inet_autobind(sk); > + if (err) > + return err; > + } > > return sk->sk_prot->sendmsg(iocb, sk, msg, size); > } > @@ -698,8 +718,11 @@ static ssize_t inet_sendpage(struct socket *sock, struct > page *page, int offset, > struct sock *sk = sock->sk; > > /* We may need to bind the socket. */ > - if (!inet_sk(sk)->num && inet_autobind(sk)) > - return -EAGAIN; > + if (!inet_sk(sk)->num) { > + int err = inet_autobind(sk); > + if (err) > + return err; > + } > > if (sk->sk_prot->sendpage) > return sk->sk_prot->sendpage(sk, page, offset, size, flags); > -- Simon Arlott - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Thu, Aug 02, 2007 at 10:08:42PM +0400, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: > So, following patch fixes problem for me. Or this one. Essentially the same though. Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 06c08e5..7c47ef5 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -168,8 +168,14 @@ void inet_sock_destruct(struct sock *sk) static int inet_autobind(struct sock *sk) { struct inet_sock *inet; + /* We may need to bind the socket. */ lock_sock(sk); + if (sk->sk_err || (sk->sk_state == TCP_CLOSE)) { + release_sock(sk); + return sk->sk_err; + } + inet = inet_sk(sk); if (!inet->num) { if (sk->sk_prot->get_port(sk, 0)) { @@ -686,8 +692,11 @@ int inet_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, struct sock *sk = sock->sk; /* We may need to bind the socket. */ - if (!inet_sk(sk)->num && inet_autobind(sk)) - return -EAGAIN; + if (!inet_sk(sk)->num) { + int err = inet_autobind(sk); + if (err) + return err; + } return sk->sk_prot->sendmsg(iocb, sk, msg, size); } @@ -698,8 +707,11 @@ static ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset, struct sock *sk = sock->sk; /* We may need to bind the socket. */ - if (!inet_sk(sk)->num && inet_autobind(sk)) - return -EAGAIN; + if (!inet_sk(sk)->num) { + int err = inet_autobind(sk); + if (err) + return err; + } if (sk->sk_prot->sendpage) return sk->sk_prot->sendpage(sk, page, offset, size, flags); -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Thu, Aug 02, 2007 at 06:15:52PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: > 17:33:45.351273 IP 192.168.7.4.5 > 192.168.7.8.2500: R > 1385353596:1385353596(0) win 1500 > 17:33:45.360878 IP 192.168.7.8.48186 > 192.168.7.4.5: R > 1388203103:1388203103(0) ack 1385353596 win 14360 > > Seems to be losing the source port information when it decides to send > that final RST|ACK. It's going through the "TCPAbortOnClose" path: > > tcp_close: > -> tcp_set_state(sk, TCP_CLOSE) > -> inet_put_port(&tcp_hashinfo, sk) > Perhaps it's losing the port information here? > -> tcp_send_active_reset(sk, GFP_KERNEL) > > "TCP_CLOSE socket is finished" > Should these two calls be the other way round? > > > Also, I don't think it should be sending a RST after the other side has > sent one - the connection no longer exists so there is nothing on the > other side to reset. Problem is not in tcp_send_active_reset(), when socket is being released it is already damaged. Problem is that inet_autobind() function is called for socket, which is already dead, but not yet completely - it smells bad (since it has its port freed), but stil alive (accessible via send()), so for its last word inet_sendmsg() tries to bind it again, and only after that time it will be eventually closed and freed completely. So, following patch fixes problem for me. Another solution might not to release port until socket is being released, but that can lead to performance degradation. Correct me if sk_err can be reset. Signed-off-by: Evgeniy Polyakov <[EMAIL PROTECTED]> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 06c08e5..6790b23 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -168,8 +169,14 @@ void inet_sock_destruct(struct sock *sk) static int inet_autobind(struct sock *sk) { struct inet_sock *inet; + /* We may need to bind the socket. */ lock_sock(sk); + if (sk->sk_err) { + release_sock(sk); + return sk->sk_err; + } + inet = inet_sk(sk); if (!inet->num) { if (sk->sk_prot->get_port(sk, 0)) { @@ -686,8 +703,11 @@ int inet_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg, struct sock *sk = sock->sk; /* We may need to bind the socket. */ - if (!inet_sk(sk)->num && inet_autobind(sk)) - return -EAGAIN; + if (!inet_sk(sk)->num) { + int err = inet_autobind(sk); + if (err) + return err; + } return sk->sk_prot->sendmsg(iocb, sk, msg, size); } @@ -698,8 +718,11 @@ static ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset, struct sock *sk = sock->sk; /* We may need to bind the socket. */ - if (!inet_sk(sk)->num && inet_autobind(sk)) - return -EAGAIN; + if (!inet_sk(sk)->num) { + int err = inet_autobind(sk); + if (err) + return err; + } if (sk->sk_prot->sendpage) return sk->sk_prot->sendpage(sk, page, offset, size, flags); -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On 02/08/07 13:15, Simon Arlott wrote: > (Don't remove CC:s, don't top post) >>> On Thu, August 2, 2007 11:16, Evgeniy Polyakov wrote: On Thu, Aug 02, 2007 at 01:55:50PM +0400, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: > On Thu, Aug 02, 2007 at 09:19:06AM +0300, [EMAIL PROTECTED] > ([EMAIL PROTECTED]) wrote: > > 1186035057.207629127.0.0.1 -> 127.0.0.1TCP 5 > smtp [SYN] > > Seq=0 Len=0 > > 1186035057.207632127.0.0.1 -> 127.0.0.1TCP smtp > 5 [SYN, > ACK] > > Seq=0 Ack=1 Win=32792 Len=0 MSS=16396 > > 1186035057.207666127.0.0.1 -> 127.0.0.1TCP 5 > smtp [ACK] > > Seq=1 Ack=1 Win=1500 Len=0 > > 1186035057.207699127.0.0.1 -> 127.0.0.1SMTP Command: EHLO > localhost > > 1186035057.207718127.0.0.1 -> 127.0.0.1TCP smtp > 5 [ACK] > > Seq=1 Ack=17 Win=32792 Len=0 > > 1186035057.207736127.0.0.1 -> 127.0.0.1TCP 5 > smtp [RST] > > Seq=17 Len=0 > > 1186035057.223934127.0.0.1 -> 127.0.0.1TCP 33787 > 5 > [RST, > > ACK] Seq=0 Ack=0 Win=32792 Len=0 > > > > Can someone please comment as to why, tcp stack sends rst packet > from the > > wrong source port in this situation. > I don't know where that extra RST is coming from. > This test would be more convincing between two hosts, since your bizarre > client is using raw sockets as root and could be doing anything. Server 192.168.7.8 (2.6.23) Client 192.168.7.4 (2.6.20) 17:33:45.326246 IP 192.168.7.4.5 > 192.168.7.8.2500: S 1385353579:1385353579(0) win 1500 17:33:45.326418 IP 192.168.7.8.2500 > 192.168.7.4.5: S 1388203102:1388203102(0) ack 1385353580 win 14360 17:33:45.348833 IP 192.168.7.4.5 > 192.168.7.8.2500: . ack 1 win 1500 17:33:45.349977 IP 192.168.7.4.5 > 192.168.7.8.2500: P 1:17(16) ack 1 win 1500 17:33:45.350117 IP 192.168.7.8.2500 > 192.168.7.4.5: . ack 17 win 14360 17:33:45.351273 IP 192.168.7.4.5 > 192.168.7.8.2500: R 1385353596:1385353596(0) win 1500 17:33:45.360878 IP 192.168.7.8.48186 > 192.168.7.4.5: R 1388203103:1388203103(0) ack 1385353596 win 14360 Seems to be losing the source port information when it decides to send that final RST|ACK. It's going through the "TCPAbortOnClose" path: tcp_close: -> tcp_set_state(sk, TCP_CLOSE) -> inet_put_port(&tcp_hashinfo, sk) Perhaps it's losing the port information here? -> tcp_send_active_reset(sk, GFP_KERNEL) "TCP_CLOSE socket is finished" Should these two calls be the other way round? Also, I don't think it should be sending a RST after the other side has sent one - the connection no longer exists so there is nothing on the other side to reset. -- Simon Arlott - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Thu, Aug 02, 2007 at 04:04:53PM +0400, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: > On Thu, Aug 02, 2007 at 12:38:59PM +0100, Simon Arlott ([EMAIL PROTECTED]) > wrote: > > I just got multiple RSTs instead of a connection too. The second RST looks > > like it's from another connection - and a RST for a RST is wrong... > > You should use iptables rule to block non-raw access: > iptables -I INPUT -p tcp --dport 5 -j DROP > > but even in that case I got valid session. Ok, I can now reproduce the problem. I will try to debug it further. -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
(Don't remove CC:s, don't top post) >> On Thu, August 2, 2007 11:16, Evgeniy Polyakov wrote: >>> On Thu, Aug 02, 2007 at 01:55:50PM +0400, Evgeniy Polyakov >>> ([EMAIL PROTECTED]) wrote: On Thu, Aug 02, 2007 at 09:19:06AM +0300, [EMAIL PROTECTED] ([EMAIL PROTECTED]) wrote: > 1186035057.207629127.0.0.1 -> 127.0.0.1TCP 5 > smtp [SYN] > Seq=0 Len=0 > 1186035057.207632127.0.0.1 -> 127.0.0.1TCP smtp > 5 [SYN, ACK] > Seq=0 Ack=1 Win=32792 Len=0 MSS=16396 > 1186035057.207666127.0.0.1 -> 127.0.0.1TCP 5 > smtp [ACK] > Seq=1 Ack=1 Win=1500 Len=0 > 1186035057.207699127.0.0.1 -> 127.0.0.1SMTP Command: EHLO localhost > 1186035057.207718127.0.0.1 -> 127.0.0.1TCP smtp > 5 [ACK] > Seq=1 Ack=17 Win=32792 Len=0 > 1186035057.207736127.0.0.1 -> 127.0.0.1TCP 5 > smtp [RST] > Seq=17 Len=0 > 1186035057.223934127.0.0.1 -> 127.0.0.1TCP 33787 > 5 [RST, > ACK] Seq=0 Ack=0 Win=32792 Len=0 > > Can someone please comment as to why, tcp stack sends rst packet from the > wrong source port in this situation. Besides the fact, that test applications do not run if started not as root, I got this: >>> >>> And it actually does not initializes a session, since tird line below >>> shows RST, but not ack. The same with sendmail smtp server (i.e. 25 port >>> like in your server) and unmodified client. >>> Please provide application which can trigger the issue and I will help >>> to debug this issue. If it will help you to debug client, I can run >>> tcpdump on public server (say 194.85.82.65, please tell me your source >>> address) to collect dumps. Current code does not trigger the issue on my >>> machines (and works not like was intended by you). Ugh, and code really >>> looks horrible... >>> >> >> I just got multiple RSTs instead of a connection too. The second RST looks >> like it's from another connection - and a RST for a RST is wrong... On Thu, August 2, 2007 12:45, [EMAIL PROTECTED] wrote: > you need to add iptables rule for this to > work, or else the tcp resets connection too early because it does not know > that something is listening on 5 port. > > iptables -I INPUT -p tcp --dport 5 -j DROP should do the job. You didn't mention this before. Without the server running: 13:02:23.314352 IP 127.0.0.1.5 > 127.0.0.1.2500: S 53123695:53123695(0) win 1500 13:02:23.314442 IP 127.0.0.1.2500 > 127.0.0.1.5: R 0:0(0) ack 53123696 win 0 13:02:25.906975 IP 127.0.0.1.3315 > 127.0.0.1.49197: P 1285306902:1285307318(416) ack 1267361915 win 1024 13:02:25.907060 IP 127.0.0.1.49197 > 127.0.0.1.3315: . ack 416 win 1541 With the server running: 13:05:55.234696 IP 127.0.0.1.5 > 127.0.0.1.2500: S 1960601450:1960601450(0) win 1500 13:05:55.234799 IP 127.0.0.1.2500 > 127.0.0.1.5: S 2171862150:2171862150(0) ack 1960601451 win 32792 13:05:55.238271 IP 127.0.0.1.5 > 127.0.0.1.2500: . ack 1 win 1500 13:05:55.240034 IP 127.0.0.1.5 > 127.0.0.1.2500: P 1:17(16) ack 1 win 1500 13:05:55.240132 IP 127.0.0.1.2500 > 127.0.0.1.5: . ack 17 win 32792 13:05:55.242251 IP 127.0.0.1.5 > 127.0.0.1.2500: R 1960601467:1960601467(0) win 1500 13:05:55.253884 IP 127.0.0.1.56434 > 127.0.0.1.5: R 2171862151:2171862151(0) ack 1960601467 win 32792 Weird. I resent your final RST a few times with a delay: 13:13:05.199275 IP 127.0.0.1.5 > 127.0.0.1.2500: S 83018811:83018811(0) win 1500 13:13:05.199378 IP 127.0.0.1.2500 > 127.0.0.1.5: S 2627922927:2627922927(0) ack 83018812 win 32792 13:13:05.203368 IP 127.0.0.1.5 > 127.0.0.1.2500: . ack 1 win 1500 13:13:05.205049 IP 127.0.0.1.5 > 127.0.0.1.2500: P 1:17(16) ack 1 win 1500 13:13:05.205173 IP 127.0.0.1.2500 > 127.0.0.1.5: . ack 17 win 32792 13:13:05.206463 IP 127.0.0.1.5 > 127.0.0.1.2500: R 83018828:83018828(0) win 1500 13:13:05.207656 IP 127.0.0.1.5 > 127.0.0.1.2500: R 83018828:83018828(0) win 1500 13:13:05.217664 IP 127.0.0.1.55271 > 127.0.0.1.5: R 2627922928:2627922928(0) ack 83018828 win 32792 13:13:05.510239 IP 127.0.0.1.5 > 127.0.0.1.2500: R 83018828:83018828(0) win 1500 13:13:05.511644 IP 127.0.0.1.5 > 127.0.0.1.2500: R 83018828:83018828(0) win 1500 13:13:05.512764 IP 127.0.0.1.5 > 127.0.0.1.2500: R 83018828:83018828(0) win 1500 I don't know where that extra RST is coming from. This test would be more convincing between two hosts, since your bizarre client is using raw sockets as root and could be doing anything. -- Simon Arlott - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Thu, Aug 02, 2007 at 12:38:59PM +0100, Simon Arlott ([EMAIL PROTECTED]) wrote: > I just got multiple RSTs instead of a connection too. The second RST looks > like it's from another connection - and a RST for a RST is wrong... You should use iptables rule to block non-raw access: iptables -I INPUT -p tcp --dport 5 -j DROP but even in that case I got valid session. > -- > Simon Arlott -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Thu, August 2, 2007 11:16, Evgeniy Polyakov wrote: > On Thu, Aug 02, 2007 at 01:55:50PM +0400, Evgeniy Polyakov ([EMAIL > PROTECTED]) wrote: >> On Thu, Aug 02, 2007 at 09:19:06AM +0300, [EMAIL PROTECTED] ([EMAIL >> PROTECTED]) wrote: >> > 1186035057.207629127.0.0.1 -> 127.0.0.1TCP 5 > smtp [SYN] >> > Seq=0 Len=0 >> > 1186035057.207632127.0.0.1 -> 127.0.0.1TCP smtp > 5 [SYN, ACK] >> > Seq=0 Ack=1 Win=32792 Len=0 MSS=16396 >> > 1186035057.207666127.0.0.1 -> 127.0.0.1TCP 5 > smtp [ACK] >> > Seq=1 Ack=1 Win=1500 Len=0 >> > 1186035057.207699127.0.0.1 -> 127.0.0.1SMTP Command: EHLO localhost >> > 1186035057.207718127.0.0.1 -> 127.0.0.1TCP smtp > 5 [ACK] >> > Seq=1 Ack=17 Win=32792 Len=0 >> > 1186035057.207736127.0.0.1 -> 127.0.0.1TCP 5 > smtp [RST] >> > Seq=17 Len=0 >> > 1186035057.223934127.0.0.1 -> 127.0.0.1TCP 33787 > 5 [RST, >> > ACK] Seq=0 Ack=0 Win=32792 Len=0 >> > >> > Can someone please comment as to why, tcp stack sends rst packet from the >> > wrong source port in this situation. >> >> Besides the fact, that test applications do not run if started not as >> root, I got this: > > And it actually does not initializes a session, since tird line below > shows RST, but not ack. The same with sendmail smtp server (i.e. 25 port > like in your server) and unmodified client. > Please provide application which can trigger the issue and I will help > to debug this issue. If it will help you to debug client, I can run > tcpdump on public server (say 194.85.82.65, please tell me your source > address) to collect dumps. Current code does not trigger the issue on my > machines (and works not like was intended by you). Ugh, and code really > looks horrible... > I just got multiple RSTs instead of a connection too. The second RST looks like it's from another connection - and a RST for a RST is wrong... -- Simon Arlott - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Thu, Aug 02, 2007 at 01:55:50PM +0400, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: > On Thu, Aug 02, 2007 at 09:19:06AM +0300, [EMAIL PROTECTED] ([EMAIL > PROTECTED]) wrote: > > 1186035057.207629127.0.0.1 -> 127.0.0.1TCP 5 > smtp [SYN] > > Seq=0 Len=0 > > 1186035057.207632127.0.0.1 -> 127.0.0.1TCP smtp > 5 [SYN, ACK] > > Seq=0 Ack=1 Win=32792 Len=0 MSS=16396 > > 1186035057.207666127.0.0.1 -> 127.0.0.1TCP 5 > smtp [ACK] > > Seq=1 Ack=1 Win=1500 Len=0 > > 1186035057.207699127.0.0.1 -> 127.0.0.1SMTP Command: EHLO localhost > > 1186035057.207718127.0.0.1 -> 127.0.0.1TCP smtp > 5 [ACK] > > Seq=1 Ack=17 Win=32792 Len=0 > > 1186035057.207736127.0.0.1 -> 127.0.0.1TCP 5 > smtp [RST] > > Seq=17 Len=0 > > 1186035057.223934127.0.0.1 -> 127.0.0.1TCP 33787 > 5 [RST, > > ACK] Seq=0 Ack=0 Win=32792 Len=0 > > > > Can someone please comment as to why, tcp stack sends rst packet from the > > wrong source port in this situation. > > Besides the fact, that test applications do not run if started not as > root, I got this: And it actually does not initializes a session, since tird line below shows RST, but not ack. The same with sendmail smtp server (i.e. 25 port like in your server) and unmodified client. Please provide application which can trigger the issue and I will help to debug this issue. If it will help you to debug client, I can run tcpdump on public server (say 194.85.82.65, please tell me your source address) to collect dumps. Current code does not trigger the issue on my machines (and works not like was intended by you). Ugh, and code really looks horrible... -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: strange tcp behavior
On Thu, Aug 02, 2007 at 09:19:06AM +0300, [EMAIL PROTECTED] ([EMAIL PROTECTED]) wrote: > 1186035057.207629127.0.0.1 -> 127.0.0.1TCP 5 > smtp [SYN] > Seq=0 Len=0 > 1186035057.207632127.0.0.1 -> 127.0.0.1TCP smtp > 5 [SYN, ACK] > Seq=0 Ack=1 Win=32792 Len=0 MSS=16396 > 1186035057.207666127.0.0.1 -> 127.0.0.1TCP 5 > smtp [ACK] > Seq=1 Ack=1 Win=1500 Len=0 > 1186035057.207699127.0.0.1 -> 127.0.0.1SMTP Command: EHLO localhost > 1186035057.207718127.0.0.1 -> 127.0.0.1TCP smtp > 5 [ACK] > Seq=1 Ack=17 Win=32792 Len=0 > 1186035057.207736127.0.0.1 -> 127.0.0.1TCP 5 > smtp [RST] > Seq=17 Len=0 > 1186035057.223934127.0.0.1 -> 127.0.0.1TCP 33787 > 5 [RST, > ACK] Seq=0 Ack=0 Win=32792 Len=0 > > Can someone please comment as to why, tcp stack sends rst packet from the > wrong source port in this situation. Besides the fact, that test applications do not run if started not as root, I got this: 13:51:12.180241 IP localhost.localdomain.5 > localhost.localdomain.10250: S 906222067:906222067(0) win 1500 13:51:12.180279 IP localhost.localdomain.10250 > localhost.localdomain.5: S 2011233747:2011233747(0) ack 906222068 win 32792 13:51:12.180293 IP localhost.localdomain.5 > localhost.localdomain.10250: R 906222068:906222068(0) win 0 13:51:12.180320 IP localhost.localdomain.5 > localhost.localdomain.10250: . ack 1 win 1500 13:51:12.180329 IP localhost.localdomain.10250 > localhost.localdomain.5: R 2011233748:2011233748(0) win 0 13:51:12.180341 IP localhost.localdomain.5 > localhost.localdomain.10250: P 1:17(16) ack 1 win 1500 13:51:12.180349 IP localhost.localdomain.10250 > localhost.localdomain.5: R 2011233748:2011233748(0) win 0 13:51:12.180361 IP localhost.localdomain.5 > localhost.localdomain.10250: R 906222084:906222084(0) win 1500 I.e. there is no bug in this session. FC7 2.6.22.1-27.fc7 kernel. Here is vanilla (with my patches, unrelated to the problem though) 2.6.22-rc5: 09:33:37.650279 IP localhost.5 > localhost.10250: S 1326688203:1326688203(0) win 1500 09:33:37.664391 IP localhost.10250 > localhost.5: S 3637551175:3637551175(0) ack 1326688204 win 32792 09:33:37.664417 IP localhost.5 > localhost.10250: R 1326688204:1326688204(0) win 0 09:33:37.650451 IP localhost.5 > localhost.10250: . ack 1 win 1500 09:33:37.650467 IP localhost.10250 > localhost.5: R 3637551176:3637551176(0) win 0 09:33:37.650481 IP localhost.5 > localhost.10250: P 1:17(16) ack 1 win 1500 09:33:37.650493 IP localhost.10250 > localhost.5: R 3637551176:3637551176(0) win 0 09:33:37.650507 IP localhost.5 > localhost.10250: R 1326688220:1326688220(0) win 1500 Is it possible that your tcpdump is screwed? -- Evgeniy Polyakov - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
strange tcp behavior
1186035057.207629127.0.0.1 -> 127.0.0.1TCP 5 > smtp [SYN] Seq=0 Len=0 1186035057.207632127.0.0.1 -> 127.0.0.1TCP smtp > 5 [SYN, ACK] Seq=0 Ack=1 Win=32792 Len=0 MSS=16396 1186035057.207666127.0.0.1 -> 127.0.0.1TCP 5 > smtp [ACK] Seq=1 Ack=1 Win=1500 Len=0 1186035057.207699127.0.0.1 -> 127.0.0.1SMTP Command: EHLO localhost 1186035057.207718127.0.0.1 -> 127.0.0.1TCP smtp > 5 [ACK] Seq=1 Ack=17 Win=32792 Len=0 1186035057.207736127.0.0.1 -> 127.0.0.1TCP 5 > smtp [RST] Seq=17 Len=0 1186035057.223934127.0.0.1 -> 127.0.0.1TCP 33787 > 5 [RST, ACK] Seq=0 Ack=0 Win=32792 Len=0 Can someone please comment as to why, tcp stack sends rst packet from the wrong source port in this situation. This is the same problem that was described in my first two posts, witch unfortunately nobody seemed to notice. Here is source code witch can reproduce the behavior described, the client side code is a complete mess but with a little bit it works. Server: #include #include #include #include #include void main(void) { int ms; int ss; struct sockaddr_in sa; char *str = "HELLO FRIEND"; struct pollfd fd; int flags; ms = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP); flags = fcntl(ms, F_GETFL, 0); fcntl(ms, F_SETFL, flags | O_NONBLOCK); memset(&sa, 0, sizeof(sa)); sa.sin_family = AF_INET; sa.sin_addr.s_addr = htonl(INADDR_ANY); sa.sin_port = htons(25); bind(ms, (struct sockaddr *) &sa, sizeof(sa)); listen(ms, 0); fd.fd = ms; fd.events = POLLIN; while(poll(&fd, 1, -1)) { ss = accept(ms, NULL, NULL); usleep(1); send(ss, str, strlen(str), MSG_NOSIGNAL); close(ss); memset(&fd, 0, sizeof(fd)); fd.fd = ms; fd.events = POLLIN; } } Client: #include #include #include #include #include #include //#include //#include struct sockaddr_in localaddr; struct sockaddr_in remoteaddr; struct sockaddr rawaddr; int sdl, sdr; struct tcphdr header; struct pheader_t { uint32_t saddr; uint32_t daddr; uint8_t r; uint8_t protocol; uint16_t length; }; struct pheader_t pheader; unsigned short tbuf[2048]; unsigned char buf[2048]; char *msg = "EHLO localhost\r\n"; unsigned char *p; char *src_addr = "127.0.0.1"; char *dst_addr = "127.0.0.1"; unsigned short sprt = 5; unsigned short dprt = 25; struct timeval tv; unsigned seq, ack_seq; int data; void mysend(void) { int i, sum; int len; if(data) { len = strlen(msg); memcpy((char *) tbuf + sizeof(pheader) + sizeof(header), msg, len); } else len = 0; bzero(&pheader, sizeof(pheader)); pheader.saddr = (in_addr_t) inet_addr(src_addr); pheader.daddr = (in_addr_t) inet_addr(dst_addr); pheader.protocol = 6; pheader.length = htons(sizeof(header) + len); memcpy(tbuf, &pheader, sizeof(pheader)); memcpy((char *) tbuf + sizeof(pheader), &header, sizeof(header)); sum = 0; for(i = 0; i < (sizeof(pheader) + sizeof(header)) / 2 + len / 2; i++) { sum += tbuf[i]; sum = (sum & 0x) + (sum >> 16); } header.check = ~sum; memcpy((char *) tbuf + sizeof(pheader), &header, sizeof(header)); sendto(sdr, (char *) tbuf + sizeof(pheader), sizeof(header) + len, 0, (struct sockaddr *) &remoteaddr, sizeof(remoteaddr)); } void main(void) { gettimeofday(&tv, NULL); srand(tv.tv_sec & tv.tv_usec); remoteaddr.sin_family = AF_INET; remoteaddr.sin_addr.s_addr = (in_addr_t) inet_addr(dst_addr); sdl = socket(PF_INET, SOCK_PACKET, htons(ETH_P_ALL)); strcpy(rawaddr.sa_data, "lo"); bind(sdl, (struct sockaddr *) &rawaddr, sizeof(rawaddr)); sdr = socket(AF_INET, SOCK_RAW, IPPROTO_TCP); bzero(&header, sizeof(header)); header.source = htons(sprt); header.dest = htons(dprt); seq = rand(); ack_seq = 0; header.seq = htonl(seq); header.ack_seq = htonl(ack_seq); header.doff = sizeof(header) / 4; header.syn = 1; header.window = htons(1500); mysend(); while(1) { recvfrom(sdl, buf, sizeof(buf), 0, NULL, NULL); // p = buf + (*buf & 0x0f) * 4; p = (buf + 14) + (*(buf + 14) & 0x0f) * 4; if(ntohs(((struct tcphdr *)p)->source) == dprt && ntohs(((struct tcphdr *)p)->dest) == sprt && ((struct tcphdr *)p)->syn == 1 && ((struct tcphdr *)p)->ack == 1) break; } bzero(&header, sizeof(header)); header.source = htons(sprt); header.dest = htons(dpr