Re: kern/156978: [lagg][patch] Take lagg rlock before checking flags
Synopsis: [lagg][patch] Take lagg rlock before checking flags State-Changed-From-To: patched-closed State-Changed-By: maxim State-Changed-When: Wed Jul 27 07:02:47 UTC 2011 State-Changed-Why: Merged to RELENG_8. http://www.freebsd.org/cgi/query-pr.cgi?pr=156978 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Debugging dropped shell connections over a VPN
On Tue, Jul 26, 2011 at 01:35:16PM -0500, Paul Keusemann wrote: On 07/26/11 08:05, Gary Palmer wrote: On Tue, Jul 26, 2011 at 06:53:59AM -0500, Paul Keusemann wrote: Again, sorry for the sluggish response. On 07/20/11 15:15, Gary Palmer wrote: On Tue, Jul 12, 2011 at 02:26:34PM -0500, Paul Keusemann wrote: On 07/07/11 14:39, Chuck Swiger wrote: On Jul 7, 2011, at 4:45 AM, Paul Keusemann wrote: My setup is something like this: - My local network is a mix of AIX, HP-UX, Linux, FreeBSD and Solaris machines running various OS versions. - My gateway / firewall machine is running FreeBSD-8.1-RELEASE-p1 with ipfw, nat and racoon for the firewall and VPN. The problem is that rlogin, ssh and telnet connections over the VPN get dropped after some period of inactivity. You're probably getting NAT timeouts against the VPN connection if it is left idle. racoon ought to have a config setting called natt_keepalive which sends periodic keepalives-- see whether that's disabled. Regards, Thanks for the suggestions Chuck, sorry it's taken so long to respond but I had to reconfigure and rebuild my kernel to enable IPSEC_NAT_T in order to try this out. One thing that I did not explicitly mention before is that I am routing a network over the VPN. Hi Paul, Even if you are not being NAT'd on the VPN there may be a firewall (or other active network component like a load balancer) with an overflowing state table somewhere at the remote end. We see this frequently where I work with customer networks and the firewall/VPN/network admin denies that its a time out issue so there is likely some device in the network that has a state table and if the connection is idle for a few minutes it gets dropped. Hmmm, this seems likely. Have you had any luck in finding the culprit and resolving the problem? Unfortunately no. We know the problem exists but as a vendor we have very little success in getting the customer to identify the problematic device inside their network as it only seems to affect our connections to them when we are helping them with problems, so there is almost always something more important going on and the timeout issue gets put on the back burner and forgotten. We've worked around it in some places by using the ssh 'ServerAliveInterval' directive to make ssh send packets and keep the session open even if we're idle, but that doesn't always work. OK, I found the ClientAliveInterval, and ClientAliveCountMax setting in the ssh_config man page. I assume these are what you are referring to. I tried setting ClientAliveInterval to 15 seconds with ClientAliveCountMax set to 3 and this seems to help. I've only tried this a couple of times but I have seen an ssh session stay alive for over an hour. The bad news is that the sessions are still getting dropped, at least now I know when it happens. Now I'm getting the following message: Received disconnect from 10.64.20.69: 2: Timeout, your session not responding. From a quick perusal of the openssh source, it is not obvious whether this message is coming from the client or the server side. Initially, because the keep alive timer is a server side setting, I assumed the message was coming from the server side but if the session is not responding how is the message getting to the client? If it is a client side problem, then I have much more flexibility to fix. All I can do is whine about server side problems. Hi Paul, ServerAliveInterval is actually a client setting. e.g. put this in your ~/.ssh/config file host * ServerAliveInterval 15 will set the client to ping the server every 15 seconds and try to keep the connection alive. You can replace '*' you want to be more targeted in your configuration. I've never played with the server side settings for various reasons. Regards, Gary ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: Debugging dropped shell connections over a VPN
On 07/27/11 06:50, Gary Palmer wrote: On Tue, Jul 26, 2011 at 01:35:16PM -0500, Paul Keusemann wrote: On 07/26/11 08:05, Gary Palmer wrote: On Tue, Jul 26, 2011 at 06:53:59AM -0500, Paul Keusemann wrote: Again, sorry for the sluggish response. On 07/20/11 15:15, Gary Palmer wrote: On Tue, Jul 12, 2011 at 02:26:34PM -0500, Paul Keusemann wrote: On 07/07/11 14:39, Chuck Swiger wrote: On Jul 7, 2011, at 4:45 AM, Paul Keusemann wrote: My setup is something like this: - My local network is a mix of AIX, HP-UX, Linux, FreeBSD and Solaris machines running various OS versions. - My gateway / firewall machine is running FreeBSD-8.1-RELEASE-p1 with ipfw, nat and racoon for the firewall and VPN. The problem is that rlogin, ssh and telnet connections over the VPN get dropped after some period of inactivity. You're probably getting NAT timeouts against the VPN connection if it is left idle. racoon ought to have a config setting called natt_keepalive which sends periodic keepalives-- see whether that's disabled. Regards, Thanks for the suggestions Chuck, sorry it's taken so long to respond but I had to reconfigure and rebuild my kernel to enable IPSEC_NAT_T in order to try this out. One thing that I did not explicitly mention before is that I am routing a network over the VPN. Hi Paul, Even if you are not being NAT'd on the VPN there may be a firewall (or other active network component like a load balancer) with an overflowing state table somewhere at the remote end. We see this frequently where I work with customer networks and the firewall/VPN/network admin denies that its a time out issue so there is likely some device in the network that has a state table and if the connection is idle for a few minutes it gets dropped. Hmmm, this seems likely. Have you had any luck in finding the culprit and resolving the problem? Unfortunately no. We know the problem exists but as a vendor we have very little success in getting the customer to identify the problematic device inside their network as it only seems to affect our connections to them when we are helping them with problems, so there is almost always something more important going on and the timeout issue gets put on the back burner and forgotten. We've worked around it in some places by using the ssh 'ServerAliveInterval' directive to make ssh send packets and keep the session open even if we're idle, but that doesn't always work. OK, I found the ClientAliveInterval, and ClientAliveCountMax setting in the ssh_config man page. I assume these are what you are referring to. I tried setting ClientAliveInterval to 15 seconds with ClientAliveCountMax set to 3 and this seems to help. I've only tried this a couple of times but I have seen an ssh session stay alive for over an hour. The bad news is that the sessions are still getting dropped, at least now I know when it happens. Now I'm getting the following message: Received disconnect from 10.64.20.69: 2: Timeout, your session not responding. From a quick perusal of the openssh source, it is not obvious whether this message is coming from the client or the server side. Initially, because the keep alive timer is a server side setting, I assumed the message was coming from the server side but if the session is not responding how is the message getting to the client? If it is a client side problem, then I have much more flexibility to fix. All I can do is whine about server side problems. Hi Paul, ServerAliveInterval is actually a client setting. e.g. put this in your ~/.ssh/config file host * ServerAliveInterval 15 will set the client to ping the server every 15 seconds and try to keep the connection alive. You can replace '*' you want to be more targeted in your configuration. Ah, I see. I was looking at the Solaris ssh_config man page. The OpenSSH ssh_config man page is third in the sequence. The ServerAlive* options are not documented in the Solaris ssh_config man page. I'll try it out too. Thanks. I've never played with the server side settings for various reasons. Regards, Gary ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org -- Paul Keusemannpkeu...@visi.com 4266 Joppa Court (952) 894-7805 Savage, MN 55378 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
pxeboot hangs if link bounces
I'm trying to track down a very mysterious problem I've having with pxebooting on 8.2-RELEASE and head (as of r224149). I've found that if I bring the link to a PXE client down and back up while it's in pxeboot, pxeboot hangs and never recovers. I've tried bouncing the link to the NFS server and I see the same behaviour, so I believe that the issue is really due to packet loss. I added printfs to the loader and I've found that the hang occurs when in vm86int() when it's called from pxe_call, from readudp. Once I bring down the link, vm86int() never returns. That's a BIOS bug -- PXENV_UDP_READ is defined to be non-blocking by the spec -- but what utterly mystifies me is that a 6.1-RELEASE pxeldr does not get blocked in a PXENV_UDP_READ call. I'm going to try a binary search to narrow down what set of changes introduced the problem, but I'm hoping that somebody might have an idea as to what could have changed that triggers this BIOS bug. I've tried building with -DOLD_NFSv2 but that didn't resolve the problem. Ryan Stone ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
Re: pxeboot hangs if link bounces
On Wed, Jul 27, 2011 at 12:45 PM, Ryan Stone ryst...@gmail.com wrote: I'm going to try a binary search to narrow down what set of changes introduced the problem, On that note, is there an faster way to cleanly build the boot code than a full buildworld? I can just make in sys/boot, but that picks up the system's libstand and include files. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org