Re: Debugging dropped shell connections over a VPN
On 07/26/11 08:05, Gary Palmer wrote: On Tue, Jul 26, 2011 at 06:53:59AM -0500, Paul Keusemann wrote: Again, sorry for the sluggish response. On 07/20/11 15:15, Gary Palmer wrote: On Tue, Jul 12, 2011 at 02:26:34PM -0500, Paul Keusemann wrote: On 07/07/11 14:39, Chuck Swiger wrote: On Jul 7, 2011, at 4:45 AM, Paul Keusemann wrote: My setup is something like this: - My local network is a mix of AIX, HP-UX, Linux, FreeBSD and Solaris machines running various OS versions. - My gateway / firewall machine is running FreeBSD-8.1-RELEASE-p1 with ipfw, nat and racoon for the firewall and VPN. The problem is that rlogin, ssh and telnet connections over the VPN get dropped after some period of inactivity. You're probably getting NAT timeouts against the VPN connection if it is left idle. racoon ought to have a config setting called natt_keepalive which sends periodic keepalives-- see whether that's disabled. Regards, Thanks for the suggestions Chuck, sorry it's taken so long to respond but I had to reconfigure and rebuild my kernel to enable IPSEC_NAT_T in order to try this out. One thing that I did not explicitly mention before is that I am routing a network over the VPN. Hi Paul, Even if you are not being NAT'd on the VPN there may be a firewall (or other active network component like a load balancer) with an overflowing state table somewhere at the remote end. We see this frequently where I work with customer networks and the firewall/VPN/network admin denies that its a time out issue so there is likely some device in the network that has a state table and if the connection is idle for a few minutes it gets dropped. Hmmm, this seems likely. Have you had any luck in finding the culprit and resolving the problem? Unfortunately no. We know the problem exists but as a vendor we have very little success in getting the customer to identify the problematic device inside their network as it only seems to affect our connections to them when we are helping them with problems, so there is almost always something more important going on and the timeout issue gets put on the back burner and forgotten. We've worked around it in some places by using the ssh 'ServerAliveInterval' directive to make ssh send packets and keep the session open even if we're idle, but that doesn't always work. OK, I found the ClientAliveInterval, and ClientAliveCountMax setting in the ssh_config man page. I assume these are what you are referring to. I tried setting ClientAliveInterval to 15 seconds with ClientAliveCountMax set to 3 and this seems to help. I've only tried this a couple of times but I have seen an ssh session stay alive for over an hour. The bad news is that the sessions are still getting dropped, at least now I know when it happens. Now I'm getting the following message: Received disconnect from 10.64.20.69: 2: Timeout, your session not responding. From a quick perusal of the openssh source, it is not obvious whether this message is coming from the client or the server side. Initially, because the keep alive timer is a server side setting, I assumed the message was coming from the server side but if the session is not responding how is the message getting to the client? If it is a client side problem, then I have much more flexibility to fix. All I can do is whine about server side problems. Paul Gary ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" -- Paul Keusemannpkeu...@visi.com 4266 Joppa Court (952) 894-7805 Savage, MN 55378 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: kern/156978: [lagg][patch] Take lagg rlock before checking flags
Synopsis: [lagg][patch] Take lagg rlock before checking flags State-Changed-From-To: open->patched State-Changed-By: maxim State-Changed-When: Tue Jul 26 14:52:18 UTC 2011 State-Changed-Why: thompsa@ has committed the patch to HEAD in r223846. http://www.freebsd.org/cgi/query-pr.cgi?pr=156978 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: m_pkthdr.rcvif dangling pointer problem
On Tue, Jul 26, 2011 at 10:09:09AM +0100, Robert N. M. Watson wrote: > > On 25 Jul 2011, at 12:00, Daan Vreeken wrote: > > > Couldn't the dangling pointer problem be solved by adding a 'generation' > > field > > to the mbuf structure? > > The 'generation' could be a system-wide number that gets incremented > > whenever > > an interface is removed. The mbuf* functions could keep a (per CPU?) > > reference count on the number of mbufs allocated/freed during > > that 'generation'. After interface removal, the ifnet structure could be > > freed when all the reference counters of generations before the current > > generation reach zero (whenever that happens). > > I think a hybrid approach makes sense, combining a number of the ideas we've > been kicking about: > > (1) Add per-CPU ifnet refcounts that don't imply cache-line misses on each > mbuf alloc/free > (2) Add optional subsystem drain functions so that subsystems that may have > unbounded queueing times for mbufs deterministically ensure reference > release, perhaps by substituting a common deadif for outstanding dying > references. > > The former gives us actual correctness in terms of avoiding races, the latter > gives us deterministic freeing by subsystems that potentially queue mbufs > forever (i.e., TCP) but no longer require the ifnet reference. I'd like to suggest that before doing all this work we could try and see which subsystems have a real need to de-reference the reference, which fields they use, and how often. Because maybe just copying into the mbuf a blob of 8-16 bytes with useful info (a cookie, fib index, some flags, etc) could perhaps cover the majority of cases (in terms of usage frequency, not locations in the code) and let us deal with other cases by looking up the cookie in some data structure. As an example: - some functions just use rcvif to tell whether this is an incoming packet. No actual dereference; - others might only care that rcvif equals some other (already refcounted) value, so we don't have a race there. cheers luigi ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Debugging dropped shell connections over a VPN
On Tue, Jul 26, 2011 at 06:53:59AM -0500, Paul Keusemann wrote: > Again, sorry for the sluggish response. > > On 07/20/11 15:15, Gary Palmer wrote: > >On Tue, Jul 12, 2011 at 02:26:34PM -0500, Paul Keusemann wrote: > >>On 07/07/11 14:39, Chuck Swiger wrote: > >>>On Jul 7, 2011, at 4:45 AM, Paul Keusemann wrote: > My setup is something like this: > - My local network is a mix of AIX, HP-UX, Linux, FreeBSD and Solaris > machines running various OS versions. > - My gateway / firewall machine is running FreeBSD-8.1-RELEASE-p1 with > ipfw, nat and racoon for the firewall and VPN. > > The problem is that rlogin, ssh and telnet connections over the VPN get > dropped after some period of inactivity. > >>>You're probably getting NAT timeouts against the VPN connection if it is > >>>left idle. racoon ought to have a config setting called natt_keepalive > >>>which sends periodic keepalives-- see whether that's disabled. > >>> > >>>Regards, > >>Thanks for the suggestions Chuck, sorry it's taken so long to respond > >>but I had to reconfigure and rebuild my kernel to enable IPSEC_NAT_T in > >>order to try this out. > >> > >>One thing that I did not explicitly mention before is that I am routing > >>a network over the VPN. > >Hi Paul, > > > >Even if you are not being NAT'd on the VPN there may be a firewall (or > >other active network component like a load balancer) with an > >overflowing state table somewhere at the remote end. We see this > >frequently where I work with customer networks and the firewall/VPN/network > >admin denies that its a time out issue so there is likely some device in > >the network that has a state table and if the connection is idle for a > >few minutes it gets dropped. > > Hmmm, this seems likely. Have you had any luck in finding the culprit > and resolving the problem? Unfortunately no. We know the problem exists but as a vendor we have very little success in getting the customer to identify the problematic device inside their network as it only seems to affect our connections to them when we are helping them with problems, so there is almost always something more important going on and the timeout issue gets put on the back burner and forgotten. We've worked around it in some places by using the ssh 'ServerAliveInterval' directive to make ssh send packets and keep the session open even if we're idle, but that doesn't always work. Gary ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Debugging dropped shell connections over a VPN
Again, sorry for the sluggish response. On 07/20/11 15:15, Gary Palmer wrote: On Tue, Jul 12, 2011 at 02:26:34PM -0500, Paul Keusemann wrote: On 07/07/11 14:39, Chuck Swiger wrote: On Jul 7, 2011, at 4:45 AM, Paul Keusemann wrote: My setup is something like this: - My local network is a mix of AIX, HP-UX, Linux, FreeBSD and Solaris machines running various OS versions. - My gateway / firewall machine is running FreeBSD-8.1-RELEASE-p1 with ipfw, nat and racoon for the firewall and VPN. The problem is that rlogin, ssh and telnet connections over the VPN get dropped after some period of inactivity. You're probably getting NAT timeouts against the VPN connection if it is left idle. racoon ought to have a config setting called natt_keepalive which sends periodic keepalives-- see whether that's disabled. Regards, Thanks for the suggestions Chuck, sorry it's taken so long to respond but I had to reconfigure and rebuild my kernel to enable IPSEC_NAT_T in order to try this out. One thing that I did not explicitly mention before is that I am routing a network over the VPN. Hi Paul, Even if you are not being NAT'd on the VPN there may be a firewall (or other active network component like a load balancer) with an overflowing state table somewhere at the remote end. We see this frequently where I work with customer networks and the firewall/VPN/network admin denies that its a time out issue so there is likely some device in the network that has a state table and if the connection is idle for a few minutes it gets dropped. Hmmm, this seems likely. Have you had any luck in finding the culprit and resolving the problem? Regards, Gary ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" -- Paul Keusemannpkeu...@visi.com 4266 Joppa Court (952) 894-7805 Savage, MN 55378 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: Debugging dropped shell connections over a VPN
Once again, apologies for my sluggish response. The VPN problem is a background job worked on when I can or when I'm too annoyed by it to do anything else. On 07/12/11 17:42, Chuck Swiger wrote: On Jul 12, 2011, at 12:26 PM, Paul Keusemann wrote: So, any other ideas on how to debug this? Gather data with tcpdump. If you do it on one of the VPN endpoints, you ought to see the VPN contents rather than just packets going by in the encrypted tunnel. I assume by endpoint, you are talking about the target of the remote shell. Unfortunately, running tcpdump on the endpoint shows only the initial negotiation (and any interactive keyboard traffic) but nothing to indicate the connection has been dropped or timed out. If I can get some time when I don't actually need to use the VPN for work, I'm going to try to run tcpdump on the tunnel to see if there's anything going across it that might shed some light on the cause of the dropped connections. Anybody know how to get racoon to log everything to one file? Right now, depending on the log level, I am getting messages in racoon.log (specified with -l at startup), messages and debug.log. It would really be nice to have just one log to look at. This is likely governed by /etc/syslog.conf, but if you specify -l then racoon shouldn't use syslog logging. My syslog.conf foo is not good but it seems that some stuff from racoon always ends up in the messages file, even when the -l option to racoon is specified. Thanks again for the tips. -- Paul Keusemannpkeu...@visi.com 4266 Joppa Court (952) 894-7805 Savage, MN 55378 ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: m_pkthdr.rcvif dangling pointer problem
On 25 Jul 2011, at 12:00, Daan Vreeken wrote: > Couldn't the dangling pointer problem be solved by adding a 'generation' > field > to the mbuf structure? > The 'generation' could be a system-wide number that gets incremented whenever > an interface is removed. The mbuf* functions could keep a (per CPU?) > reference count on the number of mbufs allocated/freed during > that 'generation'. After interface removal, the ifnet structure could be > freed when all the reference counters of generations before the current > generation reach zero (whenever that happens). I think a hybrid approach makes sense, combining a number of the ideas we've been kicking about: (1) Add per-CPU ifnet refcounts that don't imply cache-line misses on each mbuf alloc/free (2) Add optional subsystem drain functions so that subsystems that may have unbounded queueing times for mbufs deterministically ensure reference release, perhaps by substituting a common deadif for outstanding dying references. The former gives us actual correctness in terms of avoiding races, the latter gives us deterministic freeing by subsystems that potentially queue mbufs forever (i.e., TCP) but no longer require the ifnet reference. Robert___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
Re: What does define COMMENT_ONLY mean?
On 07/23/11 04:21, Bruce Evans wrote: C didn't support support variable-sized structs before C99, and doesn't really support them now. Various hacks are used to make pseudo-structs larger or smaller than ones that can actually be declared work. The above is one. The pseudo-struct is malloc()ed and has size larger than the declared one. The above says what would be in it if it could be declared. If this were written in C99, it might declare u_char ar_foo[] in the the code instead of in a comment. But C can't really support variable- sized structs. It only allows one ar_foo[], which must be at the end of the struct. ar_foo then has a known offset but an unknown size. The other ar_bar[]'s still can't be declared, since they want to be further beyond the end of the struct, which places them at an unknown offset. A probably-less-unportable way was to declare everything in the struct but malloc() less. This only works if all the magic fields are at known offsets. This doesn't work in the above, since the fields want to have variable lengths each and thus end up at variable offsets. Such fields can be allocated from a single larger field (usually an an array), but you lose the possibility of declaring them all. Bruce I got the idea with "dynamic size", tnx:) But comment_only ...ah nevermined. Tnx for explanation, Bruce. ___ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"