Re: Debugging dropped shell connections over a VPN

2011-07-27 Thread Gary Palmer
On Tue, Jul 26, 2011 at 01:35:16PM -0500, Paul Keusemann wrote:
 On 07/26/11 08:05, Gary Palmer wrote:
 On Tue, Jul 26, 2011 at 06:53:59AM -0500, Paul Keusemann wrote:
 Again, sorry for the sluggish response.
 
 On 07/20/11 15:15, Gary Palmer wrote:
 On Tue, Jul 12, 2011 at 02:26:34PM -0500, Paul Keusemann wrote:
 On 07/07/11 14:39, Chuck Swiger wrote:
 On Jul 7, 2011, at 4:45 AM, Paul Keusemann wrote:
 My setup is something like this:
 - My local network is a mix of AIX, HP-UX, Linux, FreeBSD and Solaris
 machines running various OS versions.
 - My gateway / firewall  machine is running FreeBSD-8.1-RELEASE-p1 
 with
 ipfw, nat and racoon for the firewall and VPN.
 
 The problem is that rlogin, ssh and telnet connections over the VPN 
 get
 dropped after some period of inactivity.
 You're probably getting NAT timeouts against the VPN connection if it 
 is
 left idle.  racoon ought to have a config setting called natt_keepalive
 which sends periodic keepalives-- see whether that's disabled.
 
 Regards,
 Thanks for the suggestions Chuck, sorry it's taken so long to respond
 but I had to reconfigure and rebuild my kernel to enable IPSEC_NAT_T in
 order to try this out.
 
 One thing that I did not explicitly mention before is that I am routing
 a network over the VPN.
 Hi Paul,
 
 Even if you are not being NAT'd on the VPN there may be a firewall (or
 other active network component like a load balancer) with an
 overflowing state table somewhere at the remote end.  We see this
 frequently where I work with customer networks and the 
 firewall/VPN/network
 admin denies that its a time out issue so there is likely some device in
 the network that has a state table and if the connection is idle for a
 few minutes it gets dropped.
 Hmmm,  this seems likely.  Have you had any luck in finding the culprit
 and resolving the problem?
 Unfortunately no.  We know the problem exists but as a vendor we have
 very little success in getting the customer to identify the problematic
 device inside their network as it only seems to affect our connections
 to them when we are helping them with problems, so there is almost
 always something more important going on and the timeout issue gets put
 on the back burner and forgotten.  We've worked around it in some
 places by using the ssh 'ServerAliveInterval' directive to make ssh
 send packets and keep the session open even if we're idle, but that
 doesn't always work.
 
 OK, I found the ClientAliveInterval, and ClientAliveCountMax setting in 
 the ssh_config man page.  I assume these are what you are referring to.  
 I tried setting ClientAliveInterval to 15 seconds with 
 ClientAliveCountMax set to 3 and this seems to help.  I've only tried 
 this a couple of times but I have seen an ssh session stay alive for 
 over an hour.  The bad news is that the sessions are still getting 
 dropped, at least now I know when it happens.  Now I'm getting the 
 following message:
 
 Received disconnect from 10.64.20.69: 2: Timeout, your session not 
 responding.
 
 From a quick perusal of the openssh source, it is not obvious whether 
 this message is coming from the client or the server side.   Initially, 
 because the keep alive timer is a server side setting, I assumed the 
 message was coming from the server side but if the session is not 
 responding how is the message getting to the client?  If it is a client 
 side problem, then I have much more flexibility to fix.  All I can do is 
 whine about server side problems.


Hi Paul,

ServerAliveInterval is actually a client setting.  e.g.  put this in
your ~/.ssh/config file

host *
ServerAliveInterval 15

will set the client to ping the server every 15 seconds and try to
keep the connection alive.  You can replace '*' you want to be more
targeted in your configuration.

I've never played with the server side settings for various reasons.

Regards,

Gary
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Debugging dropped shell connections over a VPN

2011-07-27 Thread Paul Keusemann

On 07/27/11 06:50, Gary Palmer wrote:

On Tue, Jul 26, 2011 at 01:35:16PM -0500, Paul Keusemann wrote:

On 07/26/11 08:05, Gary Palmer wrote:

On Tue, Jul 26, 2011 at 06:53:59AM -0500, Paul Keusemann wrote:

Again, sorry for the sluggish response.

On 07/20/11 15:15, Gary Palmer wrote:

On Tue, Jul 12, 2011 at 02:26:34PM -0500, Paul Keusemann wrote:

On 07/07/11 14:39, Chuck Swiger wrote:

On Jul 7, 2011, at 4:45 AM, Paul Keusemann wrote:

My setup is something like this:
- My local network is a mix of AIX, HP-UX, Linux, FreeBSD and Solaris
machines running various OS versions.
- My gateway / firewall  machine is running FreeBSD-8.1-RELEASE-p1
with
ipfw, nat and racoon for the firewall and VPN.

The problem is that rlogin, ssh and telnet connections over the VPN
get
dropped after some period of inactivity.

You're probably getting NAT timeouts against the VPN connection if it
is
left idle.  racoon ought to have a config setting called natt_keepalive
which sends periodic keepalives-- see whether that's disabled.

Regards,

Thanks for the suggestions Chuck, sorry it's taken so long to respond
but I had to reconfigure and rebuild my kernel to enable IPSEC_NAT_T in
order to try this out.

One thing that I did not explicitly mention before is that I am routing
a network over the VPN.

Hi Paul,

Even if you are not being NAT'd on the VPN there may be a firewall (or
other active network component like a load balancer) with an
overflowing state table somewhere at the remote end.  We see this
frequently where I work with customer networks and the
firewall/VPN/network
admin denies that its a time out issue so there is likely some device in
the network that has a state table and if the connection is idle for a
few minutes it gets dropped.

Hmmm,  this seems likely.  Have you had any luck in finding the culprit
and resolving the problem?

Unfortunately no.  We know the problem exists but as a vendor we have
very little success in getting the customer to identify the problematic
device inside their network as it only seems to affect our connections
to them when we are helping them with problems, so there is almost
always something more important going on and the timeout issue gets put
on the back burner and forgotten.  We've worked around it in some
places by using the ssh 'ServerAliveInterval' directive to make ssh
send packets and keep the session open even if we're idle, but that
doesn't always work.

OK, I found the ClientAliveInterval, and ClientAliveCountMax setting in
the ssh_config man page.  I assume these are what you are referring to.
I tried setting ClientAliveInterval to 15 seconds with
ClientAliveCountMax set to 3 and this seems to help.  I've only tried
this a couple of times but I have seen an ssh session stay alive for
over an hour.  The bad news is that the sessions are still getting
dropped, at least now I know when it happens.  Now I'm getting the
following message:

 Received disconnect from 10.64.20.69: 2: Timeout, your session not
responding.

 From a quick perusal of the openssh source, it is not obvious whether
this message is coming from the client or the server side.   Initially,
because the keep alive timer is a server side setting, I assumed the
message was coming from the server side but if the session is not
responding how is the message getting to the client?  If it is a client
side problem, then I have much more flexibility to fix.  All I can do is
whine about server side problems.


Hi Paul,

ServerAliveInterval is actually a client setting.  e.g.  put this in
your ~/.ssh/config file

host *
ServerAliveInterval 15

will set the client to ping the server every 15 seconds and try to
keep the connection alive.  You can replace '*' you want to be more
targeted in your configuration.


Ah, I see.  I was looking at the Solaris ssh_config man page.  The 
OpenSSH ssh_config man page is third in the sequence.  The ServerAlive* 
options are not documented in the Solaris ssh_config man page.  I'll try 
it out too.  Thanks.



I've never played with the server side settings for various reasons.

Regards,

Gary
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org




--
Paul Keusemannpkeu...@visi.com
4266 Joppa Court  (952) 894-7805
Savage, MN  55378

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Debugging dropped shell connections over a VPN

2011-07-26 Thread Paul Keusemann
Once again, apologies for my sluggish response.  The VPN problem is a 
background job worked on when I can or when I'm too annoyed by it to do 
anything else.


On 07/12/11 17:42, Chuck Swiger wrote:

On Jul 12, 2011, at 12:26 PM, Paul Keusemann wrote:

So, any other ideas on how to debug this?

Gather data with tcpdump.  If you do it on one of the VPN endpoints, you ought 
to see the VPN contents rather than just packets going by in the encrypted 
tunnel.



I assume by endpoint, you are talking about the target of the remote 
shell.  Unfortunately, running tcpdump on the endpoint shows only the 
initial negotiation (and any interactive keyboard traffic) but nothing 
to indicate the connection has been dropped or timed out.


If I can get some time when I don't actually need to use the VPN for 
work, I'm going to try to run tcpdump on the tunnel to see if there's 
anything going across it that might shed some light on the cause of the 
dropped connections.



Anybody know how to get racoon to log everything to one file?  Right now, 
depending on the log level, I am getting messages in racoon.log (specified with 
-l at startup), messages and debug.log.  It would really be nice to have just 
one log to look at.

This is likely governed by /etc/syslog.conf, but if you specify -l then racoon 
shouldn't use syslog logging.


My syslog.conf foo is not good but it seems that some stuff  from racoon 
always ends up in the messages file, even when the -l option to racoon 
is specified.


Thanks again for the tips.

--
Paul Keusemannpkeu...@visi.com
4266 Joppa Court  (952) 894-7805
Savage, MN  55378

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Debugging dropped shell connections over a VPN

2011-07-26 Thread Paul Keusemann

Again, sorry for the sluggish response.

On 07/20/11 15:15, Gary Palmer wrote:

On Tue, Jul 12, 2011 at 02:26:34PM -0500, Paul Keusemann wrote:

On 07/07/11 14:39, Chuck Swiger wrote:

On Jul 7, 2011, at 4:45 AM, Paul Keusemann wrote:

My setup is something like this:
- My local network is a mix of AIX, HP-UX, Linux, FreeBSD and Solaris
machines running various OS versions.
- My gateway / firewall  machine is running FreeBSD-8.1-RELEASE-p1 with
ipfw, nat and racoon for the firewall and VPN.

The problem is that rlogin, ssh and telnet connections over the VPN get
dropped after some period of inactivity.

You're probably getting NAT timeouts against the VPN connection if it is
left idle.  racoon ought to have a config setting called natt_keepalive
which sends periodic keepalives-- see whether that's disabled.

Regards,

Thanks for the suggestions Chuck, sorry it's taken so long to respond
but I had to reconfigure and rebuild my kernel to enable IPSEC_NAT_T in
order to try this out.

One thing that I did not explicitly mention before is that I am routing
a network over the VPN.

Hi Paul,

Even if you are not being NAT'd on the VPN there may be a firewall (or
other active network component like a load balancer) with an
overflowing state table somewhere at the remote end.  We see this
frequently where I work with customer networks and the firewall/VPN/network
admin denies that its a time out issue so there is likely some device in
the network that has a state table and if the connection is idle for a
few minutes it gets dropped.


Hmmm,  this seems likely.  Have you had any luck in finding the culprit 
and resolving the problem?




Regards,

Gary
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org




--
Paul Keusemannpkeu...@visi.com
4266 Joppa Court  (952) 894-7805
Savage, MN  55378

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Debugging dropped shell connections over a VPN

2011-07-26 Thread Gary Palmer
On Tue, Jul 26, 2011 at 06:53:59AM -0500, Paul Keusemann wrote:
 Again, sorry for the sluggish response.
 
 On 07/20/11 15:15, Gary Palmer wrote:
 On Tue, Jul 12, 2011 at 02:26:34PM -0500, Paul Keusemann wrote:
 On 07/07/11 14:39, Chuck Swiger wrote:
 On Jul 7, 2011, at 4:45 AM, Paul Keusemann wrote:
 My setup is something like this:
 - My local network is a mix of AIX, HP-UX, Linux, FreeBSD and Solaris
 machines running various OS versions.
 - My gateway / firewall  machine is running FreeBSD-8.1-RELEASE-p1 with
 ipfw, nat and racoon for the firewall and VPN.
 
 The problem is that rlogin, ssh and telnet connections over the VPN get
 dropped after some period of inactivity.
 You're probably getting NAT timeouts against the VPN connection if it is
 left idle.  racoon ought to have a config setting called natt_keepalive
 which sends periodic keepalives-- see whether that's disabled.
 
 Regards,
 Thanks for the suggestions Chuck, sorry it's taken so long to respond
 but I had to reconfigure and rebuild my kernel to enable IPSEC_NAT_T in
 order to try this out.
 
 One thing that I did not explicitly mention before is that I am routing
 a network over the VPN.
 Hi Paul,
 
 Even if you are not being NAT'd on the VPN there may be a firewall (or
 other active network component like a load balancer) with an
 overflowing state table somewhere at the remote end.  We see this
 frequently where I work with customer networks and the firewall/VPN/network
 admin denies that its a time out issue so there is likely some device in
 the network that has a state table and if the connection is idle for a
 few minutes it gets dropped.
 
 Hmmm,  this seems likely.  Have you had any luck in finding the culprit 
 and resolving the problem?

Unfortunately no.  We know the problem exists but as a vendor we have
very little success in getting the customer to identify the problematic
device inside their network as it only seems to affect our connections
to them when we are helping them with problems, so there is almost
always something more important going on and the timeout issue gets put
on the back burner and forgotten.  We've worked around it in some
places by using the ssh 'ServerAliveInterval' directive to make ssh
send packets and keep the session open even if we're idle, but that
doesn't always work.

Gary
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Debugging dropped shell connections over a VPN

2011-07-26 Thread Paul Keusemann

On 07/26/11 08:05, Gary Palmer wrote:

On Tue, Jul 26, 2011 at 06:53:59AM -0500, Paul Keusemann wrote:

Again, sorry for the sluggish response.

On 07/20/11 15:15, Gary Palmer wrote:

On Tue, Jul 12, 2011 at 02:26:34PM -0500, Paul Keusemann wrote:

On 07/07/11 14:39, Chuck Swiger wrote:

On Jul 7, 2011, at 4:45 AM, Paul Keusemann wrote:

My setup is something like this:
- My local network is a mix of AIX, HP-UX, Linux, FreeBSD and Solaris
machines running various OS versions.
- My gateway / firewall  machine is running FreeBSD-8.1-RELEASE-p1 with
ipfw, nat and racoon for the firewall and VPN.

The problem is that rlogin, ssh and telnet connections over the VPN get
dropped after some period of inactivity.

You're probably getting NAT timeouts against the VPN connection if it is
left idle.  racoon ought to have a config setting called natt_keepalive
which sends periodic keepalives-- see whether that's disabled.

Regards,

Thanks for the suggestions Chuck, sorry it's taken so long to respond
but I had to reconfigure and rebuild my kernel to enable IPSEC_NAT_T in
order to try this out.

One thing that I did not explicitly mention before is that I am routing
a network over the VPN.

Hi Paul,

Even if you are not being NAT'd on the VPN there may be a firewall (or
other active network component like a load balancer) with an
overflowing state table somewhere at the remote end.  We see this
frequently where I work with customer networks and the firewall/VPN/network
admin denies that its a time out issue so there is likely some device in
the network that has a state table and if the connection is idle for a
few minutes it gets dropped.

Hmmm,  this seems likely.  Have you had any luck in finding the culprit
and resolving the problem?

Unfortunately no.  We know the problem exists but as a vendor we have
very little success in getting the customer to identify the problematic
device inside their network as it only seems to affect our connections
to them when we are helping them with problems, so there is almost
always something more important going on and the timeout issue gets put
on the back burner and forgotten.  We've worked around it in some
places by using the ssh 'ServerAliveInterval' directive to make ssh
send packets and keep the session open even if we're idle, but that
doesn't always work.


OK, I found the ClientAliveInterval, and ClientAliveCountMax setting in 
the ssh_config man page.  I assume these are what you are referring to.  
I tried setting ClientAliveInterval to 15 seconds with 
ClientAliveCountMax set to 3 and this seems to help.  I've only tried 
this a couple of times but I have seen an ssh session stay alive for 
over an hour.  The bad news is that the sessions are still getting 
dropped, at least now I know when it happens.  Now I'm getting the 
following message:


Received disconnect from 10.64.20.69: 2: Timeout, your session not 
responding.


From a quick perusal of the openssh source, it is not obvious whether 
this message is coming from the client or the server side.   Initially, 
because the keep alive timer is a server side setting, I assumed the 
message was coming from the server side but if the session is not 
responding how is the message getting to the client?  If it is a client 
side problem, then I have much more flexibility to fix.  All I can do is 
whine about server side problems.


Paul



Gary
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org




--
Paul Keusemannpkeu...@visi.com
4266 Joppa Court  (952) 894-7805
Savage, MN  55378

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Debugging dropped shell connections over a VPN

2011-07-20 Thread Gary Palmer
On Tue, Jul 12, 2011 at 02:26:34PM -0500, Paul Keusemann wrote:
 On 07/07/11 14:39, Chuck Swiger wrote:
 On Jul 7, 2011, at 4:45 AM, Paul Keusemann wrote:
 My setup is something like this:
 - My local network is a mix of AIX, HP-UX, Linux, FreeBSD and Solaris 
 machines running various OS versions.
 - My gateway / firewall  machine is running FreeBSD-8.1-RELEASE-p1 with 
 ipfw, nat and racoon for the firewall and VPN.
 
 The problem is that rlogin, ssh and telnet connections over the VPN get 
 dropped after some period of inactivity.
 You're probably getting NAT timeouts against the VPN connection if it is 
 left idle.  racoon ought to have a config setting called natt_keepalive 
 which sends periodic keepalives-- see whether that's disabled.
 
 Regards,
 
 Thanks for the suggestions Chuck, sorry it's taken so long to respond 
 but I had to reconfigure and rebuild my kernel to enable IPSEC_NAT_T in 
 order to try this out.
 
 One thing that I did not explicitly mention before is that I am routing 
 a network over the VPN.

Hi Paul,

Even if you are not being NAT'd on the VPN there may be a firewall (or
other active network component like a load balancer) with an
overflowing state table somewhere at the remote end.  We see this 
frequently where I work with customer networks and the firewall/VPN/network
admin denies that its a time out issue so there is likely some device in
the network that has a state table and if the connection is idle for a
few minutes it gets dropped.

Regards,

Gary
___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Debugging dropped shell connections over a VPN

2011-07-12 Thread Paul Keusemann

On 07/07/11 14:39, Chuck Swiger wrote:

On Jul 7, 2011, at 4:45 AM, Paul Keusemann wrote:

My setup is something like this:
- My local network is a mix of AIX, HP-UX, Linux, FreeBSD and Solaris machines 
running various OS versions.
- My gateway / firewall  machine is running FreeBSD-8.1-RELEASE-p1 with ipfw, 
nat and racoon for the firewall and VPN.

The problem is that rlogin, ssh and telnet connections over the VPN get dropped 
after some period of inactivity.

You're probably getting NAT timeouts against the VPN connection if it is left 
idle.  racoon ought to have a config setting called natt_keepalive which sends 
periodic keepalives-- see whether that's disabled.

Regards,


Thanks for the suggestions Chuck, sorry it's taken so long to respond 
but I had to reconfigure and rebuild my kernel to enable IPSEC_NAT_T in 
order to try this out.


One thing that I did not explicitly mention before is that I am routing 
a network over the VPN.


I did not have previously NAT-Traversal enabled nor was it configured in 
my kernel.  After reconfiguring, compiling and installing the new 
kernel, I added the following to the phase 1 configuration for my VPN:


timer
{
# Default is 20 seconds.
natt_keepalive 10 sec;
}

# Enable NAT traversal.
#nat_traversal on;
nat_traversal force;

# Enable IKE fragmentation.
ike_frag on;

# Enable ESP fragmentaion at 552 bytes.
esp_frag 552;

The only immediately noticeable change is that I am no longer getting 
the following warnings at racoon startup:


WARNING: setsockopt(UDP_ENCAP_ESPINUDP_NON_IKE): UDP_ENCAP 
Invalid argument


I assume this is the result of adding IPSEC_NAT_T to the kernel config.  
My shell connections are still being dropped, so I'm pretty much back to 
square one.


So, any other ideas on how to debug this?

Anybody know how to get racoon to log everything to one file?  Right 
now, depending on the log level, I am getting messages in racoon.log 
(specified with -l at startup), messages and debug.log.  It would really 
be nice to have just one log to look at.


--
Paul Keusemannpkeu...@visi.com
4266 Joppa Court  (952) 894-7805
Savage, MN  55378

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Debugging dropped shell connections over a VPN

2011-07-12 Thread Chuck Swiger
On Jul 12, 2011, at 12:26 PM, Paul Keusemann wrote:
 So, any other ideas on how to debug this?

Gather data with tcpdump.  If you do it on one of the VPN endpoints, you ought 
to see the VPN contents rather than just packets going by in the encrypted 
tunnel.

 Anybody know how to get racoon to log everything to one file?  Right now, 
 depending on the log level, I am getting messages in racoon.log (specified 
 with -l at startup), messages and debug.log.  It would really be nice to have 
 just one log to look at.

This is likely governed by /etc/syslog.conf, but if you specify -l then racoon 
shouldn't use syslog logging.

Regards,
-- 
-Chuck

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org


Re: Debugging dropped shell connections over a VPN

2011-07-07 Thread Chuck Swiger
On Jul 7, 2011, at 4:45 AM, Paul Keusemann wrote:
 My setup is something like this:
 - My local network is a mix of AIX, HP-UX, Linux, FreeBSD and Solaris 
 machines running various OS versions.
 - My gateway / firewall  machine is running FreeBSD-8.1-RELEASE-p1 with ipfw, 
 nat and racoon for the firewall and VPN.
 
 The problem is that rlogin, ssh and telnet connections over the VPN get 
 dropped after some period of inactivity.

You're probably getting NAT timeouts against the VPN connection if it is left 
idle.  racoon ought to have a config setting called natt_keepalive which sends 
periodic keepalives-- see whether that's disabled.

Regards,
-- 
-Chuck

___
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org