I sure hope this will help.

***Setup***
Two server on 5.8. Establish VPN with IKEDv2. One side active, one side
passive. Use rsa keys, or pass phrase if you like.

Active side:
# cat /etc/iked.conf
ikev2 Ouellet active from re0 to 66.63.5.250 from 66.63.50.16/28 to
0.0.0.0/0 peer 66.63.5.250

Passive side:
# cat /etc/iked.conf
ikev2 Ouellet passive from em0 to 108.56.142.37 from 0.0.0.0/0 to
66.63.50.16/28 peer 108.56.142.37

***Issues***
1. On heavy traffic, you will get many instance of SAD that will only
get clean up on the expiration of the lifetime in time, even if the
lifetiem is size has pass multiple times. Meaning clean up is only done
on timer, not on data limit reach.

2. On heavy download the destination (Passive side), when the data
limits is reach in a few occasion, the passive side wil try to change
the tunnel to use NAT-T, even if there is no NAT and then the only
solution is to stop/start the active side to establish the tunnel again.

***How to trigger and reproduce at will***
To easily trigger the issue often, just reduce the default with adding
on both sides a much shorter life time

lifetime 1m bytes 100k

as this:

ikev2 Ouellet active from re0 to 66.63.5.250 from 66.63.50.16/28 to
0.0.0.0/0 peer 66.63.5.250 lifetime 1m bytes 100k

And then just watch the logs live with
tail -f /var/log/daemon | grep iked
on passive side, you will see very quickly this:

------------------------------------------------------------------------
Dec 11 20:01:32 tunnel iked[1801]: pfkey_reply: message: No such process
Dec 11 20:01:32 tunnel iked[1801]: ikev2_pld_delete: deleted 1 spis
Dec 11 20:01:32 tunnel iked[1801]: ikev2_msg_send: INFORMATIONAL
response from 66.63.5.250:500 to 108.56.142.37:500 msgid 3, 80 bytes, NAT-T
------------------------------------------------------------------------

Then you will loose access to the tunnel completely and it will not
recover until you manually reset the active side with rc.d/iked stop and
start.

The data limit is small, so you can trigger it with just:

ping -s 1500 66.63.5.250 from the active side of the network. Or what
ever way you want to generate traffic and before you know it you coudl
see this:

# ipsecctl -sa | wc -l
     493

and the number of SAD will ONLY get reduce when the time limits is
reach, even if they are not valid anymore and have been trigger by the
data limits.

May be the clean up should happen on both, time and data limits. Just a
thought.

***Work Around***
Now to work around the problem for now, simply change the lifetime of
the PASSIVE side. I just pick 2x the Active side for both time and data
so that it NEVER trigger the NAT-T issue. Not an ideal solution, but for
now it fix the lost of VPN at random time.

You can test and do the same as above to see it with only have the
active side with the same

lifetime 1m bytes 100k

and then the passive side with

lifetime 2m bytes 200k

And just flow traffic.

You still will see the huge increase in SAD on the active side as the
data limits get reach and new child get created, as they don't get clean
up then, but only on time limits reach.

But this way at a minimum, you will NOT loose your VPN.

The same issue show up as well even if both side are active. It's more
like a timing issue I guess possibly, but really if a VPN works without
NAT I think it should never try to establish NAT-T anyway, specially if
it has pass traffic constantly all the way to 500Mb, being he default
and when the VPN carry huge traffic, may be it should clean up the old
child on the SAD when a data limit is reach and a new child is created
instead of doing it only on time limit reach, so that if you decide to
setup no limit on time, then you box don't explode because of lack of
resources or what not and old child are not release.

Hopefully this will be useful to someone as it took me a week to isolate
why in hell I loose VPN at random time on an otherwise perfectly working
VPN.

Best,

Daniel

Reply via email to