I sure hope this will help. ***Setup*** Two server on 5.8. Establish VPN with IKEDv2. One side active, one side passive. Use rsa keys, or pass phrase if you like.
Active side: # cat /etc/iked.conf ikev2 Ouellet active from re0 to 66.63.5.250 from 66.63.50.16/28 to 0.0.0.0/0 peer 66.63.5.250 Passive side: # cat /etc/iked.conf ikev2 Ouellet passive from em0 to 108.56.142.37 from 0.0.0.0/0 to 66.63.50.16/28 peer 108.56.142.37 ***Issues*** 1. On heavy traffic, you will get many instance of SAD that will only get clean up on the expiration of the lifetime in time, even if the lifetiem is size has pass multiple times. Meaning clean up is only done on timer, not on data limit reach. 2. On heavy download the destination (Passive side), when the data limits is reach in a few occasion, the passive side wil try to change the tunnel to use NAT-T, even if there is no NAT and then the only solution is to stop/start the active side to establish the tunnel again. ***How to trigger and reproduce at will*** To easily trigger the issue often, just reduce the default with adding on both sides a much shorter life time lifetime 1m bytes 100k as this: ikev2 Ouellet active from re0 to 66.63.5.250 from 66.63.50.16/28 to 0.0.0.0/0 peer 66.63.5.250 lifetime 1m bytes 100k And then just watch the logs live with tail -f /var/log/daemon | grep iked on passive side, you will see very quickly this: ------------------------------------------------------------------------ Dec 11 20:01:32 tunnel iked[1801]: pfkey_reply: message: No such process Dec 11 20:01:32 tunnel iked[1801]: ikev2_pld_delete: deleted 1 spis Dec 11 20:01:32 tunnel iked[1801]: ikev2_msg_send: INFORMATIONAL response from 66.63.5.250:500 to 108.56.142.37:500 msgid 3, 80 bytes, NAT-T ------------------------------------------------------------------------ Then you will loose access to the tunnel completely and it will not recover until you manually reset the active side with rc.d/iked stop and start. The data limit is small, so you can trigger it with just: ping -s 1500 66.63.5.250 from the active side of the network. Or what ever way you want to generate traffic and before you know it you coudl see this: # ipsecctl -sa | wc -l 493 and the number of SAD will ONLY get reduce when the time limits is reach, even if they are not valid anymore and have been trigger by the data limits. May be the clean up should happen on both, time and data limits. Just a thought. ***Work Around*** Now to work around the problem for now, simply change the lifetime of the PASSIVE side. I just pick 2x the Active side for both time and data so that it NEVER trigger the NAT-T issue. Not an ideal solution, but for now it fix the lost of VPN at random time. You can test and do the same as above to see it with only have the active side with the same lifetime 1m bytes 100k and then the passive side with lifetime 2m bytes 200k And just flow traffic. You still will see the huge increase in SAD on the active side as the data limits get reach and new child get created, as they don't get clean up then, but only on time limits reach. But this way at a minimum, you will NOT loose your VPN. The same issue show up as well even if both side are active. It's more like a timing issue I guess possibly, but really if a VPN works without NAT I think it should never try to establish NAT-T anyway, specially if it has pass traffic constantly all the way to 500Mb, being he default and when the VPN carry huge traffic, may be it should clean up the old child on the SAD when a data limit is reach and a new child is created instead of doing it only on time limit reach, so that if you decide to setup no limit on time, then you box don't explode because of lack of resources or what not and old child are not release. Hopefully this will be useful to someone as it took me a week to isolate why in hell I loose VPN at random time on an otherwise perfectly working VPN. Best, Daniel