On Fri, Nov 24, 2006 at 05:22:05PM +0100, H?kan Olsson wrote:
> 5. the selected SPI (or "larval" SA state) on the local system is  
> updated with the keying material, timeouts etc - i.e the "real" SA is  
> finalized
> 
> This continues until all negotiations are complete -- however there  
> is a limit on how long this "larval" SA lives in the kernel... as you  
> may guess it's 60 seconds. (The idea being if a negotiation has not  
> completed in 60 seconds something has probably failed.)
> 
> Since the hosts seems to be a bit slow in running IKE negotiations,  
> you hit the 60 second limit before all negotiations are complete, all  
> remaining "larval" SAs are dropped and when isakmpd tries to "update"  
> them into real SAs this of course fails. ("No such process" approx  
> means "no SA found" here.)

Thank you for that very clear description.

Is this 60 second timeout a tunable? Or can you point me to where it's
defined in the kernel? I'd like to try increasing it.

However, at this stage I don't really understand why setting -D 5=99, which
generates copious logs, makes it work. In fact I can get to 3,000 tunnels
(6,000 flows) within a couple of minutes with this flag set. Perhaps this
extra logging delays the starts of some of the negotations, somehow
spreading the workload.

(Maybe having a workload spreading option, so that no more than N
outstanding exchanges are present at once, would be a useful control anyway)

> PS
> When I tried between two ~700Mhz P-III machines a while back, setting  
> up 4096 (or was it 8k) SAs was no problem. Another developer had a  
> scenario setting up 40960 SAs over loopback on his laptop -- mainly a  
> test of kernel memory usage, but he did not hit the 60s larval-SA  
> time limit there either.

I can think of several possibilities as to why some negotiations are taking
more than 60 seconds. For instance:

(1) The Cisco 7301 may be slow to respond. It does have a VAM2+ crypto
accelerator installed, but I don't know if it's used for isakmp exchanges,
or just for symmetric encryption/decryption. (However, 'show proc cpu
history' suggests CPU load is no more than about 25%)

(2) There may be packet loss and retransmissions, maybe due to some network
buffer overflowing, either on OpenBSD or Cisco.

The OpenBSD box is using a nasty rl0 card, because that's the only spare
interface I had available to go into the test LAN. Having said that,
watching with 'top' I don't see the interrupt load go above 10%.

I'm not sure how to probe deeper to get a handle on what's actually
happening though. Perhaps isakmpd -L logging might shed some light, although
I don't fancy decoding QM exchanges by hand :-(

Regards,

Brian.

Reply via email to