W dniu 09.03.2019 o 12:43, Otto Moerbeek pisze:
On Sat, Mar 09, 2019 at 12:10:34PM +0100, Michał Koc wrote:

W dniu 09.03.2019 o 08:15, Otto Moerbeek pisze:
On Fri, Mar 08, 2019 at 12:03:25PM +0100, Michał Koc wrote:

Hi all,

We have a triple redundant vpn gateway setup with sasyncd running and tons
of tunnels, about 1000 flows.

Looking at the graph of memory usage, you can clearly see that something is
sucking up the memory.

The graph can be viewed here: https://pasteboard.co/I4sjzQ8.jpg

Looking at the ps, sasyncd shows huge memory consumption:

USER         PID       %CPU  %MEM   VSZ          RSS        TT STAT
STARTED       TIME       COMMAND
_isakmpd 33560  0.0       17.0        699264   708508 ?? S
26Feb19        6:58.81  /usr/sbin/sasyncd

It only happens on the master node. Slaves do not show such a behavior.

There is nothing about sasyncd in the logs.

After sasyncd restart memory consumption is minimal, but tends to grow.

Is it normal ? or am I missing something ?

Best regards
M.K.

This is not normal. You could try to run with -vv to see if some error
path is taken that triggers a leak.

        -Otto

Should I look for something specific ?

The log grows pretty fast and it looks like it could contain some security
data which I wouldn't like to post online.

The statistics of the log(about 2 hours) looks like this:
carp_init:       1
config:       7
monitor_get_pfkey_snap:       4
monitor_loop:       1
net:       1
net_connect:       3
net_ctl:       4
net_disconnect_peer:       3
net_handle_messages:       2
net_queue:   91780
net_read:      10
net_send_messages:   39192
pfkey_send_flush:       4
pfkey_snapshot:    6832
timer_add:      19
timer_run:      18

Best regards
M.K.

Just the counts does not reveal anything. I did a quick audit of the
memory allocation logic of sasyncd and did not spot an error. If you
do not want to post the logs, you'll neeed to analyze them yourself.
This requires matching the log lines to the code and tracking where
stuff gets allocated and deallocated. Some digging could reveal the error.

I used to run sasyncd, but I do no longer. Settig up a test env is
quite some work I do not have time for.

        -Otto

First of all, thank You for your time. I know it's one of the most valuable resource.

We have done some analysis and we have found the problem.

The problem is that the very master machine exists as a peer in it's config.
The purpose of this is to make all of the config files to be as similar as possible for all of the members of the cluster.

Removing it from peers fixes the problem.

So there are three main roads we can follow:
1. Fix sasyncd to recognize self and handle it properly (a result)
2. It should be mentioned in manual, not to set self as a peer (an excuse)
3. We can change our internal config handling (no one else benefits)

What would You recommend as a next step ? we will do much to follow road 1, but we might need help, eg. code review and some guidance to meet OpenBSD needs

Furthermore if we follow road 1, is there any chance to get the reviewed, correct, accepted fix into the tree ?

Best regards
M.K.

Reply via email to