> ovn scale test, 3 clustered sb, 10 sb relay server, 1000 sandbox。
> The maximum memory used by the sb relay server is over 26G
> 
> 
> [root at node-4 ~]# for i in `kubectl get pods -n openstack | grep 
> ovsdb-sb-relay | awk '{print $1}'`; do kubectl exec -it -n openstack $i -- 
> top -bn1| grep ovsdb; done 2 9 root 20 0 7558724 7.2g 8408 S 0.0 1.4 33:36.03 
> ovsdb-serv+ 3 9 root 20 0 8589716 8.1g 7924 S 0.0 1.6 28:19.82 ovsdb-serv+ 4 
> 9 root 20 0 14.5g 14.5g 8284 R 100.0 2.9 78:00.86 ovsdb-serv+ 5 9 root 20 0 
> 26.2g 24.9g 7744 R 100.0 5.0 77:10.86 ovsdb-serv+ 6 9 root 20 0 27.5g 26.7g 
> 8076 R 100.0 5.3 30:55.49 ovsdb-serv+ 7 10 root 20 0 8835412 8.0g 8148 R 
> 100.0 3.2 11:30.74 ovsdb-serv+ 8 9 root 20 0 8835424 8.0g 8396 S 6.7 1.6 
> 7:44.34 ovsdb-serv+ 9 9 root 20 0 7678636 7.3g 8132 S 0.0 2.9 1:25.33 
> ovsdb-serv+ 10 9 root 20 0 12.6g 10.7g 8188 R 100.0 2.1 107:08.83 ovsdb-serv+ 
> 11 9 root 20 0 7479468 7.1g 8344 S 80.0 1.4 45:50.82 ovsdb-serv+
> 

Hi.  Thanks for testing and for the report!

There might be different causes for the memory consumption.
Memory lick is one of them, obviously, but there are also other
possibilities.  Would be great if you can provide logs from
these relay servers or, at least, grep messages from the 'memory'
module out of them, so we can exclude common suspects for memory
consumption.  For example, since the cluster is fairly big
(I assume, you have 1000 ovn-controllers connected, right?) it
might be that if ovn-controllers are slow in receiving updates,
backlog might grow on jsonrpc monitors on the side of ovsdb-relay.
You may look for messages in the log that contains the 'backlog'
word or other suspicious parts for memory logs.

We should also try to exclude the glibc issues from the equation.
For this you can enable memory trimming on compaction with
 ovs-appctl -t <ovsdb-relay> ovsdb-server/memory-trim-on-compaction on
And request a manual compaction with:
 ovs-appctl -t <ovsdb-relay> ovsdb-server/compact OVN_Southbound
While compaction itself makes no sense for the relay, it will
call malloc_trim() in the end and will release all the unused memory
back to the system, so you can see how much memory is actually in use
by the application without the noise from the glibc not releasing
fastbins.

If it's really a memory leak, it would be great if you can build
and test with address sanitizer enabled.  This will hurt performance
to some degree, but at least it may detect leaks.

Bets regards, Ilya Maximets.
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to