> ovn scale test, 3 clustered sb, 10 sb relay server, 1000 sandbox。 > The maximum memory used by the sb relay server is over 26G > > > [root at node-4 ~]# for i in `kubectl get pods -n openstack | grep > ovsdb-sb-relay | awk '{print $1}'`; do kubectl exec -it -n openstack $i -- > top -bn1| grep ovsdb; done 2 9 root 20 0 7558724 7.2g 8408 S 0.0 1.4 33:36.03 > ovsdb-serv+ 3 9 root 20 0 8589716 8.1g 7924 S 0.0 1.6 28:19.82 ovsdb-serv+ 4 > 9 root 20 0 14.5g 14.5g 8284 R 100.0 2.9 78:00.86 ovsdb-serv+ 5 9 root 20 0 > 26.2g 24.9g 7744 R 100.0 5.0 77:10.86 ovsdb-serv+ 6 9 root 20 0 27.5g 26.7g > 8076 R 100.0 5.3 30:55.49 ovsdb-serv+ 7 10 root 20 0 8835412 8.0g 8148 R > 100.0 3.2 11:30.74 ovsdb-serv+ 8 9 root 20 0 8835424 8.0g 8396 S 6.7 1.6 > 7:44.34 ovsdb-serv+ 9 9 root 20 0 7678636 7.3g 8132 S 0.0 2.9 1:25.33 > ovsdb-serv+ 10 9 root 20 0 12.6g 10.7g 8188 R 100.0 2.1 107:08.83 ovsdb-serv+ > 11 9 root 20 0 7479468 7.1g 8344 S 80.0 1.4 45:50.82 ovsdb-serv+ >
Hi. Thanks for testing and for the report! There might be different causes for the memory consumption. Memory lick is one of them, obviously, but there are also other possibilities. Would be great if you can provide logs from these relay servers or, at least, grep messages from the 'memory' module out of them, so we can exclude common suspects for memory consumption. For example, since the cluster is fairly big (I assume, you have 1000 ovn-controllers connected, right?) it might be that if ovn-controllers are slow in receiving updates, backlog might grow on jsonrpc monitors on the side of ovsdb-relay. You may look for messages in the log that contains the 'backlog' word or other suspicious parts for memory logs. We should also try to exclude the glibc issues from the equation. For this you can enable memory trimming on compaction with ovs-appctl -t <ovsdb-relay> ovsdb-server/memory-trim-on-compaction on And request a manual compaction with: ovs-appctl -t <ovsdb-relay> ovsdb-server/compact OVN_Southbound While compaction itself makes no sense for the relay, it will call malloc_trim() in the end and will release all the unused memory back to the system, so you can see how much memory is actually in use by the application without the noise from the glibc not releasing fastbins. If it's really a memory leak, it would be great if you can build and test with address sanitizer enabled. This will hurt performance to some degree, but at least it may detect leaks. Bets regards, Ilya Maximets. _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss