On 10/28/20 12:03 PM, Dumitru Ceara wrote: > On 10/26/20 2:42 AM, Ilya Maximets wrote: >> Under a heavy load or in a long run memory consumption if ovsdb-server >> process that is part of a RAFT cluster could reach very high values. >> From my experience it could be up to 60-100 GB. In these conditions >> it's likely that ovsdb-server will be killed by OOM-killer or just >> will not be able to work properly wasting time on processing outdated >> or unneeded data. There are 3 main parts that consumes most of the >> memory: >> >> 1. Backlog on RAFT connections between servers. >> 2. Local RAFT log. >> 3. Libc doesn't return memory back to system. >> >> Backlog could start growing if one of remote servers doesn't doing >> well and is not able to process requests in time. This sending >> backlog could contain snapshots or even just big number of big >> append requests. It could grow to tens of GBs really fast and >> most of this data might be even unnecessary if it becomes obsolete >> by one of the previous requests or if current 'term' changes and >> all the old messages should be dropped. Solution for this is >> to monitor the size of the current backlog and disconnect if it grows >> too big since it will be easier to just reconnect and send one new >> snapshot. >> >> Local RAFT log contains all the DB changes that are not part of a >> snapshot yet. Since snapshots are taken at most once in 10 minutes, >> log could grow pretty big. Up to tens of thousands of entries and >> each of these entries could be fairly big by themselves. That being >> said RAFT log could grow up to tens of GBs too. >> >> One extra point for memory consumption is that memory likely doesn't >> go away even after calling free() due to implementation of a C memory >> allocators. And this happens a lot. ovsdb-server process usually >> holds a lot of system memory even if the database is almost empty. >> This heap memory might be returned back to OS by using malloc_trim(). >> >> -- >> All of these issues was found on branch-2.13, but it always hard to >> distinguish new features from the bug fix when we're talking about >> scaling issues. Anyway, I think, it'll be good to have these >> patches (if they are any good) backorted to 2.13, especially because >> it's going to be our next LTS. Thoughts? >> > > Hi Ilya, > > I think although these might be considered features, without them there > doesn't > seem to be a way to address the memory consumption issues in production > deployments. > > In my opinion, these should definitely go to 2.13 branch too.
OK. Thanks! I'll backport them. > > Thanks, > Dumitru > >> Ilya Maximets (5): >> raft: Add log length to the memory report. >> ovsdb-server: Reclaim heap memory after compaction. >> raft: Set threshold on backlog for raft connections. >> raft: Make backlog thresholds configurable. >> raft: Avoid having more than one snapshot in-flight. >> >> NEWS | 6 +++ >> configure.ac | 1 + >> lib/jsonrpc.c | 57 ++++++++++++++++++++++++- >> lib/jsonrpc.h | 6 +++ >> ovsdb/ovsdb-server.1.in | 9 ++++ >> ovsdb/ovsdb-server.c | 41 +++++++++++++++++- >> ovsdb/ovsdb.c | 12 +++++- >> ovsdb/ovsdb.h | 3 +- >> ovsdb/raft-private.c | 1 - >> ovsdb/raft-private.h | 4 +- >> ovsdb/raft.c | 93 +++++++++++++++++++++++++++++------------ >> 11 files changed, 199 insertions(+), 34 deletions(-) >> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev