On 1/14/21 2:11 AM, Ilya Maximets wrote:
Currently, ovsdb-server stores complete value for the column in a database
file and in a raft log in case this column changed.  This means that
transaction that adds, for example, one new acl to a port group creates
a log entry with all UUIDs of all existing acls + one new.  Same for
ports in logical switches and routers and more other columns with sets
in Northbound DB.

There could be thousands of acls in one port group or thousands of ports
in a single logical switch.  And the typical use case is to add one new
if we're starting a new service/VM/container or adding one new node in a
kubernetes or OpenStack cluster.  This generates huge amount of traffic
within ovsdb raft cluster, grows overall memory consumption and hurts
performance since all these UUIDs are parsed and formatted to/from json
several times and stored on disks.  And more values we have in a set -
more space a single log entry will occupy and more time it will take to
process by ovsdb-server cluster members.

Simple test:

1. Start OVN sandbox with clustered DBs:
    # make sandbox SANDBOXFLAGS='--nbdb-model=clustered --sbdb-model=clustered'

2. Run a script that creates one port group and adds 4000 acls into it:
    # cat ../memory-test.sh
    pg_name=my_port_group
    export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach --log-file 
-vsocket_util:off)
    ovn-nbctl pg-add $pg_name
    for i in $(seq 1 4000); do
      echo "Iteration: $i"
      ovn-nbctl --log acl-add $pg_name from-lport $i udp drop
    done
    ovn-nbctl acl-del $pg_name
    ovn-nbctl pg-del $pg_name
    ovs-appctl -t $(pwd)/sandbox/nb1 memory/show
    ovn-appctl -t ovn-nbctl exit
    ---

4. Check the current memory consumption of ovsdb-server processes and
    space occupied by database files:
    # ls sandbox/[ns]b*.db -alh
    # ps -eo vsz,rss,comm,cmd | egrep '=[ns]b[123].pid'

Test results with current ovsdb log format:

    On-disk Nb DB size     :  ~369 MB
    RSS of Nb ovsdb-servers:  ~2.7 GB
    Time to finish the test:  ~2m

In order to mitigate memory consumption issues and reduce computational
load on ovsdb-servers let's store diff between old and new values
instead.  This will make size of each log entry that adds single acl to
port group (or port to logical switch or anything else like that) very
small and independent from the number of already existing acls (ports,
etc.).

Added a new marker '_is_diff' into a file transaction to specify that
this transaction contains diffs instead of replacements for the existing
data.

One side effect is that this change will actually increase the size of
file transaction that removes more than a half of entries from the set,
because diff will be larger than the resulted new value.  However, such
operations are rare.

Test results with change applied:

    On-disk Nb DB size     :  ~2.7 MB  ---> reduced by 99%
    RSS of Nb ovsdb-servers:  ~580 MB  ---> reduced by 78%
    Time to finish the test:  ~1m27s   ---> reduced by 27%

After this change new ovsdb-server is still able to read old databases,
but old ovsdb-server will not be able to read new ones.
Since new servers could join ovsdb cluster dynamically it's hard to
implement any runtime mechanism to handle cases where different
versions of ovsdb-server joins the cluster.  However we still need to
handle cluster upgrades.  For this case added special command line
argument to disable new functionality.  Documentation updated with the
recommended way to upgrade the ovsdb cluster.

Acked-by: Dumitru Ceara <dce...@redhat.com>
Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
---

This still looks good to me, thanks!

Regards,
Dumitru

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to