Currently, database schema conversion in case of clustered database
produces a transaction record with both new schema and converted
database data.  So, the sequence of events is following:

  1. Get the new schema.
  2. Convert the database to a new schema.
  3. Translate the newly converted database into JSON.
  4. Write the schema + data JSON to the storage.
  5. Destroy converted version of a database.
  6. Read schema + data JSON from the storage and parse.
  7. Create a new database from a parsed database data.
  8. Replace current database with the new one.

Most of these steps are very computationally expensive.  Also,
conversion to/from JSON is much more expensive than direct database
conversion with ovsdb_convert() that can make use of shallow data
copies.

Instead of doing all that, let's make use of previously introduced
ability to not write the converted data into the storage.  The process
will look like this then:

  1. Get the new schema.
  2. Convert the database to a new schema
     (to verify that it is possible).
  3. Write the schema to the storage.
  4. Destroy converted version of a database.
  5. Read the new schema from the storage and parse.
  6. Convert the database to a new schema.
  7. Replace current database with the new one.

Most of the operations here are performed on the small schema object,
instead of the actual database data.  Two remaining data operations
(actual conversion) are noticeably faster than conversion to/from
JSON due to reference counting and shallow data copies.

Steps 4-6 can be optimized later to not convert twice on the
process that initiates the conversion.

The change results in following performance improvements in conversion
of OVN_Southbound database schema from version 20.23.0 to 20.27.0
(measured on a single-server RAFT cluster with no clients):

          |       Before                |         After
          +---------+-------------------+---------+------------------
  DB size |  Total  | Max poll interval |  Total  | Max poll interval
  --------+---------+-------------------+---------+------------------
   542 MB | 47 sec. |     26 sec.       | 15 sec. |     10 sec.
   225 MB | 19 sec. |     10 sec.       |  6 sec. |    4.5 sec.

542 MB database had 19.5 M atoms, 225 MB database had 7.5 M atoms.

Overall performance improvement is about 3x.

Also, note that before this change database conversion basically
doubles the database file on disk.  Now it only writes a small
schema JSON.

Since the change requires backward-incompatible database file format
changes, documentation is updated on how to perform an upgrade.
Handled the same way as we did for the previous incompatible format
change in 2.15 (column diffs).

Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2022-December/052140.html
Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
---
 Documentation/ref/ovsdb.7.rst | 63 +++++++++++++++++++++++++++++++++++
 NEWS                          | 10 ++++++
 ovsdb/ovsdb-server.c          |  7 ++++
 ovsdb/ovsdb.c                 | 34 +++++++++++++++++++
 ovsdb/ovsdb.h                 |  3 ++
 ovsdb/trigger.c               | 11 ++++--
 6 files changed, 125 insertions(+), 3 deletions(-)

diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst
index 980ba29e7..84b153d24 100644
--- a/Documentation/ref/ovsdb.7.rst
+++ b/Documentation/ref/ovsdb.7.rst
@@ -213,6 +213,12 @@ Open vSwitch 2.6 introduced support for the active-backup 
service model.
    `Upgrading from version 2.14 and earlier to 2.15 and later`_ and
    `Downgrading from version 2.15 and later to 2.14 and earlier`_.
 
+   Another change happened in version 3.2.  To upgrade/downgrade the
+   ``ovsdb-server`` processes across this version follow the instructions
+   described under
+   `Upgrading from version 3.1 and earlier to 3.2 and later`_ and
+   `Downgrading from version 3.2 and later to 3.1 and earlier`_.
+
 Clustered Database Service Model
 --------------------------------
 
@@ -287,6 +293,12 @@ schema, which is covered later under `Upgrading or 
Downgrading a Database`_.)
    `Upgrading from version 2.14 and earlier to 2.15 and later`_ and
    `Downgrading from version 2.15 and later to 2.14 and earlier`_.
 
+   Another change happened in version 3.2.  To upgrade/downgrade the
+   ``ovsdb-server`` processes across this version follow the instructions
+   described under
+   `Upgrading from version 3.1 and earlier to 3.2 and later`_ and
+   `Downgrading from version 3.2 and later to 3.1 and earlier`_.
+
 Clustered OVSDB does not support the OVSDB "ephemeral columns" feature.
 ``ovsdb-tool`` and ``ovsdb-client`` change ephemeral columns into persistent
 ones when they work with schemas for clustered databases.  Future versions of
@@ -341,6 +353,57 @@ For all service models it's required to:
 
 3. Downgrade and restart ``ovsdb-server`` processes.
 
+Upgrading from version 3.1 and earlier to 3.2 and later
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There is another change of a database file format in version 3.2 that doesn't
+allow older versions of ``ovsdb-server`` to read the database file modified by
+the ``ovsdb-server`` version 3.2 or later.  This also affects runtime
+communications between servers in **cluster** service models.  To upgrade the
+``ovsdb-server`` processes from one version of Open vSwitch (3.1 or earlier) to
+another (3.2 or higher) instructions below should be followed. (This is
+different from upgrading a database schema, which is covered later under
+`Upgrading or Downgrading a Database`_.)
+
+In case of **standalone** or **active-backup** service model no special
+handling during upgrade is required.
+
+For the **cluster** service model recommended upgrade strategy is following:
+
+1. Upgrade processes one at a time.  Each ``ovsdb-server`` process after
+   upgrade should be started with ``--disable-file-no-data-conversion`` command
+   line argument.
+
+2. When all ``ovsdb-server`` processes upgraded, use ``ovs-appctl`` to invoke
+   ``ovsdb/file/no-data-conversion-enable`` command on each of them or restart
+   all ``ovsdb-server`` processes one at a time without
+   ``--disable-file-no-data-conversion`` command line option.
+
+Downgrading from version 3.2 and later to 3.1 and earlier
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Similar to upgrading covered under `Upgrading from version 3.1 and earlier to
+3.2 and later`_, downgrading from the ``ovsdb-server`` version 3.2 and later
+to 3.1 and earlier requires additional steps. (This is different from
+upgrading a database schema, which is covered later under
+`Upgrading or Downgrading a Database`_.)
+
+For all service models it's required to:
+
+1. Compact all database files via ``ovsdb-server/compact`` command with
+   ``ovs-appctl`` utility.  This should be done for each involved
+   ``ovsdb-server`` process separately (single process for **standalone**
+   service model, all involved processes for **active-backup** and **cluster**
+   service models).
+
+2. Stop all ``ovsdb-server`` processes.  Make sure that no database schema
+   conversion operations were performed between steps 1 and 2.  For
+   **standalone** and **active-backup** service models, the database compaction
+   can be performed after stopping all the processes instead with the
+   ``ovsdb-tool compact`` command.
+
+3. Downgrade and restart ``ovsdb-server`` processes.
+
 Understanding Cluster Consistency
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/NEWS b/NEWS
index 8771ee618..cf9df6106 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,15 @@
 Post-v3.1.0
 --------------------
+   - OVSDB:
+     * Changed format in which ovsdb schema conversion operations are stored in
+       clustered database files.  Such operations now may not contain the data,
+       only the new schema.  This allows to significantly improve the schema
+       conversion performance.
+       New ovsdb-server process will be able to read old database format, but
+       old processes will *fail* to read database created by the new one, if
+       conversion operation is present.  For the cluster service model follow
+       upgrade instructions in 'Upgrading from version 3.1 and earlier to 3.2
+       and later' section of ovsdb(7).
    - IPFIX template and statistics intervals can now be configured through two
      new options in the IPFIX table: 'template_interval' and 'stats_interval'.
    - Linux kernel datapath:
diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
index 91c284e99..b64814076 100644
--- a/ovsdb/ovsdb-server.c
+++ b/ovsdb/ovsdb-server.c
@@ -1971,6 +1971,7 @@ parse_options(int argc, char *argv[],
         OPT_ACTIVE,
         OPT_NO_DBS,
         OPT_FILE_COLUMN_DIFF,
+        OPT_FILE_NO_DATA_CONVERSION,
         VLOG_OPTION_ENUMS,
         DAEMON_OPTION_ENUMS,
         SSL_OPTION_ENUMS,
@@ -1996,6 +1997,8 @@ parse_options(int argc, char *argv[],
         {"active", no_argument, NULL, OPT_ACTIVE},
         {"no-dbs", no_argument, NULL, OPT_NO_DBS},
         {"disable-file-column-diff", no_argument, NULL, OPT_FILE_COLUMN_DIFF},
+        {"disable-file-no-data-conversion", no_argument, NULL,
+         OPT_FILE_NO_DATA_CONVERSION},
         {NULL, 0, NULL, 0},
     };
     char *short_options = ovs_cmdl_long_options_to_short_options(long_options);
@@ -2092,6 +2095,10 @@ parse_options(int argc, char *argv[],
             ovsdb_file_column_diff_disable();
             break;
 
+        case OPT_FILE_NO_DATA_CONVERSION:
+            ovsdb_no_data_conversion_disable();
+            break;
+
         case '?':
             exit(EXIT_FAILURE);
 
diff --git a/ovsdb/ovsdb.c b/ovsdb/ovsdb.c
index afec96264..f67b836d7 100644
--- a/ovsdb/ovsdb.c
+++ b/ovsdb/ovsdb.c
@@ -39,6 +39,7 @@
 #include "transaction.h"
 #include "transaction-forward.h"
 #include "trigger.h"
+#include "unixctl.h"
 
 #include "openvswitch/vlog.h"
 VLOG_DEFINE_THIS_MODULE(ovsdb);
@@ -177,6 +178,39 @@ ovsdb_is_valid_version(const char *s)
     return ovsdb_parse_version(s, &version);
 }
 
+/* If set to 'true', database schema conversion operations in the storage
+ * may not contain the converted data, only the schema.  Currently affects
+ * only the clustered storage. */
+static bool use_no_data_conversion = true;
+
+static void
+ovsdb_no_data_conversion_enable(struct unixctl_conn *conn, int argc OVS_UNUSED,
+                                const char *argv[] OVS_UNUSED,
+                                void *arg OVS_UNUSED)
+{
+    use_no_data_conversion = true;
+    unixctl_command_reply(conn, NULL);
+}
+
+void
+ovsdb_no_data_conversion_disable(void)
+{
+    if (!use_no_data_conversion) {
+        return;
+    }
+    use_no_data_conversion = false;
+    unixctl_command_register("ovsdb/file/no-data-conversion-enable", "",
+                             0, 0, ovsdb_no_data_conversion_enable, NULL);
+}
+
+/* Returns true if the database storage allows conversion records without
+ * data specified. */
+bool
+ovsdb_conversion_with_no_data_supported(const struct ovsdb *db)
+{
+    return use_no_data_conversion && ovsdb_storage_is_clustered(db->storage);
+}
+
 /* Returns the number of tables in 'schema''s root set. */
 static size_t
 root_set_size(const struct ovsdb_schema *schema)
diff --git a/ovsdb/ovsdb.h b/ovsdb/ovsdb.h
index 13d8bf407..d45630e8f 100644
--- a/ovsdb/ovsdb.h
+++ b/ovsdb/ovsdb.h
@@ -132,6 +132,9 @@ extern size_t n_weak_refs;
 struct ovsdb *ovsdb_create(struct ovsdb_schema *, struct ovsdb_storage *);
 void ovsdb_destroy(struct ovsdb *);
 
+void ovsdb_no_data_conversion_disable(void);
+bool ovsdb_conversion_with_no_data_supported(const struct ovsdb *);
+
 void ovsdb_get_memory_usage(const struct ovsdb *, struct simap *usage);
 
 struct ovsdb_table *ovsdb_get_table(const struct ovsdb *, const char *);
diff --git a/ovsdb/trigger.c b/ovsdb/trigger.c
index 3c93ae580..0706d66cc 100644
--- a/ovsdb/trigger.c
+++ b/ovsdb/trigger.c
@@ -280,9 +280,14 @@ ovsdb_trigger_try(struct ovsdb_trigger *t, long long int 
now)
                 return false;
             }
 
-            /* Make the new copy into a transaction log record. */
-            struct json *txn_json = ovsdb_to_txn_json(
-                newdb, "converted by ovsdb-server", true);
+            struct json *txn_json;
+            if (ovsdb_conversion_with_no_data_supported(t->db)) {
+                txn_json = json_null_create();
+            } else {
+                /* Make the new copy into a transaction log record. */
+                txn_json = ovsdb_to_txn_json(
+                                newdb, "converted by ovsdb-server", true);
+            }
 
             /* Propose the change. */
             t->progress = ovsdb_txn_propose_schema_change(
-- 
2.39.2

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to