[ovs-dev] ovsdb-idl: potential issues with the persistent UUID implementation

Terry Wilson Mon, 08 Apr 2024 06:35:01 -0700

There was a patch in OVS 3.1 that added support to the IDL code for
specifying the permanent UUID of a row when inserting [1]. There are
both C and Python implementations. Initially, I was adding support to
ovsdbapp for this feature and noticed that the Python tests for this
feature passed `new_uuid` to `insert()` as a string:


    elif name == "insert_uuid":
...
        s = txn.insert(idl.tables["simple"], new_uuid=args[0],
                       persist_uuid=True)

when the existing code in `insert()` treats `new_uuid` as a `uuid.UUID` or None.

    def insert(self, table, new_uuid=None, persist_uuid=False):
...

        if new_uuid is None:
            new_uuid = uuid.uuid4()
        row = Row(self.idl, table, new_uuid, None, persist_uuid=persist_uuid)
        table.rows[row.uuid] = row
        self._txn_rows[row.uuid] = row
        return row

It creates a `Row` with `row.uuid` of type string in this case instead
of a `uuid.UUID` and then stores it in `table.rows` under a string
key. This `Row` is fairly short-lived in that it gets deleted once we
send the request to the ovsdb-server and call
`Transaction.__disassemble()`. Then, when we get the OVSDB update
notification, the `Row` is created normally and stored in
`table.rows`. So at this point there's no real problem except that
from an interface perspective, new_uuid is the wrong type. In client
code, this can probably be problem having the returned `Row` having a
string for `uuid`.

This starts to get interesting when you try to simply fix this type
issue. For the case where `new_uuid` already exists in the DB, If you
pass `new_uuid=uuid.uuid4()` and fix the code in commit

    op["op"] = "insert"
    if row._persist_uuid:
        op["uuid"] = row.uuid

to convert `row.uuid` to a string for the JSONRPC request, the code in
insert that does `table.rows[row.uuid] = row` will then *overwrite*
the existing row, then when the transaction is disassembled, that row
will be deleted. Since the row exists in the server, the txn will
fail. So your insert() call ends up deleting the row instead of adding
a new one. The existing test case will catch this issue. The tests
currently depend on getting a response from ovsdb-server regarding the
duplicate UUID. So fixing it just in the Python implementation by
checking locally `insert()` if the UUID was already stored in
`table.rows` would be a behavior difference between the two. Trying to
just pass the old row to the transaction, since `Row._data` would no
longer be `None` would convert the insert request to an update.

Knowing that the C IDL and Python IDL are very similar, I went to
compare what they were doing. It was essentially the same. The C code
does take a `const struct uuid *` as an argument to
`ovsdb_idl_txn_insert()` like one would expect, and insert the new row
into `table->rows`. The difference is, that `table->rows` is an hmap
and hmaps allow duplicate keys. So the C IDL version is storing two
copies of the same row in the DB after `ovsdb_idl_txn_insert()` and
the one that is inserted, having been inserted into
`txn->txn_rows` is looked up in `ovsdb_idl_txn_disassemble` and
deleted directly with `hmap_remove()`. So things, perhaps
accidentally, end up working fine in practice there. One caveat would
be that any operation that tried to find the row while the transaction
was active with something like `HMAP_FOR_EACH_WITH_HASH()` which
`ovsdb_idl_get_row()`, will get the first one that is iterated over
based on the hmap implementation.

So my questions are:

Do we really mean to be storing two identical rows in the C version?

If not, should we also do local checks for the row client-side in both
versions of the IDL code and fail early? It would be an API change to
either assert or return NULL/None in the insert methods. One possible
issue with this is that it would enable a race condition: Imagine two
requests for deleting and then recreating an object with a persistent
UUID and they are routed to different worker IDLs. The recreate could
fail if it doesn't receive the update notification from ovsdb-server
with the delete in time, and one of the reasons we want to use this
feature is to avoid this kind of issue.

If we actually intend to store multiple rows with the same UUID
temporarily in the in-memory tables, then how do we want to do that in
the Python code? table.rows technically isn't a dict, it's a custom
class that subclasses dict, so if we absolutely needed to we could
override `__setitem__`, `__delitem__`, and `__getitem__` to handle
duplicates by storing/retrieving them from a list.

Ultimately, it seems like we need to a) always send the insert
transactions for persistent uuids (as the current code does) and b) do
that without passing the wrong type in the Python code or
inadvertently causing issues with blindly storing multiple rows with
the same key in the table.rows hmap in the C code. Ideas?

[1] 
https://github.com/openvswitch/ovs/commit/55b9507e6824b935ffa0205fc7c7bebfe4e54279

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] ovsdb-idl: potential issues with the persistent UUID implementation

Reply via email to