Thanks Ilya!

On Fri, Sep 8, 2023 at 7:18 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
>
> While sending a reply to the monitor_cond_since request, server
> includes the last transaction ID.  And it sends new IDs with each
> subsequent update.  Current implementation doesn't use the one
> supplied with a monitor reply, and only takes into account IDs
> provided with monitor updates.  That may cause various issues:
>
> 1. Performance: During initialization, the last-id is set to zero.
>    If re-connection will happen after receiving a monitor reply,
>    but before any monitor update, the client will send a new
>    monitor request with an all-zero last-id and will re-download
>    the whole database again.
>
> 2. Data inconsistency: Assuming one of the clients sends a
>    transaction, but our python client disconnects before receiving
>    a monitor update for this transaction.  The las-id will point

nit: s/las-id/last-id

This example clearly shows the problem, but I think it may be even more
simplified: the client doesn't have to send a transaction to hit the
problem. If there are any transactions (sent by any clients) committed to
the DB between the disconnection and reconnection of the client, then there
will be such an inconsistency problem if a second disconnection (and later
reconnect again) happens immediately after receiving the monitor reply.

Acked-by: Han Zhou <hz...@ovn.org>

>    to a database state before this transaction.  On re-connection,
>    this last-id will be sent and the monitor reply will contain
>    a diff with a new data from that transaction.  But if another
>    disconnection happens right after that, on second re-connection
>    our python client will send another monitor_cond_since with
>    exactly the same last-id.  That will cause receiving the same
>    set of updates again.  And since it's an update2 message with
>    a diff of the data, the client will remove previously applied
>    result of the transaction.  At this point it will have a
>    different database view with the server potentially leading
>    to all sorts of data inconsistency problems.
>
> Fix that by always updating the last-id from the latest monitor
> reply.
>
> Fixes: 46d44cf3be0d ("python: idl: Add monitor_cond_since support.")
> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
> ---
>  python/ovs/db/idl.py |  1 +
>  tests/ovsdb-idl.at   | 22 +++++++++++++++++++++-
>  2 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
> index 9fc2159b0..16ece0334 100644
> --- a/python/ovs/db/idl.py
> +++ b/python/ovs/db/idl.py
> @@ -494,6 +494,7 @@ class Idl(object):
>                          if not msg.result[0]:
>                              self.__clear()
>                          self.__parse_update(msg.result[2], OVSDB_UPDATE3)
> +                        self.last_id = msg.result[1]
>                      elif self.state ==
self.IDL_S_DATA_MONITOR_COND_REQUESTED:
>                          self.__clear()
>                          self.__parse_update(msg.result, OVSDB_UPDATE2)
> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
> index df5a9d2fd..1028b0237 100644
> --- a/tests/ovsdb-idl.at
> +++ b/tests/ovsdb-idl.at
> @@ -2332,6 +2332,23 @@ CHECK_STREAM_OPEN_BLOCK([Python3], [$PYTHON3
$srcdir/test-stream.py],
>  CHECK_STREAM_OPEN_BLOCK([Python3], [$PYTHON3 $srcdir/test-stream.py],
>                          [ssl6], [[[::1]]])
>
> +dnl OVSDB_CLUSTER_CHECK_MONITOR_COND_SINCE_TXN_IDS(LOG)
> +dnl
> +dnl Looks up transaction IDs in the log of OVSDB client application.
> +dnl All-zero UUID should not be sent within a monitor request more than
once,
> +dnl unless some database requests were lost (not replied).
> +m4_define([OVSDB_CLUSTER_CHECK_MONITOR_COND_SINCE_TXN_IDS],
> +[
> +   requests=$(grep -c 'send request' $1)
> +   replies=$(grep -c 'received reply' $1)
> +
> +   if test "$requests" -eq "$replies"; then
> +     AT_CHECK([grep 'monitor_cond_since' $1 \
> +                | grep -c "00000000-0000-0000-0000-000000000000" | tr -d
'\n'],
> +              [0], [1])
> +   fi
> +])
> +
>  # same as OVSDB_CHECK_IDL but uses Python IDL implementation with tcp
>  # with multiple remotes to assert the idl connects to the leader of the
Raft cluster
>  m4_define([OVSDB_CHECK_IDL_LEADER_ONLY_PY],
> @@ -2347,10 +2364,11 @@ m4_define([OVSDB_CHECK_IDL_LEADER_ONLY_PY],
>     pids=$(cat s2.pid s3.pid s1.pid | tr '\n' ',')
>     echo $pids
>     AT_CHECK([$PYTHON3 $srcdir/test-ovsdb.py  -t30 idl-cluster
$srcdir/idltest.ovsschema $remotes $pids $3],
> -        [0], [stdout], [ignore])
> +        [0], [stdout], [stderr])
>     remote=$(ovsdb_cluster_leader $remotes "idltest")
>     leader=$(echo $remote | cut -d'|' -f 1)
>     AT_CHECK([grep -F -- "${leader}" stdout], [0], [ignore])
> +   OVSDB_CLUSTER_CHECK_MONITOR_COND_SINCE_TXN_IDS([stderr])
>     AT_CLEANUP])
>
>  OVSDB_CHECK_IDL_LEADER_ONLY_PY([Check Python IDL connects to leader], 3,
['remote'])
> @@ -2393,6 +2411,7 @@ m4_define([OVSDB_CHECK_CLUSTER_IDL_C],
>     AT_CHECK([sort stdout | uuidfilt]m4_if([$7],,, [[| $7]]),
>              [0], [$5])
>     m4_ifval([$8], [AT_CHECK([grep '$8' stderr], [1])], [], [])
> +   OVSDB_CLUSTER_CHECK_MONITOR_COND_SINCE_TXN_IDS([stderr])
>     AT_CLEANUP])
>
>  # Same as OVSDB_CHECK_CLUSTER_IDL_C but uses the Python IDL
implementation.
> @@ -2413,6 +2432,7 @@ m4_define([OVSDB_CHECK_CLUSTER_IDL_PY],
>     AT_CHECK([sort stdout | uuidfilt]m4_if([$7],,, [[| $7]]),
>              [0], [$5])
>     m4_if([$8], [AT_CHECK([grep '$8' stderr], [1])], [], [])
> +   OVSDB_CLUSTER_CHECK_MONITOR_COND_SINCE_TXN_IDS([stderr])
>     AT_CLEANUP])
>
>  m4_define([OVSDB_CHECK_CLUSTER_IDL],
> --
> 2.41.0
>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to