Re: [ovs-dev] [RFC v2 0/7] Fast OVSDB resync after restart or failover.

2019-02-15 Thread Han Zhou
On Tue, Jan 29, 2019 at 12:01 PM Han Zhou  wrote:
>
> v1 -> v2:
> - Fixed a bug in json cache handling in patch 3/7.
> - Fixed XXXs.
> - Other minor improvements.
>
> ---
> In scalability test with ovn-scale-test, ovsdb-server SB load is not a
> problem at least with 1k HVs. However, if we restart the ovsdb-server,
> depending on the number of HVs and scale of logical objects, e.g. the
> number of logical ports, ovsdb-server of SB become an obvious bottleneck.
>
> In our test with 1k HVs and 20k logical ports (200 lport * 100 lswitches
> connected by one single logical router). Restarting ovsdb-server of SB
> resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs (and
> northd) are reconnecting and resyncing the big amount of data at the same
> time.
>
> Similar problem would happen in failover scenario. With active-active
> cluster, the problem can be aleviated slightly, because only 1/3 (assuming
> it is 3-node cluster) of the HVs will need to resync data from new servers,
> but it is still a serious problem.
>
> For detailed discussions for the problem and solutions, see:
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047591.html
>
> The patches implements the proposal in that discussion. It introduces
> a new method monitor_cond_since to enable client to request changes that
> happened after a specific point so that the data has been cached already
> in client are not re-transfered.
>
> The current patches supports all 3 modes of ovsdb-server, but only clustered
> mode can benefit from it, since it is the only one that supports transaction
> id out of the box.
>
> Han Zhou (7):
>   ovsdb-client.c: fix typo
>   ovsdb_monitor: Fix style of prototypes.
>   ovsdb-monitor: Refactor ovsdb monitor implementation.
>   ovsdb-server: Transaction history tracking.
>   ovsdb-monitor: Support monitor_cond_since.
>   ovsdb-idl.c: Support monitor_cond_since method in C IDL.
>   ovsdb-idl.c: Fast resync from server when connection reset.
>
>  Documentation/ref/ovsdb-server.7.rst |  78 +-
>  lib/ovsdb-idl.c  | 229 +
>  ovsdb/jsonrpc-server.c   | 101 ++--
>  ovsdb/monitor.c  | 467 
> +--
>  ovsdb/monitor.h  |  78 +++---
>  ovsdb/ovsdb-client.c | 106 +++-
>  ovsdb/ovsdb-server.c |  11 +
>  ovsdb/ovsdb.c|   3 +
>  ovsdb/ovsdb.h|  10 +
>  ovsdb/transaction.c  | 117 -
>  ovsdb/transaction.h  |   4 +
>  tests/ovsdb-monitor.at   | 294 ++
>  12 files changed, 1201 insertions(+), 297 deletions(-)
>
> --
> 2.1.0
>

Please review the formal patch instead:
https://patchwork.ozlabs.org/project/openvswitch/list/?series=92329
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC v2 0/7] Fast OVSDB resync after restart or failover.

2019-01-29 Thread Han Zhou
v1 -> v2:
- Fixed a bug in json cache handling in patch 3/7.
- Fixed XXXs.
- Other minor improvements.

---
In scalability test with ovn-scale-test, ovsdb-server SB load is not a
problem at least with 1k HVs. However, if we restart the ovsdb-server,
depending on the number of HVs and scale of logical objects, e.g. the
number of logical ports, ovsdb-server of SB become an obvious bottleneck.

In our test with 1k HVs and 20k logical ports (200 lport * 100 lswitches
connected by one single logical router). Restarting ovsdb-server of SB
resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs (and
northd) are reconnecting and resyncing the big amount of data at the same
time.

Similar problem would happen in failover scenario. With active-active
cluster, the problem can be aleviated slightly, because only 1/3 (assuming
it is 3-node cluster) of the HVs will need to resync data from new servers,
but it is still a serious problem.

For detailed discussions for the problem and solutions, see:
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047591.html

The patches implements the proposal in that discussion. It introduces
a new method monitor_cond_since to enable client to request changes that
happened after a specific point so that the data has been cached already
in client are not re-transfered.

The current patches supports all 3 modes of ovsdb-server, but only clustered
mode can benefit from it, since it is the only one that supports transaction
id out of the box.

Han Zhou (7):
  ovsdb-client.c: fix typo
  ovsdb_monitor: Fix style of prototypes.
  ovsdb-monitor: Refactor ovsdb monitor implementation.
  ovsdb-server: Transaction history tracking.
  ovsdb-monitor: Support monitor_cond_since.
  ovsdb-idl.c: Support monitor_cond_since method in C IDL.
  ovsdb-idl.c: Fast resync from server when connection reset.

 Documentation/ref/ovsdb-server.7.rst |  78 +-
 lib/ovsdb-idl.c  | 229 +
 ovsdb/jsonrpc-server.c   | 101 ++--
 ovsdb/monitor.c  | 467 +--
 ovsdb/monitor.h  |  78 +++---
 ovsdb/ovsdb-client.c | 106 +++-
 ovsdb/ovsdb-server.c |  11 +
 ovsdb/ovsdb.c|   3 +
 ovsdb/ovsdb.h|  10 +
 ovsdb/transaction.c  | 117 -
 ovsdb/transaction.h  |   4 +
 tests/ovsdb-monitor.at   | 294 ++
 12 files changed, 1201 insertions(+), 297 deletions(-)

-- 
2.1.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev