Re: [ovs-dev] [PATCH ovn] northd: Use datapath groups for southbound load balancers.

2022-08-17 Thread Mark Michelson

Thanks Ilya and Ales.

I removed the paragraph about ovn-northd from the commit message and 
replaced it with the suggested bullet point.


I then pushed the change to the main branch.

On 8/9/22 04:31, Ales Musil wrote:

On Wed, Aug 3, 2022 at 4:44 PM Ilya Maximets  wrote:


Some CMSes, like ovn-kubernetes, tend to create load balancers attached
to a big number of logical switches or routers.  For each of these
load balancers northd creates a record in Load_Balancer table of the
Southbound database with all the logical datapaths (switches, routers)
listed in the 'datapaths' column.  All these logical datapaths are
references to datapath bindings.  With large number of load balancers
like these applied to the same set of load balancers, the size of
the Southbound database can grow significantly and these references
can take up to 90% of all the traffic between Sb DB and ovn-controllers.

For example, while creating 3 load balancers (1 for tcp, udp and sctp)
in a 250-node cluster in ovn-heater cluster-density test, the database
update sent to every ovn-controller is about 40 KB in size, out of
which 36 KB are just UUIDs of 250 logical datapaths repeated 3 times.

Introducing a new column 'datapath_group' in a Load_Balancer table,
so we can create a group once and just refer to it from each load
balancer.  This saves a lot of CPU time, memory and network bandwidth.
Re-using already existing Logical_DP_Group table to store these groups.

In 250 node cluster-density test with ovn-heater that creates 30K load
balancers applied to all 250 logical switches the following improvement
was observed:

  - Southbound database size decreased from 310 MB to 118 MB.
  - CPU time on Southbound ovsdb-server process decreased by a
factor of 7, from ~35 minutes per server to ~5.
  - Memory consumption on Southbound ovsdb-server process decreased
from 12 GB (peaked at ~14) RSS down to ~1 GB (peaked at ~2).
  - Memory consumption on ovn-controller processes decreased
by 400 MB on each node, from 1300 MB to 900 MB.  Similar decrease
observed for ovn-northd as well.

We're adding some extra work to ovn-northd process with this change
to find/create datapath groups.  CPU time in the test above increased
by ~10%, but overall round trip time for changes in OVN to be
propagated to OVN controllers is still noticeably lower due to
improvements in other components like Sb DB.

Implementation details:
  - Groups are not shared between logical flows and load balancers,
so there could be duplicated groups this way, but that should
not be a big problem.  Sharing groups will require some code
re-structuring with unclear benefits.
  - ovn-controller and ovn-sbctl are parsing both the 'datapaths' column
and the 'datapath_group' to keep them backward compatible.
  - Load_Balancer table is not conditionally monitored by ovn-controller
right now, so not changing that behavior.  If conditional monitoring
will be introduced later, same considerations as for logical flows
should be applied.

Signed-off-by: Ilya Maximets 
---
  controller/lflow.c  |  44 --
  controller/ovn-controller.c |  54 
  northd/northd.c | 168 
  ovn-sb.ovsschema|   8 +-
  ovn-sb.xml  |   5 ++
  tests/ovn-northd.at |  68 ---
  utilities/ovn-sbctl.c   |  40 +
  7 files changed, 278 insertions(+), 109 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index 6055097b5..eef44389f 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -2063,6 +2063,20 @@ add_lb_vip_hairpin_flows(struct ovn_controller_lb
*lb,
  ofpbuf_uninit(&ofpacts);
  }

+static void
+add_lb_ct_snat_hairpin_for_dp(const struct ovn_controller_lb *lb,
+  const struct sbrec_datapath_binding
*datapath,
+  struct match *dp_match,
+  struct ofpbuf *dp_acts,
+  struct ovn_desired_flow_table *flow_table)
+{
+match_set_metadata(dp_match, htonll(datapath->tunnel_key));
+ofctrl_add_or_append_flow(flow_table, OFTABLE_CT_SNAT_HAIRPIN, 200,
+  lb->slb->header_.uuid.parts[0],
+  dp_match, dp_acts, &lb->slb->header_.uuid,
+  NX_CTLR_NO_METER, NULL);
+}
+
  static void
  add_lb_ct_snat_hairpin_dp_flows(struct ovn_controller_lb *lb,
  uint32_t id,
@@ -2088,12 +2102,15 @@ add_lb_ct_snat_hairpin_dp_flows(struct
ovn_controller_lb *lb,
  struct match dp_match = MATCH_CATCHALL_INITIALIZER;

  for (size_t i = 0; i < lb->slb->n_datapaths; i++) {
-match_set_metadata(&dp_match,
-   htonll(lb->slb->datapaths[i]->tunnel_key));
-ofctrl_add_or_append_flow(flow_table, OFTABLE_CT_SNAT_HAIRPIN,
200,
-  lb->slb->header_.uuid.parts[0],
-

Re: [ovs-dev] [PATCH ovn] northd: Use datapath groups for southbound load balancers.

2022-08-09 Thread Ales Musil
On Wed, Aug 3, 2022 at 4:44 PM Ilya Maximets  wrote:

> Some CMSes, like ovn-kubernetes, tend to create load balancers attached
> to a big number of logical switches or routers.  For each of these
> load balancers northd creates a record in Load_Balancer table of the
> Southbound database with all the logical datapaths (switches, routers)
> listed in the 'datapaths' column.  All these logical datapaths are
> references to datapath bindings.  With large number of load balancers
> like these applied to the same set of load balancers, the size of
> the Southbound database can grow significantly and these references
> can take up to 90% of all the traffic between Sb DB and ovn-controllers.
>
> For example, while creating 3 load balancers (1 for tcp, udp and sctp)
> in a 250-node cluster in ovn-heater cluster-density test, the database
> update sent to every ovn-controller is about 40 KB in size, out of
> which 36 KB are just UUIDs of 250 logical datapaths repeated 3 times.
>
> Introducing a new column 'datapath_group' in a Load_Balancer table,
> so we can create a group once and just refer to it from each load
> balancer.  This saves a lot of CPU time, memory and network bandwidth.
> Re-using already existing Logical_DP_Group table to store these groups.
>
> In 250 node cluster-density test with ovn-heater that creates 30K load
> balancers applied to all 250 logical switches the following improvement
> was observed:
>
>  - Southbound database size decreased from 310 MB to 118 MB.
>  - CPU time on Southbound ovsdb-server process decreased by a
>factor of 7, from ~35 minutes per server to ~5.
>  - Memory consumption on Southbound ovsdb-server process decreased
>from 12 GB (peaked at ~14) RSS down to ~1 GB (peaked at ~2).
>  - Memory consumption on ovn-controller processes decreased
>by 400 MB on each node, from 1300 MB to 900 MB.  Similar decrease
>observed for ovn-northd as well.
>
> We're adding some extra work to ovn-northd process with this change
> to find/create datapath groups.  CPU time in the test above increased
> by ~10%, but overall round trip time for changes in OVN to be
> propagated to OVN controllers is still noticeably lower due to
> improvements in other components like Sb DB.
>
> Implementation details:
>  - Groups are not shared between logical flows and load balancers,
>so there could be duplicated groups this way, but that should
>not be a big problem.  Sharing groups will require some code
>re-structuring with unclear benefits.
>  - ovn-controller and ovn-sbctl are parsing both the 'datapaths' column
>and the 'datapath_group' to keep them backward compatible.
>  - Load_Balancer table is not conditionally monitored by ovn-controller
>right now, so not changing that behavior.  If conditional monitoring
>will be introduced later, same considerations as for logical flows
>should be applied.
>
> Signed-off-by: Ilya Maximets 
> ---
>  controller/lflow.c  |  44 --
>  controller/ovn-controller.c |  54 
>  northd/northd.c | 168 
>  ovn-sb.ovsschema|   8 +-
>  ovn-sb.xml  |   5 ++
>  tests/ovn-northd.at |  68 ---
>  utilities/ovn-sbctl.c   |  40 +
>  7 files changed, 278 insertions(+), 109 deletions(-)
>
> diff --git a/controller/lflow.c b/controller/lflow.c
> index 6055097b5..eef44389f 100644
> --- a/controller/lflow.c
> +++ b/controller/lflow.c
> @@ -2063,6 +2063,20 @@ add_lb_vip_hairpin_flows(struct ovn_controller_lb
> *lb,
>  ofpbuf_uninit(&ofpacts);
>  }
>
> +static void
> +add_lb_ct_snat_hairpin_for_dp(const struct ovn_controller_lb *lb,
> +  const struct sbrec_datapath_binding
> *datapath,
> +  struct match *dp_match,
> +  struct ofpbuf *dp_acts,
> +  struct ovn_desired_flow_table *flow_table)
> +{
> +match_set_metadata(dp_match, htonll(datapath->tunnel_key));
> +ofctrl_add_or_append_flow(flow_table, OFTABLE_CT_SNAT_HAIRPIN, 200,
> +  lb->slb->header_.uuid.parts[0],
> +  dp_match, dp_acts, &lb->slb->header_.uuid,
> +  NX_CTLR_NO_METER, NULL);
> +}
> +
>  static void
>  add_lb_ct_snat_hairpin_dp_flows(struct ovn_controller_lb *lb,
>  uint32_t id,
> @@ -2088,12 +2102,15 @@ add_lb_ct_snat_hairpin_dp_flows(struct
> ovn_controller_lb *lb,
>  struct match dp_match = MATCH_CATCHALL_INITIALIZER;
>
>  for (size_t i = 0; i < lb->slb->n_datapaths; i++) {
> -match_set_metadata(&dp_match,
> -   htonll(lb->slb->datapaths[i]->tunnel_key));
> -ofctrl_add_or_append_flow(flow_table, OFTABLE_CT_SNAT_HAIRPIN,
> 200,
> -  lb->slb->header_.uuid.parts[0],
> -  &dp_match, &dp_acts,
> &lb->slb->

Re: [ovs-dev] [PATCH ovn] northd: Use datapath groups for southbound load balancers.

2022-08-04 Thread Ilya Maximets
On 8/3/22 16:43, Ilya Maximets wrote:
> Some CMSes, like ovn-kubernetes, tend to create load balancers attached
> to a big number of logical switches or routers.  For each of these
> load balancers northd creates a record in Load_Balancer table of the
> Southbound database with all the logical datapaths (switches, routers)
> listed in the 'datapaths' column.  All these logical datapaths are
> references to datapath bindings.  With large number of load balancers
> like these applied to the same set of load balancers, the size of
> the Southbound database can grow significantly and these references
> can take up to 90% of all the traffic between Sb DB and ovn-controllers.
> 
> For example, while creating 3 load balancers (1 for tcp, udp and sctp)
> in a 250-node cluster in ovn-heater cluster-density test, the database
> update sent to every ovn-controller is about 40 KB in size, out of
> which 36 KB are just UUIDs of 250 logical datapaths repeated 3 times.
> 
> Introducing a new column 'datapath_group' in a Load_Balancer table,
> so we can create a group once and just refer to it from each load
> balancer.  This saves a lot of CPU time, memory and network bandwidth.
> Re-using already existing Logical_DP_Group table to store these groups.
> 
> In 250 node cluster-density test with ovn-heater that creates 30K load
> balancers applied to all 250 logical switches the following improvement
> was observed:
> 
>  - Southbound database size decreased from 310 MB to 118 MB.
>  - CPU time on Southbound ovsdb-server process decreased by a
>factor of 7, from ~35 minutes per server to ~5.
>  - Memory consumption on Southbound ovsdb-server process decreased
>from 12 GB (peaked at ~14) RSS down to ~1 GB (peaked at ~2).
>  - Memory consumption on ovn-controller processes decreased
>by 400 MB on each node, from 1300 MB to 900 MB.  Similar decrease
>observed for ovn-northd as well.
> 
> We're adding some extra work to ovn-northd process with this change
> to find/create datapath groups.  CPU time in the test above increased
> by ~10%, but overall round trip time for changes in OVN to be
> propagated to OVN controllers is still noticeably lower due to
> improvements in other components like Sb DB.

Actually, I was using the data from the prliminary run for this
commit message and also looked into a wrong place while checking
northd performance numbers :/ .  Re-run with the version of the
code in this patch shows 12% performance improvement on ovn-northd
looking at actual length of poll intervals.  So, northd-related
paragraph above shuold be thrown away and replaced with one more
bullet in observed improvements:

  - Poll intervals on ovn-northd reduced by 12%.



Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] northd: Use datapath groups for southbound load balancers.

2022-08-03 Thread Ilya Maximets
Some CMSes, like ovn-kubernetes, tend to create load balancers attached
to a big number of logical switches or routers.  For each of these
load balancers northd creates a record in Load_Balancer table of the
Southbound database with all the logical datapaths (switches, routers)
listed in the 'datapaths' column.  All these logical datapaths are
references to datapath bindings.  With large number of load balancers
like these applied to the same set of load balancers, the size of
the Southbound database can grow significantly and these references
can take up to 90% of all the traffic between Sb DB and ovn-controllers.

For example, while creating 3 load balancers (1 for tcp, udp and sctp)
in a 250-node cluster in ovn-heater cluster-density test, the database
update sent to every ovn-controller is about 40 KB in size, out of
which 36 KB are just UUIDs of 250 logical datapaths repeated 3 times.

Introducing a new column 'datapath_group' in a Load_Balancer table,
so we can create a group once and just refer to it from each load
balancer.  This saves a lot of CPU time, memory and network bandwidth.
Re-using already existing Logical_DP_Group table to store these groups.

In 250 node cluster-density test with ovn-heater that creates 30K load
balancers applied to all 250 logical switches the following improvement
was observed:

 - Southbound database size decreased from 310 MB to 118 MB.
 - CPU time on Southbound ovsdb-server process decreased by a
   factor of 7, from ~35 minutes per server to ~5.
 - Memory consumption on Southbound ovsdb-server process decreased
   from 12 GB (peaked at ~14) RSS down to ~1 GB (peaked at ~2).
 - Memory consumption on ovn-controller processes decreased
   by 400 MB on each node, from 1300 MB to 900 MB.  Similar decrease
   observed for ovn-northd as well.

We're adding some extra work to ovn-northd process with this change
to find/create datapath groups.  CPU time in the test above increased
by ~10%, but overall round trip time for changes in OVN to be
propagated to OVN controllers is still noticeably lower due to
improvements in other components like Sb DB.

Implementation details:
 - Groups are not shared between logical flows and load balancers,
   so there could be duplicated groups this way, but that should
   not be a big problem.  Sharing groups will require some code
   re-structuring with unclear benefits.
 - ovn-controller and ovn-sbctl are parsing both the 'datapaths' column
   and the 'datapath_group' to keep them backward compatible.
 - Load_Balancer table is not conditionally monitored by ovn-controller
   right now, so not changing that behavior.  If conditional monitoring
   will be introduced later, same considerations as for logical flows
   should be applied.

Signed-off-by: Ilya Maximets 
---
 controller/lflow.c  |  44 --
 controller/ovn-controller.c |  54 
 northd/northd.c | 168 
 ovn-sb.ovsschema|   8 +-
 ovn-sb.xml  |   5 ++
 tests/ovn-northd.at |  68 ---
 utilities/ovn-sbctl.c   |  40 +
 7 files changed, 278 insertions(+), 109 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index 6055097b5..eef44389f 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -2063,6 +2063,20 @@ add_lb_vip_hairpin_flows(struct ovn_controller_lb *lb,
 ofpbuf_uninit(&ofpacts);
 }
 
+static void
+add_lb_ct_snat_hairpin_for_dp(const struct ovn_controller_lb *lb,
+  const struct sbrec_datapath_binding *datapath,
+  struct match *dp_match,
+  struct ofpbuf *dp_acts,
+  struct ovn_desired_flow_table *flow_table)
+{
+match_set_metadata(dp_match, htonll(datapath->tunnel_key));
+ofctrl_add_or_append_flow(flow_table, OFTABLE_CT_SNAT_HAIRPIN, 200,
+  lb->slb->header_.uuid.parts[0],
+  dp_match, dp_acts, &lb->slb->header_.uuid,
+  NX_CTLR_NO_METER, NULL);
+}
+
 static void
 add_lb_ct_snat_hairpin_dp_flows(struct ovn_controller_lb *lb,
 uint32_t id,
@@ -2088,12 +2102,15 @@ add_lb_ct_snat_hairpin_dp_flows(struct 
ovn_controller_lb *lb,
 struct match dp_match = MATCH_CATCHALL_INITIALIZER;
 
 for (size_t i = 0; i < lb->slb->n_datapaths; i++) {
-match_set_metadata(&dp_match,
-   htonll(lb->slb->datapaths[i]->tunnel_key));
-ofctrl_add_or_append_flow(flow_table, OFTABLE_CT_SNAT_HAIRPIN, 200,
-  lb->slb->header_.uuid.parts[0],
-  &dp_match, &dp_acts, &lb->slb->header_.uuid,
-  NX_CTLR_NO_METER, NULL);
+add_lb_ct_snat_hairpin_for_dp(lb, lb->slb->datapaths[i],
+  &dp_match, &dp_acts, flow_table);
+}
+if (lb->slb->datap