Re: [ovs-dev] [PATCH ovn] lflow: Use learn() action to generate LB hairpin reply flows.

2021-02-08 Thread Numan Siddique
On Mon, Feb 8, 2021 at 4:02 PM Dumitru Ceara  wrote:
>
> On 2/8/21 11:09 AM, Numan Siddique wrote:
> > On Sat, Feb 6, 2021 at 4:00 AM Dumitru Ceara  wrote:
> >>
> >> The main trait of load balancer hairpin traffic is that it never leaves
> >> the local hypervisor.  Essentially this means that only hairpin
> >> openflows installed for logical switches that have at least one logical
> >> switch port bound locally can ever be hit.
> >>
> >> Until now, if a load balancer was applied on multiple logical switches
> >> that are connected through a distributed router, ovn-controller would
> >> install flows to detect hairpin replies for all logical switches. In
> >> practice this leads to a very high number of openflows out of which
> >> most will never be used.
> >>
> >> Instead we now use an additional action, learn(), on flows that match on
> >> packets that create the hairpin session.  The learn() action will then
> >> generate the necessary flows to handle hairpin replies, but only for
> >> the local datapaths which actually generate hairpin traffic.
> >>
> >> For example, simulating how ovn-k8s uses load balancer for services,
> >> in a "switch per node" scenario, the script below would generate
> >> 10K (n_nodes * n_vips * n_backends) openflows on every node in table=69
> >> (hairpin reply).  With this patch the maximum number of openflows that
> >> can be created for hairpin replies is 200 (n_vips * n_backends).
> >>
> >> In general, for deployments that leverage switch-per-node topologies,
> >> the number of openflows is reduced by a factor of N, where N is the
> >> number of nodes.
> >>
> >>$ cat lbs.sh
> >>NODES=50
> >>VIPS=20
> >>BACKENDS=10
> >>ovn-nbctl lr-add rtr
> >>for ((i = 1; i <= $NODES; i++)); do
> >>ovn-nbctl \
> >>-- ls-add ls$i \
> >>-- lsp-add ls$i vm$i \
> >>-- lsp-add ls$i ls$i-rtr \
> >>-- lsp-set-type ls$i-rtr router \
> >>-- lsp-set-options ls$i-rtr router-port=rtr-ls$i \
> >>-- lrp-add rtr rtr-ls$i 00:00:00:00:01:00 42.42.42.$i/24
> >>done
> >>
> >>for ((i = 1; i <= $VIPS; i++)); do
> >>lb=lb$i
> >>vip=10.10.10.$i:1
> >>bip=20.20.20.1:2
> >>for ((j = 2; j <= $BACKENDS; j++)); do
> >>bip="$bip,20.20.20.$j:2"
> >>done
> >>ovn-nbctl lb-add $lb $vip $backends
> >>done
> >>
> >>for ((i = 1; i <= $NODES; i++)); do
> >>for ((j = 1; j <= $VIPS; j++)); do
> >>ovn-nbctl ls-lb-add ls$i lb$j
> >>done
> >>done
> >>
> >>ovs-vsctl add-port br-int vm1 \
> >>-- set interface vm1 type=internal \
> >>-- set interface vm1 external-ids:iface-id=vm1
> >>
> >> Suggested-by: Ilya Maximets 
> >> Signed-off-by: Dumitru Ceara 
> >
> > Hi Dumitru,
>
> Hi Numan,
>
> >
> > Thanks for this patch. The patch LGTM. I tested it out with  huge NB
> > DB with lots
> > of load balancer and backends.  I see a significant reduction in OF flows.
>
> Thanks for the review!
>
> >
> > Please see the few comments below.
> >
> > Thanks
> > Numan
> >
> >> ---
> >>   controller/lflow.c  | 204 ---
> >>   tests/ofproto-macros.at |   5 +-
> >>   tests/ovn.at| 516 
> >> +---
> >>   3 files changed, 455 insertions(+), 270 deletions(-)
> >>
> >> diff --git a/controller/lflow.c b/controller/lflow.c
> >> index 946c1e0..2b7d356 100644
> >> --- a/controller/lflow.c
> >> +++ b/controller/lflow.c
> >> @@ -1171,6 +1171,178 @@ add_neighbor_flows(struct ovsdb_idl_index 
> >> *sbrec_port_binding_by_name,
> >>   }
> >>   }
> >>
> >> +/* Builds the "learn()" action to be triggered by packets initiating a
> >> + * hairpin session.
> >> + *
> >> + * This will generate flows in table OFTABLE_CHK_LB_HAIRPIN_REPLY of the 
> >> form:
> >> + * - match:
> >> + * metadata=,ip/ipv6,ip.src=,ip.dst=
> >> + * nw_proto='lb_proto',tp_src_port=
> >> + * - action:
> >> + * set MLF_LOOKUP_LB_HAIRPIN_BIT=1
> >> + */
> >> +static void
> >> +add_lb_vip_hairpin_reply_action(struct in6_addr *vip6, ovs_be32 vip,
> >> +uint8_t lb_proto, bool has_l4_port,
> >> +uint64_t cookie, struct ofpbuf *ofpacts)
> >> +{
> >> +struct match match = MATCH_CATCHALL_INITIALIZER;
> >> +struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
> >> +struct ofpact_learn_spec *ol_spec;
> >> +unsigned int imm_bytes;
> >> +uint8_t *src_imm;
> >> +
> >> +/* Once learned, hairpin reply flows are permanent until the 
> >> VIP/backend
> >> + * is removed.
> >> + */
> >> +ol->flags = NX_LEARN_F_DELETE_LEARNED;
> >> +ol->idle_timeout = OFP_FLOW_PERMANENT;
> >> +ol->hard_timeout = OFP_FLOW_PERMANENT;
> >> +ol->priority = OFP_DEFAULT_PRIORITY;
> >> +ol->table_id = OFTABLE_CHK_LB_HAIRPIN_REPLY;
> >> +ol->cookie = htonll(cookie);
> >> +
> >> +/* Match on metadata of the 

Re: [ovs-dev] [PATCH ovn] lflow: Use learn() action to generate LB hairpin reply flows.

2021-02-08 Thread Dumitru Ceara

On 2/8/21 11:09 AM, Numan Siddique wrote:

On Sat, Feb 6, 2021 at 4:00 AM Dumitru Ceara  wrote:


The main trait of load balancer hairpin traffic is that it never leaves
the local hypervisor.  Essentially this means that only hairpin
openflows installed for logical switches that have at least one logical
switch port bound locally can ever be hit.

Until now, if a load balancer was applied on multiple logical switches
that are connected through a distributed router, ovn-controller would
install flows to detect hairpin replies for all logical switches. In
practice this leads to a very high number of openflows out of which
most will never be used.

Instead we now use an additional action, learn(), on flows that match on
packets that create the hairpin session.  The learn() action will then
generate the necessary flows to handle hairpin replies, but only for
the local datapaths which actually generate hairpin traffic.

For example, simulating how ovn-k8s uses load balancer for services,
in a "switch per node" scenario, the script below would generate
10K (n_nodes * n_vips * n_backends) openflows on every node in table=69
(hairpin reply).  With this patch the maximum number of openflows that
can be created for hairpin replies is 200 (n_vips * n_backends).

In general, for deployments that leverage switch-per-node topologies,
the number of openflows is reduced by a factor of N, where N is the
number of nodes.

   $ cat lbs.sh
   NODES=50
   VIPS=20
   BACKENDS=10
   ovn-nbctl lr-add rtr
   for ((i = 1; i <= $NODES; i++)); do
   ovn-nbctl \
   -- ls-add ls$i \
   -- lsp-add ls$i vm$i \
   -- lsp-add ls$i ls$i-rtr \
   -- lsp-set-type ls$i-rtr router \
   -- lsp-set-options ls$i-rtr router-port=rtr-ls$i \
   -- lrp-add rtr rtr-ls$i 00:00:00:00:01:00 42.42.42.$i/24
   done

   for ((i = 1; i <= $VIPS; i++)); do
   lb=lb$i
   vip=10.10.10.$i:1
   bip=20.20.20.1:2
   for ((j = 2; j <= $BACKENDS; j++)); do
   bip="$bip,20.20.20.$j:2"
   done
   ovn-nbctl lb-add $lb $vip $backends
   done

   for ((i = 1; i <= $NODES; i++)); do
   for ((j = 1; j <= $VIPS; j++)); do
   ovn-nbctl ls-lb-add ls$i lb$j
   done
   done

   ovs-vsctl add-port br-int vm1 \
   -- set interface vm1 type=internal \
   -- set interface vm1 external-ids:iface-id=vm1

Suggested-by: Ilya Maximets 
Signed-off-by: Dumitru Ceara 


Hi Dumitru,


Hi Numan,



Thanks for this patch. The patch LGTM. I tested it out with  huge NB
DB with lots
of load balancer and backends.  I see a significant reduction in OF flows.


Thanks for the review!



Please see the few comments below.

Thanks
Numan


---
  controller/lflow.c  | 204 ---
  tests/ofproto-macros.at |   5 +-
  tests/ovn.at| 516 +---
  3 files changed, 455 insertions(+), 270 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index 946c1e0..2b7d356 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -1171,6 +1171,178 @@ add_neighbor_flows(struct ovsdb_idl_index 
*sbrec_port_binding_by_name,
  }
  }

+/* Builds the "learn()" action to be triggered by packets initiating a
+ * hairpin session.
+ *
+ * This will generate flows in table OFTABLE_CHK_LB_HAIRPIN_REPLY of the form:
+ * - match:
+ * metadata=,ip/ipv6,ip.src=,ip.dst=
+ * nw_proto='lb_proto',tp_src_port=
+ * - action:
+ * set MLF_LOOKUP_LB_HAIRPIN_BIT=1
+ */
+static void
+add_lb_vip_hairpin_reply_action(struct in6_addr *vip6, ovs_be32 vip,
+uint8_t lb_proto, bool has_l4_port,
+uint64_t cookie, struct ofpbuf *ofpacts)
+{
+struct match match = MATCH_CATCHALL_INITIALIZER;
+struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
+struct ofpact_learn_spec *ol_spec;
+unsigned int imm_bytes;
+uint8_t *src_imm;
+
+/* Once learned, hairpin reply flows are permanent until the VIP/backend
+ * is removed.
+ */
+ol->flags = NX_LEARN_F_DELETE_LEARNED;
+ol->idle_timeout = OFP_FLOW_PERMANENT;
+ol->hard_timeout = OFP_FLOW_PERMANENT;
+ol->priority = OFP_DEFAULT_PRIORITY;
+ol->table_id = OFTABLE_CHK_LB_HAIRPIN_REPLY;
+ol->cookie = htonll(cookie);
+
+/* Match on metadata of the packet that created the hairpin session. */
+ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
+
+ol_spec->dst.field = mf_from_id(MFF_METADATA);
+ol_spec->dst.ofs = 0;
+ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
+ol_spec->n_bits = ol_spec->dst.n_bits;
+ol_spec->dst_type = NX_LEARN_DST_MATCH;
+ol_spec->src_type = NX_LEARN_SRC_FIELD;
+ol_spec->src.field = mf_from_id(MFF_METADATA);
+
+/* Match on the same ETH type as the packet that created the hairpin
+ * session.
+ */
+ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
+ol_spec->dst.field = mf_from_id(MFF_ETH_TYPE);
+ol_spec->dst.ofs = 

Re: [ovs-dev] [PATCH ovn] lflow: Use learn() action to generate LB hairpin reply flows.

2021-02-08 Thread Numan Siddique
On Sat, Feb 6, 2021 at 4:00 AM Dumitru Ceara  wrote:
>
> The main trait of load balancer hairpin traffic is that it never leaves
> the local hypervisor.  Essentially this means that only hairpin
> openflows installed for logical switches that have at least one logical
> switch port bound locally can ever be hit.
>
> Until now, if a load balancer was applied on multiple logical switches
> that are connected through a distributed router, ovn-controller would
> install flows to detect hairpin replies for all logical switches. In
> practice this leads to a very high number of openflows out of which
> most will never be used.
>
> Instead we now use an additional action, learn(), on flows that match on
> packets that create the hairpin session.  The learn() action will then
> generate the necessary flows to handle hairpin replies, but only for
> the local datapaths which actually generate hairpin traffic.
>
> For example, simulating how ovn-k8s uses load balancer for services,
> in a "switch per node" scenario, the script below would generate
> 10K (n_nodes * n_vips * n_backends) openflows on every node in table=69
> (hairpin reply).  With this patch the maximum number of openflows that
> can be created for hairpin replies is 200 (n_vips * n_backends).
>
> In general, for deployments that leverage switch-per-node topologies,
> the number of openflows is reduced by a factor of N, where N is the
> number of nodes.
>
>   $ cat lbs.sh
>   NODES=50
>   VIPS=20
>   BACKENDS=10
>   ovn-nbctl lr-add rtr
>   for ((i = 1; i <= $NODES; i++)); do
>   ovn-nbctl \
>   -- ls-add ls$i \
>   -- lsp-add ls$i vm$i \
>   -- lsp-add ls$i ls$i-rtr \
>   -- lsp-set-type ls$i-rtr router \
>   -- lsp-set-options ls$i-rtr router-port=rtr-ls$i \
>   -- lrp-add rtr rtr-ls$i 00:00:00:00:01:00 42.42.42.$i/24
>   done
>
>   for ((i = 1; i <= $VIPS; i++)); do
>   lb=lb$i
>   vip=10.10.10.$i:1
>   bip=20.20.20.1:2
>   for ((j = 2; j <= $BACKENDS; j++)); do
>   bip="$bip,20.20.20.$j:2"
>   done
>   ovn-nbctl lb-add $lb $vip $backends
>   done
>
>   for ((i = 1; i <= $NODES; i++)); do
>   for ((j = 1; j <= $VIPS; j++)); do
>   ovn-nbctl ls-lb-add ls$i lb$j
>   done
>   done
>
>   ovs-vsctl add-port br-int vm1 \
>   -- set interface vm1 type=internal \
>   -- set interface vm1 external-ids:iface-id=vm1
>
> Suggested-by: Ilya Maximets 
> Signed-off-by: Dumitru Ceara 

Hi Dumitru,

Thanks for this patch. The patch LGTM. I tested it out with  huge NB
DB with lots
of load balancer and backends.  I see a significant reduction in OF flows.

Please see the few comments below.

Thanks
Numan

> ---
>  controller/lflow.c  | 204 ---
>  tests/ofproto-macros.at |   5 +-
>  tests/ovn.at| 516 
> +---
>  3 files changed, 455 insertions(+), 270 deletions(-)
>
> diff --git a/controller/lflow.c b/controller/lflow.c
> index 946c1e0..2b7d356 100644
> --- a/controller/lflow.c
> +++ b/controller/lflow.c
> @@ -1171,6 +1171,178 @@ add_neighbor_flows(struct ovsdb_idl_index 
> *sbrec_port_binding_by_name,
>  }
>  }
>
> +/* Builds the "learn()" action to be triggered by packets initiating a
> + * hairpin session.
> + *
> + * This will generate flows in table OFTABLE_CHK_LB_HAIRPIN_REPLY of the 
> form:
> + * - match:
> + * metadata=,ip/ipv6,ip.src=,ip.dst=
> + * nw_proto='lb_proto',tp_src_port=
> + * - action:
> + * set MLF_LOOKUP_LB_HAIRPIN_BIT=1
> + */
> +static void
> +add_lb_vip_hairpin_reply_action(struct in6_addr *vip6, ovs_be32 vip,
> +uint8_t lb_proto, bool has_l4_port,
> +uint64_t cookie, struct ofpbuf *ofpacts)
> +{
> +struct match match = MATCH_CATCHALL_INITIALIZER;
> +struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
> +struct ofpact_learn_spec *ol_spec;
> +unsigned int imm_bytes;
> +uint8_t *src_imm;
> +
> +/* Once learned, hairpin reply flows are permanent until the VIP/backend
> + * is removed.
> + */
> +ol->flags = NX_LEARN_F_DELETE_LEARNED;
> +ol->idle_timeout = OFP_FLOW_PERMANENT;
> +ol->hard_timeout = OFP_FLOW_PERMANENT;
> +ol->priority = OFP_DEFAULT_PRIORITY;
> +ol->table_id = OFTABLE_CHK_LB_HAIRPIN_REPLY;
> +ol->cookie = htonll(cookie);
> +
> +/* Match on metadata of the packet that created the hairpin session. */
> +ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> +
> +ol_spec->dst.field = mf_from_id(MFF_METADATA);
> +ol_spec->dst.ofs = 0;
> +ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
> +ol_spec->n_bits = ol_spec->dst.n_bits;
> +ol_spec->dst_type = NX_LEARN_DST_MATCH;
> +ol_spec->src_type = NX_LEARN_SRC_FIELD;
> +ol_spec->src.field = mf_from_id(MFF_METADATA);
> +
> +/* Match on the same ETH type as the packet that created the hairpin
> + * session.
> + */
> +ol_spec =