Re: [ovs-dev] [PATCH ovn] lflow: Use learn() action to generate LB hairpin reply flows.
On Mon, Feb 8, 2021 at 4:02 PM Dumitru Ceara wrote: > > On 2/8/21 11:09 AM, Numan Siddique wrote: > > On Sat, Feb 6, 2021 at 4:00 AM Dumitru Ceara wrote: > >> > >> The main trait of load balancer hairpin traffic is that it never leaves > >> the local hypervisor. Essentially this means that only hairpin > >> openflows installed for logical switches that have at least one logical > >> switch port bound locally can ever be hit. > >> > >> Until now, if a load balancer was applied on multiple logical switches > >> that are connected through a distributed router, ovn-controller would > >> install flows to detect hairpin replies for all logical switches. In > >> practice this leads to a very high number of openflows out of which > >> most will never be used. > >> > >> Instead we now use an additional action, learn(), on flows that match on > >> packets that create the hairpin session. The learn() action will then > >> generate the necessary flows to handle hairpin replies, but only for > >> the local datapaths which actually generate hairpin traffic. > >> > >> For example, simulating how ovn-k8s uses load balancer for services, > >> in a "switch per node" scenario, the script below would generate > >> 10K (n_nodes * n_vips * n_backends) openflows on every node in table=69 > >> (hairpin reply). With this patch the maximum number of openflows that > >> can be created for hairpin replies is 200 (n_vips * n_backends). > >> > >> In general, for deployments that leverage switch-per-node topologies, > >> the number of openflows is reduced by a factor of N, where N is the > >> number of nodes. > >> > >>$ cat lbs.sh > >>NODES=50 > >>VIPS=20 > >>BACKENDS=10 > >>ovn-nbctl lr-add rtr > >>for ((i = 1; i <= $NODES; i++)); do > >>ovn-nbctl \ > >>-- ls-add ls$i \ > >>-- lsp-add ls$i vm$i \ > >>-- lsp-add ls$i ls$i-rtr \ > >>-- lsp-set-type ls$i-rtr router \ > >>-- lsp-set-options ls$i-rtr router-port=rtr-ls$i \ > >>-- lrp-add rtr rtr-ls$i 00:00:00:00:01:00 42.42.42.$i/24 > >>done > >> > >>for ((i = 1; i <= $VIPS; i++)); do > >>lb=lb$i > >>vip=10.10.10.$i:1 > >>bip=20.20.20.1:2 > >>for ((j = 2; j <= $BACKENDS; j++)); do > >>bip="$bip,20.20.20.$j:2" > >>done > >>ovn-nbctl lb-add $lb $vip $backends > >>done > >> > >>for ((i = 1; i <= $NODES; i++)); do > >>for ((j = 1; j <= $VIPS; j++)); do > >>ovn-nbctl ls-lb-add ls$i lb$j > >>done > >>done > >> > >>ovs-vsctl add-port br-int vm1 \ > >>-- set interface vm1 type=internal \ > >>-- set interface vm1 external-ids:iface-id=vm1 > >> > >> Suggested-by: Ilya Maximets > >> Signed-off-by: Dumitru Ceara > > > > Hi Dumitru, > > Hi Numan, > > > > > Thanks for this patch. The patch LGTM. I tested it out with huge NB > > DB with lots > > of load balancer and backends. I see a significant reduction in OF flows. > > Thanks for the review! > > > > > Please see the few comments below. > > > > Thanks > > Numan > > > >> --- > >> controller/lflow.c | 204 --- > >> tests/ofproto-macros.at | 5 +- > >> tests/ovn.at| 516 > >> +--- > >> 3 files changed, 455 insertions(+), 270 deletions(-) > >> > >> diff --git a/controller/lflow.c b/controller/lflow.c > >> index 946c1e0..2b7d356 100644 > >> --- a/controller/lflow.c > >> +++ b/controller/lflow.c > >> @@ -1171,6 +1171,178 @@ add_neighbor_flows(struct ovsdb_idl_index > >> *sbrec_port_binding_by_name, > >> } > >> } > >> > >> +/* Builds the "learn()" action to be triggered by packets initiating a > >> + * hairpin session. > >> + * > >> + * This will generate flows in table OFTABLE_CHK_LB_HAIRPIN_REPLY of the > >> form: > >> + * - match: > >> + * metadata=,ip/ipv6,ip.src=,ip.dst= > >> + * nw_proto='lb_proto',tp_src_port= > >> + * - action: > >> + * set MLF_LOOKUP_LB_HAIRPIN_BIT=1 > >> + */ > >> +static void > >> +add_lb_vip_hairpin_reply_action(struct in6_addr *vip6, ovs_be32 vip, > >> +uint8_t lb_proto, bool has_l4_port, > >> +uint64_t cookie, struct ofpbuf *ofpacts) > >> +{ > >> +struct match match = MATCH_CATCHALL_INITIALIZER; > >> +struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); > >> +struct ofpact_learn_spec *ol_spec; > >> +unsigned int imm_bytes; > >> +uint8_t *src_imm; > >> + > >> +/* Once learned, hairpin reply flows are permanent until the > >> VIP/backend > >> + * is removed. > >> + */ > >> +ol->flags = NX_LEARN_F_DELETE_LEARNED; > >> +ol->idle_timeout = OFP_FLOW_PERMANENT; > >> +ol->hard_timeout = OFP_FLOW_PERMANENT; > >> +ol->priority = OFP_DEFAULT_PRIORITY; > >> +ol->table_id = OFTABLE_CHK_LB_HAIRPIN_REPLY; > >> +ol->cookie = htonll(cookie); > >> + > >> +/* Match on metadata of the
Re: [ovs-dev] [PATCH ovn] lflow: Use learn() action to generate LB hairpin reply flows.
On 2/8/21 11:09 AM, Numan Siddique wrote: On Sat, Feb 6, 2021 at 4:00 AM Dumitru Ceara wrote: The main trait of load balancer hairpin traffic is that it never leaves the local hypervisor. Essentially this means that only hairpin openflows installed for logical switches that have at least one logical switch port bound locally can ever be hit. Until now, if a load balancer was applied on multiple logical switches that are connected through a distributed router, ovn-controller would install flows to detect hairpin replies for all logical switches. In practice this leads to a very high number of openflows out of which most will never be used. Instead we now use an additional action, learn(), on flows that match on packets that create the hairpin session. The learn() action will then generate the necessary flows to handle hairpin replies, but only for the local datapaths which actually generate hairpin traffic. For example, simulating how ovn-k8s uses load balancer for services, in a "switch per node" scenario, the script below would generate 10K (n_nodes * n_vips * n_backends) openflows on every node in table=69 (hairpin reply). With this patch the maximum number of openflows that can be created for hairpin replies is 200 (n_vips * n_backends). In general, for deployments that leverage switch-per-node topologies, the number of openflows is reduced by a factor of N, where N is the number of nodes. $ cat lbs.sh NODES=50 VIPS=20 BACKENDS=10 ovn-nbctl lr-add rtr for ((i = 1; i <= $NODES; i++)); do ovn-nbctl \ -- ls-add ls$i \ -- lsp-add ls$i vm$i \ -- lsp-add ls$i ls$i-rtr \ -- lsp-set-type ls$i-rtr router \ -- lsp-set-options ls$i-rtr router-port=rtr-ls$i \ -- lrp-add rtr rtr-ls$i 00:00:00:00:01:00 42.42.42.$i/24 done for ((i = 1; i <= $VIPS; i++)); do lb=lb$i vip=10.10.10.$i:1 bip=20.20.20.1:2 for ((j = 2; j <= $BACKENDS; j++)); do bip="$bip,20.20.20.$j:2" done ovn-nbctl lb-add $lb $vip $backends done for ((i = 1; i <= $NODES; i++)); do for ((j = 1; j <= $VIPS; j++)); do ovn-nbctl ls-lb-add ls$i lb$j done done ovs-vsctl add-port br-int vm1 \ -- set interface vm1 type=internal \ -- set interface vm1 external-ids:iface-id=vm1 Suggested-by: Ilya Maximets Signed-off-by: Dumitru Ceara Hi Dumitru, Hi Numan, Thanks for this patch. The patch LGTM. I tested it out with huge NB DB with lots of load balancer and backends. I see a significant reduction in OF flows. Thanks for the review! Please see the few comments below. Thanks Numan --- controller/lflow.c | 204 --- tests/ofproto-macros.at | 5 +- tests/ovn.at| 516 +--- 3 files changed, 455 insertions(+), 270 deletions(-) diff --git a/controller/lflow.c b/controller/lflow.c index 946c1e0..2b7d356 100644 --- a/controller/lflow.c +++ b/controller/lflow.c @@ -1171,6 +1171,178 @@ add_neighbor_flows(struct ovsdb_idl_index *sbrec_port_binding_by_name, } } +/* Builds the "learn()" action to be triggered by packets initiating a + * hairpin session. + * + * This will generate flows in table OFTABLE_CHK_LB_HAIRPIN_REPLY of the form: + * - match: + * metadata=,ip/ipv6,ip.src=,ip.dst= + * nw_proto='lb_proto',tp_src_port= + * - action: + * set MLF_LOOKUP_LB_HAIRPIN_BIT=1 + */ +static void +add_lb_vip_hairpin_reply_action(struct in6_addr *vip6, ovs_be32 vip, +uint8_t lb_proto, bool has_l4_port, +uint64_t cookie, struct ofpbuf *ofpacts) +{ +struct match match = MATCH_CATCHALL_INITIALIZER; +struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); +struct ofpact_learn_spec *ol_spec; +unsigned int imm_bytes; +uint8_t *src_imm; + +/* Once learned, hairpin reply flows are permanent until the VIP/backend + * is removed. + */ +ol->flags = NX_LEARN_F_DELETE_LEARNED; +ol->idle_timeout = OFP_FLOW_PERMANENT; +ol->hard_timeout = OFP_FLOW_PERMANENT; +ol->priority = OFP_DEFAULT_PRIORITY; +ol->table_id = OFTABLE_CHK_LB_HAIRPIN_REPLY; +ol->cookie = htonll(cookie); + +/* Match on metadata of the packet that created the hairpin session. */ +ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); + +ol_spec->dst.field = mf_from_id(MFF_METADATA); +ol_spec->dst.ofs = 0; +ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; +ol_spec->n_bits = ol_spec->dst.n_bits; +ol_spec->dst_type = NX_LEARN_DST_MATCH; +ol_spec->src_type = NX_LEARN_SRC_FIELD; +ol_spec->src.field = mf_from_id(MFF_METADATA); + +/* Match on the same ETH type as the packet that created the hairpin + * session. + */ +ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); +ol_spec->dst.field = mf_from_id(MFF_ETH_TYPE); +ol_spec->dst.ofs =
Re: [ovs-dev] [PATCH ovn] lflow: Use learn() action to generate LB hairpin reply flows.
On Sat, Feb 6, 2021 at 4:00 AM Dumitru Ceara wrote: > > The main trait of load balancer hairpin traffic is that it never leaves > the local hypervisor. Essentially this means that only hairpin > openflows installed for logical switches that have at least one logical > switch port bound locally can ever be hit. > > Until now, if a load balancer was applied on multiple logical switches > that are connected through a distributed router, ovn-controller would > install flows to detect hairpin replies for all logical switches. In > practice this leads to a very high number of openflows out of which > most will never be used. > > Instead we now use an additional action, learn(), on flows that match on > packets that create the hairpin session. The learn() action will then > generate the necessary flows to handle hairpin replies, but only for > the local datapaths which actually generate hairpin traffic. > > For example, simulating how ovn-k8s uses load balancer for services, > in a "switch per node" scenario, the script below would generate > 10K (n_nodes * n_vips * n_backends) openflows on every node in table=69 > (hairpin reply). With this patch the maximum number of openflows that > can be created for hairpin replies is 200 (n_vips * n_backends). > > In general, for deployments that leverage switch-per-node topologies, > the number of openflows is reduced by a factor of N, where N is the > number of nodes. > > $ cat lbs.sh > NODES=50 > VIPS=20 > BACKENDS=10 > ovn-nbctl lr-add rtr > for ((i = 1; i <= $NODES; i++)); do > ovn-nbctl \ > -- ls-add ls$i \ > -- lsp-add ls$i vm$i \ > -- lsp-add ls$i ls$i-rtr \ > -- lsp-set-type ls$i-rtr router \ > -- lsp-set-options ls$i-rtr router-port=rtr-ls$i \ > -- lrp-add rtr rtr-ls$i 00:00:00:00:01:00 42.42.42.$i/24 > done > > for ((i = 1; i <= $VIPS; i++)); do > lb=lb$i > vip=10.10.10.$i:1 > bip=20.20.20.1:2 > for ((j = 2; j <= $BACKENDS; j++)); do > bip="$bip,20.20.20.$j:2" > done > ovn-nbctl lb-add $lb $vip $backends > done > > for ((i = 1; i <= $NODES; i++)); do > for ((j = 1; j <= $VIPS; j++)); do > ovn-nbctl ls-lb-add ls$i lb$j > done > done > > ovs-vsctl add-port br-int vm1 \ > -- set interface vm1 type=internal \ > -- set interface vm1 external-ids:iface-id=vm1 > > Suggested-by: Ilya Maximets > Signed-off-by: Dumitru Ceara Hi Dumitru, Thanks for this patch. The patch LGTM. I tested it out with huge NB DB with lots of load balancer and backends. I see a significant reduction in OF flows. Please see the few comments below. Thanks Numan > --- > controller/lflow.c | 204 --- > tests/ofproto-macros.at | 5 +- > tests/ovn.at| 516 > +--- > 3 files changed, 455 insertions(+), 270 deletions(-) > > diff --git a/controller/lflow.c b/controller/lflow.c > index 946c1e0..2b7d356 100644 > --- a/controller/lflow.c > +++ b/controller/lflow.c > @@ -1171,6 +1171,178 @@ add_neighbor_flows(struct ovsdb_idl_index > *sbrec_port_binding_by_name, > } > } > > +/* Builds the "learn()" action to be triggered by packets initiating a > + * hairpin session. > + * > + * This will generate flows in table OFTABLE_CHK_LB_HAIRPIN_REPLY of the > form: > + * - match: > + * metadata=,ip/ipv6,ip.src=,ip.dst= > + * nw_proto='lb_proto',tp_src_port= > + * - action: > + * set MLF_LOOKUP_LB_HAIRPIN_BIT=1 > + */ > +static void > +add_lb_vip_hairpin_reply_action(struct in6_addr *vip6, ovs_be32 vip, > +uint8_t lb_proto, bool has_l4_port, > +uint64_t cookie, struct ofpbuf *ofpacts) > +{ > +struct match match = MATCH_CATCHALL_INITIALIZER; > +struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts); > +struct ofpact_learn_spec *ol_spec; > +unsigned int imm_bytes; > +uint8_t *src_imm; > + > +/* Once learned, hairpin reply flows are permanent until the VIP/backend > + * is removed. > + */ > +ol->flags = NX_LEARN_F_DELETE_LEARNED; > +ol->idle_timeout = OFP_FLOW_PERMANENT; > +ol->hard_timeout = OFP_FLOW_PERMANENT; > +ol->priority = OFP_DEFAULT_PRIORITY; > +ol->table_id = OFTABLE_CHK_LB_HAIRPIN_REPLY; > +ol->cookie = htonll(cookie); > + > +/* Match on metadata of the packet that created the hairpin session. */ > +ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec); > + > +ol_spec->dst.field = mf_from_id(MFF_METADATA); > +ol_spec->dst.ofs = 0; > +ol_spec->dst.n_bits = ol_spec->dst.field->n_bits; > +ol_spec->n_bits = ol_spec->dst.n_bits; > +ol_spec->dst_type = NX_LEARN_DST_MATCH; > +ol_spec->src_type = NX_LEARN_SRC_FIELD; > +ol_spec->src.field = mf_from_id(MFF_METADATA); > + > +/* Match on the same ETH type as the packet that created the hairpin > + * session. > + */ > +ol_spec =