Re: [ovs-dev] [PATCH ovn v2 3/7] controller: Add load balancer hairpin OF flows.
On Fri, Oct 30, 2020 at 2:48 AM Dumitru Ceara wrote: > > On 10/27/20 6:18 PM, num...@ovn.org wrote: > > From: Numan Siddique > > > > Hi Numan, > > > Presently to handle the load balancer hairpin traffic (the traffic destined > > to the > > load balancer VIP is dnatted to the backend which originated the traffic), > > ovn-northd > > adds a lot of logical flows to check this scenario. This patch attempts to > > reduce the > > these logical flows. Each ovn-controller will read the load balancers from > > the newly added southbound Load_Balancer table and adds the load balancer > > hairpin OF > > flows in the table 68, 69 and 70. If suppose a below load balancer is > > configured > > > > 10.0.0.10:80 = 10.0.0.4:8080, 10.0.0.5:8090, then the below flows are added > > > > table=68, ip.src = 10.0.0.4,ip.dst=10.0.0.4,tcp.dst=8080 > > actions=load:1->NXM_NX_REG9[7] > > table=68, ip.src = 10.0.0.5,ip.dst=10.0.0.5,tcp.dst=8090 > > actions=load:1->NXM_NX_REG9[7] > > table=69, ip.src = 10.0.0.4,ip.dst=10.0.0.10,tcp.src=8080 > > actions=load:1->NXM_NX_REG9[7] > > table=69, ip.src = 10.0.0.5,ip.dst=10.0.0.10,tcp.src=8090 > > actions=load:1->NXM_NX_REG9[7] > > Shouldn't this be REG10[7]? No. The same register is used for hairpinning the traffic and for the reverse hairpin since we have a different table for each. > > > table=70, ct.trk && ct.dnat && ct.nw_dst == 10.0.0.10. actions=ct(commit, > > zone=reg12, nat(src=10.0.0.5)) > > > > Upcoming patch will add OVN actions which does the lookup in these tables > > to handle the > > hairpin traffic. > > > > Signed-off-by: Numan Siddique > > --- > > controller/lflow.c | 253 +++ > > controller/lflow.h | 6 +- > > controller/ovn-controller.c | 27 +- > > include/ovn/logical-fields.h | 3 + > > tests/ovn.at | 469 +++ > > 5 files changed, 756 insertions(+), 2 deletions(-) > > > > diff --git a/controller/lflow.c b/controller/lflow.c > > index f631679c3f..657482626d 100644 > > --- a/controller/lflow.c > > +++ b/controller/lflow.c > > @@ -26,6 +26,7 @@ > > #include "ovn-controller.h" > > #include "ovn/actions.h" > > #include "ovn/expr.h" > > +#include "lib/lb.h" > > #include "lib/ovn-l7.h" > > #include "lib/ovn-sb-idl.h" > > #include "lib/extend-table.h" > > @@ -1138,6 +1139,213 @@ add_neighbor_flows(struct ovsdb_idl_index > > *sbrec_port_binding_by_name, > > } > > } > > > > +static void > > +add_lb_vip_hairpin_flows(struct ovn_lb *lb, struct lb_vip *lb_vip, > > + struct lb_vip_backend *lb_backend, > > + uint8_t lb_proto, > > + struct ovn_desired_flow_table *flow_table) > > +{ > > +uint64_t stub[1024 / 8]; > > +struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(stub); > > + > > +uint8_t value = 1; > > +put_load(, sizeof value, MFF_LOG_FLAGS, > > + MLF_LOOKUP_LB_HAIRPIN_BIT, 1, ); > > + > > +ovs_be32 vip4; > > +struct in6_addr vip6; > > + > > +if (lb_vip->addr_family == AF_INET) { > > +ovs_assert(ip_parse(lb_vip->vip, )); > > +} else { > > +ovs_assert(ipv6_parse(lb_vip->vip, )); > > +} > > + > > +struct match hairpin_match = MATCH_CATCHALL_INITIALIZER; > > +struct match hairpin_reply_match = MATCH_CATCHALL_INITIALIZER; > > + > > +if (lb_vip->addr_family == AF_INET) { > > +ovs_be32 ip4; > > +ovs_assert(ip_parse(lb_backend->ip, )); > > Because we share the same ovn_lb structure between NB and SB we're now forced > to parse the backend IP twice: > > ovn_sb_lb_create() > - ovn_lb_create() > -- ip_address_and_port_from_key() > --- this parses the string into a sockaddr_storage The output of the function - ip_address_and_port_from_key() is ip address as char * and port as uint16_t. We do need to anyway call ip_parse/ipv6_parse whether we has same 'struct lb_vip' for both NB and SB or different structures. Otherwise we should not use ip_address_and_port_from_key() at all. Also I don't see any issues with using ip_parse here. > --- then builds an IP string and extracts the port > > Here we reparse the IP string into an ovs_be32. > > If the structs were independent for NB/SB I think we would pottentially have > simpler code and we would avoid reparsing IPs. > > Also, what if the SB DB gets corrupted, do we really want ovn-controller to > abort? If suppose, lb vip is invalid, then the lib/lb.c util function ovn_lb_create() will not even create the 'struct lb_vip' instance. Same for the backend ip. So I think it's ok to assert here. > > > + > > +match_set_dl_type(_match, htons(ETH_TYPE_IP)); > > +match_set_nw_src(_match, ip4); > > +match_set_nw_dst(_match, ip4); > > + > > +match_set_dl_type(_reply_match, > > +htons(ETH_TYPE_IP)); > > +match_set_nw_src(_reply_match, ip4); > > +match_set_nw_dst(_reply_match, vip4); > > +} else {
Re: [ovs-dev] [PATCH ovn v2 3/7] controller: Add load balancer hairpin OF flows.
On 10/27/20 6:18 PM, num...@ovn.org wrote: > From: Numan Siddique > Hi Numan, > Presently to handle the load balancer hairpin traffic (the traffic destined > to the > load balancer VIP is dnatted to the backend which originated the traffic), > ovn-northd > adds a lot of logical flows to check this scenario. This patch attempts to > reduce the > these logical flows. Each ovn-controller will read the load balancers from > the newly added southbound Load_Balancer table and adds the load balancer > hairpin OF > flows in the table 68, 69 and 70. If suppose a below load balancer is > configured > > 10.0.0.10:80 = 10.0.0.4:8080, 10.0.0.5:8090, then the below flows are added > > table=68, ip.src = 10.0.0.4,ip.dst=10.0.0.4,tcp.dst=8080 > actions=load:1->NXM_NX_REG9[7] > table=68, ip.src = 10.0.0.5,ip.dst=10.0.0.5,tcp.dst=8090 > actions=load:1->NXM_NX_REG9[7] > table=69, ip.src = 10.0.0.4,ip.dst=10.0.0.10,tcp.src=8080 > actions=load:1->NXM_NX_REG9[7] > table=69, ip.src = 10.0.0.5,ip.dst=10.0.0.10,tcp.src=8090 > actions=load:1->NXM_NX_REG9[7] Shouldn't this be REG10[7]? > table=70, ct.trk && ct.dnat && ct.nw_dst == 10.0.0.10. actions=ct(commit, > zone=reg12, nat(src=10.0.0.5)) > > Upcoming patch will add OVN actions which does the lookup in these tables to > handle the > hairpin traffic. > > Signed-off-by: Numan Siddique > --- > controller/lflow.c | 253 +++ > controller/lflow.h | 6 +- > controller/ovn-controller.c | 27 +- > include/ovn/logical-fields.h | 3 + > tests/ovn.at | 469 +++ > 5 files changed, 756 insertions(+), 2 deletions(-) > > diff --git a/controller/lflow.c b/controller/lflow.c > index f631679c3f..657482626d 100644 > --- a/controller/lflow.c > +++ b/controller/lflow.c > @@ -26,6 +26,7 @@ > #include "ovn-controller.h" > #include "ovn/actions.h" > #include "ovn/expr.h" > +#include "lib/lb.h" > #include "lib/ovn-l7.h" > #include "lib/ovn-sb-idl.h" > #include "lib/extend-table.h" > @@ -1138,6 +1139,213 @@ add_neighbor_flows(struct ovsdb_idl_index > *sbrec_port_binding_by_name, > } > } > > +static void > +add_lb_vip_hairpin_flows(struct ovn_lb *lb, struct lb_vip *lb_vip, > + struct lb_vip_backend *lb_backend, > + uint8_t lb_proto, > + struct ovn_desired_flow_table *flow_table) > +{ > +uint64_t stub[1024 / 8]; > +struct ofpbuf ofpacts = OFPBUF_STUB_INITIALIZER(stub); > + > +uint8_t value = 1; > +put_load(, sizeof value, MFF_LOG_FLAGS, > + MLF_LOOKUP_LB_HAIRPIN_BIT, 1, ); > + > +ovs_be32 vip4; > +struct in6_addr vip6; > + > +if (lb_vip->addr_family == AF_INET) { > +ovs_assert(ip_parse(lb_vip->vip, )); > +} else { > +ovs_assert(ipv6_parse(lb_vip->vip, )); > +} > + > +struct match hairpin_match = MATCH_CATCHALL_INITIALIZER; > +struct match hairpin_reply_match = MATCH_CATCHALL_INITIALIZER; > + > +if (lb_vip->addr_family == AF_INET) { > +ovs_be32 ip4; > +ovs_assert(ip_parse(lb_backend->ip, )); Because we share the same ovn_lb structure between NB and SB we're now forced to parse the backend IP twice: ovn_sb_lb_create() - ovn_lb_create() -- ip_address_and_port_from_key() --- this parses the string into a sockaddr_storage --- then builds an IP string and extracts the port Here we reparse the IP string into an ovs_be32. If the structs were independent for NB/SB I think we would pottentially have simpler code and we would avoid reparsing IPs. Also, what if the SB DB gets corrupted, do we really want ovn-controller to abort? > + > +match_set_dl_type(_match, htons(ETH_TYPE_IP)); > +match_set_nw_src(_match, ip4); > +match_set_nw_dst(_match, ip4); > + > +match_set_dl_type(_reply_match, > +htons(ETH_TYPE_IP)); > +match_set_nw_src(_reply_match, ip4); > +match_set_nw_dst(_reply_match, vip4); > +} else { > +struct in6_addr ip6; > +ovs_assert(ipv6_parse(lb_backend->ip, )); Same here. > + > +match_set_dl_type(_match, htons(ETH_TYPE_IPV6)); > +match_set_ipv6_src(_match, ); > +match_set_ipv6_dst(_match, ); > + > +match_set_dl_type(_reply_match, > +htons(ETH_TYPE_IPV6)); Indentation is a bit odd. > +match_set_ipv6_src(_reply_match, ); > +match_set_ipv6_dst(_reply_match, ); > +} > + > +if (lb_backend->port) { > +match_set_nw_proto(_match, lb_proto); > +match_set_tp_dst(_match, htons(lb_backend->port)); > + > +match_set_nw_proto(_reply_match, lb_proto); > +match_set_tp_src(_reply_match, > +htons(lb_backend->port)); Same here. > +} > + > +for (size_t i = 0; i < lb->slb->n_datapaths; i++) { > +match_set_metadata(_match, > +