On Fri, Sep 21, 2018 at 11:06 PM Guru Shetty <g...@ovn.org> wrote: > > > > > > I have tried to make sense of this patch series a few times. I think > > adding increasing complications like this will make gateway code > > unmaintainable. The whole gateway redirect chassis already makes it > > un-understandable and now this will mean that no one will be able to > > understand it as we go forward. [/rant off] > > > > Sorry about the "rant" if it came out as rude - was not my intention. I > will try to offer something more constructive. When you initially looked at > the gateway redirect code, how long did it take for you to understand it? > My main concern is not with your code, but the underlying code. I initially > reviewed the underlying redirect code. And it took me a good 2 days to > understand it. Back then, we wanted something working and it looked like > something that can eventually be simplified. But, we have been adding more > things to it. And right now, if this patch series is added, it will likely > get very very hard to understand it as we go forward. Do you reckon that a > year from now, you will be able to debug a corner case in the code? > > If there are others who have understood the "redirect" gateway code easily, > please do chime in. > > My suggestion is to re-look at the redirect gateway design and see whether > the code can be simplified. Or whether we can write detailed documentation > about it to make it easy to understand. Only once we are satisfied with it, > we can attempt to add more stuff to it. > > I always forget the gateway code and the design. (May be because I haven't worked on this part of the code much) and I have to refresh my memory everytime. I agree that we need to have a detailed documentation. May be "a journey of a packet " kind of documentation in ovn-architecture would also help.
The proposed solution in this patch series, If I understand it correctly runs the router pipeline in the source chassis and then resumes it on the gateway chassis. Recently I was looking into SR-IOV kind of scenarios, where the VM traffic by passes the host kernel, and I think the proposed solution will not work. In order to support N/S traffic for SR-IOV instances we have to find probably another way and that would also complicate the code further. However I think it is possible to have one solution to support gateway traffic for VLAN tenant networks which covers SR-IOV and the normal case. I think the router pipeline should be executed only on the gateway chassis. I will have a discussion with Anil on this, this monday and see if it's possible. But unfortunately, supporting these use cases will add some amount of complexity :(. Thanks Numan > > > > >> --- > >> > >> v6->v7: > >> * Added this patch > >> > >> > >> ovn/controller/physical.c | 60 > >> ++++++++++++++++++++++++++++++++++++++++++--- > >> ovn/northd/ovn-northd.8.xml | 10 ++++++++ > >> ovn/northd/ovn-northd.c | 29 ++++++++++++++++++++++ > >> ovn/ovn-architecture.7.xml | 4 ++- > >> 4 files changed, 99 insertions(+), 4 deletions(-) > >> > >> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c > >> index f269a1d..1f41f59 100644 > >> --- a/ovn/controller/physical.c > >> +++ b/ovn/controller/physical.c > >> @@ -190,7 +190,9 @@ get_zone_ids(const struct sbrec_port_binding > *binding, > >> static void > >> put_local_common_flows(uint32_t dp_key, uint32_t port_key, > >> bool nested_container, const struct zone_ids > >> *zone_ids, > >> - struct ofpbuf *ofpacts_p, struct hmap > *flow_table) > >> + struct ofpbuf *ofpacts_p, struct hmap > *flow_table, > >> + struct local_datapath *ld, > >> + const struct hmap *local_datapaths) > >> { > >> struct match match; > >> > >> @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t dp_key, uint32_t > >> port_key, > >> } > >> } > >> > >> + struct ofpbuf *clone = NULL; > >> + clone = ofpbuf_clone(ofpacts_p); > >> + > >> /* Resubmit to table 34. */ > >> put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p); > >> ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0, > >> &match, ofpacts_p); > >> > >> + /* For a reply packet from gateway with VLAN switch port as > >> destination > >> + * (excluding localnet_port and external VLAN networks), gateway > >> router > >> + * will use gateway MAC address as source MAC instead of router > >> internal > >> + * port MAC, so that external switches can learn gateway MAC > address. > >> + * Here (before packet is given to the port) we replace router > >> gateway > >> + * MAC address with router internal port MAC. */ > >> + if (ld->localnet_port && (port_key != > >> ld->localnet_port->tunnel_key)) { > >> + for (int i = 0; i < ld->n_peer_dps; i++) { > >> + struct local_datapath *peer_ldp = get_local_datapath( > >> + local_datapaths, ld->peer_dps[i]->peer_dp->tunnel_key); > >> + const struct sbrec_port_binding *crp; > >> + crp = peer_ldp->chassisredirect_port; > >> + if (!crp) { > >> + continue; > >> + } > >> + > >> + if (strcmp(smap_get(&crp->options, "distributed-port"), > >> + ld->peer_dps[i]->peer->logical_port) && > >> + (port_key != ld->peer_dps[i]->patch->tunnel_key)) { > >> + for (int j = 0; j < crp->n_mac; j++) { > >> + struct lport_addresses laddrs; > >> + if (!extract_lsp_addresses(crp->mac[j], &laddrs)) { > >> + continue; > >> + } > >> + match_set_dl_src(&match, laddrs.ea); > >> + destroy_lport_addresses(&laddrs); > >> + break; > >> + } > >> + for (int j = 0; j < ld->peer_dps[i]->peer->n_mac; j++) > { > >> + struct lport_addresses laddrs; > >> + uint64_t mac64; > >> + if (!extract_lsp_addresses( > >> + ld->peer_dps[i]->peer->mac[j], &laddrs)) { > >> + continue; > >> + } > >> + mac64 = eth_addr_to_uint64(laddrs.ea); > >> + put_load(mac64, > >> + MFF_ETH_SRC, 0, 48, clone); > >> + destroy_lport_addresses(&laddrs); > >> + break; > >> + } > >> + put_resubmit(OFTABLE_CHECK_LOOPBACK, clone); > >> + ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 150, > 0, > >> + &match, clone); > >> + } > >> + } > >> + } > >> + ofpbuf_delete(clone); > >> + > >> /* Table 34, Priority 100. > >> * ======================= > >> * > >> @@ -330,7 +384,7 @@ consider_port_binding(struct ovsdb_idl_index > >> *sbrec_chassis_by_name, > >> > >> struct zone_ids binding_zones = get_zone_ids(binding, > ct_zones); > >> put_local_common_flows(dp_key, port_key, false, &binding_zones, > >> - ofpacts_p, flow_table); > >> + ofpacts_p, flow_table, ld, > >> local_datapaths); > >> > >> match_init_catchall(&match); > >> ofpbuf_clear(ofpacts_p); > >> @@ -531,7 +585,7 @@ consider_port_binding(struct ovsdb_idl_index > >> *sbrec_chassis_by_name, > >> > >> struct zone_ids zone_ids = get_zone_ids(binding, ct_zones); > >> put_local_common_flows(dp_key, port_key, nested_container, > >> &zone_ids, > >> - ofpacts_p, flow_table); > >> + ofpacts_p, flow_table, ld, > >> local_datapaths); > >> > >> /* Table 0, Priority 200, 150 and 100. > >> * ============================== > >> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml > >> index 8fa5272..876c121 100644 > >> --- a/ovn/northd/ovn-northd.8.xml > >> +++ b/ovn/northd/ovn-northd.8.xml > >> @@ -2013,6 +2013,16 @@ next; > >> </li> > >> > >> <li> > >> + A priority-100 logical flow with match > >> + <code>inport == <var>GW</var> && > >> + flags.rcv_from_vlan == 1</code> has actions > >> + <code>eth.dst = <var>E</var>; next;</code>, where > >> + <var>GW</var> is the logical router distributed gateway > >> + port and <var>E</var> is the MAC address of router > >> + distributed gateway port. > >> + </li> > >> + > >> + <li> > >> For each NAT rule in the OVN Northbound database that can > >> be handled in a distributed manner, a priority-100 logical > >> flow with match <code>ip4.src == <var>B</var> && > >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c > >> index bcf0b66..d012bb8 100644 > >> --- a/ovn/northd/ovn-northd.c > >> +++ b/ovn/northd/ovn-northd.c > >> @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const struct > >> ovn_port *op, > >> } else { > >> ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" : "6"); > >> } > >> + > >> + if (op->peer && op->peer->od->localnet_port && > >> + op->od->l3dgw_port && op->od->l3redirect_port && > >> + (op != op->od->l3redirect_port) && > >> + (op != op->od->l3dgw_port)) { > >> + ds_put_format(&match, " && is_chassis_resident(%s)", > >> + op->od->l3redirect_port->json_key); > >> + ds_put_format(&actions, "; flags.rcv_from_vlan = 1"); > >> + } > >> ds_put_format(&actions, "; " > >> "%sreg1 = %s; " > >> "eth.src = %s; " > >> @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap *datapaths, > struct > >> hmap *ports, > >> op->lrp_networks.ipv6_addrs[i].network_s, > >> op->lrp_networks.ipv6_addrs[i].plen, NULL, NULL); > >> } > >> + > >> + /* For a reply packet from gateway with VLAN switch port as > >> + * destination, replace router internal port MAC with router > >> gateway > >> + * MAC address, so that external switches can learn gateway MAC > >> + * address. Later before delivering the packet to the port, > >> + * controller will replace the gateway MAC with router internal > >> port > >> + * MAC in table 33. */ > >> + if (op->od->l3dgw_port && (op == op->od->l3dgw_port) && > >> + op->od->l3redirect_port) { > >> + ds_clear(&actions); > >> + ds_clear(&match); > >> + ds_put_format(&match, "inport == %s", op->json_key); > >> + ds_put_format(&match, " && flags.rcv_from_vlan == 1"); > >> + ds_put_format(&match, " && is_chassis_resident(%s)", > >> + op->od->l3redirect_port->json_key); > >> + ds_put_format(&actions, > >> + "eth.src = %s; next;", > op->lrp_networks.ea_s); > >> + ovn_lflow_add(lflows, op->od, S_ROUTER_IN_GW_REDIRECT, 100, > >> + ds_cstr(&match), ds_cstr(&actions)); > >> + } > >> } > >> > >> /* Convert the static routes to flows. */ > >> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml > >> index ad2101c..0de41d2 100644 > >> --- a/ovn/ovn-architecture.7.xml > >> +++ b/ovn/ovn-architecture.7.xml > >> @@ -1067,7 +1067,9 @@ > >> > >> <p> > >> Flows in table 33 resemble those in table 32 but for logical > >> ports that > >> - reside locally rather than remotely. For unicast logical > output > >> ports > >> + reside locally rather than remotely. If these are VLAN ports > and > >> + packet has router gateway port MAC address as source, replace > it > >> with > >> + router internal port MAC address. For unicast logical output > >> ports > >> on the local hypervisor, the actions just resubmit to table 34. > >> For > >> multicast output ports that include one or more logical ports > on > >> the > >> local hypervisor, for each such logical port <var>P</var>, the > >> actions > >> -- > >> 1.8.3.1 > >> > >> _______________________________________________ > >> dev mailing list > >> d...@openvswitch.org > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >> > > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev