On Tue, 16 Oct 2018 at 15:43, Ankur Sharma <ankur.sha...@nutanix.com> wrote:
> Hi, > > We have done some effort in evaluating usage of OVN for > Distributed Virtual Routing (DVR) for vlan backed networks. > Would you mind explaining the above statement with a lot of details? I would like to understand the problem well before looking at the proposed solution. > > We would like to take it forward with the community. > > We understand that some of the work could be overlapping with existing > patches in review. > > We would appreciate the feedback and would be happy to update our patches > to avoid known overlaps. > > This email explains the proposal. We will be following it up with patches. > Each "CODE CHANGES" section summarizes the change that corresponding patch > would have. > > > DISTRIBUTED VIRTUAL ROUTING FOR VLAN BACKED NETWORKS > ====================================================== > > > 1. OVN Bridge Deployment > ------------------------------------ > > Our design follows following ovn-bridge deployment model > (please refer to figure OVN Bridge deployment). > i. br-int ==> OVN managed bridge. > br-pif ==> Learning Bridge, where physical NICs will be connected. > > ii. Any packet that should be on physical network, will travel from > BR-INT > to BR-PIF, via patch ports (localnet ports). > > 2. Layer 2 > ------------- > > DESIGN: > ~~~~~~~ > a. Leverage on localnet logical port type as path port between br-int > and > br-pif. > b. Each VLAN backed logical switch will have a localnet port connected > to it. > c. Tagging and untagging of vlan headers happens at localnet port > boundary. > > PIPELINE EXECUTION: > ~~~~~~~~~~~~~~~~~~~ > a. Unlike geneve encap based solution, where we execute ingress > pipeline on > source chassis and egress pipeline on destination chassis, for vlan > backed logical switches, packet will go through ingress pipeline > on destination chassis as well. > > PACKET FLOW (Figure 1. shows topology and Figure 2. shows the packet > flow): > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > a. VM sends unicast traffic (destined to VM2_MAC) to br-int. > b. For br-int, destination mac is not local, hence it will forward it to > localnet port (by design), which is attached to br-pif. This is > the stage at which vlan tag is added. Br-pif forwards the packet > to physical interface. > c. br-pif on destination chassis sends the received traffic to > patch-ports > on br-int (as unicast or unknown unicast). > d. br-int does vlan tag check, strips the vlan header and sends > the packet to ingress pipeline of the corresponding datapath. > > > KEY DIFFERENCES AS COMPARED TO OVERLAY: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > a. No encapsulation. > b. Both ingress and egress pipelines of logical switch are executed on > both source and destination hypervisor (unlike overlay where ingress > pipeline is executed on source hypervisor and egress on > destination). > > CODE CHANGES: > ~~~~~~~~~~~~~ > a. ovn-nb.ovsschema: > 1. Add a new column to table Logical_Switch. > 2. Column name would be "type". > 3. Values would be either "vlan" or "overlay", with "overlay" > being default. > > b. ovn-sbctl: > 1. Add a new cli which sets the "type" of logical-switch. > ovn-nbctl ls-set-network-type SWITCH TYPE > > c. ovn-northd: > 1. Add a new enum to ovn_datapath struct, which will indicate > if logical_switch datapath type is overlay or vlan. > 2. Populate a new key value pair in southbound database for > Datapath > Bindings of Logical_Switch. > 3. Key value pair: <logical-switch-type, "vlan" or "overlay">, > default > will be overlay. > > > 3. Layer 3 East West > -------------------- > > DESIGN: > ~~~~~~~ > a. Since the router port is distributed and there is no encapsulation, > hence packets with router port mac as source mac cannot go on wire. > b. We propose replacing router port mac with a chassis specific mac, > whenever packet goes on wire. > c. Number of chassis_mac per chassis could be dependent on number of > physical nics and corresponding bond policy on br-pif. > > As of now, we propose only one chassis_mac per chassis > (shared by all resident logical routers). However, we are analyzing > if br-pif's bond policy would require more macs per chassis. > > PIPELINE EXECUTION: > ~~~~~~~~~~~~~~~~~~~ > a. For a DVR E-W flow, both ingress and egress pipelines for > logical_router > will execute on source chassis only. > > PACKET FLOW (Figure 3. shows topology and Figure 4. shows the packet > flow): > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > a. VM1 sends packet (destined to IP2), to br-int. > b. On Source hypervisor, packet goes through following pipelines: > 1. Ingress: logical-switch 1 > 2. Egress: logical-switch 1 > 3. Ingress: logical-router > 4. Egress: logical-router > 5. Ingress: logical-switch2 > 6. Egress: logical-switch2 > > On wire, packet goes out with destination logical switch's vlan. > As mentioned in design, source mac (RP2_MAC) would be replaced with > CHASSIS_MAC and destination mac would be that of VM2. > > c. Packet reaches destination chassis and enters logical-switch2 > pipeline in br-int. > d. Packet goes through logical-switch2 pipeline (both ingress and > egress) > and gets forwarded to VM2. > > CODE CHANGES: > ~~~~~~~~~~~~~ > a. ovn-sb.ovsschema: > 1. Add a new column to the table Chassis. > 2. Column name would be "chassis_macs", type being string and no > limit on range of values. > 3. This column will hold a list if chassis unique macs. > 4. This table will be populated from ovn-controller. > > b. ovn-sbctl: > 1. CLI to add/delete chassis_macs to/from the south bound database. > > c. ovn-controller: > 1. Read chassis macs from OVS Open_Vswitch table and populate > south bound database. > 2. In table=65, add a new flow at priority 150, which will do > following: > a. Match: source_mac == router_port_mac, metadata == > destination_logical_switch, logical_outport = localnet_port > b. Action: Replace source mac with chassis_mac, add vlan tag. > > > 4. LAYER 3 North South (NO NAT) > ------------------------------- > > DESIGN: > ~~~~~~~ > a. For talking to external network endpoint, we will need a gateway > on OVN DVR. > b. We propose to use the gateway_chassis construct to achieve the same. > c. LRP will be attached to Gateway Chassis(s) and only on the active > chassis we will respond to ARP request for the LRP IP from undelay > network. > d. If NATing (keeping state) is not involved then traffic need not go > via the gateway chassis always, i.e traffic from OVN chassis to > external network need not go via the gateway chassis. > > PIPELINE EXECUTION: > ~~~~~~~~~~~~~~~~~~~ > a. From endpoint on OVN chassis to endpoint on underlay. > i. Like DVR E-W, logical_router ingress and egress pipelines are > executed on source chassis. > > b. From endpoint on underlay TO endpoint on OVN chassis. > i. logical_router ingress and egress pipelines are executed on > gateway chassis. > > PACKET FLOW LS ENDPOINT to UNDERLAY ENDPOINT (Figure 5. shows topology): > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > a. Packet flow in this case is exactly same as Layer 3 E-W. > > > PACKET FLOW UNDERLAY ENDPOINT to LS ENDPOINT (Figure 5. shows topology > and > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Figure 6. shows the packet flow): > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > a. Gateway for endpoints behind DVR will be resident on only > gateway-chassis. > b. Unicast packets will come to gateway-chassis, with destination MAC > being RP2_MAC. > c. From now on, it is like L3 E-W flow. > > CODE CHANGES: > ~~~~~~~~~~~~~ > a. ovn-northd: > 1. Changes to respond to vlan backed router port ARP from uplink, > only if it is on a gateway chassis. > 2. Changes to make sure that in the absence of NAT configuration, > OVN_CHASSIS to external network traffic does not go via the > gateway > chassis. > > b. ovn-controller: > 1. Send out garps, advertising the vlan backed router port's > (which has gateway chassis attached to it) from the > active gateway chassis. > > > 5. LAYER 3 North South (NAT) > ---------------------------- > > SNAT, DNAT, SNAT_AND_DNAT (without external mac): > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > a. Our proposal aligns with following patch series which is out for > review: > link <http://patchwork.ozlabs.org/patch/952119/> > > b. However, our implementation deviates from proposal in following > areas: > i. Usage of lr_in_ip_routing: > Our implementation sets the redirect flag after routing decision > is taken. > This is to ensure that a user entered static route will not > affect the > redirect decision (unless it is meant to). > > ii. Using Tenant VLAN ID for "redirection": > Our implementation uses external network router port's > (router port that has gateway chassis attached to it) vlan id > for redirection. This is because chassisredirect port is NOT on > tenant network and logically packet is being forwarded to > chassisredirect port. > > > SNAT_AND_DNAT (with external mac): > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > a. Current OVN implementation of not going via gateway chassis aligns > with > our design and it worked fine. > > > This is just an initial proposal. We have identified more areas that > should be worked upon, we will submit patches (and put forth topics/design > for discussion), > as we make progress. > > > Thanks > > Regards, > Ankur > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev