On Tue, 16 Oct 2018 at 15:43, Ankur Sharma <ankur.sha...@nutanix.com> wrote:

> Hi,
>
> We have done some effort in evaluating usage of OVN for
> Distributed Virtual Routing (DVR) for vlan backed networks.
>

Would you mind explaining the above statement with a lot of details? I
would like to understand the problem well before looking at the proposed
solution.


>
> We would like to take it forward with the community.
>
> We understand that some of the work could be overlapping with existing
> patches in review.
>
> We would appreciate the feedback and would be happy to update our patches
> to avoid known overlaps.
>
> This email explains the proposal. We will be following it up with patches.
> Each "CODE CHANGES" section summarizes the change that corresponding patch
> would have.
>
>
> DISTRIBUTED VIRTUAL ROUTING FOR VLAN BACKED NETWORKS
> ======================================================
>
>
> 1. OVN Bridge Deployment
> ------------------------------------
>
> Our design follows following ovn-bridge deployment model
> (please refer to figure OVN Bridge deployment).
>     i. br-int ==> OVN managed bridge.
>        br-pif ==> Learning Bridge, where physical NICs will be connected.
>
>    ii. Any packet that should be on physical network, will travel from
> BR-INT
>        to BR-PIF, via patch ports (localnet ports).
>
> 2. Layer 2
> -------------
>
>    DESIGN:
>    ~~~~~~~
>    a. Leverage on localnet logical port type as path port between br-int
> and
>        br-pif.
>    b. Each VLAN backed logical switch will have a localnet port connected
>        to it.
>    c. Tagging and untagging of vlan headers happens at localnet port
> boundary.
>
>    PIPELINE EXECUTION:
>    ~~~~~~~~~~~~~~~~~~~
>    a. Unlike geneve encap based solution, where we execute ingress
> pipeline on
>        source chassis and egress pipeline on destination chassis, for vlan
>        backed logical switches, packet will go through ingress pipeline
>        on destination chassis as well.
>
>    PACKET FLOW (Figure 1. shows topology and Figure 2. shows the packet
> flow):
>
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    a. VM sends unicast traffic (destined to VM2_MAC) to br-int.
>    b. For br-int, destination mac is not local, hence it will forward it to
>        localnet port (by design), which is attached to br-pif. This is
>        the stage at which vlan tag is added. Br-pif forwards the packet
>        to physical interface.
>    c. br-pif on destination chassis sends the received traffic to
> patch-ports
>        on br-int (as unicast or unknown unicast).
>    d. br-int does vlan tag check, strips the vlan header and sends
>        the packet to ingress pipeline of the corresponding datapath.
>
>
>    KEY DIFFERENCES AS COMPARED TO OVERLAY:
>    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    a. No encapsulation.
>    b. Both ingress and egress pipelines of logical switch are executed on
>        both source and destination hypervisor (unlike overlay where ingress
>        pipeline is executed on source hypervisor and egress on
> destination).
>
>    CODE CHANGES:
>    ~~~~~~~~~~~~~
>    a. ovn-nb.ovsschema:
>         1. Add a new column to table Logical_Switch.
>         2. Column name would be "type".
>         3. Values would be either "vlan" or "overlay", with "overlay"
>             being default.
>
>    b. ovn-sbctl:
>         1. Add a new cli which sets the "type" of logical-switch.
>             ovn-nbctl ls-set-network-type SWITCH TYPE
>
>    c. ovn-northd:
>         1. Add a new enum to ovn_datapath struct, which will indicate
>             if logical_switch datapath type is overlay or vlan.
>         2. Populate a new key value pair in southbound database for
> Datapath
>             Bindings of Logical_Switch.
>         3. Key value pair: <logical-switch-type, "vlan" or "overlay">,
> default
>             will be overlay.
>
>
> 3. Layer 3 East West
> --------------------
>
>    DESIGN:
>    ~~~~~~~
>    a. Since the router port is distributed and there is no encapsulation,
>        hence packets with router port mac as source mac cannot go on wire.
>    b. We propose replacing router port mac with a chassis specific mac,
>        whenever packet goes on wire.
>    c. Number of chassis_mac per chassis could be dependent on number of
>        physical nics and corresponding bond policy  on br-pif.
>
>       As of now, we propose only one chassis_mac per chassis
>       (shared by all resident logical routers). However, we are analyzing
>       if br-pif's bond policy would require more macs per chassis.
>
>    PIPELINE EXECUTION:
>    ~~~~~~~~~~~~~~~~~~~
>    a. For a DVR E-W flow, both ingress and egress pipelines for
> logical_router
>        will execute on source chassis only.
>
>    PACKET FLOW (Figure 3. shows topology and Figure 4. shows the packet
> flow):
>
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    a. VM1 sends packet (destined to IP2), to br-int.
>    b. On Source hypervisor, packet goes through following pipelines:
>       1. Ingress: logical-switch 1
>       2. Egress:  logical-switch 1
>       3. Ingress: logical-router
>       4. Egress:  logical-router
>       5. Ingress: logical-switch2
>       6. Egress:  logical-switch2
>
>       On wire, packet goes out with destination logical switch's vlan.
>       As mentioned in design, source mac (RP2_MAC) would be replaced with
>       CHASSIS_MAC and destination mac would be that of VM2.
>
>    c. Packet reaches destination chassis and enters logical-switch2
>        pipeline in br-int.
>    d. Packet goes through logical-switch2 pipeline (both ingress and
> egress)
>        and gets forwarded to VM2.
>
>    CODE CHANGES:
>    ~~~~~~~~~~~~~
>    a. ovn-sb.ovsschema:
>         1. Add a new column to the table Chassis.
>         2. Column name would be "chassis_macs", type being string and no
>             limit on range of values.
>         3. This column will hold a list if chassis unique macs.
>         4. This table will be populated from ovn-controller.
>
>    b. ovn-sbctl:
>         1. CLI to add/delete chassis_macs to/from the south bound database.
>
>    c. ovn-controller:
>         1. Read chassis macs from OVS Open_Vswitch table and populate
>             south bound database.
>         2. In table=65, add a new flow at priority 150, which will do
> following:
>            a. Match: source_mac == router_port_mac, metadata ==
>                destination_logical_switch, logical_outport = localnet_port
>            b. Action: Replace source mac with chassis_mac, add vlan tag.
>
>
> 4. LAYER 3 North South (NO NAT)
> -------------------------------
>
>    DESIGN:
>    ~~~~~~~
>    a. For talking to external network endpoint, we will need a gateway
>       on OVN DVR.
>    b. We propose to use the gateway_chassis construct to achieve the same.
>    c. LRP will be attached to Gateway Chassis(s) and only on the active
>        chassis we will respond to ARP request for the LRP IP from undelay
>        network.
>    d. If NATing (keeping state) is not involved then traffic need not go
>        via the gateway chassis always, i.e traffic from OVN chassis to
>        external network need not go via the gateway chassis.
>
>    PIPELINE EXECUTION:
>    ~~~~~~~~~~~~~~~~~~~
>    a. From endpoint on OVN chassis to endpoint on underlay.
>       i. Like DVR E-W, logical_router ingress and egress pipelines are
>          executed on source chassis.
>
>    b. From endpoint on underlay TO endpoint on OVN chassis.
>       i. logical_router ingress and egress pipelines are executed on
>          gateway chassis.
>
>    PACKET FLOW LS ENDPOINT to UNDERLAY ENDPOINT (Figure 5. shows topology):
>    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    a. Packet flow in this case is exactly same as Layer 3 E-W.
>
>
>    PACKET FLOW UNDERLAY ENDPOINT to LS ENDPOINT (Figure 5. shows topology
> and
>
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    Figure 6. shows the packet flow):
>    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    a. Gateway for endpoints behind DVR will be resident on only
>        gateway-chassis.
>    b. Unicast packets will come to gateway-chassis, with destination MAC
>        being RP2_MAC.
>    c. From now on, it is like L3 E-W flow.
>
>    CODE CHANGES:
>    ~~~~~~~~~~~~~
>    a. ovn-northd:
>         1. Changes to respond to vlan backed router port ARP from uplink,
>            only if it is on a gateway chassis.
>         2. Changes to make sure that in the absence of NAT configuration,
>            OVN_CHASSIS to external network traffic does not go via the
> gateway
>            chassis.
>
>    b. ovn-controller:
>         1. Send out garps, advertising the vlan backed router port's
>            (which has gateway chassis attached to it) from the
>            active gateway chassis.
>
>
> 5. LAYER 3 North South (NAT)
> ----------------------------
>
>    SNAT, DNAT, SNAT_AND_DNAT (without external mac):
>    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    a. Our proposal aligns with following patch series which is out for
> review:
>        link <http://patchwork.ozlabs.org/patch/952119/>
>
>    b. However, our implementation deviates from proposal in following
> areas:
>       i. Usage of lr_in_ip_routing:
>          Our implementation sets the redirect flag after routing decision
> is taken.
>          This is to ensure that a user entered static route will not
> affect the
>          redirect decision (unless it is meant to).
>
>      ii. Using Tenant VLAN ID for "redirection":
>          Our implementation uses external network router port's
>          (router port that has gateway chassis attached to it) vlan id
>          for redirection. This is because chassisredirect port is NOT on
>          tenant network and logically packet is being forwarded to
>          chassisredirect port.
>
>
>    SNAT_AND_DNAT (with external mac):
>    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>    a. Current OVN implementation of not going via gateway chassis aligns
> with
>        our design and it worked fine.
>
>
> This is just an initial proposal. We have identified more areas that
> should be worked upon, we will submit patches (and put forth topics/design
> for discussion),
> as we make progress.
>
>
> Thanks
>
> Regards,
> Ankur
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to