On Wed, Jul 18, 2018 at 6:52 PM, Miguel Angel Ajo Pelayo <
majop...@redhat.com> wrote:

>
> I have been testing the patches, and seeing them work as expected
> (L3HA failovers, N/S, E/W, etc...), but I have found a couple of
> issues, one of them, "2", I'm not sure it's an issue, but I will
> describe it too, in case it's not a real issue we can move it to
> disc...@openvswitch.org then.
>
>
Issue 2 is because of the design decision of VLANs implementation.
Its not related to this patch(we can see the same behaviour you explained
without this patch),
so we can move that to ovs-discuss(as not related to this patch).


>
> 1) The expiry of the chassisredirect port MACs on the switch CAM
>    table: In N/S routing, when any traffic needs to be handled
>    by the master Chassis for a router the dst.mac is the MAC of
>    the chassisredirect port.
>
>    The switch knows about such MAC because it's announced via gARP
>    on the L2 level. The problem is that, for incoming N to S traffic
>    the router pipelines translate to the router internal leg src.mac
>    before sending the packet to the destination Chassis.
>
>    Because of that, the chassisredirect MAC/VLAN combination is never
>    again relearned as outgoing traffic on the right port (master gw
>    chassis port), it will eventually expire after 300 seconds.
>
>    From that moment any traffic directed to the "chassisredirect"
>    port MAC will be flooded until any other gARP happens. Everything
>    seems to work fine at a very small scale, but that would really
>    kill the network in real life conditions.
>
>    You can see it live here:
>
>    https://www.youtube.com/watch?v=VDwoXbZqUto
>
>   (sorry for the audio which is missing in a couple of non-important
>    moments, not sure why)
>
>
Thanks for identifying this issue.


>     The problematic MAC in that video is "fa:16:3e:48:66:e", the one
>     of this chassisredirect port:
>
> logical_port        : "cr-lrp-4823af55-cd17-4de8-8120-6d13c44dc86b"
> mac                 : ["fa:16:3e:48:66:e7 172.24.4.8/24"]
> nat_addresses       : []
> options             : {distributed-port=
>                        "lrp-4823af55-cd17-4de8-8120-6d13c44dc86b"}
> parent_port         : []
> tag                 : []
> tunnel_key          : 3
> type                : chassisredirect
>
>
> Here I can think of one solutions:
>
> a) Make sure that the traffic is not fully processed by the lrouter
> flows on the gateway chassis, and let the packet egress the host with the
> src.mac = "chassisredirect" mac.   That would make switches again
> relearn the MAC/VLAN to port association every time a packet flows N to S.
>
>
I feel this is a better solution for the issue.


> b) which I believe doesn't work: make sure gARPs don't stop happening
> (or happen <300sec). Would not be a valid solution, since CAM table
> entries could be early expired on switches if they overflow.
>
>
> 2) MAC flipping on E/W traffic, which is easier to see in this blog post:
>      https://ajo.es/ovn-distributed-ew-on-vlan/#the-end-oh-no
>
>    If you want the TL;DR version for more context go to the top:
>      https://ajo.es/ovn-distributed-ew-on-vlan/
>
>    Where the VLAN/MAC combination lives is not really important, since we
>    never direct traffic to such mac, all the lrouter flow processing
>    happens in OpenFlow before leaving the host.
>
>    My worry here, is... for a switch, is it just enough to disable port
>    flapping protection as we already have to do for L3HA (a MAC can move
>    around ports based on master/backup status)., or, given the higher rate
>    of port flapping, can it be problematic (for example, I could think
>    of the switch logging every port flap, but I don't know if that would
>    be the case).
>
>
>    One solution for this could be:
>
> a)  Making sure that packets that leave the host have the host MAC address
> on the physical interface of the provider bridge where the Logical Switch
> has a localport attached to.  It would be fine, since that mac address is
> never matched on destination, but we would also need to restore it with
> another lflow at the moment it arrives the final Chassis.
>
>      (As far as I've been told, this is what neutron/dvr does for VLAN
>      tenant networks)
>
>
Yes, this  solution is similar to OpenStack Neutron reference
implementation of DVR E/W traffic
(i.e replacing router internal port MAC with host MAC address when the
packet is leaving the host)


> I plan to start working (with some help from Anil) on a follow up
> patch to make sure "1" does not happen, and then "2" if we confirm that's
> problematic.
>
> Best,
> Miguel Ángel.
>
>
>
> On Tue, Jul 10, 2018 at 8:25 AM, Miguel Angel Ajo Pelayo <
> majop...@redhat.com> wrote:
>
>> Anil, good work!. thank you.
>>
>> I'm reviewing the patches and the behaviour of the series to make sure
>> everything is all right.
>>
>> E/W distributed L3 routing over L2 is an interesting problem I'm
>> documenting what I see to
>> share it on this thread.
>>
>> Best,
>> Miguel Ángel
>>
>>
>> On Mon, Jun 25, 2018 at 9:33 AM Anil Venkata <anilvenk...@redhat.com>
>> wrote:
>>
>>> On Sat, Jun 16, 2018 at 12:05 AM, Ben Pfaff <b...@ovn.org> wrote:
>>>
>>> > On Thu, Jun 07, 2018 at 02:59:46PM +0530, vkomm...@redhat.com wrote:
>>> > > From: Venkata Anil <vkomm...@redhat.com>
>>> > >
>>> > > This patch avoids tunneling and instead uses source tenant vlan
>>> network
>>> > > across hypervisors for traffic from vlan network on local hypervisor
>>> > > towards gateway hypervisor hosting redirect chassiss port.
>>> > >
>>> > > On the local hypervisor, when the packet enters logical router
>>> ingress
>>> > > pipeline from tenant vlan network, router will set
>>> REGBIT_NAT_REDIRECT
>>> > > and redirect the packet to gateway hypervisor, which is hosting the
>>> > > chassis redirect port, using tenant vlan network.
>>> > > Packet travelling across hypervisors will have source vlan tag and
>>> > > distributed gateway port MAC as destination MAC (other packet data
>>> > > unchanged).
>>> > >
>>> > > Gateway hypervisor will check the vlan tag and destination MAC and
>>> > > resubmit it to router logical ingress pipeline for routing and
>>> finding
>>> > > the logical output port(i.e it treats this packet as coming from the
>>> > > local patch port connected to tenant vlan network for routing).
>>> > >
>>> > > No changes done for return path as return path to source hypervisor
>>> > > always uses tenant vlan networks.
>>> >
>>> > Thanks a lot for revising the patch series.
>>> >
>>> > We've had a lot of churn in ovn-controller over the last week, and it
>>> > has caused some patch rejects for this patch series.  Would you mind
>>> > rebasing and reposting it?
>>> >
>>>
>>> Thanks Ben. Sorry for the delay, I was on vacation. I will rebase it now.
>>>
>>> Thanks
>>> Anil
>>> _______________________________________________
>>> dev mailing list
>>> d...@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to