On Thu, May 21, 2020 at 7:12 PM Girish Moodalbail <gmoodalb...@gmail.com>
wrote:

>
>
> On Thu, May 21, 2020 at 6:58 PM Tim Rozet <tro...@redhat.com> wrote:
>
>> On Thu, May 21, 2020 at 8:45 PM Venugopal Iyer <venugop...@nvidia.com>
>> wrote:
>>
>>> Hi, Han:
>>>
>>> ________________________________________
>>> From: ovn-kuberne...@googlegroups.com <ovn-kuberne...@googlegroups.com>
>>> on behalf of Han Zhou <zhou...@gmail.com>
>>> Sent: Thursday, May 21, 2020 4:42 PM
>>> To: Tim Rozet
>>> Cc: Venugopal Iyer; Dumitru Ceara; Girish Moodalbail; Han Zhou; Dan
>>> Winship; ovs-discuss; ovn-kuberne...@googlegroups.com; Michael Cambria
>>> Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve
>>> table
>>>
>>> External email: Use caution opening links or attachments
>>>
>>>
>>>
>>> On Thu, May 21, 2020 at 2:35 PM Tim Rozet <tro...@redhat.com<mailto:
>>> tro...@redhat.com>> wrote:
>>> I think that if you directly connect GR to DR you don't need to learn
>>> any ARP with packet_in and you can preprogram the static entries. Each GR
>>> will have 1 enty for the DR, while the DR will have N number of entries for
>>> N nodes.
>>>
>>> Hi Tim, as mentioned by Girish, directly connecting GRs to DR requires N
>>> ports on the DR and also requires a lot of small subnets, which is not
>>> desirable. And since changes are needed anyway in OVN to support that, we
>>> moved forward with the current approach of avoiding the static ARP flows to
>>> solve the problem instead of directly connecting GRs to DR.
>>>
>>> Why is that not desirable? They are all private subnets with /30 (if
>> using ipv4). If IPv6, it's even less of a concern from an addressing
>> perspective.
>>
>
> It is not just about the subnet management but also the additional logical
> flows that created between two ways of connecting DR and GR.
>
> Say, we have a fix that efficiently allows one to connect 1000s of GR
> using a single logical switch, then would you rather use that instead of
> 1000 patch cables connecting a GR to DR? It is not only the issue of Subnet
> Management for those 1000 point-to-point connections but also those 1000
> patch ports are local to each of the chassis, so we need to understand in
> such a topology how many addition logical flows gets created in the SB and
> how many OpenFlow flows gets created on each of the 1000 chassis for those
> 1000 patch cables.
>
>
>>
>> The real issue with ARP learning comes from the GR-----External. You have
>>> to learn these, and from my conversation with Girish it seems like every GR
>>> is adding an entry on every ARP request it sees. This means 1 GR sends ARP
>>> request to external L2 network and every GR sees the ARP request and adds
>>> an entry. I think the behavior should be:
>>>
>>> GRs only add ARP entries when:
>>>
>>>   1.  An ARP Response is sent to it
>>>   2.  The GR receives a GARP broadcast, and already has an entry in his
>>> cache for that IP (Girish mentioned this is similar to linux arp_accept
>>> behavior)
>>>
>>> For 2), it is expensive to do in OVN because OpenFlow doesn't support a
>>> match condition of "field1 == field2", which is required to check if the
>>> incoming ARP request is a GARP, i.e. SPA == TPA. However, it is ok to
>>> support something similar like linux arp_accept configuration but slightly
>>> different. In OVN we can configure it to alllow/disable learning from all
>>> ARP requests to IPs not belonging to the router, including GARPs. Would
>>> that solve the problem here? (@Venugopal Iyer<mailto:
>>> venugop...@nvidia.com>  brought up the same thing about "arp_accept". I
>>> hope this reply addresses that as well)
>>>
>>
>> I think the issue there is if you have an external device, which is using
>> a VIP and it fails over, it will usually send GARP to inform of the mac
>> change. In this case if you ignore GARP, what happens? You wont send
>> another ARP because OVN programs the arp entry forever and doesn't expire
>> it right? So you won't learn the new mac and keep sending packets to a dead
>> mac?
>>
>
> I think we will have to support GARP otherwise VIPs will not work like Tim
> mentions. If we do learn from GARP and as long as the GARP itself is not
> originated by any of the 1000s GRs, then we should be fine.
>
> Right, I didn't thought this through. I thought it is just a configurable
option, but it seems we will always need to support GARP, so the option
becomes useless.
However, there is no easy way to achieve: "do learn from GARP and as long
as the GARP itself is not originated by any of the 1000s GRs", because OVN
doesn't have the knowledge of the use case. The requirement is like: don't
learn neighbours from ARP requests if the ARP's src belongs to OVN routers.
Firstly this requirement is hard to understand by users not from the
particular ovn-k8s setup. Secondly to implement this, it requires O(n^2)
flows already, just to bypass the OVN owned router IPs, which is useless to
the original problem. We will have to figure out a clean way.

For the internal join-switch this is easier. I think allowing broadcasting
from LRs only the GARP request and ARP request to unknown IPs (all others
will be unicasted) will solve the problem. But for the external logical
switch, I have no idea. Can it be handled from the operator perspective, by
initiating a ping from external to the GR, so that GR learns the external
GW IP-MAC binding, before sending broadcast to all neighbours?


> Regards,
> ~Girish
>
>
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to