A couple of comments below:

________________________________________
From: ovn-kuberne...@googlegroups.com <ovn-kuberne...@googlegroups.com> on 
behalf of Han Zhou <zhou...@gmail.com>
Sent: Thursday, May 21, 2020 7:43 PM
To: Girish Moodalbail
Cc: Tim Rozet; Venugopal Iyer; Dumitru Ceara; Han Zhou; Dan Winship; 
ovs-discuss; ovn-kuberne...@googlegroups.com; Michael Cambria
Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table

External email: Use caution opening links or attachments



On Thu, May 21, 2020 at 7:12 PM Girish Moodalbail 
<gmoodalb...@gmail.com<mailto:gmoodalb...@gmail.com>> wrote:


On Thu, May 21, 2020 at 6:58 PM Tim Rozet 
<tro...@redhat.com<mailto:tro...@redhat.com>> wrote:
On Thu, May 21, 2020 at 8:45 PM Venugopal Iyer 
<venugop...@nvidia.com<mailto:venugop...@nvidia.com>> wrote:
Hi, Han:

________________________________________
From: ovn-kuberne...@googlegroups.com<mailto:ovn-kuberne...@googlegroups.com> 
<ovn-kuberne...@googlegroups.com<mailto:ovn-kuberne...@googlegroups.com>> on 
behalf of Han Zhou <zhou...@gmail.com<mailto:zhou...@gmail.com>>
Sent: Thursday, May 21, 2020 4:42 PM
To: Tim Rozet
Cc: Venugopal Iyer; Dumitru Ceara; Girish Moodalbail; Han Zhou; Dan Winship; 
ovs-discuss; 
ovn-kuberne...@googlegroups.com<mailto:ovn-kuberne...@googlegroups.com>; 
Michael Cambria
Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table

External email: Use caution opening links or attachments



On Thu, May 21, 2020 at 2:35 PM Tim Rozet 
<tro...@redhat.com<mailto:tro...@redhat.com><mailto:tro...@redhat.com<mailto:tro...@redhat.com>>>
 wrote:
I think that if you directly connect GR to DR you don't need to learn any ARP 
with packet_in and you can preprogram the static entries. Each GR will have 1 
enty for the DR, while the DR will have N number of entries for N nodes.

Hi Tim, as mentioned by Girish, directly connecting GRs to DR requires N ports 
on the DR and also requires a lot of small subnets, which is not desirable. And 
since changes are needed anyway in OVN to support that, we moved forward with 
the current approach of avoiding the static ARP flows to solve the problem 
instead of directly connecting GRs to DR.

Why is that not desirable? They are all private subnets with /30 (if using 
ipv4). If IPv6, it's even less of a concern from an addressing perspective.

It is not just about the subnet management but also the additional logical 
flows that created between two ways of connecting DR and GR.

Say, we have a fix that efficiently allows one to connect 1000s of GR using a 
single logical switch, then would you rather use that instead of 1000 patch 
cables connecting a GR to DR? It is not only the issue of Subnet Management for 
those 1000 point-to-point connections but also those 1000 patch ports are local 
to each of the chassis, so we need to understand in such a topology how many 
addition logical flows gets created in the SB and how many OpenFlow flows gets 
created on each of the 1000 chassis for those 1000 patch cables.


The real issue with ARP learning comes from the GR-----External. You have to 
learn these, and from my conversation with Girish it seems like every GR is 
adding an entry on every ARP request it sees. This means 1 GR sends ARP request 
to external L2 network and every GR sees the ARP request and adds an entry. I 
think the behavior should be:

GRs only add ARP entries when:

  1.  An ARP Response is sent to it
  2.  The GR receives a GARP broadcast, and already has an entry in his cache 
for that IP (Girish mentioned this is similar to linux arp_accept behavior)

For 2), it is expensive to do in OVN because OpenFlow doesn't support a match 
condition of "field1 == field2", which is required to check if the incoming ARP 
request is a GARP, i.e. SPA == TPA. However, it is ok to support something 
similar like linux arp_accept configuration but slightly different. In OVN we 
can configure it to alllow/disable learning from all ARP requests to IPs not 
belonging to the router, including GARPs. Would that solve the problem here? 
(@Venugopal Iyer<mailto:venugop...@nvidia.com<mailto:venugop...@nvidia.com>>  
brought up the same thing about "arp_accept". I hope this reply addresses that 
as well)

I think the issue there is if you have an external device, which is using a VIP 
and it fails over, it will usually send GARP to inform of the mac change. In 
this case if you ignore GARP, what happens? You wont send another ARP because 
OVN programs the arp entry forever and doesn't expire it right? So you won't 
learn the new mac and keep sending packets to a dead mac?

I think we will have to support GARP otherwise VIPs will not work like Tim 
mentions. If we do learn from GARP and as long as the GARP itself is not 
originated by any of the 1000s GRs, then we should be fine.

Right, I didn't thought this through. I thought it is just a configurable 
option, but it seems we will always need to support GARP, so the option becomes 
useless.
However, there is no easy way to achieve: "do learn from GARP and as long as 
the GARP itself is not originated by any of the 1000s GRs", because OVN doesn't 
have the knowledge of the use case. The requirement is like: don't learn 
neighbours from ARP requests if the ARP's src belongs to OVN routers. Firstly 
this requirement is hard to understand by users not from the particular ovn-k8s 
setup. Secondly to implement this, it requires O(n^2) flows already, just to 
bypass the OVN owned router IPs, which is useless to the original problem. We 
will have to figure out a clean way.


<vi> I suppose the use of GARP as a reply v/s response is not very clear; [1], 
Section 3 seems to offer a concise summary of this. If the application sends 
GARP as
<vi> a reply we are covered, but the question is if the GARP is a request 
(which is allowed) then what our response should be. Tim is right, we can't 
ignore
<vi> the request (more so, since aging is not supported currently), however 
"arp_accept" ignores the request for creating a new cache entry, not updating
<vi> an existing one (see last para below)

[2]
arp_accept - BOOLEAN
        Define behavior for gratuitous ARP frames who's IP is not
        already present in the ARP table:
        0 - don't create new entries in the ARP table
        1 - create new entries in the ARP table

        Both replies and requests type gratuitous arp will trigger the
        ARP table to be updated, if this setting is on.

        If the ARP table already contains the IP address of the
        gratuitous arp frame, the arp table will be updated regardless
        if this setting is on or off.

<vi> if we lookup and get a hit, we should still process the GARP; only if we 
don't  have a hit, we should ignore (instead of
<vi> creating an entry). BTW, do we update today? if I understand the use of 
reg9[2] / REGBIT_LOOKUP_NEIGHBOR_RESULT (assuming lookup_arp
<vi> returns 1 if entry exists), I am not sure it does? maybe I missed it ..

thanks,

-venu

[1]https://www.ietf.org/rfc/rfc5227.txt


For the internal join-switch this is easier. I think allowing broadcasting from 
LRs only the GARP request and ARP request to unknown IPs (all others will be 
unicasted) will solve the problem. But for the external logical switch, I have 
no idea. Can it be handled from the operator perspective, by initiating a ping 
from external to the GR, so that GR learns the external GW IP-MAC binding, 
before sending broadcast to all neighbours?

Regards,
~Girish


--
You received this message because you are subscribed to the Google Groups 
"ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
ovn-kubernetes+unsubscr...@googlegroups.com<mailto:ovn-kubernetes+unsubscr...@googlegroups.com>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCmKJ4JpZ-HfKhmb18LU3HmqAiAvUmFGnRrPcDF5M7u0yw%40mail.gmail.com<https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCmKJ4JpZ-HfKhmb18LU3HmqAiAvUmFGnRrPcDF5M7u0yw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to