Hi, Dumitru:

-----Original Message-----
From: Dumitru Ceara <dce...@redhat.com> 
Sent: Monday, May 25, 2020 3:55 AM
To: Girish Moodalbail <gmoodalb...@gmail.com>; Han Zhou <zhou...@gmail.com>
Cc: Venugopal Iyer <venugop...@nvidia.com>; Tim Rozet <tro...@redhat.com>; Han 
Zhou <hz...@ovn.org>; Dan Winship <danwins...@redhat.com>; ovs-discuss 
<ovs-discuss@openvswitch.org>; ovn-kuberne...@googlegroups.com; Michael Cambria 
<mcamb...@redhat.com>
Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table

External email: Use caution opening links or attachments


On 5/23/20 12:56 AM, Girish Moodalbail wrote:
>
>
> On Fri, May 22, 2020 at 1:51 PM Han Zhou <zhou...@gmail.com 
> <mailto:zhou...@gmail.com>> wrote:
>
>
>
>     On Fri, May 22, 2020 at 8:39 AM Venugopal Iyer
>     <venugop...@nvidia.com <mailto:venugop...@nvidia.com>> wrote:
>
>         A couple of comments below:
>
>
>
>
>         <vi> I suppose the use of GARP as a reply v/s response is not
>         very clear; [1], Section 3 seems to offer a concise summary of
>         this. If the application sends GARP as
>         <vi> a reply we are covered, but the question is if the GARP is
>         a request (which is allowed) then what our response should be.
>         Tim is right, we can't ignore
>         <vi> the request (more so, since aging is not supported
>         currently), however "arp_accept" ignores the request for
>         creating a new cache entry, not updating
>         <vi> an existing one (see last para below)
>
>         [2]
>         arp_accept - BOOLEAN
>                 Define behavior for gratuitous ARP frames who's IP is not
>                 already present in the ARP table:
>                 0 - don't create new entries in the ARP table
>                 1 - create new entries in the ARP table
>
>                 Both replies and requests type gratuitous arp will
>         trigger the
>                 ARP table to be updated, if this setting is on.
>
>                 If the ARP table already contains the IP address of the
>                 gratuitous arp frame, the arp table will be updated
>         regardless
>                 if this setting is on or off.
>
>         <vi> if we lookup and get a hit, we should still process the
>         GARP; only if we don't  have a hit, we should ignore (instead of
>         <vi> creating an entry). BTW, do we update today? if I
>         understand the use of reg9[2] / REGBIT_LOOKUP_NEIGHBOR_RESULT
>         (assuming lookup_arp
>         <vi> returns 1 if entry exists), I am not sure it does? maybe I
>         missed it ..
>
>         thanks,
>
>         -venu
>
>         [1]https://www.ietf.org/rfc/rfc5227.txt
>
>
>     (Not sure why the indent format of your reply is not correct at
>     least on my client - it mixes all previous replies together so one
>     cannot tell which part was from whom, so I truncated all of them.)
>
>     Thanks Venu. I think this would work: we can add an option similar
>     but different from arp_accept (because it is not easy to OVN to tell
>     if it is GARP on the ingress pipeline). The option can be named
>     like: learn_from_arp_request.
>     When ARP request is received, always check if an old entry existed
>     for the SPA. If existed and MAC is different, then update the
>     mac-binding entry. If the entry doesn't exist, check the option setting:
>     "true" - add a new entry.
>     "false" - if the TPA is on the router, add a new entry (it means the
>     remote wants to communicate with this node, so it makes sense to
>     learn the remote as well). Otherwise, ignore it and no new entry added.
>
>     Do you think this works?
>
>
> I think this should work as well.
>
> For the single join switch connected to 1000 GRs, it should work as 
> well (assuming your other fix for dynamic learning is present as well).
> However, in this case,  even with this option set we will still be 
> sending the ARP broadcast out from Node1 to each of the other 999 Nodes.
> After the packets have travelled through the tunnel, we are going to 
> drop the packet on the target hypervisor, if 
> `learn_from_arp_request=true'. As I understand, we are waiting for 
> reply from @Dumitru Ceara <mailto:dce...@redhat.com> to understand why 
> such a flow is required, correct?
>

As Han pointed out, commit 32f5ebb062 ("ovn-northd: Limit ARP/ND broadcast 
domain whenever possible.") added logical flows in the LS S_SWITCH_IN_L2_LKUP 
stage to explicitly flood ARP/ND requests originated from router owned IP 
interfaces. This was done for a couple of reasons:

1. ARP requests for destinations/next-hops outside OVN need to be flooded in 
the broadcast domain anyway and would otherwise match the lowest priority rule 
in S_SWITCH_IN_L2_LKUP that would flood them nevertheless.

2. OVN sends periodic GARP requests for router owned IPs (i.e., NAT addresses 
and logical_router_port addresses) to update external switch/router FDB/ARP 
caches in scenarios like VM migration:
6bfbb4c24187 ("ovn: Send GARP on localnet."). These packets should be flooded 
in the broadcast domain too.

I think we have a few options:

1. Change OVN behavior and use GARP replies instead of GARP requests.
The effect should be (almost [1]) the same from the external devices 
perspective but the advantage is that we can completely remove the logical 
flows that match on self originated ARP packets. This is quite easy to achieve 
and I have a patch ready for it if we decide to go this way.

2. Make the flows that match on self originated ARP traffic more explicit and 
restrict them to GARP requests. For example, for a logical router port with 
addresses MAC, IP1, IP2 and NAT entries with external_mac MAC-E and external IP 
IP-E:

Right now we have a flow:
if "eth.src == {MAC, MAC-E} && (arp_req || nd_ns)" then "flood"

We could instead create:
if "eth.src == MAC && arp.tpa == {IP1, IP2} && arp_req" then "flood"
if "eth.src == MAC-E && arp.tpa == {IP-E} && arp_req" then "flood"

I would prefer option 1 above but I'd like to hear more opinions about 
disadvantages of using GARP replies instead of GARP requests for OVN owned IP 
addresses.

[vi> ] I'd prefer 1. too, unless we need to think about external devices, if at 
all, that don't support  unsolicited replies .. in that case, we'll need to 
wait till their cache times out..

Thanks,

-venu

Option 2 is also relatively straightforward to implement but will generate a 
few more logical flows, still O(N) though, with N="number of logical routers 
connected to the logical switch".

Thanks,
Dumitru

[1] https://tools.ietf.org/html/rfc5227#page-15


> Regards,
> ~Girish

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to