Han,

just a quick question below..

________________________________________
From: ovn-kuberne...@googlegroups.com <ovn-kuberne...@googlegroups.com> on 
behalf of Girish Moodalbail <gmoodalb...@gmail.com>
Sent: Tuesday, May 19, 2020 11:09 PM
To: Han Zhou
Cc: Han Zhou; Dan Winship; ovs-discuss; ovn-kuberne...@googlegroups.com
Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table

External email: Use caution opening links or attachments

Hello Han,

Please see in-line:

On Sat, May 16, 2020 at 11:17 PM Han Zhou 
<zhou...@gmail.com<mailto:zhou...@gmail.com>> wrote:


On Sat, May 16, 2020 at 12:13 PM Girish Moodalbail 
<gmoodalb...@gmail.com<mailto:gmoodalb...@gmail.com>> wrote:
Hello Han,

Can you please explain how the dynamic resolution of the IP-to-MAC will work 
with this new option set?

Say the packet is being forwarded from router2 towards the distributed router? 
So, nexthop (reg0) is set to IP1 and we need to find the MAC address M1 to set 
eth.dst to.

+----------------+        +----------------+
|   l3gateway    |        |   l3gateway    |
|    router2     |        |    router3     |
+-------------+--+        +-+--------------+
            IP2,M2         IP3,M3
              |             |
           +--+-------------+---+
           |    join switch     |
           +---------+----------+
                     |
                  IP1,M1
             +-------+--------+
             |  distributed   |
             |     router     |
             +----------------+

The MAC M1 will not obviously in the MAC_binding table. On the hypervisor where 
the packet originated, the router2's port and the distributed router's port are 
locally present. So, does this result in a PACKET_IN to the ovn-controller and 
the resolution happens there?

Yes there will be a PACKET_IN, and then:
1. ovn-controller will generate the ARP request for IP1, and send PACKET_OUT to 
OVS.
2. The ARP request will be delivered to the distributed router pipeline only, 
because of a special handling of ARP in OVN for IPs of router ports, although 
it is a broadcast. (It would have been broadcasted to all GRs without that 
special handling)
3. The distributed router pipeline should learn the IP-MAC binding of IP2-M2 
(through a PACKET_IN to ovn-controller), and at the same time send ARP reply to 
the router2 in the distributed router pipeline.
4. Router2 pipeline will handle the ARP response and learn the IP-MAC binding 
of IP1-M1 (through a PACKET_IN to ovn-controller).

Unfortunately, the ARP request (who as IP1) from router2 is broadcasted out to 
all of the chassis through Geneve Tunnel. The other gateway routers learn the 
Source mac of 'M2'. Now, each of the gateway router has an entry for (IP2, M2) 
in the MAC binding table on their respective rtoj-<blah> router port. So, the 
MAC_Binding table will now have N X N entries, where N is the number of gateway 
routers.

Per your explanation above, the ARP request should not have broadcasted right? 


<vi> probably obvious and I am missing it, but..
<vi> I see the lflow to direct ARP request to the router port, instead of 
bcast. However,
<vi> we also add flows to bcast self-originated (unsolicitated ?) arp requests 
(we should
<vi> not see this  for router IPs, I suppose). But, given we just match on the 
source 
<vi> MAC address  of the packet for such packets, does it differ from the ARP 
<vi> request generated for Router IP?

thanks,

-venu

Note that the direction of  ARP request is from Gateway Router to Distributed 
Router.

Regards,
~Girish




How about the resolution of IP3-to-M3 happen on gateway router2? Will there be 
an ARP request packet that will be broadcasted on the join switch for this case?

I think in the use case of ovn-k8s, as you described before, this should not 
happen. However, if this does happen, it is similar to above steps, except that 
in step 2) and 3) the ARP request and response will be sent between the 
chassises through tunnel. If this happens between all pairs of GRs, then there 
will be again O(n^2) MAC_Binding entries.

I haven't tested the GR scenario yet, so I can't guarantee it works as 
expected. Please let me know if you see any problems. I will submit formal 
patch with more test cases if it is confirmed in your environment.

Thanks,
Han


Regards,
~Girish

On Sat, May 16, 2020 at 10:25 AM Girish Moodalbail 
<gmoodalb...@gmail.com<mailto:gmoodalb...@gmail.com>> wrote:


On Sat, May 16, 2020 at 12:36 AM Han Zhou 
<zhou...@gmail.com<mailto:zhou...@gmail.com>> wrote:


On Tue, May 5, 2020 at 11:57 AM Han Zhou <hz...@ovn.org<mailto:hz...@ovn.org>> 
wrote:
>
>
>
> On Fri, May 1, 2020 at 2:14 PM Dan Winship 
> <danwins...@redhat.com<mailto:danwins...@redhat.com>> wrote:
> >
> > On 5/1/20 12:37 PM, Girish Moodalbail wrote:
> > > If we now look at table=12 (lr_in_arp_resolve) in the ingress pipeline
> > > of Gateway Router-1, then you will see that there will be 2000 logical
> > > flow entries...
> >
> > > In the topology above, the only intended path is North-South between
> > > each gateway router and the logical router. There is no east-west
> > > traffic between the gateway routers
> >
> > > Is there an another way to solve the above problem with just keeping the
> > > single join logical switch?
> >
> > Two thoughts:
> >
> > 1. In openshift-sdn, the bridge doesn't try to handle ARP itself. It
> > just lets ARP requests pass through normally, and lets ARP replies pass
> > through normally as long as they are correct (ie, it doesn't let
> > spoofing through). This means fewer flows but more traffic. Maybe that's
> > the right tradeoff?
> >
> The 2M entries here is not for ARP responder, but more equivalent to the 
> neighbour table (or ARP cache), on each LR. The ARP responder resides in the 
> LS (join logical switch), which is O(n) instead of O(n^2), so it is not a 
> problem here.
>
> However, a similar idea may works here to avoid the O(n^2) scale issue. For 
> the neighbour table, actually OVN has two parts, one is statically build, 
> which is the 2M entires mentioned in this case, and the other is the dynamic 
> ARP resolve - the mac_binding table, which is dynamically populated by 
> handling ARP messages. To solve the problem here, it is possible to change 
> OVN to support configuring a LR to avoid static neighbour table, and relies 
> only on dynamic ARP resolving. In this case, all the gateway routers can be 
> configured as not using static ARP resolving, and eventually there will be 
> only 2 entries (one for IPv4 and one for IPv6) for each gateway router in 
> mac_binding table for the north-south traffic to the join router. (of source 
> there will be still same amount of mac_bindings in each router for the 
> external traffic on the other side of the gateway routers).
>
> This change seems straightforward, but I am not sure if there is any corner 
> cases.

Hi Girish,

I've sent a RFC patch here for the above proposal: 
https://patchwork.ozlabs.org/project/openvswitch/patch/1589614395-99499-1-git-send-email-hz...@ovn.org/
For this use case, just set options:dynamic_neigh_routes=true for all the 
Gateway Routers. Could you try it in your scale environment and see if it 
solves the problem?

Thanks,
Han

>
> > 2. In most places in ovn-kubernetes, our MAC addresses are
> > programmatically related to the corresponding IP addresses, and in
> > places where that's not currently true, we could try to make it true,
> > and then perhaps the thousands of rules could just be replaced by a
> > single rule?
> >
> This may be a good idea, but I am not sure how to implement in OVN to make it 
> generic, since most OVN users can't make such assumption.
>
> On the other hand, why wouldn't splitting the join logical switch to 1000 
> LSes solve the problem? I understand that there will be 1000 more datapaths, 
> and 1000 more LRPs, but these are all O(n), which is much more efficient than 
> the O(n^2) exploding. What's the other scale issues created by this?
>
> In addition, Girish, for the external LS, I am not sure why can't it be 
> shared, if all the nodes are connected to a single L2 network. (If they are 
> connected to separate L2 networks, different external LSes should be created, 
> at least according to current OVN model).

Thanks Han for the patch. Will give it a try and let you know.

Regards,
~Girish

>
> Thanks,
> Han

--
You received this message because you are subscribed to the Google Groups 
"ovn-kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
ovn-kubernetes+unsubscr...@googlegroups.com<mailto:ovn-kubernetes+unsubscr...@googlegroups.com>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTq4WSwvwHbws5e0yozT7OM9RYcpWwaA2v49k83JDmEqA%40mail.gmail.com<https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTq4WSwvwHbws5e0yozT7OM9RYcpWwaA2v49k83JDmEqA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to