Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table

Tim Rozet Thu, 21 May 2020 14:35:54 -0700

I think that if you directly connect GR to DR you don't need to learn any
ARP with packet_in and you can preprogram the static entries. Each GR will
have 1 enty for the DR, while the DR will have N number of entries for N
nodes.


The real issue with ARP learning comes from the GR-----External. You have
to learn these, and from my conversation with Girish it seems like every GR
is adding an entry on every ARP request it sees. This means 1 GR sends ARP
request to external L2 network and every GR sees the ARP request and adds
an entry. I think the behavior should be:

GRs only add ARP entries when:

   1. An ARP *Response* is sent to it
   2. The GR receives a GARP broadcast, and already has an entry in his
   cache for that IP (Girish mentioned this is similar to linux arp_accept
   behavior)

In addition, as Michael Cambria pointed out in our weekly meeting, these
ARP cache entries should have expiry timers on them. If they are
permanently learned, you will end up with a growing ARP table over time,
and end up in the same place. We can probably just program the GR ARP flows
with an idle_timeout and have the flow removed. What do you think?

Should I file a bugzilla outlining the above so we can have proper tracking?

Thanks,

Tim Rozet
Red Hat CTO Networking Team


On Thu, May 21, 2020 at 5:01 PM Han Zhou <zhou...@gmail.com> wrote:

>
>
> On Thu, May 21, 2020 at 10:33 AM Venugopal Iyer <venugop...@nvidia.com>
> wrote:
>
>> Han,
>>
>> just a quick question below..
>>
>> ________________________________________
>> From: ovn-kuberne...@googlegroups.com <ovn-kuberne...@googlegroups.com>
>> on behalf of Girish Moodalbail <gmoodalb...@gmail.com>
>> Sent: Tuesday, May 19, 2020 11:09 PM
>> To: Han Zhou
>> Cc: Han Zhou; Dan Winship; ovs-discuss; ovn-kuberne...@googlegroups.com
>> Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table
>>
>> External email: Use caution opening links or attachments
>>
>> Hello Han,
>>
>> Please see in-line:
>>
>> On Sat, May 16, 2020 at 11:17 PM Han Zhou <zhou...@gmail.com<mailto:
>> zhou...@gmail.com>> wrote:
>>
>>
>> On Sat, May 16, 2020 at 12:13 PM Girish Moodalbail <gmoodalb...@gmail.com
>> <mailto:gmoodalb...@gmail.com>> wrote:
>> Hello Han,
>>
>> Can you please explain how the dynamic resolution of the IP-to-MAC will
>> work with this new option set?
>>
>> Say the packet is being forwarded from router2 towards the distributed
>> router? So, nexthop (reg0) is set to IP1 and we need to find the MAC
>> address M1 to set eth.dst to.
>>
>> +----------------+        +----------------+
>> |   l3gateway    |        |   l3gateway    |
>> |    router2     |        |    router3     |
>> +-------------+--+        +-+--------------+
>>             IP2,M2         IP3,M3
>>               |             |
>>            +--+-------------+---+
>>            |    join switch     |
>>            +---------+----------+
>>                      |
>>                   IP1,M1
>>              +-------+--------+
>>              |  distributed   |
>>              |     router     |
>>              +----------------+
>>
>> The MAC M1 will not obviously in the MAC_binding table. On the hypervisor
>> where the packet originated, the router2's port and the distributed
>> router's port are locally present. So, does this result in a PACKET_IN to
>> the ovn-controller and the resolution happens there?
>>
>> Yes there will be a PACKET_IN, and then:
>> 1. ovn-controller will generate the ARP request for IP1, and send
>> PACKET_OUT to OVS.
>> 2. The ARP request will be delivered to the distributed router pipeline
>> only, because of a special handling of ARP in OVN for IPs of router ports,
>> although it is a broadcast. (It would have been broadcasted to all GRs
>> without that special handling)
>> 3. The distributed router pipeline should learn the IP-MAC binding of
>> IP2-M2 (through a PACKET_IN to ovn-controller), and at the same time send
>> ARP reply to the router2 in the distributed router pipeline.
>> 4. Router2 pipeline will handle the ARP response and learn the IP-MAC
>> binding of IP1-M1 (through a PACKET_IN to ovn-controller).
>>
>> Unfortunately, the ARP request (who as IP1) from router2 is broadcasted
>> out to all of the chassis through Geneve Tunnel. The other gateway routers
>> learn the Source mac of 'M2'. Now, each of the gateway router has an entry
>> for (IP2, M2) in the MAC binding table on their respective rtoj-<blah>
>> router port. So, the MAC_Binding table will now have N X N entries, where N
>> is the number of gateway routers.
>>
>> Per your explanation above, the ARP request should not have broadcasted
>> right?
>>
>>
>> <vi> probably obvious and I am missing it, but..
>> <vi> I see the lflow to direct ARP request to the router port, instead of
>> bcast. However,
>> <vi> we also add flows to bcast self-originated (unsolicitated ?) arp
>> requests (we should
>> <vi> not see this  for router IPs, I suppose). But, given we just match
>> on the source
>> <vi> MAC address  of the packet for such packets, does it differ from the
>> ARP
>> <vi> request generated for Router IP?
>>
>> Good catch! That seems to be the reason why it is broadcasted. I thought
> the feature was only allowing GARP to be broadcasted, but it is actually
> allowing (G)ARP including regular ARP generated by the LRs. It can be an
> easy fix to: commit 32f5ebb062 ("ovn-northd: Limit ARP/ND broadcast domain
> whenever possible."), but I am not sure if there are other concerns of
> doing that. @Dumitru Ceara <dce...@redhat.com> to comment if we can
> restrict it to be GARP only.
>
> On the other hand, in this use case, if there are any ARP from the
> distributed router to any of the GRs, then all the GRs should have learned
> the MAC-bindings of the IP1-M1, and they won't send ARP for IP1 any more,
> thus would not result in N x N MAC-bindings, right? In the real use case,
> it may depend on which direction of traffic comes first. If it is always
> from external to k8s workloads first, then yes it will end up with N x N
> mac-bindings finally.
>
>
>> thanks,
>>
>> -venu
>>
>> Note that the direction of  ARP request is from Gateway Router to
>> Distributed Router.
>>
>> Regards,
>> ~Girish
>>
>>
>>
>>
>> How about the resolution of IP3-to-M3 happen on gateway router2? Will
>> there be an ARP request packet that will be broadcasted on the join switch
>> for this case?
>>
>> I think in the use case of ovn-k8s, as you described before, this should
>> not happen. However, if this does happen, it is similar to above steps,
>> except that in step 2) and 3) the ARP request and response will be sent
>> between the chassises through tunnel. If this happens between all pairs of
>> GRs, then there will be again O(n^2) MAC_Binding entries.
>>
>> I haven't tested the GR scenario yet, so I can't guarantee it works as
>> expected. Please let me know if you see any problems. I will submit formal
>> patch with more test cases if it is confirmed in your environment.
>>
>> Thanks,
>> Han
>>
>>
>> Regards,
>> ~Girish
>>
>> On Sat, May 16, 2020 at 10:25 AM Girish Moodalbail <gmoodalb...@gmail.com
>> <mailto:gmoodalb...@gmail.com>> wrote:
>>
>>
>> On Sat, May 16, 2020 at 12:36 AM Han Zhou <zhou...@gmail.com<mailto:
>> zhou...@gmail.com>> wrote:
>>
>>
>> On Tue, May 5, 2020 at 11:57 AM Han Zhou <hz...@ovn.org<mailto:
>> hz...@ovn.org>> wrote:
>> >
>> >
>> >
>> > On Fri, May 1, 2020 at 2:14 PM Dan Winship <danwins...@redhat.com
>> <mailto:danwins...@redhat.com>> wrote:
>> > >
>> > > On 5/1/20 12:37 PM, Girish Moodalbail wrote:
>> > > > If we now look at table=12 (lr_in_arp_resolve) in the ingress
>> pipeline
>> > > > of Gateway Router-1, then you will see that there will be 2000
>> logical
>> > > > flow entries...
>> > >
>> > > > In the topology above, the only intended path is North-South between
>> > > > each gateway router and the logical router. There is no east-west
>> > > > traffic between the gateway routers
>> > >
>> > > > Is there an another way to solve the above problem with just
>> keeping the
>> > > > single join logical switch?
>> > >
>> > > Two thoughts:
>> > >
>> > > 1. In openshift-sdn, the bridge doesn't try to handle ARP itself. It
>> > > just lets ARP requests pass through normally, and lets ARP replies
>> pass
>> > > through normally as long as they are correct (ie, it doesn't let
>> > > spoofing through). This means fewer flows but more traffic. Maybe
>> that's
>> > > the right tradeoff?
>> > >
>> > The 2M entries here is not for ARP responder, but more equivalent to
>> the neighbour table (or ARP cache), on each LR. The ARP responder resides
>> in the LS (join logical switch), which is O(n) instead of O(n^2), so it is
>> not a problem here.
>> >
>> > However, a similar idea may works here to avoid the O(n^2) scale issue.
>> For the neighbour table, actually OVN has two parts, one is statically
>> build, which is the 2M entires mentioned in this case, and the other is the
>> dynamic ARP resolve - the mac_binding table, which is dynamically populated
>> by handling ARP messages. To solve the problem here, it is possible to
>> change OVN to support configuring a LR to avoid static neighbour table, and
>> relies only on dynamic ARP resolving. In this case, all the gateway routers
>> can be configured as not using static ARP resolving, and eventually there
>> will be only 2 entries (one for IPv4 and one for IPv6) for each gateway
>> router in mac_binding table for the north-south traffic to the join router.
>> (of source there will be still same amount of mac_bindings in each router
>> for the external traffic on the other side of the gateway routers).
>> >
>> > This change seems straightforward, but I am not sure if there is any
>> corner cases.
>>
>> Hi Girish,
>>
>> I've sent a RFC patch here for the above proposal:
>> https://patchwork.ozlabs.org/project/openvswitch/patch/1589614395-99499-1-git-send-email-hz...@ovn.org/
>> For this use case, just set options:dynamic_neigh_routes=true for all the
>> Gateway Routers. Could you try it in your scale environment and see if it
>> solves the problem?
>>
>> Thanks,
>> Han
>>
>> >
>> > > 2. In most places in ovn-kubernetes, our MAC addresses are
>> > > programmatically related to the corresponding IP addresses, and in
>> > > places where that's not currently true, we could try to make it true,
>> > > and then perhaps the thousands of rules could just be replaced by a
>> > > single rule?
>> > >
>> > This may be a good idea, but I am not sure how to implement in OVN to
>> make it generic, since most OVN users can't make such assumption.
>> >
>> > On the other hand, why wouldn't splitting the join logical switch to
>> 1000 LSes solve the problem? I understand that there will be 1000 more
>> datapaths, and 1000 more LRPs, but these are all O(n), which is much more
>> efficient than the O(n^2) exploding. What's the other scale issues created
>> by this?
>> >
>> > In addition, Girish, for the external LS, I am not sure why can't it be
>> shared, if all the nodes are connected to a single L2 network. (If they are
>> connected to separate L2 networks, different external LSes should be
>> created, at least according to current OVN model).
>>
>> Thanks Han for the patch. Will give it a try and let you know.
>>
>> Regards,
>> ~Girish
>>
>> >
>> > Thanks,
>> > Han
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "ovn-kubernetes" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to ovn-kubernetes+unsubscr...@googlegroups.com<mailto:
>> ovn-kubernetes+unsubscr...@googlegroups.com>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTq4WSwvwHbws5e0yozT7OM9RYcpWwaA2v49k83JDmEqA%40mail.gmail.com
>> <
>> https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTq4WSwvwHbws5e0yozT7OM9RYcpWwaA2v49k83JDmEqA%40mail.gmail.com?utm_medium=email&utm_source=footer
>> >.
>
> --
> You received this message because you are subscribed to the Google Groups
> "ovn-kubernetes" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ovn-kubernetes+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCnZ0ZJeC0L%3DXXf8JQ0k1TqJoo0MkHzj6%3DkmEv1qHPxaZA%40mail.gmail.com
> <https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCnZ0ZJeC0L%3DXXf8JQ0k1TqJoo0MkHzj6%3DkmEv1qHPxaZA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table

Reply via email to