Thanks Han for the explanation. Yes, there is no east-west traffic between
the GRs (I was just curious to know). So, if the ARP request/response
between GR and DR is confined to the same chassis, then there shouldn't be
O(n^2) explosion per-your explanation.

Will get back to you on how the test goes in the next few days.

Regards,
Girish

On Sat, May 16, 2020 at 11:17 PM Han Zhou <zhou...@gmail.com> wrote:

>
>
> On Sat, May 16, 2020 at 12:13 PM Girish Moodalbail <gmoodalb...@gmail.com>
> wrote:
>
>> Hello Han,
>>
>> Can you please explain how the dynamic resolution of the IP-to-MAC will
>> work with this new option set?
>>
>> Say the packet is being forwarded from router2 towards the distributed
>> router? So, nexthop (reg0) is set to IP1 and we need to find the MAC
>> address M1 to set eth.dst to.
>>
>> +----------------+        +----------------+
>> |   l3gateway    |        |   l3gateway    |
>> |    router2     |        |    router3     |
>> +-------------+--+        +-+--------------+
>>             IP2,M2         IP3,M3
>>               |             |
>>            +--+-------------+---+
>>            |    join switch     |
>>            +---------+----------+
>>                      |
>>                   IP1,M1
>>              +-------+--------+
>>              |  distributed   |
>>              |     router     |
>>              +----------------+
>>
>> The MAC M1 will not obviously in the MAC_binding table. On the hypervisor
>> where the packet originated, the router2's port and the distributed
>> router's port are locally present. So, does this result in a PACKET_IN to
>> the ovn-controller and the resolution happens there?
>>
>
> Yes there will be a PACKET_IN, and then:
> 1. ovn-controller will generate the ARP request for IP1, and send
> PACKET_OUT to OVS.
> 2. The ARP request will be delivered to the distributed router pipeline
> only, because of a special handling of ARP in OVN for IPs of router ports,
> although it is a broadcast. (It would have been broadcasted to all GRs
> without that special handling)
> 3. The distributed router pipeline should learn the IP-MAC binding of
> IP2-M2 (through a PACKET_IN to ovn-controller), and at the same time send
> ARP reply to the router2 in the distributed router pipeline.
> 4. Router2 pipeline will handle the ARP response and learn the IP-MAC
> binding of IP1-M1 (through a PACKET_IN to ovn-controller).
>
>
>>
>> How about the resolution of IP3-to-M3 happen on gateway router2? Will
>> there be an ARP request packet that will be broadcasted on the join switch
>> for this case?
>>
>
> I think in the use case of ovn-k8s, as you described before, this should
> not happen. However, if this does happen, it is similar to above steps,
> except that in step 2) and 3) the ARP request and response will be sent
> between the chassises through tunnel. If this happens between all pairs of
> GRs, then there will be again O(n^2) MAC_Binding entries.
>
> I haven't tested the GR scenario yet, so I can't guarantee it works as
> expected. Please let me know if you see any problems. I will submit formal
> patch with more test cases if it is confirmed in your environment.
>
> Thanks,
> Han
>
>
>>
>> Regards,
>> ~Girish
>>
>> On Sat, May 16, 2020 at 10:25 AM Girish Moodalbail <gmoodalb...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Sat, May 16, 2020 at 12:36 AM Han Zhou <zhou...@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, May 5, 2020 at 11:57 AM Han Zhou <hz...@ovn.org> wrote:
>>>> >
>>>> >
>>>> >
>>>> > On Fri, May 1, 2020 at 2:14 PM Dan Winship <danwins...@redhat.com>
>>>> wrote:
>>>> > >
>>>> > > On 5/1/20 12:37 PM, Girish Moodalbail wrote:
>>>> > > > If we now look at table=12 (lr_in_arp_resolve) in the ingress
>>>> pipeline
>>>> > > > of Gateway Router-1, then you will see that there will be 2000
>>>> logical
>>>> > > > flow entries...
>>>> > >
>>>> > > > In the topology above, the only intended path is North-South
>>>> between
>>>> > > > each gateway router and the logical router. There is no east-west
>>>> > > > traffic between the gateway routers
>>>> > >
>>>> > > > Is there an another way to solve the above problem with just
>>>> keeping the
>>>> > > > single join logical switch?
>>>> > >
>>>> > > Two thoughts:
>>>> > >
>>>> > > 1. In openshift-sdn, the bridge doesn't try to handle ARP itself. It
>>>> > > just lets ARP requests pass through normally, and lets ARP replies
>>>> pass
>>>> > > through normally as long as they are correct (ie, it doesn't let
>>>> > > spoofing through). This means fewer flows but more traffic. Maybe
>>>> that's
>>>> > > the right tradeoff?
>>>> > >
>>>> > The 2M entries here is not for ARP responder, but more equivalent to
>>>> the neighbour table (or ARP cache), on each LR. The ARP responder resides
>>>> in the LS (join logical switch), which is O(n) instead of O(n^2), so it is
>>>> not a problem here.
>>>> >
>>>> > However, a similar idea may works here to avoid the O(n^2) scale
>>>> issue. For the neighbour table, actually OVN has two parts, one is
>>>> statically build, which is the 2M entires mentioned in this case, and the
>>>> other is the dynamic ARP resolve - the mac_binding table, which is
>>>> dynamically populated by handling ARP messages. To solve the problem here,
>>>> it is possible to change OVN to support configuring a LR to avoid static
>>>> neighbour table, and relies only on dynamic ARP resolving. In this case,
>>>> all the gateway routers can be configured as not using static ARP
>>>> resolving, and eventually there will be only 2 entries (one for IPv4 and
>>>> one for IPv6) for each gateway router in mac_binding table for the
>>>> north-south traffic to the join router. (of source there will be still same
>>>> amount of mac_bindings in each router for the external traffic on the other
>>>> side of the gateway routers).
>>>> >
>>>> > This change seems straightforward, but I am not sure if there is any
>>>> corner cases.
>>>>
>>>> Hi Girish,
>>>>
>>>> I've sent a RFC patch here for the above proposal:
>>>> https://patchwork.ozlabs.org/project/openvswitch/patch/1589614395-99499-1-git-send-email-hz...@ovn.org/
>>>> For this use case, just set options:dynamic_neigh_routes=true for all
>>>> the Gateway Routers. Could you try it in your scale environment and see if
>>>> it solves the problem?
>>>>
>>>> Thanks,
>>>> Han
>>>>
>>>> >
>>>> > > 2. In most places in ovn-kubernetes, our MAC addresses are
>>>> > > programmatically related to the corresponding IP addresses, and in
>>>> > > places where that's not currently true, we could try to make it
>>>> true,
>>>> > > and then perhaps the thousands of rules could just be replaced by a
>>>> > > single rule?
>>>> > >
>>>> > This may be a good idea, but I am not sure how to implement in OVN to
>>>> make it generic, since most OVN users can't make such assumption.
>>>> >
>>>> > On the other hand, why wouldn't splitting the join logical switch to
>>>> 1000 LSes solve the problem? I understand that there will be 1000 more
>>>> datapaths, and 1000 more LRPs, but these are all O(n), which is much more
>>>> efficient than the O(n^2) exploding. What's the other scale issues created
>>>> by this?
>>>> >
>>>> > In addition, Girish, for the external LS, I am not sure why can't it
>>>> be shared, if all the nodes are connected to a single L2 network. (If they
>>>> are connected to separate L2 networks, different external LSes should be
>>>> created, at least according to current OVN model).
>>>>
>>>
>>> Thanks Han for the patch. Will give it a try and let you know.
>>>
>>> Regards,
>>> ~Girish
>>>
>>>
>>>> >
>>>> > Thanks,
>>>> > Han
>>>>
>>>
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to