No worries, thanks for the update Han. Once you have the patch, we can test your changes on our cluster and provide you an update.
Regards, ~Girish On Wed, Jun 3, 2020 at 4:27 PM Han Zhou <zhou...@gmail.com> wrote: > Hi Girish, yes, that's what we concluded in last OVN meeting, but sorry > that I forgot to update here. > > On Wed, Jun 3, 2020 at 3:32 PM Girish Moodalbail <gmoodalb...@gmail.com> > wrote: > > > > Hello all, > > > > To kind of proceed with the proposed fixes, with minimal impact, is the > following a reasonable approach? > > > > Add an option, namely dynamic_neigh_routes={true|false}, for a gateway > router. With this option enabled, the nextHop IP's MAC will be learned > through a ARP request on the physical network. The ARP request will be > flooded on the L2 broadcast domain (for both join switch and external > switch). > > > > The RFC patch fulfils this purpose: > https://patchwork.ozlabs.org/project/openvswitch/patch/1589614395-99499-1-git-send-email-hz...@ovn.org/ > I am working on the formal patch. > > > Add an option, namely learn_from_arp_request={true|false}, for a gateway > router. The option is interpreted as below:\ > > "true" - learn the MAC/IP binding and add a new MAC_Binding entry > (default behavior) > > "false" - if there is a MAC_binding for that IP and the MAC is > different, then update that MAC/IP binding. The external entity might be > trying to advertise the new MAC for that IP. (If we don't do this, then we > will never learn External VIP to MAC changes) > > > > (Irrespective of, learn_from_arp_request is true or false, always do > this -- if the TPA is on the router, add a new entry (it means the remote > wants to communicate with this node, so it makes sense to learn the remote > as well)) > > > > I am working on this as well, but delayed a little. I hope to have > something this week. > > > > > For now, I think it is fine for ARP packets to be broadcasted on the > tunnel for the `join` switch case. If it becomes a problem, then we can > start looking around changing the logical flows. > > > > Thanks everyone for the lively discussion. > > > > Regards, > > ~Girish > > > > On Thu, May 28, 2020 at 7:33 AM Tim Rozet <tro...@redhat.com> wrote: > >> > >> > >> > >> On Thu, May 28, 2020 at 7:26 AM Dumitru Ceara <dce...@redhat.com> > wrote: > >>> > >>> On 5/28/20 12:48 PM, Daniel Alvarez Sanchez wrote: > >>> > Hi all > >>> > > >>> > Sorry for top posting. I want to thank you all for the discussion and > >>> > give also some feedback from OpenStack perspective which is affected > >>> > by the problem described here. > >>> > > >>> > In OpenStack, it's kind of common to have a shared external network > >>> > (logical switch with a localnet port) across many tenants. Each > tenant > >>> > user may create their own router where their instances will be > >>> > connected to access the external network. > >>> > > >>> > In such scenario, we are hitting the issue described here. In > >>> > particular in our tests we exercise 3K VIFs (with 1 FIP) each > spanning > >>> > 300 LS; each LS connected to a LR (ie. 300 LRs) and that router > >>> > connected to the public LS. This is creating a huge problem in terms > >>> > of performance and tons of events due to the MAC_Binding entries > >>> > generated as a consequence of the GARPs sent for the floating IPs. > >>> > > >>> > >>> Just as an addition to this, GARPs wouldn't be the only reason why all > >>> routers would learn the MAC_Binding. Even if we wouldn't be sending > >>> GARPs for the FIPs, when a VM that's behind a FIP would send traffic to > >>> the outside, the router will generate an ARP request for the next hop > >>> using the FIP-IP and FIP-MAC. This will be broadcasted to all routers > >>> connected to the public LS and will trigger them to learn the > >>> FIP-IP:FIP-MAC binding. > >> > >> > >> Yeah we shouldn't be learning on regular ARP requests. > >> > >>> > >>> > >>> > Thanks, > >>> > Daniel > >>> > > >>> > > >>> > On Thu, May 28, 2020 at 10:51 AM Dumitru Ceara <dce...@redhat.com> > wrote: > >>> >> > >>> >> On 5/28/20 8:34 AM, Han Zhou wrote: > >>> >>> > >>> >>> > >>> >>> On Wed, May 27, 2020 at 1:10 AM Dumitru Ceara <dce...@redhat.com > >>> >>> <mailto:dce...@redhat.com>> wrote: > >>> >>>> > >>> >>>> Hi Girish, Han, > >>> >>>> > >>> >>>> On 5/26/20 11:51 PM, Han Zhou wrote: > >>> >>>>> > >>> >>>>> > >>> >>>>> On Tue, May 26, 2020 at 1:07 PM Girish Moodalbail > >>> >>> <gmoodalb...@gmail.com <mailto:gmoodalb...@gmail.com> > >>> >>>>> <mailto:gmoodalb...@gmail.com <mailto:gmoodalb...@gmail.com>>> > wrote: > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> On Tue, May 26, 2020 at 12:42 PM Han Zhou <zhou...@gmail.com > >>> >>> <mailto:zhou...@gmail.com> > >>> >>>>> <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com>>> wrote: > >>> >>>>>>> > >>> >>>>>>> Hi Girish, > >>> >>>>>>> > >>> >>>>>>> Thanks for the summary. I agree with you that GARP request > v.s. reply > >>> >>>>> is irrelavent to the problem here. > >>> >>>> > >>> >>>> Well, actually I think GARP request vs reply is relevant (at > least for > >>> >>>> case 1 below) because if OVN would be generating GARP replies we > >>> >>>> wouldn't need the priority 80 flow to determine if an ARP request > packet > >>> >>>> is actually an OVN self originated GARP that needs to be flooded > in the > >>> >>>> L2 broadcast domain. > >>> >>>> > >>> >>>> On the other hand, router3 would be learning mac_binding IP2,M2 > from the > >>> >>>> GARP reply originated by router2 and vice versa so we'd have to > restrict > >>> >>>> flooding of GARP replies to non-patch ports. > >>> >>>> > >>> >>> > >>> >>> Hi Dumitru, the point was that, on the external LS, the GRs will > have to > >>> >>> send ARP requests to resolve unknown IPs (at least for the > external GW), > >>> >>> and it has to be broadcasted, which will cause all the GRs learn > all > >>> >>> MACs of other GRs. This is regardless of the GARP behavior. You are > >>> >>> right that if we only consider the Join switch then the GARP > request > >>> >>> v.s. reply does make a difference. However, GARP request/reply may > be > >>> >>> really needed only on the external LS. > >>> >>> > >>> >> > >>> >> Ok, but do you see an easy way to determine if we need to add the > >>> >> logical flows that flood self originated GARP packets on a given > logical > >>> >> switch? Right now we add them on all switches. > >>> >> > >>> >>>>>>> Please see my comment inline below. > >>> >>>>>>> > >>> >>>>>>> On Tue, May 26, 2020 at 12:09 PM Girish Moodalbail > >>> >>>>> <gmoodalb...@gmail.com <mailto:gmoodalb...@gmail.com> > >>> >>> <mailto:gmoodalb...@gmail.com <mailto:gmoodalb...@gmail.com>>> > wrote: > >>> >>>>>>>> > >>> >>>>>>>> Hello Dumitru, > >>> >>>>>>>> > >>> >>>>>>>> There are several things that are being discussed on this > thread. > >>> >>>>> Let me see if I can tease them out for clarity. > >>> >>>>>>>> > >>> >>>>>>>> 1. All the router IPs are known to OVN (the join switch case) > >>> >>>>>>>> 2. Some IPs are known and some are not known (the external > logical > >>> >>>>> switch that connects to physical network case). > >>> >>>>>>>> > >>> >>>>>>>> Let us look at each of the case above: > >>> >>>>>>>> > >>> >>>>>>>> 1. Join Switch Case > >>> >>>>>>>> > >>> >>>>>>>> +----------------+ +----------------+ > >>> >>>>>>>> | l3gateway | | l3gateway | > >>> >>>>>>>> | router2 | | router3 | > >>> >>>>>>>> +-------------+--+ +-+--------------+ > >>> >>>>>>>> IP2,M2 IP3,M3 > >>> >>>>>>>> | | > >>> >>>>>>>> +--+-------------+---+ > >>> >>>>>>>> | join switch | > >>> >>>>>>>> +---------+----------+ > >>> >>>>>>>> | > >>> >>>>>>>> IP1,M1 > >>> >>>>>>>> +-------+--------+ > >>> >>>>>>>> | distributed | > >>> >>>>>>>> | router | > >>> >>>>>>>> +----------------+ > >>> >>>>>>>> > >>> >>>>>>>> > >>> >>>>>>>> Say, GR router2 wants to send the packet out to DR and that we > >>> >>>>> don't have static mappings of MAC to IP in lr_in_arp_resolve > table on GR > >>> >>>>> router2 (with Han's patch of dynamic_neigh_routes=true for all > the > >>> >>>>> Gateway Routers). With this in mind, when an ARP request is sent > out by > >>> >>>>> router2's hypervisor the packet should be directly sent to the > >>> >>>>> distributed router alone. Your commit 32f5ebb0622 (ovn-northd: > Limit > >>> >>>>> ARP/ND broadcast domain whenever possible) should have allowed > only > >>> >>>>> unicast. However, in ls_in_l2_lkup table we have > >>> >>>>>>>> > >>> >>>>>>>> table=19(ls_in_l2_lkup ), priority=80 , > match=(eth.src == > >>> >>>>> { M2 } && (arp.op == 1 || nd_ns)), action=(outport = "_MC_flood"; > >>> >>> output;) > >>> >>>>>>>> table=19(ls_in_l2_lkup ), priority=75 , > match=(flags[1] == > >>> >>>>> 0 && arp.op == 1 && arp.tpa == { IP1}), action=(outport = > >>> >>>>> "jtor-router2"; output;) > >>> >>>>>>>> > >>> >>>>>>>> As you can see, `priority=80` rule will always be hit and > sent out > >>> >>>>> to all the GRs. The `priority=75` rule is never hit. So, we will > see ARP > >>> >>>>> packets on the GENEVE tunnel. So, we need to change > `priority=80` to > >>> >>>>> match GARP request packets. That way, for the known OVN IPs case > we > >>> >>>>> don't do broadcast. > >>> >>>>>>> > >>> >>>>>>> Since the solution to case 2) below (i.e. > >>> >>>>> learn_from_arp_request=false) solves the problem of case 1), > too, I > >>> >>>>> think we don't need this change just for case 1). As @Dumitru > Ceara > >>> >>>>> mentioned, there is some cost because it adds extra flows. It > would be > >>> >>>>> significant amount of flows if there are a lot of snat_and_dnat > IPs. > >>> >>>>> What do you think? > >>> >>>> > >>> >>>> I think the following might be a solution, although with the cost > of > >>> >>>> adding as many flows as dnat_and_snat IPs are configured: > >>> >>>> > >>> >>>> - priority 80: explicitly determine if an ARP request is a self > >>> >>>> originated GARP for configured IP addresses and dnat_and_snat IPs > (by > >>> >>>> matching on all eth.src and arp.tpa pairs) and if so flood on all > >>> >>>> non-patch ports. > >>> >>>> - priority 75: if arp.tpa is owned by an OVN logical router port, > >>> >>>> "unicast" it only on the patch port towards the router. > >>> >>>> - priority 1: flood any broadcast packet. > >>> >>>> > >>> >>>> Together with the learn_from_arp_request=false knob this would > cover > >>> >>>> both case 1 (join switch) and case 2 (external switch). > >>> >>>> > >>> >>>> Wdyt? > >>> >>>> > >>> >>> Would the "learn_from_arp_request=false knob" cover both cases? If > yes, > >>> >>> we don't need to add more flows of priority 80, or more accurately: > >>> >>> whether to update the priority-80 flows is not directly related to > the > >>> >>> current problem. > >>> >>> > >>> >> > >>> >> Yes, it would, except for the fact that the ARP requests would > still be > >>> >> flooded to all routers (and ignored at the destination). Which is > afaiu > >>> >> what Girish was worried about. In order to address that part too I'm > >>> >> afraid we have to update the priority-80 flows. > >>> >> > >>> >> Regards, > >>> >> Dumitru > >>> >> > >>> >>>>>> > >>> >>>>>> > >>> >>>>>> Han, yes it will work. However, my only concern is that we > would send > >>> >>>>> all these ARP requests via tunnel to each of 1000 hypervisors > and these > >>> >>>>> hypervisors will just drop them on the floor. when they see > >>> >>>>> learn_from_arp_request=false. > >>> >>>>> > >>> >>>>> I think maybe it is not a problem since it happens only once on > the Join > >>> >>>>> switch. Once the MAC is learned, it won't broadcast again. It > may be > >>> >>>>> more of a problem on the external LS if periodical GARP is > required > >>> >>>>> there. However, I'd suggest to have some test and see if it is > really a > >>> >>>>> problem, before trying to solve it. > >>> >>>>> > >>> >>>>>> > >>> >>>>>> Han, Dumitru, > >>> >>>>>> > >>> >>>>>> Why can't we swap the priorities of the above two flows so that > the > >>> >>>>> ARP request for NexHop IP known to OVN will be always sent via > >>> >>> `unicast`? > >>> >>>>> > >>> >>>>> If swapped, even GARP won't get broadcasted. Maybe that's not the > >>> >>>>> desired behavior. > >>> >>>>> > >>> >>>> > >>> >>>> This is definitely not desired as we'd be hitting the prio 75 > flow that > >>> >>>> would send the self originated GARP request (IPx) packet back > towards > >>> >>>> the router port that owns IPx. > >>> >>>> > >>> >>>>>> > >>> >>>>>> Regards, > >>> >>>>>> ~Girish > >>> >>>>>> > >>> >>>>>>> > >>> >>>>>>>> > >>> >>>>>>>> 2. External Logical Switch Case > >>> >>>>>>>> > >>> >>>>>>>> 10.10.10.0/24 <http://10.10.10.0/24> > >>> >>> <http://10.10.10.0/24> > >>> >>>>> > >>> >>>>>>>> -------------------------+-------------------------- > >>> >>>>>>>> | > >>> >>>>>>>> localnet > >>> >>>>>>>> +-----+-----+ > >>> >>>>>>>> | external | > >>> >>>>>>>> +------------+ LS1 +-------------+ > >>> >>>>>>>> | +-----+-----+ | > >>> >>>>>>>> | | | > >>> >>>>>>>> 10.10.10.2 10.10.10.3 10.10.10.4 > >>> >>>>>>>> SNAT SNAT SNAT > >>> >>>>>>>> +-----+-----+ +-----+-----+ +-----------+ > >>> >>>>>>>> | l3gateway | | l3gateway | | l3gateway | > >>> >>>>>>>> | node1 | | node2 | | node3 | > >>> >>>>>>>> +-----------+ +-----------+ +-----------+ > >>> >>>>>>>> > >>> >>>>>>>> In this case, we have some of the IPs in OVN and some in the > >>> >>>>> physical network. If we fix (1) above, all the ARP requests for > the > >>> >>>>> OVN's router IPs will be unicast. However, all the ARP requests > to > >>> >>>>> external IPs, say 10.10.10.1 on the "physical router", will be > >>> >>>>> broadcast. Now, we will see these ARP broadcasts on all the L3 > gateway > >>> >>>>> routers. With 'learn_from_arp_request=false' [a], then the > MAC_Binding > >>> >>>>> table will not explode for both ARP and GARP requests. > >>> >>>>>>>> > >>> >>>>>>>> So, I don't think GARP requests and replies is the issue here? > >>> >>>>> Furthermore, learning from the GARP replies are blocked on > certain > >>> >>>>> routers. For example: > >>> >>>>> > >>> >>> > https://www.juniper.net/documentation/en_US/junose15.1/topics/concept/ip-gratuitous-arps-transmission-overview.html > >>> >>>>> says "By default, updating the ARP cache on GARP replies is > disabled on > >>> >>>>> the router.". So, our NAT addresses mapping will not be learnt. > >>> >>>> > >>> >>>> Just as a side note, the above doesn't mean Juniper boxes don't > support > >>> >>>> learning from GARP replies, just that they'd need extra > configuration. I > >>> >>>> don't necessarily think that's a bad thing if properly documented > in OVN > >>> >>>> that we would be generating GARP replies. > >>> >>>> > >>> >>>> Regards, > >>> >>>> Dumitru > >>> >>>> > >>> >>>>>>>> > >>> >>>>>>>> Regards, > >>> >>>>>>>> ~Girish > >>> >>>>>>>> > >>> >>>>>>>> > >>> >>>>>>>> [a] - From Han's mail, the meaning of > learn_from_arp_request=false > >>> >>>>> --> if the TPA is on the router, add a new entry (it means the > >>> >>>>>>>>> remote wants to communicate with this node, so it makes > >>> >>> sense to > >>> >>>>>>>>> learn the remote as well). Otherwise, ignore it and no > new > >>> >>>>> entry added. > >>> >>>>>>>> > >>> >>>>>>>> > >>> >>>>>>>> > >>> >>>>>> > >>> >>>>>> -- > >>> >>>>>> You received this message because you are subscribed to the > Google > >>> >>>>> Groups "ovn-kubernetes" group. > >>> >>>>>> To unsubscribe from this group and stop receiving emails from > it, send > >>> >>>>> an email to ovn-kubernetes+unsubscr...@googlegroups.com > >>> >>> <mailto:ovn-kubernetes%2bunsubscr...@googlegroups.com> > >>> >>>>> <mailto:ovn-kubernetes%2bunsubscr...@googlegroups.com > >>> >>> <mailto:ovn-kubernetes%252bunsubscr...@googlegroups.com>>. > >>> >>>>>> To view this discussion on the web visit > >>> >>>>> > >>> >>> > https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STRnem2PeSahuwhro1t%2BQJxchZNC7viq8n-ngM9KU%2B%2B-Xw%40mail.gmail.com > . > >>> >>>> > >>> >>> > >>> >>> -- > >>> >>> You received this message because you are subscribed to the Google > >>> >>> Groups "ovn-kubernetes" group. > >>> >>> To unsubscribe from this group and stop receiving emails from it, > send > >>> >>> an email to ovn-kubernetes+unsubscr...@googlegroups.com > >>> >>> <mailto:ovn-kubernetes+unsubscr...@googlegroups.com>. > >>> >>> To view this discussion on the web visit > >>> >>> > https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCkHGft30Vx_Yx3fiCeki4NM4YwCvNJaU2S2mGv4buLwgg%40mail.gmail.com > >>> >>> < > https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCkHGft30Vx_Yx3fiCeki4NM4YwCvNJaU2S2mGv4buLwgg%40mail.gmail.com?utm_medium=email&utm_source=footer > >. > >>> >> > >>> >> _______________________________________________ > >>> >> discuss mailing list > >>> >> disc...@openvswitch.org > >>> >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > >>> > > >>> > >> -- > >> You received this message because you are subscribed to the Google > Groups "ovn-kubernetes" group. > >> To unsubscribe from this group and stop receiving emails from it, send > an email to ovn-kubernetes+unsubscr...@googlegroups.com. > >> To view this discussion on the web visit > https://groups.google.com/d/msgid/ovn-kubernetes/CADO7ZnoBqbOvo-2jjTOKPA3otgA_4LYqiao2k718guFdW8kTAg%40mail.gmail.com > . >
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss