On 6/25/20 9:34 PM, Girish Moodalbail wrote: > Hello Dumitru, Han, > > So, we applied this patchset and gave it a spin on our large scale > cluster and saw a significant reduction in the number of logical flows > in lr_in_ip_input table. Before this patch there were around 1.6M flows > in lr_in_ip_input table. However, after the patch we see about 26K > flows. So that is significant reduction in number of logical flows. > > In lr_in_ip_input, I see > > * priority 92 flows matching ARP requests for dnat_and_snat IPs on > distributed gateway port with is_chassis_resident() and > corresponding ARP reply > * priority 91 flows matching ARP requests for dnat_and_snat IPs on > distributed gateway port with !is_chassis_resident() and > corresponding drop > * priority 90 flow matching ARP request for dnat_and_snat IPs and > corresponding ARP replies > > So far so good.
Hi Girish, Great, thanks for testing out the series and confirming that it's working ok. > > However, not directly related to this patch per-se but directly related > to the behaviour of ARP and dnat_and_snat IP, on the OVN chassis we are > seeing a significant number of OpenFlow flows in table 27 (around 2.3M > OpenFlow flows). This table gets populated from logical flows in > table=19 (ls_in_l2_lkup) of logical switch. > > The two logical flows in l2_in_l2_lkup that are contributing to huge > number of OpenFlow flows are: (for the entire logical flow entry, > please > see: https://gist.github.com/girishmg/57b3005030d421c59b30e6c36cfc9c18) > > Priority=75 flow > ============= > This flow looks like below (where 169.254.0.0/29 <http://169.254.0.0/29> > is dnat_and_snat subnet and 192.168.0.1 is the logical_switch's gateway IP) > > table=19(ls_in_l2_lkup ), priority=75 , match=(flags[1] == 0 && > arp.op == 1 && arp.tpa == { 169.254.3.107, 169.254.1.85, 192.168.0.1, > 169.254.10.155, 169.254.1.6}), action=(outport = "stor-sdn-test1"; output;) > > What this flow says is that any ARP request packet from the switch > heading towards the default gateway or any of those 1-to-1 nat send it > out through the port towards the ovn_cluster_router’s ingress pipeline. > Question though is why any Pod on the logical switch would send an ARP > for an IP that is not in its subnet. A packet from a Pod towards a > non-subnet IP should ARP only for the default gateway IP. > This is a bug. I'll start working on a fix send a patch for it soon. > Priority=80 Flow > ============= > This flow looks like below > > table=19(ls_in_l2_lkup ), priority=80 , match=(eth.src == { > 0a:58:c0:a8:00:01, 6a:93:f4:55:aa:a7, ae:92:2d:33:24:ea, > ba:0a:d3:7d:bc:e8, b2:2f:40:4d:d9:2b} && (arp.op == 1 || nd_ns)), > action=(outport = "_MC_flood"; output;) > > The question again for this flow is why will there be a self-originated > arp requests for the dnat_and_snat IPs from inside of the node's logical > switch. I can see how this is a possibility on the switch that has > `localnet port` on it and to which the distributed router connects to > through a gateway port. > This is also a bug, similar to the one above, we should only deal with external_mac's that might be used on this port. I'll fix it too soon. Thanks, Dumitru > Regards, > ~Girish > > On Wed, Jun 24, 2020 at 8:55 AM Dumitru Ceara <dce...@redhat.com > <mailto:dce...@redhat.com>> wrote: > > Hi Girish, > > I sent a patch series to implement Han's suggestion: > https://patchwork.ozlabs.org/project/openvswitch/list/?series=185580 > https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/372005.html > > It would be great if you could give it a run on your setup too. > > Thanks, > Dumitru > > On 6/16/20 5:18 PM, Girish Moodalbail wrote: > > Thanks Han for the update. > > > > Regards, > > ~Girish > > > > On Mon, Jun 15, 2020 at 12:55 PM Han Zhou <zhou...@gmail.com > <mailto:zhou...@gmail.com> > > <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com>>> wrote: > > > > Sorry Girish, I can't promise for now. I will see if I have > time in > > the next couple of weeks, but welcome anyone to volunteer on > this if > > it is urgent. > > > > On Mon, Jun 15, 2020 at 10:56 AM Girish Moodalbail > > <gmoodalb...@gmail.com <mailto:gmoodalb...@gmail.com> > <mailto:gmoodalb...@gmail.com <mailto:gmoodalb...@gmail.com>>> wrote: > > > > Hello Han, > > > > On Wed, Jun 3, 2020 at 9:39 PM Han Zhou <zhou...@gmail.com > <mailto:zhou...@gmail.com> > > <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com>>> wrote: > > > > > > > > On Wed, Jun 3, 2020 at 7:16 PM Girish Moodalbail > > <gmoodalb...@gmail.com <mailto:gmoodalb...@gmail.com> > <mailto:gmoodalb...@gmail.com <mailto:gmoodalb...@gmail.com>>> wrote: > > > > Hello all, > > > > While working on an extension, see the diagram > below, to > > the existing OVN logical topology for the > ovn-kubernetes > > project, I am seeing an explosion of the "Reply to ARP > > requests" logical flows in the `lr_in_ip_input` table > > for the distributed router (ovn_cluster_router) > > configured with gateway port (rtol-LS) > > > > internet > > ---------+--------------> > > | > > | > > > +----------localnet-port---------+ > > |LS | > > +-----------------ltor-LS--------+ > > | > > | > > +---------------------rtol-LS------------+ > > | ovn_cluster_router | > > | (Distributed Router) | > > +-rtos-ls0------rtos-ls1--------rtos-ls2-+ > > | | | > > | | | > > +-----+-+ +----+--+ +-----+-+ > > | LS0 | | LS1 | | LS2 | > > +-+-----+ +-+-----+ +-+-----+ > > | | | > > p0 p1 p2 > > IA0 IA1 IA2 > > EA0 EA1 EA2 > > (Node0) (Node1) (Node2) > > > > In the topology above, each of the three logical > switch > > port has an internal address of IAx and an external > > address of EAx (dnat_and_snat IP). They are all > bound to > > their respective nodes (Nodex). A packet from `p0` > > heading towards the internet will be SNAT'ed to EA0 on > > the local hypervisor and then sent out through the > LS's > > localnet-port on that hypervisor. Basically, they are > > configured for distributed NATing. > > > > I am seeing interesting "Reply to ARP requests" flows > > for arp.tpa set to "EAX". Flows are like this: > > > > For EA0 > > priority=90, match=(inport == "rtos-ls0" && arp.tpa == > > EA0 && arp.op == 1), action=(/* ARP reply */) > > priority=90, match=(inport == "rtos-ls1" && arp.tpa == > > EA0 && arp.op == 1), action=(/* ARP reply */) > > priority=90, match=(inport == "rtos-ls2" && arp.tpa == > > EA0 && arp.op == 1), action=(/* ARP reply */) > > > > For EA1 > > priority=90, match=(inport == "rtos-ls0" && arp.tpa == > > EA1 && arp.op == 1), action=(/* ARP reply */) > > priority=90, match=(inport == "rtos-ls1" && arp.tpa == > > EA0 && arp.op == 1), action=(/* ARP reply */) > > priority=90, match=(inport == "rtos-ls2" && arp.tpa == > > EA1 && arp.op == 1), action=(/* ARP reply */) > > > > Similarly, for EA2. > > > > So, we have N * N "Reply to ARP requests" flows for N > > nodes each with 1 dnat_and_snat ip. > > This is causing scale issues. > > > > If you look at the flows for `EA0`, i am confused > as to > > why is it needed? > > > > 1. When will one see an ARP request for the EA0 from > > any of the LS{0,1,2}'s logical switch port. > > 2. If it is needed at all, can't we just remove the > > `inport` thing altogether since the flow is > > configured for every port of logical router port > > except for the distributed gateway port > rtol-LS. For > > this port, we could add an higher priority > rule with > > action set to `next`. > > 3. Say, we don't need east-west NAT connectivity. Is > > there a way to make these ARPs be learnt > > dynamically, like we are doing for join and > external > > logical switch (the other thread [1]). > > > > Regards, > > ~Girish > > > > > [1] > https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html > > > > > > In general, these flows should be per router instead > of per > > router port, since the nat addresses are not attached > to any > > router port. For distributed gateway ports, there will > need > > per-port flows to match > > is_chassis_resident(gateway-chassis). I think this can be > > handled by: > > - priority X + 20 flows for each distributed gateway port > > with is_chassis_resident(), reply ARP > > - priority X + 10 flows for each distributed gateway port > > without is_chassis_resident(), drop > > - priority X flows for each router (no need to match > > inport), reply ARP > > > > This way, there are N * (2D + 1) flows per router. N = > > number of NAT IPs, D = number of distributed gateway > ports. > > This would optimize the above scenario where there is > only 1 > > distributed gateway port but many regular router ports. > > Thoughts? > > > > > > We went ahead and added support for this topology in > > ovn-kubernetes project in this commit > > > > https://github.com/ovn-org/ovn-kubernetes/commit/edb24e6a71142f2e835b67b29c11e1688c645683 > > > > > Han, was curious to know if the above fix is in your > radar? Thanks. > > > > The number of OpenFlow flows in each of the hypervisors is > > insanely high and is consuming a lot of memory. > > > > Regards, > > ~Girish > > > > > > > > > > > > > > Thanks, > > Han > > > > -- > > You received this message because you are subscribed to the > > Google Groups "ovn-kubernetes" group. > > To unsubscribe from this group and stop receiving emails from > > it, send an email to > ovn-kubernetes+unsubscr...@googlegroups.com > <mailto:ovn-kubernetes%2bunsubscr...@googlegroups.com> > > <mailto:ovn-kubernetes+unsubscr...@googlegroups.com > <mailto:ovn-kubernetes%2bunsubscr...@googlegroups.com>>. > > To view this discussion on the web visit > > > > https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTOrzx-zy48TKpbxx4yxxQ_X5bN05VPqBHA79gpCBQfwg%40mail.gmail.com > > > > <https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTOrzx-zy48TKpbxx4yxxQ_X5bN05VPqBHA79gpCBQfwg%40mail.gmail.com?utm_medium=email&utm_source=footer>. > > > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss