Re: [ovs-discuss] [OVN] logical flow explosion in lr_in_ip_input table for dnat_and_snat IPs

Dumitru Ceara Thu, 25 Jun 2020 12:48:53 -0700

On 6/25/20 9:34 PM, Girish Moodalbail wrote:
> Hello Dumitru, Han,
> 
> So, we applied this patchset and gave it a spin on our large scale
> cluster and saw a significant reduction in the number of logical flows
> in lr_in_ip_input table. Before this patch there were around 1.6M flows
> in lr_in_ip_input table. However, after the patch we see about 26K
> flows. So that is significant reduction in number of logical flows.
> 
> In lr_in_ip_input, I see
> 
>   * priority 92 flows matching ARP requests for dnat_and_snat IPs on
>     distributed gateway port with is_chassis_resident() and
>     corresponding ARP reply
>   * priority 91 flows matching ARP requests for dnat_and_snat IPs on
>     distributed gateway port with !is_chassis_resident() and
>     corresponding drop
>   * priority 90 flow matching ARP request for dnat_and_snat IPs and
>     corresponding ARP replies
> 
> So far so good.


Hi Girish,

Great, thanks for testing out the series and confirming that it's
working ok.

> 
> However, not directly related to this patch per-se but directly related
> to the behaviour of ARP and dnat_and_snat IP, on the OVN chassis we are
> seeing a significant number of OpenFlow flows in table 27 (around 2.3M
> OpenFlow flows). This table gets populated from logical flows in
> table=19 (ls_in_l2_lkup) of logical switch.
> 
> The two logical flows in l2_in_l2_lkup that are contributing to huge
> number of OpenFlow flows are: (for the  entire logical flow entry,
> please
> see: https://gist.github.com/girishmg/57b3005030d421c59b30e6c36cfc9c18)
> 
> Priority=75 flow 
> =============
> This flow looks like below (where 169.254.0.0/29 <http://169.254.0.0/29>
> is dnat_and_snat subnet and 192.168.0.1 is the logical_switch's gateway IP)
> 
> table=19(ls_in_l2_lkup      ), priority=75   , match=(flags[1] == 0 &&
> arp.op == 1 && arp.tpa == { 169.254.3.107, 169.254.1.85, 192.168.0.1,
> 169.254.10.155, 169.254.1.6}), action=(outport = "stor-sdn-test1"; output;)
> 
> What this flow says is that any ARP request packet from the switch
> heading towards the default gateway or any of those 1-to-1 nat send it
> out through the port towards  the ovn_cluster_router’s ingress pipeline.
> Question though is why any Pod on the logical switch would send an ARP
> for an IP that is not in its subnet. A packet from a Pod towards a
> non-subnet IP should ARP only for the default gateway IP.
> 

This is a bug. I'll start working on a fix send a patch for it soon.

> Priority=80 Flow
> =============
> This flow looks like below
> 
> table=19(ls_in_l2_lkup      ), priority=80   , match=(eth.src == {
> 0a:58:c0:a8:00:01, 6a:93:f4:55:aa:a7, ae:92:2d:33:24:ea,
> ba:0a:d3:7d:bc:e8, b2:2f:40:4d:d9:2b} && (arp.op == 1 || nd_ns)),
> action=(outport = "_MC_flood"; output;)
> 
> The question again for this flow is why will there be a self-originated
> arp requests for the dnat_and_snat IPs from inside of the node's logical
> switch. I can see how this is a possibility on the switch that has
> `localnet port` on it and to which the distributed router connects to
> through a gateway port. 
> 

This is also a bug, similar to the one above, we should only deal with
external_mac's that might be used on this port. I'll fix it too soon.

Thanks,
Dumitru

> Regards,
> ~Girish
> 
> On Wed, Jun 24, 2020 at 8:55 AM Dumitru Ceara <dce...@redhat.com
> <mailto:dce...@redhat.com>> wrote:
> 
>     Hi Girish,
> 
>     I sent a patch series to implement Han's suggestion:
>     https://patchwork.ozlabs.org/project/openvswitch/list/?series=185580
>     https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/372005.html
> 
>     It would be great if you could give it a run on your setup too.
> 
>     Thanks,
>     Dumitru
> 
>     On 6/16/20 5:18 PM, Girish Moodalbail wrote:
>     > Thanks Han for the update.
>     >
>     > Regards,
>     > ~Girish 
>     >
>     > On Mon, Jun 15, 2020 at 12:55 PM Han Zhou <zhou...@gmail.com
>     <mailto:zhou...@gmail.com>
>     > <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com>>> wrote:
>     >
>     >     Sorry Girish, I can't promise for now. I will see if I have
>     time in
>     >     the next couple of weeks, but welcome anyone to volunteer on
>     this if
>     >     it is urgent.
>     >
>     >     On Mon, Jun 15, 2020 at 10:56 AM Girish Moodalbail
>     >     <gmoodalb...@gmail.com <mailto:gmoodalb...@gmail.com>
>     <mailto:gmoodalb...@gmail.com <mailto:gmoodalb...@gmail.com>>> wrote:
>     >
>     >         Hello Han,
>     >
>     >         On Wed, Jun 3, 2020 at 9:39 PM Han Zhou <zhou...@gmail.com
>     <mailto:zhou...@gmail.com>
>     >         <mailto:zhou...@gmail.com <mailto:zhou...@gmail.com>>> wrote:
>     >
>     >
>     >
>     >             On Wed, Jun 3, 2020 at 7:16 PM Girish Moodalbail
>     >             <gmoodalb...@gmail.com <mailto:gmoodalb...@gmail.com>
>     <mailto:gmoodalb...@gmail.com <mailto:gmoodalb...@gmail.com>>> wrote:
>     >
>     >                 Hello all,
>     >
>     >                 While working on an extension, see the diagram
>     below, to
>     >                 the existing OVN logical topology for the
>     ovn-kubernetes
>     >                 project, I am seeing an explosion of the "Reply to ARP
>     >                 requests" logical flows in the `lr_in_ip_input` table
>     >                 for the distributed router (ovn_cluster_router)
>     >                 configured with gateway port (rtol-LS)
>     >
>     >                                         internet          
>     >                                ---------+-------------->  
>     >                                         |                  
>     >                                         |                         
>             
>     >                       +----------localnet-port---------+  
>     >                       |LS                              |  
>     >                       +-----------------ltor-LS--------+  
>     >                                            |              
>     >                                            |              
>     >                  +---------------------rtol-LS------------+
>     >                  |           ovn_cluster_router           |
>     >                  |          (Distributed Router)          |
>     >                  +-rtos-ls0------rtos-ls1--------rtos-ls2-+
>     >                       |              |              |        
>     >                       |              |              |      
>     >                 +-----+-+       +----+--+     +-----+-+    
>     >                 |  LS0  |       |  LS1  |     |  LS2  |    
>     >                 +-+-----+       +-+-----+     +-+-----+        
>     >                   |               |             |          
>     >                   p0              p1            p2        
>     >                  IA0             IA1           IA2        
>     >                  EA0             EA1           EA2 
>     >                 (Node0)          (Node1)       (Node2)
>     >
>     >                 In the topology above, each of the three logical
>     switch
>     >                 port has an internal address of IAx and an external
>     >                 address of EAx (dnat_and_snat IP). They are all
>     bound to
>     >                 their respective nodes (Nodex). A packet from `p0`
>     >                 heading towards the internet will be SNAT'ed to EA0 on
>     >                 the local hypervisor and then sent out through the
>     LS's
>     >                 localnet-port on that hypervisor. Basically, they are
>     >                 configured for distributed NATing.
>     >
>     >                 I am seeing interesting "Reply to ARP requests" flows
>     >                 for arp.tpa set to "EAX". Flows are like this:
>     >
>     >                 For EA0
>     >                 priority=90, match=(inport == "rtos-ls0" && arp.tpa ==
>     >                 EA0 && arp.op == 1), action=(/* ARP reply */)
>     >                 priority=90, match=(inport == "rtos-ls1" && arp.tpa ==
>     >                 EA0 && arp.op == 1), action=(/* ARP reply */)
>     >                 priority=90, match=(inport == "rtos-ls2" && arp.tpa ==
>     >                 EA0 && arp.op == 1), action=(/* ARP reply */)
>     >
>     >                 For EA1
>     >                 priority=90, match=(inport == "rtos-ls0" && arp.tpa ==
>     >                 EA1 && arp.op == 1), action=(/* ARP reply */)
>     >                 priority=90, match=(inport == "rtos-ls1" && arp.tpa ==
>     >                 EA0 && arp.op == 1), action=(/* ARP reply */)
>     >                 priority=90, match=(inport == "rtos-ls2" && arp.tpa ==
>     >                 EA1 && arp.op == 1), action=(/* ARP reply */)
>     >
>     >                 Similarly, for EA2.
>     >
>     >                 So, we have N * N "Reply to ARP requests" flows for N
>     >                 nodes each with 1 dnat_and_snat ip. 
>     >                 This is causing scale issues.
>     >
>     >                 If you look at the flows for `EA0`, i am confused
>     as to
>     >                 why is it needed?
>     >
>     >                  1. When will one see an ARP request for the EA0 from
>     >                     any of the LS{0,1,2}'s logical switch port.
>     >                  2. If it is needed at all, can't we just remove the
>     >                     `inport` thing altogether since the flow is
>     >                     configured for every port of logical router port
>     >                     except for the distributed gateway port
>     rtol-LS. For
>     >                     this port, we could add an higher priority
>     rule with
>     >                     action set to `next`.
>     >                  3. Say, we don't need east-west NAT connectivity. Is
>     >                     there a way to make these ARPs be learnt
>     >                     dynamically, like we are doing for join and
>     external
>     >                     logical switch (the other thread [1]).
>     >
>     >                 Regards,
>     >                 ~Girish
>     >
>     >               
>      [1] 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html 
>     >
>     >
>     >             In general, these flows should be per router instead
>     of per
>     >             router port, since the nat addresses are not attached
>     to any
>     >             router port. For distributed gateway ports, there will
>     need
>     >             per-port flows to match
>     >             is_chassis_resident(gateway-chassis). I think this can be
>     >             handled by:
>     >             - priority X + 20 flows for each distributed gateway port
>     >             with is_chassis_resident(), reply ARP
>     >             - priority X + 10 flows for each distributed gateway port
>     >             without is_chassis_resident(), drop
>     >             - priority X flows for each router (no need to match
>     >             inport), reply ARP
>     >
>     >             This way, there are N * (2D + 1) flows per router. N =
>     >             number of NAT IPs, D = number of distributed gateway
>     ports.
>     >             This would optimize the above scenario where there is
>     only 1
>     >             distributed gateway port but many regular router ports.
>     >             Thoughts?
>     >
>     >
>     >         We went ahead and added support for this topology in
>     >         ovn-kubernetes project in this commit
>     >       
>      
> https://github.com/ovn-org/ovn-kubernetes/commit/edb24e6a71142f2e835b67b29c11e1688c645683
>  
>     >
>     >         Han, was curious to know if the above fix is in your
>     radar? Thanks. 
>     >
>     >         The number of OpenFlow flows in each of the hypervisors is
>     >         insanely high and is consuming a lot of memory.
>     >
>     >         Regards,
>     >         ~Girish
>     >
>     >
>     >
>     >          
>     >
>     >
>     >             Thanks,
>     >             Han
>     >
>     >         --
>     >         You received this message because you are subscribed to the
>     >         Google Groups "ovn-kubernetes" group.
>     >         To unsubscribe from this group and stop receiving emails from
>     >         it, send an email to
>     ovn-kubernetes+unsubscr...@googlegroups.com
>     <mailto:ovn-kubernetes%2bunsubscr...@googlegroups.com>
>     >         <mailto:ovn-kubernetes+unsubscr...@googlegroups.com
>     <mailto:ovn-kubernetes%2bunsubscr...@googlegroups.com>>.
>     >         To view this discussion on the web visit
>     >       
>      
> https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTOrzx-zy48TKpbxx4yxxQ_X5bN05VPqBHA79gpCBQfwg%40mail.gmail.com
>     >       
>      
> <https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTOrzx-zy48TKpbxx4yxxQ_X5bN05VPqBHA79gpCBQfwg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>     >
> 

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] [OVN] logical flow explosion in lr_in_ip_input table for dnat_and_snat IPs

Reply via email to