[root@ovn-db-2 ~]# ovn-nbctl list nb_global _uuid : b7b3aa05-f7ed-4dbc-979f-10445ac325b8 connections : [] external_ids : {"neutron:liveness_check_at"="2020-07-22 04:03:17.726917+00:00"} hv_cfg : 312 ipsec : false name : "" nb_cfg : 2636 options : {mac_prefix="ca:e8:07", svc_monitor_mac="4e:d0:3a:80:d4:b7"} sb_cfg : 2005 ssl : []
[root@ovn-db-2 ~]# ovn-sbctl list sb_global _uuid : 3720bc1d-b0da-47ce-85ca-96fa8d398489 connections : [] external_ids : {} ipsec : false nb_cfg : 312 options : {mac_prefix="ca:e8:07", svc_monitor_mac="4e:d0:3a:80:d4:b7"} ssl : [] The NBDB and SBDB is definitely out of sync. Is there any way to force ovn-northd sync them? Thanks! Tony ________________________________ From: Tony Liu <tonyliu0...@hotmail.com> Sent: July 21, 2020 08:39 PM To: Daniel Alvarez <dalva...@redhat.com> Cc: Cory Hawkless <c...@hawkless.id.au>; ovs-discuss@openvswitch.org <ovs-discuss@openvswitch.org>; Dumitru Ceara <dce...@redhat.com> Subject: Re: [ovs-discuss] OVN scale When create a network (and subnet) on OpenStack, a GW port and service port (for DHCP and metadata) are also created. They are created in Neutron and onv-nb-db by ML2 driver. Then ovn-northd will translate such update from NBDB to SBDB. My question here is that, with 20.03, is this translation incremental? After created 4000 networks successfully on OpenStack, I see 4000 logical switches and 8000 LS ports in NBDB. But in SBDB, there are only 1567 port-bindings. The break happened when translating 1568th port. If ovn-northd recompiles the whole DB for every update, this problem can be explained. The DB is too big for ovn-northd to compile in time, so all the followed updates are lost. Does it make sense? I recall DB update is coordinated by some "version", like some changes happened in NBDB, the version bumps up, ovn-northd update SBDB and bumps up version as well, so they match. So, if NBDB version bumps up more than once while ovn-northd updating SBDB, is that still going to work? If yes, then it's just matter of time, no matter how fast update happening in NBDB, ovn-northd will catch them up eventually. Am I right about that? Any comment is welcome. Thanks! Tony ________________________________ From: Tony Liu <tonyliu0...@hotmail.com> Sent: July 21, 2020 10:22 AM To: Daniel Alvarez <dalva...@redhat.com> Cc: Cory Hawkless <c...@hawkless.id.au>; ovs-discuss@openvswitch.org <ovs-discuss@openvswitch.org>; Dumitru Ceara <dce...@redhat.com> Subject: Re: [ovs-discuss] OVN scale Hi Daniel, all 4000 networks and 50 routers, 200 networks on each router, they are all created. CPU usage of Neutron server, ovn-nb-db, ovn-northd, ovn-sb-db, ovn-controller and ovs-vswitchd is OK, not consistently 100%, but still some spikes to it. Now, when create VM, I got that "waiting for vif-plugged-in timeout". This brings out another question, it used to be neutron-agent notifying Neutron server port status change, with OVN, who does it? How should I look into this? Please see my other comments Inline... Thanks! Tony ________________________________ From: Daniel Alvarez <dalva...@redhat.com> Sent: July 21, 2020 12:06 AM To: Tony Liu <tonyliu0...@hotmail.com> Cc: Cory Hawkless <c...@hawkless.id.au>; ovs-discuss@openvswitch.org <ovs-discuss@openvswitch.org>; Dumitru Ceara <dce...@redhat.com> Subject: Re: [ovs-discuss] OVN scale Hi Tony, all On 21 Jul 2020, at 07:53, Tony Liu <tonyliu0...@hotmail.com> wrote: Hi Cory, With 4000 networks all connecting to one router with external GW, all networks and router are created and connected. I launched a few VMs on some networks, they are connected and all have external connectivity. When running ping on VM, there is a slow ping (a few seconds) out of 10+ normal pings (< 1ms). When checking CPU usage, I see Neutron server, OVN DB, OVN controller and ovs-switchd all take almost 100% CPU. It's been like that for hours already. Since they are all created and some of them work fine (didn't validate all networks), not sure what those services are busy with. Checked log, the ovn-controller keep switching between ovn-sb-db, because of heartbeat timeout. How are you deploying OpenStack and in particular the OVN dbs? Is it RAFT cluster? > Kolla Ansible. I see cluster-local-address and remote address (to the first > node) > is specified for all 3 nodes. I assume clustering is enabled. > Is there different type of cluster? What’s your current value for ovn-remote-probe-interval? If it’s too low, this can be triggering reconnections all the time and creating a snowball effect. > external_ids : {ovn-encap-ip="10.6.30.22", ovn-encap-type=geneve, > ovn-remote="tcp:10.6.20.84:6642,tcp:10.6.20.85:6642,tcp:10.6.20.86:6642", > ovn-remote-probe-interval="60000", system-id="compute-3"} You can bump the probe interval timeout like this: ovs-vsctl set open . external_ids:ovn-remote-probe-interval=<TIME IN MS> I'd like know if that's expected, or something I can tune to fix the problem. If that's expected, I can't think of anything other than building multiple clusters to support that kind of scale. I am running test with 4000 networks with 50 routers, 80 networks on each router. Wondering if that's going to help. Reducing the number of routers should help. Also there are some improvements in 20.06 release when it comes to the number of logical flows by a series of patches from Han. I will post the links later, sorry. Also there is a big improvement around large Port Groups as they are now split by data path reducing dramatically the calculations in ovn-controller. Specially in scenarios with a large number of networks like yours. However you seem to have no security groups and hence no Port Groups in the NB database. Is this correct? > Yes. For now, I want to avoid scale impact from SG, so I disable it. Is there any chance you can re run the initial scenario but with 20.06? > Is there container for 20.06? Or where I can get the packages of 20.06? >I should be able to upgrade 20.03 to 20.06 by upgrading packages. The goal is to have thousands networks connecting to external. I'd like to know what's the expected scale supported by current OVN. +Dumitru as we know that there is a limit of 3000 in the number of re submissions. So having 3K routers connected to the public logical switch may hit this limitation. Please @Dumitru correct me if I’m wrong. Any comment is welcome. Thanks! Tony ________________________________ From: Cory Hawkless <c...@hawkless.id.au> Sent: July 20, 2020 10:04 PM To: Tony Liu <tonyliu0...@hotmail.com>; ovs-discuss@openvswitch.org <ovs-discuss@openvswitch.org> Subject: RE: OVN scale I would expect to see 100% cpu utilisation on anything involved in the process of creating 4000 networks and routers but the question is for how long do you see high utilisation? Does it last for seconds, minutes, hours? Do the resources actually get created after some period of time or is the process failing? From: discuss [mailto:ovs-discuss-boun...@openvswitch.org] On Behalf Of Tony Liu Sent: Tuesday, 21 July 2020 1:53 PM To: ovs-discuss@openvswitch.org Subject: [ovs-discuss] OVN scale Hi folks, This is my first email here. Please let me know if there is any rule or convention I need to follow. Don't want to break it. I started with OpenStack Ussuri and OVN 20.03.0 recently and currently running some scaling test. Searched around for scaling info and noticed some improvements already presented, which is pretty cool. Wondering that "incremental" by DDlog implemented yet? With a 3-node OVN DB cluster and 3 compute nodes (with OVN controller), I created 4000 networks from OpenStack, 4000 logical routers with external GW, add one network to each LR. Port security is disabled on all networks. Then I see ovn-northd, ovn-controller and ovs-switchd all take almost 100% CPU. Is this expected? I revised solution and running test to have 4000 networks, 20 LRs and 200 networks on each LR. Will see if this makes any difference. Is there any scaling and performance report with the latest OVN release as my reference? Thanks! Tony _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss