Re: [ovs-discuss] [ovn] recommended upgrade/restart procedure for ovn components
On Wed, Aug 18, 2021 at 1:47 PM Krzysztof Klimonda wrote: > > Hi Numan, > > On Wed, Aug 18, 2021, at 17:42, Numan Siddique wrote: > > On Wed, Aug 18, 2021 at 3:55 AM Krzysztof Klimonda > > wrote: > > > > > > Hi, > > > > > > After reading OVN upgrade documentation[1], my understanding is that the > > > order of upgrading components is pretty important to ensure controlplane > > > & dataplane stability. As I understand those are the upgrade steps: > > > > > > > > 1. upgrade and restart ovn-controller on every chassis > > > 2. upgrade ovn-nb-db and ovn-sb-db and migrate database schema > > > 3. upgrade ovn-northd as the last component > > > > Even though this is the recommended procedure, I know that Openstack > > tripleo deployments and Openshift upgrades the ovn-northd and > > ovsdb-servers first > > > > > > > > > > First, is schema upgrade is done by ovn-ctl somehow? It didn't upgrade > > > schema for me and I had to run "ovsdb-client migrate" command on both > > > northbound and southbound databases. > > > > I think ovn-ctl should take care of upgrading the database to the > > updated schema. Before restarting the ovsdb-servers, the ovn packages > > were upgraded to the desired schema files right ? > > If so, I think ovn-ctl should upgrade the database. > > Yeah, those are kolla containers and after restart we use new image with new > ovn packages. This is how kolla starts northbound db: > "/usr/share/ovn/scripts/ovn-ctl run_nb_ovsdb --db-nb-addr=172.16.0.213 > --db-nb-cluster-local-addr=172.16.0.213 --db-nb-sock=/run/ovn/ovnnb_db.sock > --db-nb-pid=/run/ovn/ovnnb_db.pid > --db-nb-file=/var/lib/openvswitch/ovn-nb/ovnnb.db > --ovn-nb-logfile=/var/log/kolla/openvswitch/ovn-nb-db.log" - I'll double > check if I can figure out why schema wasn't upgraded. > > > > > > > > > Second, in large deployments (250+ ovn-controllers) restarting ovn > > > southbound cluster nodes leads to complete failure of the southbound > > > database in my environment - once all ovn-controllers (and > > > neutron-ovn-metadata-agents) start reconnecting to the cluster, the load > > > generated by them makes cluster lose quorum, or even corrupt database on > > > some nodes. > > > > If there are a lot of connections to ovsdb-servers, it would > > definitely slow down. Maybe you can restart ovn-controllers in > > phased manners ? Or pause all ovn-controllers and then unpause them > > in a few groups so that ovsdb-servers are not overloaded. > > I think in one of our production scale deployments we did something similar. > > By pause do you mean "debug/pause"? Thanks, I'll check it out. Yes. > > > > > > > > I'm running OVN 21.06 with ovsdb-server 2.14.0 - should I be upgrading to > > > 2.15.x? I've also seen the new relay-based architecture introduced in > > > 2.16.0 release but this seems be rather recent development and I'm > > > worried about stability (I've seen some report about crashes and high > > > memory usage). > > > > > > When running scale tests for ovn with kubernetes with hundreds of nodes, > > > how are cluster upgrades handled? > > > > As I mentioned above, I think in the case of openshift, the master > > nodes are upgraded first and then the worker nodes are upgraded. > > I think during the master node upgrades, the worker nodes are paused. > > My kubernetes/openshift knowledge is limited though. > > Thanks, any idea on upgrading ovsdb-server to 2.15.1 release? I see that > there is a new database format - would that give any performance boost to > northbound and southbound clusters? Or should I just start looking into > relay-based southbound deployment to scale my cluster to 200+ nodes? If you want to try to relay deployment, I'd suggest using 2.16.0. I'm not really sure what improvements went in 2.15.1. If you can, I'd suggest moving to 2.16.0. Thanks Numan > > Thanks > Krzysztof > > > > > Thanks > > Numan > > > > > > > > Regards, > > > Krzysztof > > > > > > [1] https://docs.ovn.org/en/latest/intro/install/ovn-upgrades.html > > > > > > -- > > > Krzysztof Klimonda > > > kklimo...@syntaxhighlighted.com > > > ___ > > > discuss mailing list > > > disc...@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > > > > > > -- > Krzysztof Klimonda > kklimo...@syntaxhighlighted.com > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] [ovn] recommended upgrade/restart procedure for ovn components
Hi Numan, On Wed, Aug 18, 2021, at 17:42, Numan Siddique wrote: > On Wed, Aug 18, 2021 at 3:55 AM Krzysztof Klimonda > wrote: > > > > Hi, > > > > After reading OVN upgrade documentation[1], my understanding is that the > > order of upgrading components is pretty important to ensure controlplane & > > dataplane stability. As I understand those are the upgrade steps: > > > > > 1. upgrade and restart ovn-controller on every chassis > > 2. upgrade ovn-nb-db and ovn-sb-db and migrate database schema > > 3. upgrade ovn-northd as the last component > > Even though this is the recommended procedure, I know that Openstack > tripleo deployments and Openshift upgrades the ovn-northd and > ovsdb-servers first > > > > > > First, is schema upgrade is done by ovn-ctl somehow? It didn't upgrade > > schema for me and I had to run "ovsdb-client migrate" command on both > > northbound and southbound databases. > > I think ovn-ctl should take care of upgrading the database to the > updated schema. Before restarting the ovsdb-servers, the ovn packages > were upgraded to the desired schema files right ? > If so, I think ovn-ctl should upgrade the database. Yeah, those are kolla containers and after restart we use new image with new ovn packages. This is how kolla starts northbound db: "/usr/share/ovn/scripts/ovn-ctl run_nb_ovsdb --db-nb-addr=172.16.0.213 --db-nb-cluster-local-addr=172.16.0.213 --db-nb-sock=/run/ovn/ovnnb_db.sock --db-nb-pid=/run/ovn/ovnnb_db.pid --db-nb-file=/var/lib/openvswitch/ovn-nb/ovnnb.db --ovn-nb-logfile=/var/log/kolla/openvswitch/ovn-nb-db.log" - I'll double check if I can figure out why schema wasn't upgraded. > > > > > Second, in large deployments (250+ ovn-controllers) restarting ovn > > southbound cluster nodes leads to complete failure of the southbound > > database in my environment - once all ovn-controllers (and > > neutron-ovn-metadata-agents) start reconnecting to the cluster, the load > > generated by them makes cluster lose quorum, or even corrupt database on > > some nodes. > > If there are a lot of connections to ovsdb-servers, it would > definitely slow down. Maybe you can restart ovn-controllers in > phased manners ? Or pause all ovn-controllers and then unpause them > in a few groups so that ovsdb-servers are not overloaded. > I think in one of our production scale deployments we did something similar. By pause do you mean "debug/pause"? Thanks, I'll check it out. > > > > I'm running OVN 21.06 with ovsdb-server 2.14.0 - should I be upgrading to > > 2.15.x? I've also seen the new relay-based architecture introduced in > > 2.16.0 release but this seems be rather recent development and I'm worried > > about stability (I've seen some report about crashes and high memory usage). > > > > When running scale tests for ovn with kubernetes with hundreds of nodes, > > how are cluster upgrades handled? > > As I mentioned above, I think in the case of openshift, the master > nodes are upgraded first and then the worker nodes are upgraded. > I think during the master node upgrades, the worker nodes are paused. > My kubernetes/openshift knowledge is limited though. Thanks, any idea on upgrading ovsdb-server to 2.15.1 release? I see that there is a new database format - would that give any performance boost to northbound and southbound clusters? Or should I just start looking into relay-based southbound deployment to scale my cluster to 200+ nodes? Thanks Krzysztof > > Thanks > Numan > > > > > Regards, > > Krzysztof > > > > [1] https://docs.ovn.org/en/latest/intro/install/ovn-upgrades.html > > > > -- > > Krzysztof Klimonda > > kklimo...@syntaxhighlighted.com > > ___ > > discuss mailing list > > disc...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > -- Krzysztof Klimonda kklimo...@syntaxhighlighted.com ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] [ovn] recommended upgrade/restart procedure for ovn components
On Wed, Aug 18, 2021 at 3:55 AM Krzysztof Klimonda wrote: > > Hi, > > After reading OVN upgrade documentation[1], my understanding is that the > order of upgrading components is pretty important to ensure controlplane & > dataplane stability. As I understand those are the upgrade steps: > > 1. upgrade and restart ovn-controller on every chassis > 2. upgrade ovn-nb-db and ovn-sb-db and migrate database schema > 3. upgrade ovn-northd as the last component Even though this is the recommended procedure, I know that Openstack tripleo deployments and Openshift upgrades the ovn-northd and ovsdb-servers first > > First, is schema upgrade is done by ovn-ctl somehow? It didn't upgrade schema > for me and I had to run "ovsdb-client migrate" command on both northbound and > southbound databases. I think ovn-ctl should take care of upgrading the database to the updated schema. Before restarting the ovsdb-servers, the ovn packages were upgraded to the desired schema files right ? If so, I think ovn-ctl should upgrade the database. > > Second, in large deployments (250+ ovn-controllers) restarting ovn southbound > cluster nodes leads to complete failure of the southbound database in my > environment - once all ovn-controllers (and neutron-ovn-metadata-agents) > start reconnecting to the cluster, the load generated by them makes cluster > lose quorum, or even corrupt database on some nodes. If there are a lot of connections to ovsdb-servers, it would definitely slow down. Maybe you can restart ovn-controllers in phased manners ? Or pause all ovn-controllers and then unpause them in a few groups so that ovsdb-servers are not overloaded. I think in one of our production scale deployments we did something similar. > I'm running OVN 21.06 with ovsdb-server 2.14.0 - should I be upgrading to > 2.15.x? I've also seen the new relay-based architecture introduced in 2.16.0 > release but this seems be rather recent development and I'm worried about > stability (I've seen some report about crashes and high memory usage). > > When running scale tests for ovn with kubernetes with hundreds of nodes, how > are cluster upgrades handled? As I mentioned above, I think in the case of openshift, the master nodes are upgraded first and then the worker nodes are upgraded. I think during the master node upgrades, the worker nodes are paused. My kubernetes/openshift knowledge is limited though. Thanks Numan > > Regards, > Krzysztof > > [1] https://docs.ovn.org/en/latest/intro/install/ovn-upgrades.html > > -- > Krzysztof Klimonda > kklimo...@syntaxhighlighted.com > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] [ovn] recommended upgrade/restart procedure for ovn components
Hi, After reading OVN upgrade documentation[1], my understanding is that the order of upgrading components is pretty important to ensure controlplane & dataplane stability. As I understand those are the upgrade steps: 1. upgrade and restart ovn-controller on every chassis 2. upgrade ovn-nb-db and ovn-sb-db and migrate database schema 3. upgrade ovn-northd as the last component First, is schema upgrade is done by ovn-ctl somehow? It didn't upgrade schema for me and I had to run "ovsdb-client migrate" command on both northbound and southbound databases. Second, in large deployments (250+ ovn-controllers) restarting ovn southbound cluster nodes leads to complete failure of the southbound database in my environment - once all ovn-controllers (and neutron-ovn-metadata-agents) start reconnecting to the cluster, the load generated by them makes cluster lose quorum, or even corrupt database on some nodes. I'm running OVN 21.06 with ovsdb-server 2.14.0 - should I be upgrading to 2.15.x? I've also seen the new relay-based architecture introduced in 2.16.0 release but this seems be rather recent development and I'm worried about stability (I've seen some report about crashes and high memory usage). When running scale tests for ovn with kubernetes with hundreds of nodes, how are cluster upgrades handled? Regards, Krzysztof [1] https://docs.ovn.org/en/latest/intro/install/ovn-upgrades.html -- Krzysztof Klimonda kklimo...@syntaxhighlighted.com ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss