Re: [ovs-discuss] OVN nb-db and sb-db out of sync
Hi Numan, This is how sb-db is brought up. ``` /usr/share/ovn/scripts/ovn-ctl run_sb_ovsdb --db-sb-create-insecure-remote=yes --db-sb-addr=10.6.20.84 --db-sb-cluster-local-addr=10.6.20.84 --db-sock=/run/ovn/ovnsb_db.sock --db-sb-pid=/run/ovn/ovnsb_db.pid --db-sb-file=/var/lib/openvswitch/ovn-sb/ovnsb.db --ovn-sb-logfile=/var/log/kolla/openvswitch/ovn-sb-db.log ``` The script you pointed to me starts both nb-db and sb-db without "run_sb_ovsdb". But I don't think that really matters. In this case, I assume "ovnsb.db" will be initialized properly? I checked code, that "stale data" is caused by some index mismatch. Any clues? Thanks! Tony From: Numan Siddique Sent: July 23, 2020 11:54 AM To: Tony Liu Cc: ovs-dev ; ovs-discuss@openvswitch.org Subject: Re: [ovs-discuss] OVN nb-db and sb-db out of sync On Thu, Jul 23, 2020 at 11:35 PM Tony Liu mailto:tonyliu0...@hotmail.com>> wrote: Hi Numan, I did each of the followings on all 3 OVN DB nodes. ``` docker stop ovn_sb_db mv /var/lib/docker/volumes/ovn_sb_db/_data/ovnsb.db /var/lib/docker/volumes/ovn_sb_db/_data/ovnsb.db.bak docker start ovn_sb_db docker restart ovn_northd ``` I see new DB file is created, but I got complaints from ovn-northd. ``` 2020-07-22T23:37:27.274Z|80540|ovsdb_idl|WARN|tcp:10.6.20.84:6642<http://10.6.20.84:6642>: clustered database server has stale data; trying another server ``` Should I use ovsdb-tool to initialize the DB, instead of relying on ovn-sb-db, or something else I am missing? I would suggest to use ovn-ctl for initializing/starting the cluster. Please take a look at this as an example - https://github.com/ovn-org/ovn-fake-multinode/blob/master/ovn_cluster.sh#L337 Thanks Numan I also tried to use "ovn-sbctl destroy" to remove the record, but onv-sbctl is stuck there forever. Thanks! Tony From: Numan Siddique mailto:num...@ovn.org>> Sent: July 23, 2020 03:15 AM To: Tony Liu mailto:tonyliu0...@hotmail.com>> Cc: ovs-dev mailto:ovs-...@openvswitch.org>>; ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org> mailto:ovs-discuss@openvswitch.org>> Subject: Re: [ovs-discuss] OVN nb-db and sb-db out of sync On Thu, Jul 23, 2020 at 8:22 AM Tony Liu mailto:tonyliu0...@hotmail.com>> wrote: Hi, I see why sb-db broke at 1568th port-binding. The 1568th datapath-binding in sb-db references the same _uuid : 108cf745-db82-43c0-a9d3-afe27a41e4aa external_ids: {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192} tunnel_key : 1567 _uuid : d934ed92-2f3c-4b31-8a76-2a5047a3bb46 external_ids: {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192} tunnel_key : 1568 I don't believe this is supposed to happen. Any idea how could it happen? Then ovn-northd is stuck in trying to delete this duplication, and it ignores all the following updates. That caused out-of-sync between nb-db and sb-db. Any way I can fix it manually, like with ovn-sbctl to delete it? If you delete the ovn sb db resources manually, ovn-northd should sync it up. But I'm surprised why ovn-northd didn't sync earlier. There's something wrong related to raft going on here. Not sure what. Thanks Numan Thanks! Tony From: dev mailto:ovs-dev-boun...@openvswitch.org>> on behalf of Tony Liu mailto:tonyliu0...@hotmail.com>> Sent: July 22, 2020 11:33 AM To: ovs-dev mailto:ovs-...@openvswitch.org>> Subject: [ovs-dev] OVN nb-db and sb-db out of sync Hi, During a scaling test where 4000 networks are created from OpenStack, I see that nb-db and sb-db are out of sync. All 4000 logical switches and 8000 LS ports (GW port and service port of each network) are created in nb-db. In sb-db, only 1567 port-bindings, 4000 is expected. [root@ovn-db-2 ~]# ovn-nbctl list nb_global _uuid : b7b3aa05-f7ed-4dbc-979f-10445ac325b8 connections : [] external_ids: {"neutron:liveness_check_at"="2020-07-22 04:03:17.726917+00:00"} hv_cfg : 312 ipsec : false name: "" nb_cfg : 2636 options : {mac_prefix="ca:e8:07", svc_monitor_mac="4e:d0:3a:80:d4:b7"} sb_cfg : 2005 ssl : [] [root@ovn-db-2 ~]# ovn-sbctl list sb_global _uuid : 3720bc1d-b0da-47ce-85ca-96fa8d398489 connections : [] external_ids: {} ipsec : false nb_cfg : 312 options : {mac_prefix="ca:e8:07", svc_monitor_mac="4e:d0:3a:80:d4:b7"} ssl : [] Is there any way to force ovn-northd to rebuild sb-db t
Re: [ovs-discuss] OVN nb-db and sb-db out of sync
On Thu, Jul 23, 2020 at 11:35 PM Tony Liu wrote: > Hi Numan, > > I did each of the followings on all 3 OVN DB nodes. > ``` > docker stop ovn_sb_db > mv /var/lib/docker/volumes/ovn_sb_db/_data/ovnsb.db > /var/lib/docker/volumes/ovn_sb_db/_data/ovnsb.db.bak > docker start ovn_sb_db > docker restart ovn_northd > ``` > > I see new DB file is created, but I got complaints from ovn-northd. > ``` > 2020-07-22T23:37:27.274Z|80540|ovsdb_idl|WARN|tcp:10.6.20.84:6642: > clustered database server has stale data; trying another server > ``` > > Should I use ovsdb-tool to initialize the DB, instead of relying on > ovn-sb-db, or something else I am missing? > I would suggest to use ovn-ctl for initializing/starting the cluster. Please take a look at this as an example - https://github.com/ovn-org/ovn-fake-multinode/blob/master/ovn_cluster.sh#L337 Thanks Numan > > I also tried to use "ovn-sbctl destroy" to remove the record, but > onv-sbctl is stuck there forever. > > > Thanks! > > Tony > > -- > *From:* Numan Siddique > *Sent:* July 23, 2020 03:15 AM > *To:* Tony Liu > *Cc:* ovs-dev ; ovs-discuss@openvswitch.org < > ovs-discuss@openvswitch.org> > *Subject:* Re: [ovs-discuss] OVN nb-db and sb-db out of sync > > > > On Thu, Jul 23, 2020 at 8:22 AM Tony Liu wrote: > > Hi, > > I see why sb-db broke at 1568th port-binding. > The 1568th datapath-binding in sb-db references the same > > _uuid : 108cf745-db82-43c0-a9d3-afe27a41e4aa > external_ids: > {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", > name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192} > tunnel_key : 1567 > > _uuid : d934ed92-2f3c-4b31-8a76-2a5047a3bb46 > external_ids: > {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", > name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192} > tunnel_key : 1568 > > I don't believe this is supposed to happen. Any idea how could it happen? > Then ovn-northd is stuck in trying to delete this duplication, and it > ignores all the following updates. > That caused out-of-sync between nb-db and sb-db. > Any way I can fix it manually, like with ovn-sbctl to delete it? > > > If you delete the ovn sb db resources manually, ovn-northd should sync it > up. > > But I'm surprised why ovn-northd didn't sync earlier. There's something > wrong related to raft going > on here. Not sure what. > > Thanks > Numan > > > > > Thanks! > > Tony > > -- > *From:* dev on behalf of Tony Liu < > tonyliu0...@hotmail.com> > *Sent:* July 22, 2020 11:33 AM > *To:* ovs-dev > *Subject:* [ovs-dev] OVN nb-db and sb-db out of sync > > Hi, > > During a scaling test where 4000 networks are created from OpenStack, I > see that > nb-db and sb-db are out of sync. All 4000 logical switches and 8000 LS > ports > (GW port and service port of each network) are created in nb-db. In sb-db, > only 1567 port-bindings, 4000 is expected. > > [root@ovn-db-2 ~]# ovn-nbctl list nb_global > _uuid : b7b3aa05-f7ed-4dbc-979f-10445ac325b8 > connections : [] > external_ids: {"neutron:liveness_check_at"="2020-07-22 > 04:03:17.726917+00:00"} > hv_cfg : 312 > ipsec : false > name: "" > nb_cfg : 2636 > options : {mac_prefix="ca:e8:07", > svc_monitor_mac="4e:d0:3a:80:d4:b7"} > sb_cfg : 2005 > ssl : [] > > [root@ovn-db-2 ~]# ovn-sbctl list sb_global > _uuid : 3720bc1d-b0da-47ce-85ca-96fa8d398489 > connections : [] > external_ids: {} > ipsec : false > nb_cfg : 312 > options : {mac_prefix="ca:e8:07", > svc_monitor_mac="4e:d0:3a:80:d4:b7"} > ssl : [] > > Is there any way to force ovn-northd to rebuild sb-db to sync with nb-db, > like manipulating nb_cfg or anything else? Note, it's 3-node RAFT cluster > for both > nb-db and sb-db. > > Is that "incremental update" implemented in 20.03? > If not, in which release it's going to be available? > > > Thanks! > > Tony > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] OVN nb-db and sb-db out of sync
Hi Numan, I did each of the followings on all 3 OVN DB nodes. ``` docker stop ovn_sb_db mv /var/lib/docker/volumes/ovn_sb_db/_data/ovnsb.db /var/lib/docker/volumes/ovn_sb_db/_data/ovnsb.db.bak docker start ovn_sb_db docker restart ovn_northd ``` I see new DB file is created, but I got complaints from ovn-northd. ``` 2020-07-22T23:37:27.274Z|80540|ovsdb_idl|WARN|tcp:10.6.20.84:6642: clustered database server has stale data; trying another server ``` Should I use ovsdb-tool to initialize the DB, instead of relying on ovn-sb-db, or something else I am missing? I also tried to use "ovn-sbctl destroy" to remove the record, but onv-sbctl is stuck there forever. Thanks! Tony From: Numan Siddique Sent: July 23, 2020 03:15 AM To: Tony Liu Cc: ovs-dev ; ovs-discuss@openvswitch.org Subject: Re: [ovs-discuss] OVN nb-db and sb-db out of sync On Thu, Jul 23, 2020 at 8:22 AM Tony Liu mailto:tonyliu0...@hotmail.com>> wrote: Hi, I see why sb-db broke at 1568th port-binding. The 1568th datapath-binding in sb-db references the same _uuid : 108cf745-db82-43c0-a9d3-afe27a41e4aa external_ids: {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192} tunnel_key : 1567 _uuid : d934ed92-2f3c-4b31-8a76-2a5047a3bb46 external_ids: {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192} tunnel_key : 1568 I don't believe this is supposed to happen. Any idea how could it happen? Then ovn-northd is stuck in trying to delete this duplication, and it ignores all the following updates. That caused out-of-sync between nb-db and sb-db. Any way I can fix it manually, like with ovn-sbctl to delete it? If you delete the ovn sb db resources manually, ovn-northd should sync it up. But I'm surprised why ovn-northd didn't sync earlier. There's something wrong related to raft going on here. Not sure what. Thanks Numan Thanks! Tony From: dev mailto:ovs-dev-boun...@openvswitch.org>> on behalf of Tony Liu mailto:tonyliu0...@hotmail.com>> Sent: July 22, 2020 11:33 AM To: ovs-dev mailto:ovs-...@openvswitch.org>> Subject: [ovs-dev] OVN nb-db and sb-db out of sync Hi, During a scaling test where 4000 networks are created from OpenStack, I see that nb-db and sb-db are out of sync. All 4000 logical switches and 8000 LS ports (GW port and service port of each network) are created in nb-db. In sb-db, only 1567 port-bindings, 4000 is expected. [root@ovn-db-2 ~]# ovn-nbctl list nb_global _uuid : b7b3aa05-f7ed-4dbc-979f-10445ac325b8 connections : [] external_ids: {"neutron:liveness_check_at"="2020-07-22 04:03:17.726917+00:00"} hv_cfg : 312 ipsec : false name: "" nb_cfg : 2636 options : {mac_prefix="ca:e8:07", svc_monitor_mac="4e:d0:3a:80:d4:b7"} sb_cfg : 2005 ssl : [] [root@ovn-db-2 ~]# ovn-sbctl list sb_global _uuid : 3720bc1d-b0da-47ce-85ca-96fa8d398489 connections : [] external_ids: {} ipsec : false nb_cfg : 312 options : {mac_prefix="ca:e8:07", svc_monitor_mac="4e:d0:3a:80:d4:b7"} ssl : [] Is there any way to force ovn-northd to rebuild sb-db to sync with nb-db, like manipulating nb_cfg or anything else? Note, it's 3-node RAFT cluster for both nb-db and sb-db. Is that "incremental update" implemented in 20.03? If not, in which release it's going to be available? Thanks! Tony ___ dev mailing list d...@openvswitch.org<mailto:d...@openvswitch.org> https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ discuss mailing list disc...@openvswitch.org<mailto:disc...@openvswitch.org> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] OVN nb-db and sb-db out of sync
On Thu, Jul 23, 2020 at 8:22 AM Tony Liu wrote: > Hi, > > I see why sb-db broke at 1568th port-binding. > The 1568th datapath-binding in sb-db references the same > > _uuid : 108cf745-db82-43c0-a9d3-afe27a41e4aa > external_ids: > {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", > name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192} > tunnel_key : 1567 > > _uuid : d934ed92-2f3c-4b31-8a76-2a5047a3bb46 > external_ids: > {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", > name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192} > tunnel_key : 1568 > > I don't believe this is supposed to happen. Any idea how could it happen? > Then ovn-northd is stuck in trying to delete this duplication, and it > ignores all the following updates. > That caused out-of-sync between nb-db and sb-db. > Any way I can fix it manually, like with ovn-sbctl to delete it? > If you delete the ovn sb db resources manually, ovn-northd should sync it up. But I'm surprised why ovn-northd didn't sync earlier. There's something wrong related to raft going on here. Not sure what. Thanks Numan > > Thanks! > > Tony > > -- > *From:* dev on behalf of Tony Liu < > tonyliu0...@hotmail.com> > *Sent:* July 22, 2020 11:33 AM > *To:* ovs-dev > *Subject:* [ovs-dev] OVN nb-db and sb-db out of sync > > Hi, > > During a scaling test where 4000 networks are created from OpenStack, I > see that > nb-db and sb-db are out of sync. All 4000 logical switches and 8000 LS > ports > (GW port and service port of each network) are created in nb-db. In sb-db, > only 1567 port-bindings, 4000 is expected. > > [root@ovn-db-2 ~]# ovn-nbctl list nb_global > _uuid : b7b3aa05-f7ed-4dbc-979f-10445ac325b8 > connections : [] > external_ids: {"neutron:liveness_check_at"="2020-07-22 > 04:03:17.726917+00:00"} > hv_cfg : 312 > ipsec : false > name: "" > nb_cfg : 2636 > options : {mac_prefix="ca:e8:07", > svc_monitor_mac="4e:d0:3a:80:d4:b7"} > sb_cfg : 2005 > ssl : [] > > [root@ovn-db-2 ~]# ovn-sbctl list sb_global > _uuid : 3720bc1d-b0da-47ce-85ca-96fa8d398489 > connections : [] > external_ids: {} > ipsec : false > nb_cfg : 312 > options : {mac_prefix="ca:e8:07", > svc_monitor_mac="4e:d0:3a:80:d4:b7"} > ssl : [] > > Is there any way to force ovn-northd to rebuild sb-db to sync with nb-db, > like manipulating nb_cfg or anything else? Note, it's 3-node RAFT cluster > for both > nb-db and sb-db. > > Is that "incremental update" implemented in 20.03? > If not, in which release it's going to be available? > > > Thanks! > > Tony > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] OVN nb-db and sb-db out of sync
Hi, I see why sb-db broke at 1568th port-binding. The 1568th datapath-binding in sb-db references the same _uuid : 108cf745-db82-43c0-a9d3-afe27a41e4aa external_ids: {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192} tunnel_key : 1567 _uuid : d934ed92-2f3c-4b31-8a76-2a5047a3bb46 external_ids: {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192} tunnel_key : 1568 I don't believe this is supposed to happen. Any idea how could it happen? Then ovn-northd is stuck in trying to delete this duplication, and it ignores all the following updates. That caused out-of-sync between nb-db and sb-db. Any way I can fix it manually, like with ovn-sbctl to delete it? Thanks! Tony From: dev on behalf of Tony Liu Sent: July 22, 2020 11:33 AM To: ovs-dev Subject: [ovs-dev] OVN nb-db and sb-db out of sync Hi, During a scaling test where 4000 networks are created from OpenStack, I see that nb-db and sb-db are out of sync. All 4000 logical switches and 8000 LS ports (GW port and service port of each network) are created in nb-db. In sb-db, only 1567 port-bindings, 4000 is expected. [root@ovn-db-2 ~]# ovn-nbctl list nb_global _uuid : b7b3aa05-f7ed-4dbc-979f-10445ac325b8 connections : [] external_ids: {"neutron:liveness_check_at"="2020-07-22 04:03:17.726917+00:00"} hv_cfg : 312 ipsec : false name: "" nb_cfg : 2636 options : {mac_prefix="ca:e8:07", svc_monitor_mac="4e:d0:3a:80:d4:b7"} sb_cfg : 2005 ssl : [] [root@ovn-db-2 ~]# ovn-sbctl list sb_global _uuid : 3720bc1d-b0da-47ce-85ca-96fa8d398489 connections : [] external_ids: {} ipsec : false nb_cfg : 312 options : {mac_prefix="ca:e8:07", svc_monitor_mac="4e:d0:3a:80:d4:b7"} ssl : [] Is there any way to force ovn-northd to rebuild sb-db to sync with nb-db, like manipulating nb_cfg or anything else? Note, it's 3-node RAFT cluster for both nb-db and sb-db. Is that "incremental update" implemented in 20.03? If not, in which release it's going to be available? Thanks! Tony ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss