Re: [ovs-discuss] OVN nb-db and sb-db out of sync

2020-07-23 Thread Tony Liu
Hi Numan,

This is how sb-db is brought up.
```
/usr/share/ovn/scripts/ovn-ctl run_sb_ovsdb --db-sb-create-insecure-remote=yes 
--db-sb-addr=10.6.20.84 --db-sb-cluster-local-addr=10.6.20.84  
--db-sock=/run/ovn/ovnsb_db.sock --db-sb-pid=/run/ovn/ovnsb_db.pid 
--db-sb-file=/var/lib/openvswitch/ovn-sb/ovnsb.db 
--ovn-sb-logfile=/var/log/kolla/openvswitch/ovn-sb-db.log
```
The script you pointed to me starts both nb-db and sb-db without 
"run_sb_ovsdb". But I don't think
that really matters. In this case, I assume "ovnsb.db" will be initialized 
properly?

I checked code, that "stale data" is caused by some index mismatch. Any clues?


Thanks!

Tony


From: Numan Siddique 
Sent: July 23, 2020 11:54 AM
To: Tony Liu 
Cc: ovs-dev ; ovs-discuss@openvswitch.org 

Subject: Re: [ovs-discuss] OVN nb-db and sb-db out of sync



On Thu, Jul 23, 2020 at 11:35 PM Tony Liu 
mailto:tonyliu0...@hotmail.com>> wrote:
Hi Numan,

I did each of the followings on all 3 OVN DB nodes.
```
docker stop ovn_sb_db
mv /var/lib/docker/volumes/ovn_sb_db/_data/ovnsb.db 
/var/lib/docker/volumes/ovn_sb_db/_data/ovnsb.db.bak
docker start ovn_sb_db
docker restart ovn_northd
```

I see new DB file is created, but I got complaints from ovn-northd.
```
2020-07-22T23:37:27.274Z|80540|ovsdb_idl|WARN|tcp:10.6.20.84:6642<http://10.6.20.84:6642>:
 clustered database server has stale data; trying another server
```

Should I use ovsdb-tool to initialize the DB, instead of relying on ovn-sb-db, 
or something else I am missing?

I would suggest to use ovn-ctl for initializing/starting the cluster.

Please take a look at this as an example - 
https://github.com/ovn-org/ovn-fake-multinode/blob/master/ovn_cluster.sh#L337

Thanks
Numan


I also tried to use "ovn-sbctl destroy" to remove the record, but onv-sbctl is 
stuck there forever.


Thanks!

Tony


From: Numan Siddique mailto:num...@ovn.org>>
Sent: July 23, 2020 03:15 AM
To: Tony Liu mailto:tonyliu0...@hotmail.com>>
Cc: ovs-dev mailto:ovs-...@openvswitch.org>>; 
ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org> 
mailto:ovs-discuss@openvswitch.org>>
Subject: Re: [ovs-discuss] OVN nb-db and sb-db out of sync



On Thu, Jul 23, 2020 at 8:22 AM Tony Liu 
mailto:tonyliu0...@hotmail.com>> wrote:
Hi,

I see why sb-db broke at 1568th port-binding.
The 1568th datapath-binding in sb-db references the same

_uuid   : 108cf745-db82-43c0-a9d3-afe27a41e4aa
external_ids: {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", 
name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192}
tunnel_key  : 1567

_uuid   : d934ed92-2f3c-4b31-8a76-2a5047a3bb46
external_ids: {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", 
name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192}
tunnel_key  : 1568

I don't believe this is supposed to happen. Any idea how could it happen?
Then ovn-northd is stuck in trying to delete this duplication, and it ignores 
all the following updates.
That caused out-of-sync between nb-db and sb-db.
Any way I can fix it manually, like with ovn-sbctl to delete it?

If you delete the ovn sb db resources manually, ovn-northd should sync it up.

But I'm surprised why ovn-northd didn't sync earlier. There's something wrong 
related to raft going
on here. Not sure what.

Thanks
Numan




Thanks!

Tony


From: dev 
mailto:ovs-dev-boun...@openvswitch.org>> on 
behalf of Tony Liu mailto:tonyliu0...@hotmail.com>>
Sent: July 22, 2020 11:33 AM
To: ovs-dev mailto:ovs-...@openvswitch.org>>
Subject: [ovs-dev] OVN nb-db and sb-db out of sync

Hi,

During a scaling test where 4000 networks are created from OpenStack, I see that
nb-db and sb-db are out of sync. All 4000 logical switches and 8000 LS ports
(GW port and service port of each network) are created in nb-db. In sb-db,
only 1567 port-bindings, 4000 is expected.

[root@ovn-db-2 ~]# ovn-nbctl list nb_global
_uuid   : b7b3aa05-f7ed-4dbc-979f-10445ac325b8
connections : []
external_ids: {"neutron:liveness_check_at"="2020-07-22 
04:03:17.726917+00:00"}
hv_cfg  : 312
ipsec   : false
name: ""
nb_cfg  : 2636
options : {mac_prefix="ca:e8:07", 
svc_monitor_mac="4e:d0:3a:80:d4:b7"}
sb_cfg  : 2005
ssl : []

[root@ovn-db-2 ~]# ovn-sbctl list sb_global
_uuid   : 3720bc1d-b0da-47ce-85ca-96fa8d398489
connections : []
external_ids: {}
ipsec   : false
nb_cfg  : 312
options : {mac_prefix="ca:e8:07", 
svc_monitor_mac="4e:d0:3a:80:d4:b7"}
ssl : []

Is there any way to force ovn-northd to rebuild sb-db t

Re: [ovs-discuss] OVN nb-db and sb-db out of sync

2020-07-23 Thread Numan Siddique
On Thu, Jul 23, 2020 at 11:35 PM Tony Liu  wrote:

> Hi Numan,
>
> I did each of the followings on all 3 OVN DB nodes.
> ```
> docker stop ovn_sb_db
> mv /var/lib/docker/volumes/ovn_sb_db/_data/ovnsb.db
> /var/lib/docker/volumes/ovn_sb_db/_data/ovnsb.db.bak
> docker start ovn_sb_db
> docker restart ovn_northd
> ```
>
> I see new DB file is created, but I got complaints from ovn-northd.
> ```
> 2020-07-22T23:37:27.274Z|80540|ovsdb_idl|WARN|tcp:10.6.20.84:6642:
> clustered database server has stale data; trying another server
> ```
>
> Should I use ovsdb-tool to initialize the DB, instead of relying on
> ovn-sb-db, or something else I am missing?
>

I would suggest to use ovn-ctl for initializing/starting the cluster.

Please take a look at this as an example -
https://github.com/ovn-org/ovn-fake-multinode/blob/master/ovn_cluster.sh#L337

Thanks
Numan


>
> I also tried to use "ovn-sbctl destroy" to remove the record, but
> onv-sbctl is stuck there forever.
>
>
> Thanks!
>
> Tony
>
> --
> *From:* Numan Siddique 
> *Sent:* July 23, 2020 03:15 AM
> *To:* Tony Liu 
> *Cc:* ovs-dev ; ovs-discuss@openvswitch.org <
> ovs-discuss@openvswitch.org>
> *Subject:* Re: [ovs-discuss] OVN nb-db and sb-db out of sync
>
>
>
> On Thu, Jul 23, 2020 at 8:22 AM Tony Liu  wrote:
>
> Hi,
>
> I see why sb-db broke at 1568th port-binding.
> The 1568th datapath-binding in sb-db references the same
>
> _uuid   : 108cf745-db82-43c0-a9d3-afe27a41e4aa
> external_ids:
> {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473",
> name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192}
> tunnel_key  : 1567
>
> _uuid   : d934ed92-2f3c-4b31-8a76-2a5047a3bb46
> external_ids:
> {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473",
> name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192}
> tunnel_key  : 1568
>
> I don't believe this is supposed to happen. Any idea how could it happen?
> Then ovn-northd is stuck in trying to delete this duplication, and it
> ignores all the following updates.
> That caused out-of-sync between nb-db and sb-db.
> Any way I can fix it manually, like with ovn-sbctl to delete it?
>
>
> If you delete the ovn sb db resources manually, ovn-northd should sync it
> up.
>
> But I'm surprised why ovn-northd didn't sync earlier. There's something
> wrong related to raft going
> on here. Not sure what.
>
> Thanks
> Numan
>
>
>
>
> Thanks!
>
> Tony
>
> --
> *From:* dev  on behalf of Tony Liu <
> tonyliu0...@hotmail.com>
> *Sent:* July 22, 2020 11:33 AM
> *To:* ovs-dev 
> *Subject:* [ovs-dev] OVN nb-db and sb-db out of sync
>
> Hi,
>
> During a scaling test where 4000 networks are created from OpenStack, I
> see that
> nb-db and sb-db are out of sync. All 4000 logical switches and 8000 LS
> ports
> (GW port and service port of each network) are created in nb-db. In sb-db,
> only 1567 port-bindings, 4000 is expected.
>
> [root@ovn-db-2 ~]# ovn-nbctl list nb_global
> _uuid   : b7b3aa05-f7ed-4dbc-979f-10445ac325b8
> connections : []
> external_ids: {"neutron:liveness_check_at"="2020-07-22
> 04:03:17.726917+00:00"}
> hv_cfg  : 312
> ipsec   : false
> name: ""
> nb_cfg  : 2636
> options : {mac_prefix="ca:e8:07",
> svc_monitor_mac="4e:d0:3a:80:d4:b7"}
> sb_cfg  : 2005
> ssl : []
>
> [root@ovn-db-2 ~]# ovn-sbctl list sb_global
> _uuid   : 3720bc1d-b0da-47ce-85ca-96fa8d398489
> connections : []
> external_ids: {}
> ipsec   : false
> nb_cfg  : 312
> options : {mac_prefix="ca:e8:07",
> svc_monitor_mac="4e:d0:3a:80:d4:b7"}
> ssl : []
>
> Is there any way to force ovn-northd to rebuild sb-db to sync with nb-db,
> like manipulating nb_cfg or anything else? Note, it's 3-node RAFT cluster
> for both
> nb-db and sb-db.
>
> Is that "incremental update" implemented in 20.03?
> If not, in which release it's going to be available?
>
>
> Thanks!
>
> Tony
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN nb-db and sb-db out of sync

2020-07-23 Thread Tony Liu
Hi Numan,

I did each of the followings on all 3 OVN DB nodes.
```
docker stop ovn_sb_db
mv /var/lib/docker/volumes/ovn_sb_db/_data/ovnsb.db 
/var/lib/docker/volumes/ovn_sb_db/_data/ovnsb.db.bak
docker start ovn_sb_db
docker restart ovn_northd
```

I see new DB file is created, but I got complaints from ovn-northd.
```
2020-07-22T23:37:27.274Z|80540|ovsdb_idl|WARN|tcp:10.6.20.84:6642: clustered 
database server has stale data; trying another server
```

Should I use ovsdb-tool to initialize the DB, instead of relying on ovn-sb-db, 
or something else I am missing?

I also tried to use "ovn-sbctl destroy" to remove the record, but onv-sbctl is 
stuck there forever.


Thanks!

Tony


From: Numan Siddique 
Sent: July 23, 2020 03:15 AM
To: Tony Liu 
Cc: ovs-dev ; ovs-discuss@openvswitch.org 

Subject: Re: [ovs-discuss] OVN nb-db and sb-db out of sync



On Thu, Jul 23, 2020 at 8:22 AM Tony Liu 
mailto:tonyliu0...@hotmail.com>> wrote:
Hi,

I see why sb-db broke at 1568th port-binding.
The 1568th datapath-binding in sb-db references the same

_uuid   : 108cf745-db82-43c0-a9d3-afe27a41e4aa
external_ids: {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", 
name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192}
tunnel_key  : 1567

_uuid   : d934ed92-2f3c-4b31-8a76-2a5047a3bb46
external_ids: {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", 
name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192}
tunnel_key  : 1568

I don't believe this is supposed to happen. Any idea how could it happen?
Then ovn-northd is stuck in trying to delete this duplication, and it ignores 
all the following updates.
That caused out-of-sync between nb-db and sb-db.
Any way I can fix it manually, like with ovn-sbctl to delete it?

If you delete the ovn sb db resources manually, ovn-northd should sync it up.

But I'm surprised why ovn-northd didn't sync earlier. There's something wrong 
related to raft going
on here. Not sure what.

Thanks
Numan




Thanks!

Tony


From: dev 
mailto:ovs-dev-boun...@openvswitch.org>> on 
behalf of Tony Liu mailto:tonyliu0...@hotmail.com>>
Sent: July 22, 2020 11:33 AM
To: ovs-dev mailto:ovs-...@openvswitch.org>>
Subject: [ovs-dev] OVN nb-db and sb-db out of sync

Hi,

During a scaling test where 4000 networks are created from OpenStack, I see that
nb-db and sb-db are out of sync. All 4000 logical switches and 8000 LS ports
(GW port and service port of each network) are created in nb-db. In sb-db,
only 1567 port-bindings, 4000 is expected.

[root@ovn-db-2 ~]# ovn-nbctl list nb_global
_uuid   : b7b3aa05-f7ed-4dbc-979f-10445ac325b8
connections : []
external_ids: {"neutron:liveness_check_at"="2020-07-22 
04:03:17.726917+00:00"}
hv_cfg  : 312
ipsec   : false
name: ""
nb_cfg  : 2636
options : {mac_prefix="ca:e8:07", 
svc_monitor_mac="4e:d0:3a:80:d4:b7"}
sb_cfg  : 2005
ssl : []

[root@ovn-db-2 ~]# ovn-sbctl list sb_global
_uuid   : 3720bc1d-b0da-47ce-85ca-96fa8d398489
connections : []
external_ids: {}
ipsec   : false
nb_cfg  : 312
options : {mac_prefix="ca:e8:07", 
svc_monitor_mac="4e:d0:3a:80:d4:b7"}
ssl : []

Is there any way to force ovn-northd to rebuild sb-db to sync with nb-db,
like manipulating nb_cfg or anything else? Note, it's 3-node RAFT cluster for 
both
nb-db and sb-db.

Is that "incremental update" implemented in 20.03?
If not, in which release it's going to be available?


Thanks!

Tony

___
dev mailing list
d...@openvswitch.org<mailto:d...@openvswitch.org>
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
discuss mailing list
disc...@openvswitch.org<mailto:disc...@openvswitch.org>
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN nb-db and sb-db out of sync

2020-07-23 Thread Numan Siddique
On Thu, Jul 23, 2020 at 8:22 AM Tony Liu  wrote:

> Hi,
>
> I see why sb-db broke at 1568th port-binding.
> The 1568th datapath-binding in sb-db references the same
>
> _uuid   : 108cf745-db82-43c0-a9d3-afe27a41e4aa
> external_ids:
> {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473",
> name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192}
> tunnel_key  : 1567
>
> _uuid   : d934ed92-2f3c-4b31-8a76-2a5047a3bb46
> external_ids:
> {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473",
> name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192}
> tunnel_key  : 1568
>
> I don't believe this is supposed to happen. Any idea how could it happen?
> Then ovn-northd is stuck in trying to delete this duplication, and it
> ignores all the following updates.
> That caused out-of-sync between nb-db and sb-db.
> Any way I can fix it manually, like with ovn-sbctl to delete it?
>

If you delete the ovn sb db resources manually, ovn-northd should sync it
up.

But I'm surprised why ovn-northd didn't sync earlier. There's something
wrong related to raft going
on here. Not sure what.

Thanks
Numan



>
> Thanks!
>
> Tony
>
> --
> *From:* dev  on behalf of Tony Liu <
> tonyliu0...@hotmail.com>
> *Sent:* July 22, 2020 11:33 AM
> *To:* ovs-dev 
> *Subject:* [ovs-dev] OVN nb-db and sb-db out of sync
>
> Hi,
>
> During a scaling test where 4000 networks are created from OpenStack, I
> see that
> nb-db and sb-db are out of sync. All 4000 logical switches and 8000 LS
> ports
> (GW port and service port of each network) are created in nb-db. In sb-db,
> only 1567 port-bindings, 4000 is expected.
>
> [root@ovn-db-2 ~]# ovn-nbctl list nb_global
> _uuid   : b7b3aa05-f7ed-4dbc-979f-10445ac325b8
> connections : []
> external_ids: {"neutron:liveness_check_at"="2020-07-22
> 04:03:17.726917+00:00"}
> hv_cfg  : 312
> ipsec   : false
> name: ""
> nb_cfg  : 2636
> options : {mac_prefix="ca:e8:07",
> svc_monitor_mac="4e:d0:3a:80:d4:b7"}
> sb_cfg  : 2005
> ssl : []
>
> [root@ovn-db-2 ~]# ovn-sbctl list sb_global
> _uuid   : 3720bc1d-b0da-47ce-85ca-96fa8d398489
> connections : []
> external_ids: {}
> ipsec   : false
> nb_cfg  : 312
> options : {mac_prefix="ca:e8:07",
> svc_monitor_mac="4e:d0:3a:80:d4:b7"}
> ssl : []
>
> Is there any way to force ovn-northd to rebuild sb-db to sync with nb-db,
> like manipulating nb_cfg or anything else? Note, it's 3-node RAFT cluster
> for both
> nb-db and sb-db.
>
> Is that "incremental update" implemented in 20.03?
> If not, in which release it's going to be available?
>
>
> Thanks!
>
> Tony
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN nb-db and sb-db out of sync

2020-07-22 Thread Tony Liu
Hi,

I see why sb-db broke at 1568th port-binding.
The 1568th datapath-binding in sb-db references the same

_uuid   : 108cf745-db82-43c0-a9d3-afe27a41e4aa
external_ids: {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", 
name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192}
tunnel_key  : 1567

_uuid   : d934ed92-2f3c-4b31-8a76-2a5047a3bb46
external_ids: {logical-switch="8a5d1d3c-e9fc-4cbe-a461-98ff838e6473", 
name=neutron-e907dc17-f1e8-4217-a37d-86e9a98c86c2, name2=net-97-192}
tunnel_key  : 1568

I don't believe this is supposed to happen. Any idea how could it happen?
Then ovn-northd is stuck in trying to delete this duplication, and it ignores 
all the following updates.
That caused out-of-sync between nb-db and sb-db.
Any way I can fix it manually, like with ovn-sbctl to delete it?


Thanks!

Tony


From: dev  on behalf of Tony Liu 

Sent: July 22, 2020 11:33 AM
To: ovs-dev 
Subject: [ovs-dev] OVN nb-db and sb-db out of sync

Hi,

During a scaling test where 4000 networks are created from OpenStack, I see that
nb-db and sb-db are out of sync. All 4000 logical switches and 8000 LS ports
(GW port and service port of each network) are created in nb-db. In sb-db,
only 1567 port-bindings, 4000 is expected.

[root@ovn-db-2 ~]# ovn-nbctl list nb_global
_uuid   : b7b3aa05-f7ed-4dbc-979f-10445ac325b8
connections : []
external_ids: {"neutron:liveness_check_at"="2020-07-22 
04:03:17.726917+00:00"}
hv_cfg  : 312
ipsec   : false
name: ""
nb_cfg  : 2636
options : {mac_prefix="ca:e8:07", 
svc_monitor_mac="4e:d0:3a:80:d4:b7"}
sb_cfg  : 2005
ssl : []

[root@ovn-db-2 ~]# ovn-sbctl list sb_global
_uuid   : 3720bc1d-b0da-47ce-85ca-96fa8d398489
connections : []
external_ids: {}
ipsec   : false
nb_cfg  : 312
options : {mac_prefix="ca:e8:07", 
svc_monitor_mac="4e:d0:3a:80:d4:b7"}
ssl : []

Is there any way to force ovn-northd to rebuild sb-db to sync with nb-db,
like manipulating nb_cfg or anything else? Note, it's 3-node RAFT cluster for 
both
nb-db and sb-db.

Is that "incremental update" implemented in 20.03?
If not, in which release it's going to be available?


Thanks!

Tony

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss