[ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

2020-07-31 Thread Tony Liu
Hi,

I see the active ovn-northd takes much CPU (30% - 100%) when there is no
configuration from OpenStack, nothing happening on all chassis nodes either.

Is this expected? What is it busy with?


2020-07-31T23:08:09.511Z|04267|poll_loop|DBG|wakeup due to [POLLIN] on fd 8 
(10.6.20.84:44358<->10.6.20.84:6641) at lib/stream-fd.c:157 (68% CPU usage)
2020-07-31T23:08:09.512Z|04268|jsonrpc|DBG|tcp:10.6.20.84:6641: received 
request, method="echo", params=[], id="echo"
2020-07-31T23:08:09.512Z|04269|jsonrpc|DBG|tcp:10.6.20.84:6641: send reply, 
result=[], id="echo"
2020-07-31T23:08:12.777Z|04270|poll_loop|DBG|wakeup due to [POLLIN] on fd 9 
(10.6.20.84:49158<->10.6.20.85:6642) at lib/stream-fd.c:157 (34% CPU usage)
2020-07-31T23:08:12.777Z|04271|reconnect|DBG|tcp:10.6.20.85:6642: idle 5002 ms, 
sending inactivity probe
2020-07-31T23:08:12.777Z|04272|reconnect|DBG|tcp:10.6.20.85:6642: entering IDLE
2020-07-31T23:08:12.777Z|04273|jsonrpc|DBG|tcp:10.6.20.85:6642: send request, 
method="echo", params=[], id="echo"
2020-07-31T23:08:12.777Z|04274|jsonrpc|DBG|tcp:10.6.20.85:6642: received 
request, method="echo", params=[], id="echo"
2020-07-31T23:08:12.777Z|04275|reconnect|DBG|tcp:10.6.20.85:6642: entering 
ACTIVE
2020-07-31T23:08:12.777Z|04276|jsonrpc|DBG|tcp:10.6.20.85:6642: send reply, 
result=[], id="echo"
2020-07-31T23:08:13.635Z|04277|poll_loop|DBG|wakeup due to [POLLIN] on fd 9 
(10.6.20.84:49158<->10.6.20.85:6642) at lib/stream-fd.c:157 (34% CPU usage)
2020-07-31T23:08:13.635Z|04278|jsonrpc|DBG|tcp:10.6.20.85:6642: received reply, 
result=[], id="echo"
2020-07-31T23:08:14.480Z|04279|hmap|DBG|Dropped 129 log messages in last 5 
seconds (most recently, 0 seconds ago) due to excessive rate
2020-07-31T23:08:14.480Z|04280|hmap|DBG|lib/shash.c:112: 2 buckets with 6+ 
nodes, including 2 buckets with 6 nodes (32 nodes total across 32 buckets)
2020-07-31T23:08:14.513Z|04281|poll_loop|DBG|wakeup due to 27-ms timeout at 
lib/reconnect.c:643 (34% CPU usage)
2020-07-31T23:08:14.513Z|04282|reconnect|DBG|tcp:10.6.20.84:6641: idle 5001 ms, 
sending inactivity probe
2020-07-31T23:08:14.513Z|04283|reconnect|DBG|tcp:10.6.20.84:6641: entering IDLE
2020-07-31T23:08:14.513Z|04284|jsonrpc|DBG|tcp:10.6.20.84:6641: send request, 
method="echo", params=[], id="echo"
2020-07-31T23:08:15.370Z|04285|poll_loop|DBG|wakeup due to [POLLIN] on fd 8 
(10.6.20.84:44358<->10.6.20.84:6641) at lib/stream-fd.c:157 (34% CPU usage)
2020-07-31T23:08:15.370Z|04286|jsonrpc|DBG|tcp:10.6.20.84:6641: received 
request, method="echo", params=[], id="echo"
2020-07-31T23:08:15.370Z|04287|reconnect|DBG|tcp:10.6.20.84:6641: entering 
ACTIVE
2020-07-31T23:08:15.370Z|04288|jsonrpc|DBG|tcp:10.6.20.84:6641: send reply, 
result=[], id="echo"
2020-07-31T23:08:16.236Z|04289|poll_loop|DBG|wakeup due to 0-ms timeout at 
tcp:10.6.20.84:6641 (100% CPU usage)
2020-07-31T23:08:16.236Z|04290|jsonrpc|DBG|tcp:10.6.20.84:6641: received reply, 
result=[], id="echo"
2020-07-31T23:08:17.778Z|04291|poll_loop|DBG|wakeup due to [POLLIN] on fd 9 
(10.6.20.84:49158<->10.6.20.85:6642) at lib/stream-fd.c:157 (100% CPU usage)
2020-07-31T23:08:17.778Z|04292|jsonrpc|DBG|tcp:10.6.20.85:6642: received 
request, method="echo", params=[], id="echo"
2020-07-31T23:08:17.778Z|04293|jsonrpc|DBG|tcp:10.6.20.85:6642: send reply, 
result=[], id="echo"
2020-07-31T23:08:20.372Z|04294|poll_loop|DBG|wakeup due to [POLLIN] on fd 8 
(10.6.20.84:44358<->10.6.20.84:6641) at lib/stream-fd.c:157 (41% CPU usage)
2020-07-31T23:08:20.372Z|04295|reconnect|DBG|tcp:10.6.20.84:6641: idle 5002 ms, 
sending inactivity probe
2020-07-31T23:08:20.372Z|04296|reconnect|DBG|tcp:10.6.20.84:6641: entering IDLE
2020-07-31T23:08:20.372Z|04297|jsonrpc|DBG|tcp:10.6.20.84:6641: send request, 
method="echo", params=[], id="echo"
2020-07-31T23:08:20.372Z|04298|jsonrpc|DBG|tcp:10.6.20.84:6641: received 
request, method="echo", params=[], id="echo"
2020-07-31T23:08:20.372Z|04299|reconnect|DBG|tcp:10.6.20.84:6641: entering 
ACTIVE
2020-07-31T23:08:20.372Z|04300|jsonrpc|DBG|tcp:10.6.20.84:6641: send reply, 
result=[], id="echo"
2020-07-31T23:08:20.376Z|04301|hmap|DBG|Dropped 181 log messages in last 6 
seconds (most recently, 1 seconds ago) due to excessive rate
2020-07-31T23:08:20.376Z|04302|hmap|DBG|northd/ovn-northd.c:595: 2 buckets with 
6+ nodes, including 2 buckets with 6 nodes (256 nodes total across 256 buckets)
2020-07-31T23:08:21.222Z|04303|poll_loop|DBG|wakeup due to [POLLIN] on fd 8 
(10.6.20.84:44358<->10.6.20.84:6641) at lib/stream-fd.c:157 (41% CPU usage)
2020-07-31T23:08:21.223Z|04304|jsonrpc|DBG|tcp:10.6.20.84:6641: received reply, 
result=[], id="echo"
2020-07-31T23:08:22.779Z|04305|poll_loop|DBG|wakeup due to 706-ms timeout at 
lib/reconnect.c:643 (41% CPU usage)
2020-07-31T23:08:22.779Z|04306|reconnect|DBG|tcp:10.6.20.85:6642: idle 5001 ms, 
sending inactivity probe
2020-07-31T23:08:22.779Z|04307|reconnect|DBG|tcp:10.6.20.85:6642: entering IDLE
2020-07-31T23:08:22.779Z|04308|jsonrpc|DBG|tcp:10.6.20.85:6642: 

Re: [ovs-discuss] [OVN] DB backup and restore

2020-07-31 Thread Tony Liu
Thanks Han! It's clear!

Tony

> -Original Message-
> From: Han Zhou 
> Sent: Friday, July 31, 2020 10:11 AM
> To: Tony Liu 
> Cc: Han Zhou ; Numan Siddique ; ovs-
> dev ; ovs-discuss 
> Subject: Re: [ovs-discuss] [OVN] DB backup and restore
> 
> 
> 
> On Thu, Jul 30, 2020 at 11:07 PM Tony Liu   > wrote:
> 
> 
>   Hi Han,
> 
>   ovsdb-client backup and restore work as expected. Sorry for the
> false alarm.
>   I messed up with the container. When restore the snapshot for nb-db,
> sb-db is
>   updated accordingly by ovn-northd.
> 
> 
> 
> Great!
> 
> 
> 
>   I think this man page should be updated saying RAFT cluster is also
> supported
>   by backup and restore.
>   http://www.openvswitch.org/support/dist-docs/ovsdb-client.1.txt
> 
> 
> 
> I didn't see it saying RAFT cluster is not supported in the above
> document. Probably you misunderstood this statement:
> "Reads  snapshot,  which  must  be  a OVSDB standalone or active-backup
> database"
> The backup file you generated from ovsdb-client backup command is in
> OVSDB standalone format, which is mentioned in the "backup" document.
> 
> 
> In addition, the document ovsdb(7) also made it clear that this is the
> right way to backup/restore clustered DB.
> Did this clarify?
> 
> 
> 
>   Thanks!
> 
>   Tony
>   > -Original Message-
>   > From: Han Zhou mailto:hz...@ovn.org> >
>   > Sent: Thursday, July 30, 2020 7:19 PM
>   > To: Tony Liu   >
>   > Cc: Numan Siddique   >; Han Zhou   >; ovs-
>   > dev mailto:ovs-...@openvswitch.org> >;
> ovs-discuss mailto:ovs-
> disc...@openvswitch.org> >
>   > Subject: Re: [ovs-discuss] [OVN] DB backup and restore
>   >
>   >
>   >
>   > On Thu, Jul 30, 2020 at 7:04 PM Tony Liu  
>   >   > > wrote:
>   >
>   >
>   >   Hi,
>   >
>   >
>   >
>   >   Just update, finally make this snapshot/rollback work for
> me.
>   >
>   >   The rollback is not live though. Here is what I did.
>   >
>   >
>   >
>   >   1. Make a snapshot by ovsdb-client. Assuming no ongoing
>   >
>   >  Transactions, and data is consistent on all nodes. The
>   >
>   >  Snapshot can be done on any node. It doesn't include any
>   >
>   >  cluster info. That's probably why the man page says this
> is
>   >
>   >  for standalone and A/B only. But that cluster info seems
>   >
>   >  not required to restore.
>   >
>   >
>   >
>   >   2. To rollback/restore, stop services on all nodes,
> starting
>   >
>   >  from followers to the leader.
>   >
>   >
>   >
>   >   3. Pick a node as the new leader, copy snapshot to be the
> DB
>   >
>   >  file. Then start the service. A cluster with new cluster
> ID
>   >
>   >  will be created. The node will be allocated a new server
> ID
>   >
>   >  as well.
>   >
>   >
>   >
>   >   4. On the rest two nodes, remove the DB file, restart
> service
>   >
>   >  with remote-address pointing to the leader.
>   >
>   >
>   >
>   >   Now, the new cluster starts working with the rollback data.
>   >
>   >
>   > The steps you gave may work, but it is weird. It is better to
> just
>   > follow the steps mentioned in this section:
>   >
>   >
> https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb.7
>   > .rst#backing-up-and-restoring-a-database
>   >
>   >
>   >
>   >
>   >
>   >
>   >   "ovs-client restore" doesn't work for me, not sure why.
>   >
>   >   
>   >
>   >   ovsdb-client: ovsdb error: /dev/stdin: cannot identify file
> type
>   >
>   >   
>   >
>   >   I tried to restore the snapshot created by backup, also the
>   >
>   >   Directly copied DB file, neither of them works. Wondering
> anyone
>   >
>   >   experienced such issue?
>   >
>   >
>   >
>   > Maybe your command was wrong. Could you share your command line,
> and the
>   > version used?
>   >
>   >
>   >
>   >
>   >
>   >   To Numan, it would great if you could share the details to
> use
>   >
>   >   Neutron-ovn-sync-util.
>   >
>   >
>   >
>   >
>   >
>   >   Thanks!
>   >
>   >
>   >
>   >   Tony
>   >
>   >
>   >
>   >   From: Tony Liu   >
>   >   Sent: Thursday, July 30, 2020 4:51 PM
>   >   

Re: [ovs-discuss] [OVN] DB backup and restore

2020-07-31 Thread Han Zhou
On Thu, Jul 30, 2020 at 11:07 PM Tony Liu  wrote:

> Hi Han,
>
> ovsdb-client backup and restore work as expected. Sorry for the false
> alarm.
> I messed up with the container. When restore the snapshot for nb-db, sb-db
> is
> updated accordingly by ovn-northd.
>
> Great!


> I think this man page should be updated saying RAFT cluster is also
> supported
> by backup and restore.
> http://www.openvswitch.org/support/dist-docs/ovsdb-client.1.txt
>
> I didn't see it saying RAFT cluster is not supported in the above
document. Probably you misunderstood this statement:

"Reads  snapshot,  which  must  be  a OVSDB standalone or
active-backup  database"

The backup file you generated from ovsdb-client backup command is in OVSDB
standalone format, which is mentioned in the "backup" document.

In addition, the document ovsdb(7) also made it clear that this is the
right way to backup/restore clustered DB.
Did this clarify?


> Thanks!
>
> Tony
> > -Original Message-
> > From: Han Zhou 
> > Sent: Thursday, July 30, 2020 7:19 PM
> > To: Tony Liu 
> > Cc: Numan Siddique ; Han Zhou ; ovs-
> > dev ; ovs-discuss 
> > Subject: Re: [ovs-discuss] [OVN] DB backup and restore
> >
> >
> >
> > On Thu, Jul 30, 2020 at 7:04 PM Tony Liu  >  > wrote:
> >
> >
> >   Hi,
> >
> >
> >
> >   Just update, finally make this snapshot/rollback work for me.
> >
> >   The rollback is not live though. Here is what I did.
> >
> >
> >
> >   1. Make a snapshot by ovsdb-client. Assuming no ongoing
> >
> >  Transactions, and data is consistent on all nodes. The
> >
> >  Snapshot can be done on any node. It doesn't include any
> >
> >  cluster info. That's probably why the man page says this is
> >
> >  for standalone and A/B only. But that cluster info seems
> >
> >  not required to restore.
> >
> >
> >
> >   2. To rollback/restore, stop services on all nodes, starting
> >
> >  from followers to the leader.
> >
> >
> >
> >   3. Pick a node as the new leader, copy snapshot to be the DB
> >
> >  file. Then start the service. A cluster with new cluster ID
> >
> >  will be created. The node will be allocated a new server ID
> >
> >  as well.
> >
> >
> >
> >   4. On the rest two nodes, remove the DB file, restart service
> >
> >  with remote-address pointing to the leader.
> >
> >
> >
> >   Now, the new cluster starts working with the rollback data.
> >
> >
> > The steps you gave may work, but it is weird. It is better to just
> > follow the steps mentioned in this section:
> >
> > https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb.7
> > .rst#backing-up-and-restoring-a-database
> >
> >
> >
> >
> >
> >
> >   "ovs-client restore" doesn't work for me, not sure why.
> >
> >   
> >
> >   ovsdb-client: ovsdb error: /dev/stdin: cannot identify file type
> >
> >   
> >
> >   I tried to restore the snapshot created by backup, also the
> >
> >   Directly copied DB file, neither of them works. Wondering anyone
> >
> >   experienced such issue?
> >
> >
> >
> > Maybe your command was wrong. Could you share your command line, and the
> > version used?
> >
> >
> >
> >
> >
> >   To Numan, it would great if you could share the details to use
> >
> >   Neutron-ovn-sync-util.
> >
> >
> >
> >
> >
> >   Thanks!
> >
> >
> >
> >   Tony
> >
> >
> >
> >   From: Tony Liu 
> >   Sent: Thursday, July 30, 2020 4:51 PM
> >   To: Numan Siddique  ; Han Zhou
> > 
> >   Cc: Han Zhou  ; ovs-dev  > d...@openvswitch.org> ; ovs-discuss 
> >   Subject: Re: [ovs-dev] [ovs-discuss] [OVN] DB backup and restore
> >
> >
> >
> >   Hi Numan,
> >
> >   I found this comment you made a few years back.
> >
> >   - At neutron-server startup, OVN ML2 driver syncs the neutron
> >   DB and OVN DB if sync mode is set to repair.
> >   - Admin can run the "neutron-ovn-db-sync-util" to sync the DBs.
> >
> >   Could you share the details to try those two options?
> >
> >
> >   Thanks!
> >
> >   Tony
> >
> >   From: Tony Liu
> >   Sent: Thursday, July 30, 2020 4:38 PM
> >   To: Han Zhou
> >   Cc: Han Zhou; ovs-dev > d...@openvswitch.org>; ovs-discuss
> >   Subject: Re: [ovs-dev] [ovs-discuss] [OVN] DB backup and restore
> >
> >   Hi,
> >
> >   I have another thought after some diggings. Since I am with
> >   OpenStack, all networking configurations are from OpenStack.
> >   I could snapshot OpenStack MariaDB, restore and run
> >   neutron-ovn-db-sync to update OVN DB. Would that be a cleaner
> >   solution?
> >
> >   

Re: [ovs-discuss] [OVN] DB backup and restore

2020-07-31 Thread Tony Liu
Hi Han,

ovsdb-client backup and restore work as expected. Sorry for the false alarm.
I messed up with the container. When restore the snapshot for nb-db, sb-db is
updated accordingly by ovn-northd.

I think this man page should be updated saying RAFT cluster is also supported
by backup and restore.
http://www.openvswitch.org/support/dist-docs/ovsdb-client.1.txt


Thanks!

Tony
> -Original Message-
> From: Han Zhou 
> Sent: Thursday, July 30, 2020 7:19 PM
> To: Tony Liu 
> Cc: Numan Siddique ; Han Zhou ; ovs-
> dev ; ovs-discuss 
> Subject: Re: [ovs-discuss] [OVN] DB backup and restore
> 
> 
> 
> On Thu, Jul 30, 2020 at 7:04 PM Tony Liu   > wrote:
> 
> 
>   Hi,
> 
> 
> 
>   Just update, finally make this snapshot/rollback work for me.
> 
>   The rollback is not live though. Here is what I did.
> 
> 
> 
>   1. Make a snapshot by ovsdb-client. Assuming no ongoing
> 
>  Transactions, and data is consistent on all nodes. The
> 
>  Snapshot can be done on any node. It doesn't include any
> 
>  cluster info. That's probably why the man page says this is
> 
>  for standalone and A/B only. But that cluster info seems
> 
>  not required to restore.
> 
> 
> 
>   2. To rollback/restore, stop services on all nodes, starting
> 
>  from followers to the leader.
> 
> 
> 
>   3. Pick a node as the new leader, copy snapshot to be the DB
> 
>  file. Then start the service. A cluster with new cluster ID
> 
>  will be created. The node will be allocated a new server ID
> 
>  as well.
> 
> 
> 
>   4. On the rest two nodes, remove the DB file, restart service
> 
>  with remote-address pointing to the leader.
> 
> 
> 
>   Now, the new cluster starts working with the rollback data.
> 
> 
> The steps you gave may work, but it is weird. It is better to just
> follow the steps mentioned in this section:
> 
> https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb.7
> .rst#backing-up-and-restoring-a-database
> 
> 
> 
> 
> 
> 
>   "ovs-client restore" doesn't work for me, not sure why.
> 
>   
> 
>   ovsdb-client: ovsdb error: /dev/stdin: cannot identify file type
> 
>   
> 
>   I tried to restore the snapshot created by backup, also the
> 
>   Directly copied DB file, neither of them works. Wondering anyone
> 
>   experienced such issue?
> 
> 
> 
> Maybe your command was wrong. Could you share your command line, and the
> version used?
> 
> 
> 
> 
> 
>   To Numan, it would great if you could share the details to use
> 
>   Neutron-ovn-sync-util.
> 
> 
> 
> 
> 
>   Thanks!
> 
> 
> 
>   Tony
> 
> 
> 
>   From: Tony Liu 
>   Sent: Thursday, July 30, 2020 4:51 PM
>   To: Numan Siddique  ; Han Zhou
> 
>   Cc: Han Zhou  ; ovs-dev  d...@openvswitch.org> ; ovs-discuss 
>   Subject: Re: [ovs-dev] [ovs-discuss] [OVN] DB backup and restore
> 
> 
> 
>   Hi Numan,
> 
>   I found this comment you made a few years back.
> 
>   - At neutron-server startup, OVN ML2 driver syncs the neutron
>   DB and OVN DB if sync mode is set to repair.
>   - Admin can run the "neutron-ovn-db-sync-util" to sync the DBs.
> 
>   Could you share the details to try those two options?
> 
> 
>   Thanks!
> 
>   Tony
> 
>   From: Tony Liu
>   Sent: Thursday, July 30, 2020 4:38 PM
>   To: Han Zhou
>   Cc: Han Zhou; ovs-dev d...@openvswitch.org>; ovs-discuss
>   Subject: Re: [ovs-dev] [ovs-discuss] [OVN] DB backup and restore
> 
>   Hi,
> 
>   I have another thought after some diggings. Since I am with
>   OpenStack, all networking configurations are from OpenStack.
>   I could snapshot OpenStack MariaDB, restore and run
>   neutron-ovn-db-sync to update OVN DB. Would that be a cleaner
>   solution?
> 
>   BTW, I got this error when restore the OVN DB.
>   ovsdb-client: ovsdb error: /dev/stdin: cannot identify file type
> 
>   The file was created by "backup" command.
> 
> 
>   Thanks!
> 
>   Tony
> 
>   From: Tony Liu
>   Sent: Thursday, July 30, 2020 3:41 PM
>   To: Han Zhou
>   Cc: Han Zhou; ovs-dev d...@openvswitch.org>; ovs-discuss
>   Subject: Re: [ovs-dev] [ovs-discuss] [OVN] DB backup and restore
> 
>   Hi,
> 
>   A quick question here. Given this man page.
>   http://www.openvswitch.org/support/dist-docs/ovsdb-client.1.txt
> 
>   It says backup and restore commands are for OVSDB standalone and
> 
>