Re: [ovs-discuss] [ovs-dev] [OVN] ovn-northd HA
Hi Tony, Please find my answers inlined. On Sat, Aug 1, 2020 at 5:55 PM Tony Liu wrote: > When I restore 4096 LS, 4354 LSP, 256 LR and 256 LRP, (I clean up > all DBs before restore.) it takes a few seconds to restore the nb-db. > But onv-northd takes forever to update sb-db. > > I changed sb-db election timer from 1s to 10s. Then it takes just a > few minutes for sb-db to get fully synced. > > How does that sb-db leader switch affect such sync? > > Most likely it is because SB-DB was busy and resulted in time out for the RAFT election, and kept doing leader election (it can be confirmed by checking the "term" number), thus never got synced. When you change the time to 10s, it could complete the work without leader flapping. > > Thanks! > > Tony > > > -Original Message- > > From: dev On Behalf Of Tony Liu > > Sent: Saturday, August 1, 2020 5:26 PM > > To: ovs-discuss ; ovs-dev > d...@openvswitch.org> > > Subject: [ovs-dev] [OVN] ovn-northd HA > > > > Hi, > > > > I have a few questions about ovn-northd HA. > > > > Does the lock for active ovn-northd have to be acquired from the leader > > of sb-db? > Yes, because ovn-northd sets "leader_only" to true for the connection. I remember it is also required that all lock participants must connect to the leader for OVSDB lock to work properly. > > > > If ovn-northd didn't acquire the lock, it becomes standby. Does it keep > > trying to acquire the lock, or wait for notification, or monitor the > > active ovn-northd? > It is based on OVSDB notification. > > > If it keeps trying, what's the period? > > > > Say the active ovn-northd is down, the connection to sb-db is down, sb- > > db releases the lock, so another ovn-northd can acquire it. > > Is that correct? > > > Yes > When sb-db is busy, the connection from ovn-northd is dropped. Not sure > > from which side it's dropped. And that triggers active ovn-northd switch. > > Is that right? > > > It is possible, but the same northd may get the lock again, if it is lucky. > > In case that sb-db leader switchs, is that going to cause active ovn- > > northd switch as well? > > > It is possible, but the same northd may get the lock again, if it is lucky. > > For whatever reason, in case active ovn-northd switches, is the new > > active ovn-northd going to continue the work left by the previous leader, > > or start all over again? > > > Even for the same ovn-northd, it always recompute everything as a response to any change. So during switch over, the new active ovn-northd doesn't need to "continue" - it just recompute everything as usual. When incremental processing is implemented, the new active ovn-northd may need to do a recompute first and then handle the further changes incrementally. In either cases, there is no need to "continue the work left by the previous leader". Thanks, Han > > > Thanks! > > > > Tony > > > > ___ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update
On Fri, Jul 31, 2020 at 4:14 PM Tony Liu wrote: > Hi, > > I see the active ovn-northd takes much CPU (30% - 100%) when there is no > configuration from OpenStack, nothing happening on all chassis nodes > either. > > Is this expected? What is it busy with? > > Yes, this is expected. It is due to the OVSDB probe between ovn-northd and NB/SB OVSDB servers, which is used to detect the OVSDB connection failure. Usually this is not a concern (unlike the probe with a large number of ovn-controller clients), because ovn-northd is a centralized component and the CPU cost when there is no configuration change doesn't matter that much. However, if it is a concern, the probe interval (default 5 sec) can be changed. If you change, remember to change on both server side and client side. For client side (ovn-northd), it is configured in the NB DB's NB_Global table's options:northd_probe_interval. See man page of ovn-nb(5). For server side (NB and SB), it is configured in the NB and SB DB's Connection table's inactivity_probe column. Thanks, Han > > 2020-07-31T23:08:09.511Z|04267|poll_loop|DBG|wakeup due to [POLLIN] on fd > 8 (10.6.20.84:44358<->10.6.20.84:6641) at lib/stream-fd.c:157 (68% CPU > usage) > 2020-07-31T23:08:09.512Z|04268|jsonrpc|DBG|tcp:10.6.20.84:6641: received > request, method="echo", params=[], id="echo" > 2020-07-31T23:08:09.512Z|04269|jsonrpc|DBG|tcp:10.6.20.84:6641: send > reply, result=[], id="echo" > 2020-07-31T23:08:12.777Z|04270|poll_loop|DBG|wakeup due to [POLLIN] on fd > 9 (10.6.20.84:49158<->10.6.20.85:6642) at lib/stream-fd.c:157 (34% CPU > usage) > 2020-07-31T23:08:12.777Z|04271|reconnect|DBG|tcp:10.6.20.85:6642: idle > 5002 ms, sending inactivity probe > 2020-07-31T23:08:12.777Z|04272|reconnect|DBG|tcp:10.6.20.85:6642: > entering IDLE > 2020-07-31T23:08:12.777Z|04273|jsonrpc|DBG|tcp:10.6.20.85:6642: send > request, method="echo", params=[], id="echo" > 2020-07-31T23:08:12.777Z|04274|jsonrpc|DBG|tcp:10.6.20.85:6642: received > request, method="echo", params=[], id="echo" > 2020-07-31T23:08:12.777Z|04275|reconnect|DBG|tcp:10.6.20.85:6642: > entering ACTIVE > 2020-07-31T23:08:12.777Z|04276|jsonrpc|DBG|tcp:10.6.20.85:6642: send > reply, result=[], id="echo" > 2020-07-31T23:08:13.635Z|04277|poll_loop|DBG|wakeup due to [POLLIN] on fd > 9 (10.6.20.84:49158<->10.6.20.85:6642) at lib/stream-fd.c:157 (34% CPU > usage) > 2020-07-31T23:08:13.635Z|04278|jsonrpc|DBG|tcp:10.6.20.85:6642: received > reply, result=[], id="echo" > 2020-07-31T23:08:14.480Z|04279|hmap|DBG|Dropped 129 log messages in last 5 > seconds (most recently, 0 seconds ago) due to excessive rate > 2020-07-31T23:08:14.480Z|04280|hmap|DBG|lib/shash.c:112: 2 buckets with 6+ > nodes, including 2 buckets with 6 nodes (32 nodes total across 32 buckets) > 2020-07-31T23:08:14.513Z|04281|poll_loop|DBG|wakeup due to 27-ms timeout > at lib/reconnect.c:643 (34% CPU usage) > 2020-07-31T23:08:14.513Z|04282|reconnect|DBG|tcp:10.6.20.84:6641: idle > 5001 ms, sending inactivity probe > 2020-07-31T23:08:14.513Z|04283|reconnect|DBG|tcp:10.6.20.84:6641: > entering IDLE > 2020-07-31T23:08:14.513Z|04284|jsonrpc|DBG|tcp:10.6.20.84:6641: send > request, method="echo", params=[], id="echo" > 2020-07-31T23:08:15.370Z|04285|poll_loop|DBG|wakeup due to [POLLIN] on fd > 8 (10.6.20.84:44358<->10.6.20.84:6641) at lib/stream-fd.c:157 (34% CPU > usage) > 2020-07-31T23:08:15.370Z|04286|jsonrpc|DBG|tcp:10.6.20.84:6641: received > request, method="echo", params=[], id="echo" > 2020-07-31T23:08:15.370Z|04287|reconnect|DBG|tcp:10.6.20.84:6641: > entering ACTIVE > 2020-07-31T23:08:15.370Z|04288|jsonrpc|DBG|tcp:10.6.20.84:6641: send > reply, result=[], id="echo" > 2020-07-31T23:08:16.236Z|04289|poll_loop|DBG|wakeup due to 0-ms timeout at > tcp:10.6.20.84:6641 (100% CPU usage) > 2020-07-31T23:08:16.236Z|04290|jsonrpc|DBG|tcp:10.6.20.84:6641: received > reply, result=[], id="echo" > 2020-07-31T23:08:17.778Z|04291|poll_loop|DBG|wakeup due to [POLLIN] on fd > 9 (10.6.20.84:49158<->10.6.20.85:6642) at lib/stream-fd.c:157 (100% CPU > usage) > 2020-07-31T23:08:17.778Z|04292|jsonrpc|DBG|tcp:10.6.20.85:6642: received > request, method="echo", params=[], id="echo" > 2020-07-31T23:08:17.778Z|04293|jsonrpc|DBG|tcp:10.6.20.85:6642: send > reply, result=[], id="echo" > 2020-07-31T23:08:20.372Z|04294|poll_loop|DBG|wakeup due to [POLLIN] on fd > 8 (10.6.20.84:44358<->10.6.20.84:6641) at lib/stream-fd.c:157 (41% CPU > usage) > 2020-07-31T23:08:20.372Z|04295|reconnect|DBG|tcp:10.6.20.84:6641: idle > 5002 ms, sending inactivity probe > 2020-07-31T23:08:20.372Z|04296|reconnect|DBG|tcp:10.6.20.84:6641: > entering IDLE > 2020-07-31T23:08:20.372Z|04297|jsonrpc|DBG|tcp:10.6.20.84:6641: send > request, method="echo", params=[], id="echo" > 2020-07-31T23:08:20.372Z|04298|jsonrpc|DBG|tcp:10.6.20.84:6641: received > request, method="echo", params=[], id="echo" > 2020-07-31T23:08:20.372Z|04299|reconnect|DBG|tcp:10.6.20.84:6641: > entering ACTIVE > 2020-07-31T23:08:20.372Z|04300|jsonrpc|DBG|tc
Re: [ovs-discuss] [OVN] ovn-northd HA
When I restore 4096 LS, 4354 LSP, 256 LR and 256 LRP, (I clean up all DBs before restore.) it takes a few seconds to restore the nb-db. But onv-northd takes forever to update sb-db. I changed sb-db election timer from 1s to 10s. Then it takes just a few minutes for sb-db to get fully synced. How does that sb-db leader switch affect such sync? Thanks! Tony > -Original Message- > From: dev On Behalf Of Tony Liu > Sent: Saturday, August 1, 2020 5:26 PM > To: ovs-discuss ; ovs-dev d...@openvswitch.org> > Subject: [ovs-dev] [OVN] ovn-northd HA > > Hi, > > I have a few questions about ovn-northd HA. > > Does the lock for active ovn-northd have to be acquired from the leader > of sb-db? > > If ovn-northd didn't acquire the lock, it becomes standby. Does it keep > trying to acquire the lock, or wait for notification, or monitor the > active ovn-northd? > > If it keeps trying, what's the period? > > Say the active ovn-northd is down, the connection to sb-db is down, sb- > db releases the lock, so another ovn-northd can acquire it. > Is that correct? > > When sb-db is busy, the connection from ovn-northd is dropped. Not sure > from which side it's dropped. And that triggers active ovn-northd switch. > Is that right? > > In case that sb-db leader switchs, is that going to cause active ovn- > northd switch as well? > > For whatever reason, in case active ovn-northd switches, is the new > active ovn-northd going to continue the work left by the previous leader, > or start all over again? > > > Thanks! > > Tony > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] [OVN] ovn-northd HA
Hi, I have a few questions about ovn-northd HA. Does the lock for active ovn-northd have to be acquired from the leader of sb-db? If ovn-northd didn't acquire the lock, it becomes standby. Does it keep trying to acquire the lock, or wait for notification, or monitor the active ovn-northd? If it keeps trying, what's the period? Say the active ovn-northd is down, the connection to sb-db is down, sb-db releases the lock, so another ovn-northd can acquire it. Is that correct? When sb-db is busy, the connection from ovn-northd is dropped. Not sure from which side it's dropped. And that triggers active ovn-northd switch. Is that right? In case that sb-db leader switchs, is that going to cause active ovn-northd switch as well? For whatever reason, in case active ovn-northd switches, is the new active ovn-northd going to continue the work left by the previous leader, or start all over again? Thanks! Tony ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] connecting Mininet Topology to Internet
Hello, I’m trying to connect a mininet topology to the internet so that I can test out some actions I created. I created a simple topology: H1,h2,h3,h4 ---s1s2---Internet I did this by adding a bridge with the ovs-vsctl add-port command to s2 (eth1 which is what the VM uses to connect to the internet) and then used dhclient on each of the hx-eth0 interfaces of the hosts. Something weird happens where every host connects to the internet (tried both ping and links commands) but when I look at the flows that are automatically created by ovs, it seems like only one of the hosts (the first one I pinged from) is receiving the replies from the internet even though no packets are dropped by the other hosts. How is this possible? Does anyone have a better way to connect a mininet topology to the internet? Help is very appreciated since I’ve been hitting my head on this issue for a couple of days now Luca ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss