Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work
Hi,Ilya Maximetsthanks for your replay I am running ovn large scale test,with 1000 sandbox (that is 1000 ovn-controller),3 clustered nb ,3 nb-relay, 3 clustered sb,20 sb-relay configure flows:neutron-server <> nb-relay <> nb <> northd <> sb <> sb-relay <> ovn-controller default 5 seconds probe interval will cause connection flapping: large transaction handing,db log compression,... ovsdb relay server has two kinds of connections:active connection and passive connection, active connection ,as ovsdb client,connect to clustered ovsdb server,and passive connection listening other client connect to itself I config this two kinds of connections in nb: active connection: "tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641" passive connection: “ptcp:6641:0.0.0.0” it cannot relay server share the same connection configurations with clustered ovsdb-server? it is not a good way to have another small database with a relay configuration. an example: ovn-northd has no database,probe interval read from NB,config northd probe interval like this:ovn-nbctl set NB_Global . options:northd_probe_interval=6,can relay sever read probe interval from NB or SB? if probe interval of relay server cannot read from NB or SB, appctl command can be as consider,because it can reconfig without restart Best regards, Wentao Jia the follow msg is configuration of my test: clustered ovsdb server ovsdb-server -vconsole:info -vsyslog:off -vfile:off --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/var/run/ovn/ovnnb_db.sock --pidfile=/var/run/ovn/ovnnb_db.pid --unixctl=/var/run/ovn/ovnnb_db.ctl --remote=db:OVN_Northbound,NB_Global,connections --private-key=db:OVN_Northbound,SSL,private_key --certificate=db:OVN_Northbound,SSL,certificate --ca-cert=db:OVN_Northbound,SSL,ca_cert --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers /etc/ovn/ovnnb_db.db ovsdb relay server: ovsdb-server --remote=db:OVN_Northbound,NB_Global,connections -vconsole:info -vsyslog:off -vfile:off --log-file=/var/log/ovn/ovsdb-server-nb.log relay:OVN_Northbound:tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641 connecion configuration:one active connection and one passive connection ()[root@ovn-busybox-0 /]# ovn-nbctl list connection _uuid : 5ddab5a4-a267-42b4-9dd4-76d55855a109 external_ids: {} inactivity_probe: 12 is_connected: true max_backoff : [] other_config: {} status : {sec_since_connect="143208", state=ACTIVE} target : "tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641" _uuid : 351b99bb-dd6a-4ba3-9c30-c0b4cff183e7 external_ids: {} inactivity_probe: 0 is_connected: true max_backoff : [] other_config: {} status : {bound_port="6641", sec_since_connect="0", sec_since_disconnect="0"} target : "ptcp:6641:0.0.0.0" 发件人:Ilya Maximets 发送日期:2021-08-26 02:38:58 收件人:ovs-discuss@openvswitch.org,"贾文涛" 抄送人:i.maxim...@ovn.org 主题:Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work>> hi,all >> >> >> the default inactivity probe interval of ovsdb relay server to nb/sb ovsdb >> server is 5000ms. >> I set an active connection as follow,set inactivity probe interval to >> 12ms : >> _uuid : 5ddab5a4-a267-42b4-9dd4-76d55855a109 >> external_ids: {} >> inactivity_probe: 12 >> is_connected: true >> max_backoff : [] >> other_config: {} >> status : {sec_since_connect="0", state=ACTIVE} >> target : "tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641" > > >Hmm. How exactly did you configure that? > >> >> ovn-ovsdb-nb.openstack.svc.cluster.local is a vip >> but the inactivity probe is still 5000> >> 2021-08-24T12:34:17.313Z|04924|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: >> idle 120225 ms, sending inactivity probe >> 2021-08-24T12:36:17.759Z|05854|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: >> idle 120446 ms, sending inactivity probe >> 2021-08-24T12:37:06.326Z|06145|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: >> idle 6853 ms, sending inactivity probe >> 2021-08-24T12:37:11.330Z|06155|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: >> idle 5004 ms, sending inactivity probe > >This looks like you have 2 different connections. One with 5000 and >one with 12 inactivity probe interval. > >I suspect that relay server is started something like this: > >ovsdb-server ... --remo
[ovs-discuss] ovsdb relay server active connection probe interval do not work
hi,all the default inactivity probe interval of ovsdb relay server to nb/sb ovsdb server is 5000ms. I set an active connection as follow,set inactivity probe interval to 12ms : _uuid : 5ddab5a4-a267-42b4-9dd4-76d55855a109 external_ids: {} inactivity_probe: 12 is_connected: true max_backoff : [] other_config: {} status : {sec_since_connect="0", state=ACTIVE} target : "tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641" ovn-ovsdb-nb.openstack.svc.cluster.local is a vip but the inactivity probe is still 5000 the follow is log of ovsdb relay server 2021-08-24T12:34:17.313Z|04924|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 120225 ms, sending inactivity probe 2021-08-24T12:36:17.759Z|05854|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 120446 ms, sending inactivity probe 2021-08-24T12:37:06.326Z|06145|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 6853 ms, sending inactivity probe 2021-08-24T12:37:11.330Z|06155|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:37:16.334Z|06165|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5003 ms, sending inactivity probe 2021-08-24T12:37:21.339Z|06175|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5005 ms, sending inactivity probe 2021-08-24T12:37:33.850Z|06226|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 6681 ms, sending inactivity probe 2021-08-24T12:37:38.855Z|06236|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5003 ms, sending inactivity probe 2021-08-24T12:37:43.859Z|06246|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:37:48.864Z|06256|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:37:53.870Z|06266|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5006 ms, sending inactivity probe 2021-08-24T12:37:58.876Z|06276|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5006 ms, sending inactivity probe 2021-08-24T12:38:08.882Z|06293|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 6299 ms, sending inactivity probe 2021-08-24T12:38:13.887Z|06303|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5003 ms, sending inactivity probe 2021-08-24T12:38:18.890Z|06313|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 121131 ms, sending inactivity probe 2021-08-24T12:38:18.891Z|06316|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:38:23.895Z|06330|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:38:28.901Z|06340|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5005 ms, sending inactivity probe 2021-08-24T12:38:33.905Z|06350|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:38:38.909Z|06360|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5003 ms, sending inactivity probe 2021-08-24T12:38:43.913Z|06370|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5003 ms, sending inactivity probe 2021-08-24T12:38:48.922Z|06380|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5009 ms, sending inactivity probe 2021-08-24T12:38:53.926Z|06390|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5003 ms, sending inactivity probe 2021-08-24T12:38:58.930Z|06400|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5003 ms, sending inactivity probe 2021-08-24T12:39:03.934Z|06410|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5003 ms, sending inactivity probe 2021-08-24T12:39:08.938Z|06420|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:39:13.941Z|06430|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5002 ms, sending inactivity probe 2021-08-24T12:39:18.946Z|06440|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:39:23.951Z|06452|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5005 ms, sending inactivity probe 2021-08-24T12:39:28.956Z|06462|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5004 ms, sending inactivity probe 2021-08-24T12:39:33.962Z|06472|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641: idle 5006 ms, sending inactivity probe best regards,Wentao Jia ___ discuss mailing list disc...@openvswitch.org
[ovs-discuss] OVN nbctl and sbctl deamon cannot set probe interval
Hi nbctl and sbctl is a long run process, if connection broken,it should be reconnected,but nbctl and sbctl will not reconnect because of the probe interval is not set best regards,Wentao Jia ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] ovsdb relay server maybe memleak
ovn scale test, 3 clustered sb, 10 sb relay server, 1000 sandbox。 The maximum memory used by the sb relay server is over 26G [root@node-4 ~]# for i in `kubectl get pods -n openstack | grep ovsdb-sb-relay | awk '{print $1}'`; do kubectl exec -it -n openstack $i -- top -bn1| grep ovsdb; done 2 9 root 20 0 7558724 7.2g 8408 S 0.0 1.4 33:36.03 ovsdb-serv+ 3 9 root 20 0 8589716 8.1g 7924 S 0.0 1.6 28:19.82 ovsdb-serv+ 4 9 root 20 0 14.5g 14.5g 8284 R 100.0 2.9 78:00.86 ovsdb-serv+ 5 9 root 20 0 26.2g 24.9g 7744 R 100.0 5.0 77:10.86 ovsdb-serv+ 6 9 root 20 0 27.5g 26.7g 8076 R 100.0 5.3 30:55.49 ovsdb-serv+ 7 10 root 20 0 8835412 8.0g 8148 R 100.0 3.2 11:30.74 ovsdb-serv+ 8 9 root 20 0 8835424 8.0g 8396 S 6.7 1.6 7:44.34 ovsdb-serv+ 9 9 root 20 0 7678636 7.3g 8132 S 0.0 2.9 1:25.33 ovsdb-serv+ 10 9 root 20 0 12.6g 10.7g 8188 R 100.0 2.1 107:08.83 ovsdb-serv+ 11 9 root 20 0 7479468 7.1g 8344 S 80.0 1.4 45:50.82 ovsdb-serv+ ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] ovsdb relay server segment fault
ovn scale test, 3 clustered sb, 10 sb relay server, 1000 sandbox。 sb relay sever will be segment fault accidentally [root@node-4 ~]# kubectl logs -n openstack ovn-ovsdb-sb-relay-79d5dd7ff4-tqbbd --tail 10 -p 22021-08-01T03:09:44Z|15758|poll_loop|INFO|wakeup due to [POLLOUT] on fd 101 (10.232.2.213:6642<->10.232.7.147:39998) at lib/stream-fd.c:153 (66% CPU usage) 32021-08-01T03:09:52Z|15759|timeval|WARN|Unreasonably long 5223ms poll interval (2209ms user, 126ms system) 42021-08-01T03:09:52Z|15760|timeval|WARN|faults: 19955 minor, 0 major 52021-08-01T03:09:52Z|15761|timeval|WARN|context switches: 0 voluntary, 5818 involuntary 62021-08-01T03:09:55Z|15762|timeval|WARN|Unreasonably long 3550ms poll interval (2277ms user, 71ms system) 72021-08-01T03:09:55Z|15763|timeval|WARN|faults: 3652 minor, 0 major 82021-08-01T03:09:55Z|15764|timeval|WARN|context switches: 0 voluntary, 1438 involuntary 92021-08-01T03:09:55Z|15765|poll_loop|INFO|Dropped 43 log messages in last 11 seconds (most recently, 10 seconds ago) due to excessive rate 102021-08-01T03:09:55Z|15766|poll_loop|INFO|wakeup due to [POLLOUT] on fd 95 (10.232.2.213:6642<->10.232.7.132:53042) at lib/stream-fd.c:153 (67% CPU usage) 11/tmp/start_sb_relay.sh: line 5: 9 Segmentation fault ovsdb-server --remote=db:OVN_Southbound,SB_Global,connections relay:OVN_Southbound:tcp:${SERVICE_NAME}.${NAMESPACE}.svc.cluster.local:6642 ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] port group reference large number of ports cause ovn-northd and ovn-nbctl deamon cpu 100% forever
Hi,all ovn scale test, created 47194 logical switch ports, port group reference all the switch ports in one row. cause ovn-northd and ovn-nbctl deamon cpu 100% forever gdb info: ()[root@ovn-northd-0 /]# date Tue Jul 20 11:07:34 CST 2021 ()[root@ovn-northd-0 /]# gdb attach 41 .. (gdb) bt #0 0x7f4513cedb35 in malloc () from /lib64/libc.so.6 #1 0x564b7cdc0188 in xmalloc__ (size=size@entry=48) at lib/util.c:137 #2 0x564b7cdc01ab in xmalloc (size=size@entry=48) at lib/util.c:172 #3 0x564b7cdaa9c2 in ovsdb_idl_get_row_arc (src=src@entry=0x564b93325d40, dst_table_class=dst_table_class@entry=0x564b7d175ad0 , dst_uuid=0x564b895bb750) at lib/ovsdb-idl.c:2328 #4 0x564b7cd25eaa in nbrec_port_group_parse_ports (row_=0x564b93325d40, datum=0x564b93325ea8) at lib/ovn-nb-idl.c:33264 #5 0x564b7cda5b06 in ovsdb_idl_row_parse (row=row@entry=0x564b93325d40) at lib/ovsdb-idl.c:1656 #6 0x564b7cda77e9 in ovsdb_idl_row_reparse_backrefs (row=row@entry=0x564b9316fcb0) at lib/ovsdb-idl.c:2074 #7 0x564b7cda7a07 in ovsdb_idl_insert_row (row=row@entry=0x564b9316fcb0, data=0x564b8ae27f90) at lib/ovsdb-idl.c:2230 #8 0x564b7cda8ee2 in ovsdb_idl_process_update (ru=0x564b8af95470, table=0x564b7d423690) at lib/ovsdb-idl.c:1464 #9 ovsdb_idl_parse_update__ (du=0x564b8de0ea30, idl=0x564b7d422930) at lib/ovsdb-idl.c:1336 #10 ovsdb_idl_parse_update (update=0x564bdb13b7e8, idl=0x564b7d422930) at lib/ovsdb-idl.c:1375 #11 ovsdb_idl_run (idl=) at lib/ovsdb-idl.c:444 #12 0x564b7cdad89c in ovsdb_idl_loop_run (loop=loop@entry=0x7ffcfcff8d80) at lib/ovsdb-idl.c:4122 #13 0x564b7ccf9747 in main (argc=9, argv=0x7ffcfcff8f68) at northd/ovn-northd.c:14445 (gdb) frame 9 #9 ovsdb_idl_parse_update__ (du=0x564b8de0ea30, idl=0x564b7d422930) at lib/ovsdb-idl.c:1336 1336switch (ovsdb_idl_process_update(table, ru)) { (gdb) info local ru = 0x564b8af95470 j = 4247 tu = 0x564b8af6fd68 table = 0x564b7d423690 i = 9 (gdb) print du->n $1 = 10 (gdb) print du->table_updates[0] $2 = {table_name = 0x564b907a15e0 "NB_Global", row_updates = 0x564b88cf2f90, n = 1} (gdb) print du->table_updates[1] $3 = {table_name = 0x564b94d13f60 "ACL", row_updates = 0x564b8af6fd90, n = 9} (gdb) print du->table_updates[2] $4 = {table_name = 0x564b7f9a7d10 "Logical_Switch", row_updates = 0x564b8af6fec0, n = 237} (gdb) print du->table_updates[3] $5 = {table_name = 0x564bda9c4c90 "HA_Chassis", row_updates = 0x564b8af71c70, n = 59} (gdb) print du->table_updates[4] $6 = {table_name = 0x564bdb893c60 "HA_Chassis_Group", row_updates = 0x564b90dad0f0, n = 1} (gdb) print du->table_updates[5] $7 = {table_name = 0x564c5ccb7b10 "Logical_Router", row_updates = 0x564b86f8ddd0, n = 1} (gdb) print du->table_updates[6] $9 = {table_name = 0x564bdb284750 "Port_Group", row_updates = 0x564b8f658470, n = 3} (gdb) print du->table_updates[7] $10 = {table_name = 0x564b8af6f920 "Connection", row_updates = 0x564b8769f0e0, n = 1} (gdb) print du->table_updates[8] $11 = {table_name = 0x564c38e30d10 "DHCP_Options", row_updates = 0x564b8af723e0, n = 237} (gdb) print du->table_updates[9] $12 = {table_name = 0x564bd8be5ed0 "Logical_Switch_Port", row_updates = 0x564b8af74190, n = 47194} (gdb) $13 = {table_name = 0x564bd8be5ed0 "Logical_Switch_Port", row_updates = 0x564b8af74190, n = 47194} (gdb) print tu->n $14 = 47194 (gdb) print j $15 = 4247 gdb again: ()[root@ovn-northd-0 /]# date Tue Jul 20 11:26:25 CST 2021 ()[root@ovn-northd-0 /]# ()[root@ovn-northd-0 /]# gdb attach 41 6.6-1.el7.centos.es.x86_64 zlib-1.2.7-18.el7.centos.es.x86_64 (gdb) bt #0 xmalloc__ (size=size@entry=48) at lib/util.c:136 #1 0x564b7cdc01ab in xmalloc (size=size@entry=48) at lib/util.c:172 #2 0x564b7cdaa9c2 in ovsdb_idl_get_row_arc (src=src@entry=0x564b7d710c40, dst_table_class=dst_table_class@entry=0x564b7d175ad0 , dst_uuid=0x564b8dc1a8b0) at lib/ovsdb-idl.c:2328 #3 0x564b7cd25eaa in nbrec_port_group_parse_ports (row_=0x564b7d710c40, datum=0x564b93d9df48) at lib/ovn-nb-idl.c:33264 #4 0x564b7cda5b06 in ovsdb_idl_row_parse (row=row@entry=0x564b7d710c40) at lib/ovsdb-idl.c:1656 #5 0x564b7cda77e9 in ovsdb_idl_row_reparse_backrefs (row=row@entry=0x564b946e3470) at lib/ovsdb-idl.c:2074 #6 0x564b7cda7a07 in ovsdb_idl_insert_row (row=row@entry=0x564b946e3470, data=0x564bd7f1cc60) at lib/ovsdb-idl.c:2230 #7 0x564b7cda8ee2 in ovsdb_idl_process_update (ru=0x564b8af96f70, table=0x564b7d423690) at lib/ovsdb-idl.c:1464 #8 ovsdb_idl_parse_update__ (du=0x564b8de0ea30, idl=0x564b7d422930) at lib/ovsdb-idl.c:1336 #9 ovsdb_idl_parse_update (update=0x564bdb13b7e8, idl=0x564b7d422930) at lib/ovsdb-idl.c:1375 #10 ovsdb_idl_run (idl=) at lib/ovsdb-idl.c:444 #11 0x564b7cdad89c in ovsdb_idl_loop_run (loop=loop@entry=0x7ffcfcff8d80) at lib/ovsdb-idl.c:4122 #12 0x564b7ccf9747 in main (argc=9, argv=0x7ffcfcff8f68) at northd/ovn-northd.c:14445 (gdb) frame 8 #8 ovsdb_idl_parse_update__ (du=0x564b8de0ea30, idl=0x564b7d422930)