Re: [ovs-discuss] ovsdb relay server active connection probe interval do not work

2021-08-25 Thread

Hi,Ilya Maximetsthanks for your replay


I am running ovn large scale test,with 1000 sandbox (that is 1000 
ovn-controller),3 clustered nb ,3 nb-relay, 3 clustered sb,20 sb-relay
configure flows:neutron-server <> nb-relay <> nb <>  northd <> 
sb <> sb-relay <> ovn-controller
default 5 seconds probe interval will  cause connection flapping: large 
transaction handing,db log compression,...


ovsdb relay server has two kinds of connections:active connection and passive 
connection, active connection ,as ovsdb client,connect to clustered ovsdb 
server,and passive connection listening  other client connect to itself
I config this two kinds of connections in nb:  active connection: 
"tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641"  passive connection: 
“ptcp:6641:0.0.0.0”
it cannot relay server share the same connection configurations with clustered 
ovsdb-server? 


it is not a good way to have another small database with a relay configuration. 
 an example: ovn-northd has no database,probe interval read from NB,config 
northd probe interval like this:ovn-nbctl  set NB_Global . 
options:northd_probe_interval=6,can relay sever read probe interval  from 
NB or SB?   if probe interval of relay server cannot read from NB or SB,   
appctl command can be  as consider,because it can reconfig without restart
Best regards, Wentao Jia
the follow msg is configuration of my test:
clustered ovsdb server
 ovsdb-server -vconsole:info -vsyslog:off -vfile:off 
--log-file=/var/log/ovn/ovsdb-server-nb.log 
--remote=punix:/var/run/ovn/ovnnb_db.sock --pidfile=/var/run/ovn/ovnnb_db.pid 
--unixctl=/var/run/ovn/ovnnb_db.ctl 
--remote=db:OVN_Northbound,NB_Global,connections 
--private-key=db:OVN_Northbound,SSL,private_key 
--certificate=db:OVN_Northbound,SSL,certificate 
--ca-cert=db:OVN_Northbound,SSL,ca_cert 
--ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols 
--ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers /etc/ovn/ovnnb_db.db


ovsdb relay server:
ovsdb-server --remote=db:OVN_Northbound,NB_Global,connections -vconsole:info 
-vsyslog:off -vfile:off --log-file=/var/log/ovn/ovsdb-server-nb.log 
relay:OVN_Northbound:tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641



connecion configuration:one active connection and one passive connection
()[root@ovn-busybox-0 /]# ovn-nbctl list connection
_uuid   : 5ddab5a4-a267-42b4-9dd4-76d55855a109
external_ids: {}
inactivity_probe: 12
is_connected: true
max_backoff : []
other_config: {}
status  : {sec_since_connect="143208", state=ACTIVE}
target  : "tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641"


_uuid   : 351b99bb-dd6a-4ba3-9c30-c0b4cff183e7
external_ids: {}
inactivity_probe: 0
is_connected: true
max_backoff : []
other_config: {}
status  : {bound_port="6641", sec_since_connect="0", 
sec_since_disconnect="0"}
target  : "ptcp:6641:0.0.0.0"






发件人:Ilya Maximets 
发送日期:2021-08-26 02:38:58
收件人:ovs-discuss@openvswitch.org,"贾文涛" 
抄送人:i.maxim...@ovn.org
主题:Re: [ovs-discuss] ovsdb relay server active connection probe interval do not 
work>> hi,all
>> 
>> 
>>  the default inactivity probe interval of ovsdb relay server to nb/sb ovsdb 
>> server is 5000ms.
>>  I set an active connection as follow,set inactivity probe interval to 
>> 12ms :
>> _uuid   : 5ddab5a4-a267-42b4-9dd4-76d55855a109
>> external_ids: {}
>> inactivity_probe: 12
>> is_connected: true
>> max_backoff : []
>> other_config: {}
>> status  : {sec_since_connect="0", state=ACTIVE}
>> target  : "tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641"
>
>
>Hmm.  How exactly did you configure that?
>
>> 
>> ovn-ovsdb-nb.openstack.svc.cluster.local is a vip 
>> but the inactivity probe is still 5000> 
>> 2021-08-24T12:34:17.313Z|04924|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
>>  idle 120225 ms, sending inactivity probe
>> 2021-08-24T12:36:17.759Z|05854|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
>>  idle 120446 ms, sending inactivity probe
>> 2021-08-24T12:37:06.326Z|06145|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
>>  idle 6853 ms, sending inactivity probe
>> 2021-08-24T12:37:11.330Z|06155|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
>>  idle 5004 ms, sending inactivity probe
>
>This looks like you have 2 different connections.  One with 5000 and
>one with 12 inactivity probe interval.
>
>I suspect that relay server is started something like this:
>
>ovsdb-server ... --remo

[ovs-discuss] ovsdb relay server active connection probe interval do not work

2021-08-24 Thread
hi,all


 the default inactivity probe interval of ovsdb relay server to nb/sb ovsdb 
server is 5000ms.
 I set an active connection as follow,set inactivity probe interval to 12ms 
:
_uuid   : 5ddab5a4-a267-42b4-9dd4-76d55855a109
external_ids: {}
inactivity_probe: 12
is_connected: true
max_backoff : []
other_config: {}
status  : {sec_since_connect="0", state=ACTIVE}
target  : "tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641"



ovn-ovsdb-nb.openstack.svc.cluster.local is a vip 


but the inactivity probe is still 5000
the follow is log of ovsdb relay server
2021-08-24T12:34:17.313Z|04924|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 120225 ms, sending inactivity probe
2021-08-24T12:36:17.759Z|05854|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 120446 ms, sending inactivity probe
2021-08-24T12:37:06.326Z|06145|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 6853 ms, sending inactivity probe
2021-08-24T12:37:11.330Z|06155|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5004 ms, sending inactivity probe
2021-08-24T12:37:16.334Z|06165|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5003 ms, sending inactivity probe
2021-08-24T12:37:21.339Z|06175|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5005 ms, sending inactivity probe
2021-08-24T12:37:33.850Z|06226|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 6681 ms, sending inactivity probe
2021-08-24T12:37:38.855Z|06236|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5003 ms, sending inactivity probe
2021-08-24T12:37:43.859Z|06246|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5004 ms, sending inactivity probe
2021-08-24T12:37:48.864Z|06256|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5004 ms, sending inactivity probe
2021-08-24T12:37:53.870Z|06266|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5006 ms, sending inactivity probe
2021-08-24T12:37:58.876Z|06276|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5006 ms, sending inactivity probe
2021-08-24T12:38:08.882Z|06293|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 6299 ms, sending inactivity probe
2021-08-24T12:38:13.887Z|06303|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5003 ms, sending inactivity probe
2021-08-24T12:38:18.890Z|06313|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 121131 ms, sending inactivity probe
2021-08-24T12:38:18.891Z|06316|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5004 ms, sending inactivity probe
2021-08-24T12:38:23.895Z|06330|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5004 ms, sending inactivity probe
2021-08-24T12:38:28.901Z|06340|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5005 ms, sending inactivity probe
2021-08-24T12:38:33.905Z|06350|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5004 ms, sending inactivity probe
2021-08-24T12:38:38.909Z|06360|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5003 ms, sending inactivity probe
2021-08-24T12:38:43.913Z|06370|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5003 ms, sending inactivity probe
2021-08-24T12:38:48.922Z|06380|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5009 ms, sending inactivity probe
2021-08-24T12:38:53.926Z|06390|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5003 ms, sending inactivity probe
2021-08-24T12:38:58.930Z|06400|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5003 ms, sending inactivity probe
2021-08-24T12:39:03.934Z|06410|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5003 ms, sending inactivity probe
2021-08-24T12:39:08.938Z|06420|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5004 ms, sending inactivity probe
2021-08-24T12:39:13.941Z|06430|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5002 ms, sending inactivity probe
2021-08-24T12:39:18.946Z|06440|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5004 ms, sending inactivity probe
2021-08-24T12:39:23.951Z|06452|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5005 ms, sending inactivity probe
2021-08-24T12:39:28.956Z|06462|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5004 ms, sending inactivity probe
2021-08-24T12:39:33.962Z|06472|reconnect|DBG|tcp:ovn-ovsdb-nb.openstack.svc.cluster.local:6641:
 idle 5006 ms, sending inactivity probe



best regards,Wentao Jia




___
discuss mailing list
disc...@openvswitch.org

[ovs-discuss] OVN nbctl and sbctl deamon cannot set probe interval

2021-08-09 Thread
Hi
  nbctl  and sbctl is a long run process, if connection broken,it should be 
reconnected,but nbctl and sbctl will not reconnect because of the probe 
interval is not set
best regards,Wentao Jia




___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] ovsdb relay server maybe memleak

2021-08-04 Thread
ovn scale test, 3 clustered sb, 10 sb relay server, 1000 sandbox。
The maximum memory used by the sb relay server is over 26G


[root@node-4 ~]# for i in `kubectl get pods -n openstack | grep ovsdb-sb-relay 
| awk '{print $1}'`; do kubectl exec -it -n openstack $i -- top -bn1| grep 
ovsdb; done 2 9 root 20 0 7558724 7.2g 8408 S 0.0 1.4 33:36.03 ovsdb-serv+ 3 9 
root 20 0 8589716 8.1g 7924 S 0.0 1.6 28:19.82 ovsdb-serv+ 4 9 root 20 0 14.5g 
14.5g 8284 R 100.0 2.9 78:00.86 ovsdb-serv+ 5 9 root 20 0 26.2g 24.9g 7744 R 
100.0 5.0 77:10.86 ovsdb-serv+ 6 9 root 20 0 27.5g 26.7g 8076 R 100.0 5.3 
30:55.49 ovsdb-serv+ 7 10 root 20 0 8835412 8.0g 8148 R 100.0 3.2 11:30.74 
ovsdb-serv+ 8 9 root 20 0 8835424 8.0g 8396 S 6.7 1.6 7:44.34 ovsdb-serv+ 9 9 
root 20 0 7678636 7.3g 8132 S 0.0 2.9 1:25.33 ovsdb-serv+ 10 9 root 20 0 12.6g 
10.7g 8188 R 100.0 2.1 107:08.83 ovsdb-serv+ 11 9 root 20 0 7479468 7.1g 8344 S 
80.0 1.4 45:50.82 ovsdb-serv+




___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] ovsdb relay server segment fault

2021-08-04 Thread
ovn scale test, 3 clustered sb, 10 sb relay server, 1000 sandbox。
sb relay sever will be segment fault  accidentally
[root@node-4 ~]# kubectl logs -n openstack ovn-ovsdb-sb-relay-79d5dd7ff4-tqbbd 
--tail 10 -p 22021-08-01T03:09:44Z|15758|poll_loop|INFO|wakeup due to [POLLOUT] 
on fd 101 (10.232.2.213:6642<->10.232.7.147:39998) at lib/stream-fd.c:153 (66% 
CPU usage) 32021-08-01T03:09:52Z|15759|timeval|WARN|Unreasonably long 5223ms 
poll interval (2209ms user, 126ms system) 
42021-08-01T03:09:52Z|15760|timeval|WARN|faults: 19955 minor, 0 major 
52021-08-01T03:09:52Z|15761|timeval|WARN|context switches: 0 voluntary, 5818 
involuntary 62021-08-01T03:09:55Z|15762|timeval|WARN|Unreasonably long 3550ms 
poll interval (2277ms user, 71ms system) 
72021-08-01T03:09:55Z|15763|timeval|WARN|faults: 3652 minor, 0 major 
82021-08-01T03:09:55Z|15764|timeval|WARN|context switches: 0 voluntary, 1438 
involuntary 92021-08-01T03:09:55Z|15765|poll_loop|INFO|Dropped 43 log messages 
in last 11 seconds (most recently, 10 seconds ago) due to excessive rate 
102021-08-01T03:09:55Z|15766|poll_loop|INFO|wakeup due to [POLLOUT] on fd 95 
(10.232.2.213:6642<->10.232.7.132:53042) at lib/stream-fd.c:153 (67% CPU usage) 
11/tmp/start_sb_relay.sh: line 5: 9 Segmentation fault ovsdb-server 
--remote=db:OVN_Southbound,SB_Global,connections 
relay:OVN_Southbound:tcp:${SERVICE_NAME}.${NAMESPACE}.svc.cluster.local:6642













___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] port group reference large number of ports cause ovn-northd and ovn-nbctl deamon cpu 100% forever

2021-07-21 Thread
Hi,all


ovn scale test, created 47194 logical switch ports, port group reference all 
the switch ports in one row. cause ovn-northd and ovn-nbctl deamon cpu 100% 
forever


gdb info:
()[root@ovn-northd-0 /]# date
Tue Jul 20 11:07:34 CST 2021
()[root@ovn-northd-0 /]# gdb attach 41
..
(gdb) bt
#0  0x7f4513cedb35 in malloc () from /lib64/libc.so.6
#1  0x564b7cdc0188 in xmalloc__ (size=size@entry=48) at lib/util.c:137
#2  0x564b7cdc01ab in xmalloc (size=size@entry=48) at lib/util.c:172
#3  0x564b7cdaa9c2 in ovsdb_idl_get_row_arc (src=src@entry=0x564b93325d40, 
dst_table_class=dst_table_class@entry=0x564b7d175ad0 , 
dst_uuid=0x564b895bb750) at lib/ovsdb-idl.c:2328
#4  0x564b7cd25eaa in nbrec_port_group_parse_ports (row_=0x564b93325d40, 
datum=0x564b93325ea8) at lib/ovn-nb-idl.c:33264
#5  0x564b7cda5b06 in ovsdb_idl_row_parse (row=row@entry=0x564b93325d40) at 
lib/ovsdb-idl.c:1656
#6  0x564b7cda77e9 in ovsdb_idl_row_reparse_backrefs 
(row=row@entry=0x564b9316fcb0) at lib/ovsdb-idl.c:2074
#7  0x564b7cda7a07 in ovsdb_idl_insert_row (row=row@entry=0x564b9316fcb0, 
data=0x564b8ae27f90) at lib/ovsdb-idl.c:2230
#8  0x564b7cda8ee2 in ovsdb_idl_process_update (ru=0x564b8af95470, 
table=0x564b7d423690) at lib/ovsdb-idl.c:1464
#9  ovsdb_idl_parse_update__ (du=0x564b8de0ea30, idl=0x564b7d422930) at 
lib/ovsdb-idl.c:1336
#10 ovsdb_idl_parse_update (update=0x564bdb13b7e8, idl=0x564b7d422930) at 
lib/ovsdb-idl.c:1375
#11 ovsdb_idl_run (idl=) at lib/ovsdb-idl.c:444
#12 0x564b7cdad89c in ovsdb_idl_loop_run (loop=loop@entry=0x7ffcfcff8d80) 
at lib/ovsdb-idl.c:4122
#13 0x564b7ccf9747 in main (argc=9, argv=0x7ffcfcff8f68) at 
northd/ovn-northd.c:14445
(gdb) frame 9
#9  ovsdb_idl_parse_update__ (du=0x564b8de0ea30, idl=0x564b7d422930) at 
lib/ovsdb-idl.c:1336
1336switch (ovsdb_idl_process_update(table, ru)) {
(gdb) info local
ru = 0x564b8af95470
j = 4247
tu = 0x564b8af6fd68
table = 0x564b7d423690
i = 9
(gdb) print du->n
$1 = 10
(gdb) print du->table_updates[0]
$2 = {table_name = 0x564b907a15e0 "NB_Global", row_updates = 0x564b88cf2f90, n 
= 1}
(gdb) print du->table_updates[1]
$3 = {table_name = 0x564b94d13f60 "ACL", row_updates = 0x564b8af6fd90, n = 9}
(gdb) print du->table_updates[2]
$4 = {table_name = 0x564b7f9a7d10 "Logical_Switch", row_updates = 
0x564b8af6fec0, n = 237}
(gdb) print du->table_updates[3]
$5 = {table_name = 0x564bda9c4c90 "HA_Chassis", row_updates = 0x564b8af71c70, n 
= 59}
(gdb) print du->table_updates[4]
$6 = {table_name = 0x564bdb893c60 "HA_Chassis_Group", row_updates = 
0x564b90dad0f0, n = 1}
(gdb) print du->table_updates[5]
$7 = {table_name = 0x564c5ccb7b10 "Logical_Router", row_updates = 
0x564b86f8ddd0, n = 1}


(gdb) print du->table_updates[6]
$9 = {table_name = 0x564bdb284750 "Port_Group", row_updates = 0x564b8f658470, n 
= 3}
(gdb) print du->table_updates[7]
$10 = {table_name = 0x564b8af6f920 "Connection", row_updates = 0x564b8769f0e0, 
n = 1}
(gdb) print du->table_updates[8]
$11 = {table_name = 0x564c38e30d10 "DHCP_Options", row_updates = 
0x564b8af723e0, n = 237}
(gdb) print du->table_updates[9]
$12 = {table_name = 0x564bd8be5ed0 "Logical_Switch_Port", row_updates = 
0x564b8af74190, n = 47194}
(gdb) 
$13 = {table_name = 0x564bd8be5ed0 "Logical_Switch_Port", row_updates = 
0x564b8af74190, n = 47194}
(gdb) print tu->n
$14 = 47194
(gdb) print j
$15 = 4247


gdb again:
()[root@ovn-northd-0 /]# date
Tue Jul 20 11:26:25 CST 2021
()[root@ovn-northd-0 /]# 
()[root@ovn-northd-0 /]# gdb attach 41
6.6-1.el7.centos.es.x86_64 zlib-1.2.7-18.el7.centos.es.x86_64
(gdb) bt
#0  xmalloc__ (size=size@entry=48) at lib/util.c:136
#1  0x564b7cdc01ab in xmalloc (size=size@entry=48) at lib/util.c:172
#2  0x564b7cdaa9c2 in ovsdb_idl_get_row_arc (src=src@entry=0x564b7d710c40, 
dst_table_class=dst_table_class@entry=0x564b7d175ad0 , 
dst_uuid=0x564b8dc1a8b0) at lib/ovsdb-idl.c:2328
#3  0x564b7cd25eaa in nbrec_port_group_parse_ports (row_=0x564b7d710c40, 
datum=0x564b93d9df48) at lib/ovn-nb-idl.c:33264
#4  0x564b7cda5b06 in ovsdb_idl_row_parse (row=row@entry=0x564b7d710c40) at 
lib/ovsdb-idl.c:1656
#5  0x564b7cda77e9 in ovsdb_idl_row_reparse_backrefs 
(row=row@entry=0x564b946e3470) at lib/ovsdb-idl.c:2074
#6  0x564b7cda7a07 in ovsdb_idl_insert_row (row=row@entry=0x564b946e3470, 
data=0x564bd7f1cc60) at lib/ovsdb-idl.c:2230
#7  0x564b7cda8ee2 in ovsdb_idl_process_update (ru=0x564b8af96f70, 
table=0x564b7d423690) at lib/ovsdb-idl.c:1464
#8  ovsdb_idl_parse_update__ (du=0x564b8de0ea30, idl=0x564b7d422930) at 
lib/ovsdb-idl.c:1336
#9  ovsdb_idl_parse_update (update=0x564bdb13b7e8, idl=0x564b7d422930) at 
lib/ovsdb-idl.c:1375
#10 ovsdb_idl_run (idl=) at lib/ovsdb-idl.c:444
#11 0x564b7cdad89c in ovsdb_idl_loop_run (loop=loop@entry=0x7ffcfcff8d80) 
at lib/ovsdb-idl.c:4122
#12 0x564b7ccf9747 in main (argc=9, argv=0x7ffcfcff8f68) at 
northd/ovn-northd.c:14445
(gdb) frame 8
#8  ovsdb_idl_parse_update__ (du=0x564b8de0ea30, idl=0x564b7d422930)