Hi all, I’m running ovn 22.09 and sometimes see that ovn-controllers crash with segmentation fault. The backtrace is next:
(gdb) bt #0 0x00007f0742707de1 in __strlen_sse2 () from /lib64/libc.so.6 #1 0x00007f0742788c5d in inet_pton () from /lib64/libc.so.6 #2 0x0000564f45a1c784 in ip_parse (s=<optimized out>, ip=ip@entry=0x7f074040f90c) at lib/packets.c:698 #3 0x0000564f4594cbfb in svc_monitor_send_tcp_health_check__ (swconn=swconn@entry=0x7f0738000940, svc_mon=svc_mon@entry=0x564f4c2960c0, ctl_flags=ctl_flags@entry=2, tcp_seq=3858078915, tcp_ack=tcp_ack@entry=0, tcp_src=<optimized out>) at controller/pinctrl.c:7513 #4 0x0000564f4594d47c in svc_monitor_send_tcp_health_check__ (tcp_src=<optimized out>, tcp_ack=0, tcp_seq=<optimized out>, ctl_flags=2, svc_mon=0x564f4c2960c0, swconn=0x7f0738000940) at controller/pinctrl.c:7502 #5 svc_monitor_send_health_check (swconn=swconn@entry=0x7f0738000940, svc_mon=svc_mon@entry=0x564f4c2960c0) at controller/pinctrl.c:7621 #6 0x0000564f4595869b in svc_monitors_run (svc_monitors_next_run_time=0x564f45dd3970 <svc_monitors_next_run_time.37793>, swconn=0x7f0738000940) at controller/pinctrl.c:7693 #7 pinctrl_handler (arg_=0x564f45e11240 <pinctrl>) at controller/pinctrl.c:3499 #8 0x0000564f45a0ad6f in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:422 #9 0x00007f074325bea5 in start_thread () from /lib64/libpthread.so.0 #10 0x00007f07427798dd in clone () from /lib64/libc.so.6 After moving to frame #3, I can get actual data from svc_mon structure (port/protocol/dp_key/port_key) - I’ve looked them up in SB DB and found port_binding, which belongs to a logical port, which resides on this chassis. It has configured LB with HC. Here everything seems good. But if to check svc_mon->sb_svc_mon structure, it seems to me that it contains garbage - Address 0x564f00000000 out of bounds; logical_port == 0, etc (but I can be wrong): $1 = (const struct sbrec_service_monitor *) 0x564f54db2b40 (gdb) print *svc_mon->sb_svc_mon $2 = {header_ = {hmap_node = {hash = 94898726054728, next = 0x0}, uuid = {parts = {0, 0, 0, 0}}, src_arcs = {prev = 0x564f54aae0d0, next = 0x0}, dst_arcs = {prev = 0x564f7f8bd470, next = 0x564f7f8bd540}, table = 0x64, old_datum = 0xf, parsed = 152, reparse_node = {prev = 0x0, next = 0x0}, new_datum = 0x0, prereqs = 0x52eb8916, written = 0x171, txn_node = {hash = 1, next = 0x564f54db2db0}, map_op_written = 0x0, map_op_lists = 0x0, set_op_written = 0x0, set_op_lists = 0x0, change_seqno = {0, 0, 0}, track_node = {prev = 0x564f00000000, next = 0x0}, updated = 0x0, tracked_old_datum = 0x0}, external_ids = {map = {buckets = 0x1, one = 0x564f54db2d90, mask = 0, n = 0}}, ip = 0x564f00000000 <Address 0x564f00000000 out of bounds>, logical_port = 0x0, options = {map = {buckets = 0x0, one = 0x0, mask = 1, n = 94898780242768}}, port = 0, protocol = 0x0, src_ip = 0x1 <Address 0x1 out of bounds>, src_mac = 0x564f54db2d70 "`Ջ\177OV", status = 0x0} … (gdb) print svc_mon->state $8 = SVC_MON_S_ONLINE (gdb) print svc_mon->status $9 = SVC_MON_ST_ONLINE (gdb) print svc_mon->protocol $10 = SVC_MON_PROTO_TCP (gdb) print svc_mon->sb_svc_mon This crash occurred right after ovsdb SB connection loss due to inactivity probe failure. So, ovn-controller was re-connecting to SB, and I guess, this could somehow re-initialize SB IDL objects. I’m not sure I can try to reproduce this behaviour on latest main branch, so my question, if this theoretically can be connected with re-initialization of IDL? If yes, what should be done to avoid such behavior? Should ovn-controller process changes if its IDL is in inconsistent state? Any help is appreciated. Regards, Vladislav Odintsov _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev