Hi all,

I’m running ovn 22.09 and sometimes see that ovn-controllers crash with
segmentation fault.  The backtrace is next:

(gdb) bt
#0  0x00007f0742707de1 in __strlen_sse2 () from /lib64/libc.so.6
#1  0x00007f0742788c5d in inet_pton () from /lib64/libc.so.6
#2  0x0000564f45a1c784 in ip_parse (s=<optimized out>, 
ip=ip@entry=0x7f074040f90c) at lib/packets.c:698
#3  0x0000564f4594cbfb in svc_monitor_send_tcp_health_check__ 
(swconn=swconn@entry=0x7f0738000940,
    svc_mon=svc_mon@entry=0x564f4c2960c0, ctl_flags=ctl_flags@entry=2, 
tcp_seq=3858078915, tcp_ack=tcp_ack@entry=0,
    tcp_src=<optimized out>) at controller/pinctrl.c:7513
#4  0x0000564f4594d47c in svc_monitor_send_tcp_health_check__ 
(tcp_src=<optimized out>, tcp_ack=0, tcp_seq=<optimized out>,
    ctl_flags=2, svc_mon=0x564f4c2960c0, swconn=0x7f0738000940) at 
controller/pinctrl.c:7502
#5  svc_monitor_send_health_check (swconn=swconn@entry=0x7f0738000940, 
svc_mon=svc_mon@entry=0x564f4c2960c0)
    at controller/pinctrl.c:7621
#6  0x0000564f4595869b in svc_monitors_run 
(svc_monitors_next_run_time=0x564f45dd3970 <svc_monitors_next_run_time.37793>,
    swconn=0x7f0738000940) at controller/pinctrl.c:7693
#7  pinctrl_handler (arg_=0x564f45e11240 <pinctrl>) at controller/pinctrl.c:3499
#8  0x0000564f45a0ad6f in ovsthread_wrapper (aux_=<optimized out>) at 
lib/ovs-thread.c:422
#9  0x00007f074325bea5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f07427798dd in clone () from /lib64/libc.so.6

After moving to frame #3, I can get actual data from svc_mon structure
(port/protocol/dp_key/port_key) - I’ve looked them up in SB DB and found
port_binding, which belongs to a logical port, which resides on this chassis.
It has configured LB with HC. Here everything seems good.  But if to check
svc_mon->sb_svc_mon structure, it seems to me that it contains garbage -
Address 0x564f00000000 out of bounds; logical_port == 0, etc (but I can be
wrong):

$1 = (const struct sbrec_service_monitor *) 0x564f54db2b40
(gdb) print *svc_mon->sb_svc_mon
$2 = {header_ = {hmap_node = {hash = 94898726054728, next = 0x0}, uuid = {parts 
= {0, 0, 0, 0}}, src_arcs = {prev = 0x564f54aae0d0, next = 0x0}, dst_arcs = 
{prev = 0x564f7f8bd470, next = 0x564f7f8bd540}, table = 0x64, old_datum = 0xf,
    parsed = 152, reparse_node = {prev = 0x0, next = 0x0}, new_datum = 0x0, 
prereqs = 0x52eb8916, written = 0x171, txn_node = {hash = 1, next = 
0x564f54db2db0}, map_op_written = 0x0, map_op_lists = 0x0, set_op_written = 0x0,
    set_op_lists = 0x0, change_seqno = {0, 0, 0}, track_node = {prev = 
0x564f00000000, next = 0x0}, updated = 0x0, tracked_old_datum = 0x0}, 
external_ids = {map = {buckets = 0x1, one = 0x564f54db2d90, mask = 0, n = 0}},
  ip = 0x564f00000000 <Address 0x564f00000000 out of bounds>, logical_port = 
0x0, options = {map = {buckets = 0x0, one = 0x0, mask = 1, n = 
94898780242768}}, port = 0, protocol = 0x0, src_ip = 0x1 <Address 0x1 out of 
bounds>,
  src_mac = 0x564f54db2d70 "`Ջ\177OV", status = 0x0}
…
(gdb) print svc_mon->state
$8 = SVC_MON_S_ONLINE
(gdb) print svc_mon->status
$9 = SVC_MON_ST_ONLINE
(gdb) print svc_mon->protocol
$10 = SVC_MON_PROTO_TCP
(gdb) print svc_mon->sb_svc_mon

This crash occurred right after ovsdb SB connection loss due to inactivity
probe failure.  So, ovn-controller was re-connecting to SB, and I guess, this
could somehow re-initialize SB IDL objects.

I’m not sure I can try to reproduce this behaviour on latest main branch, so my
question, if this theoretically can be connected with re-initialization of IDL?
If yes, what should be done to avoid such behavior?
Should ovn-controller process changes if its IDL is in inconsistent state?

Any help is appreciated.

Regards,
Vladislav Odintsov

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to