With Guru's help, I believe I have fixed it: https://patchwork.ozlabs.org/patch/954247/
On Wed, Aug 01, 2018 at 11:46:38AM -0700, Guru Shetty wrote: > I was able to reproduce it. I will work with Ben to get this fixed. > > On 26 July 2018 at 23:14, Girish Moodalbail <gmoodalb...@gmail.com> wrote: > > > Hello Ben, > > > > Sorry, got distracted with something else at work. I am still able to > > reproduce the issue, and this is what I have and what I did > > (if you need the core, let me know and I can share it with you) > > > > - 3-cluster RAFT setup in Ubuntu VM (2 VCPUs with 8GB RAM) > > $ uname -r > > Linux u1804-HVM-domU 4.15.0-23-generic #25-Ubuntu SMP Wed May 23 > > 18:02:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux > > > > - On all of the VMs, I have installed openvswitch-switch=2.9.2, > > openvswitch-dbg=2.9.2, and ovn-central=2.9.2 > > (all of these packages are from http://packages.wand.net.nz/) > > > > - I bring up the node in the cluster one after the other -- leader 1st and > > followed by two followers > > - I check for cluster status and everything is healthy > > - ovn-nbctl show and ovn-sbctl show is all empty > > > > - on the leader with OVN_NB_DB set to comma-separated-NB connection > > strings I did > > for i in `seq 1 50`; do ovn-nbclt ls-add ls$i; ovn-nbctl lsp-add ls$i > > port0_$i; done > > > > - Check for the presence of 50 logical switches and 50 logical ports (one > > on each switch). Compact the database on all the nodes. > > > > - Next I try to delete the ports and whilst the deletion is happening I > > run compact on one of the followers > > > > leader_node# for i in `seq 1 50`; do ovn-nbctl lsp-del port0_$i;done > > follower_node# ovs-appctl -t /var/run/openvswitch/ovnnb_db.ctl > > ovsdb-server/compact OVN_Northbound > > > > - On the follower node I see the crash: > > > > ● ovn-central.service - LSB: OVN central components > > Loaded: loaded (/etc/init.d/ovn-central; generated) > > Active: active (running) since Thu 2018-07-26 22:48:53 PDT; 19min ago > > Docs: man:systemd-sysv-generator(8) > > Process: 21883 ExecStop=/etc/init.d/ovn-central stop (code=exited, > > status=0/SUCCESS) > > Process: 21934 ExecStart=/etc/init.d/ovn-central start (code=exited, > > status=0/SUCCESS) > > Tasks: 10 (limit: 4915) > > CGroup: /system.slice/ovn-central.service > > ├─22047 ovsdb-server: monitoring pid 22134 (*1 crashes: pid > > 22048 died, killed (Aborted), core dumped* > > ├─22059 ovsdb-server: monitoring pid 22060 (healthy) > > ├─22060 ovsdb-server -vconsole:off -vfile:info > > --log-file=/var/log/openvswitch/ovsdb-server-sb.log - > > ├─22072 ovn-northd: monitoring pid 22073 (healthy) > > ├─22073 ovn-northd -vconsole:emer -vsyslog:err -vfile:info > > --ovnnb-db=tcp:10.0.7.33:6641,tcp:10.0.7. > > └─22134 ovsdb-server -vconsole:off -vfile:info > > --log-file=/var/log/openvswitch/ovsdb-server-nb.log > > > > > > Same call trace and reason: > > > > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 > > #1 0x00007f79599a1801 in __GI_abort () at abort.c:79 > > #2 0x00005596879c017c in json_serialize (json=<optimized out>, > > s=<optimized out>) at ../lib/json.c:1554 > > #3 0x00005596879c01eb in json_serialize_object_member (i=<optimized out>, > > s=<optimized out>, node=<optimized out>, node=<optimized out>) at > > ../lib/json.c:1583 > > #4 0x00005596879c0132 in json_serialize_object (s=0x7ffc17013bf0, > > object=0x55968993dcb0) at ../lib/json.c:1612 > > #5 json_serialize (json=<optimized out>, s=0x7ffc17013bf0) at > > ../lib/json.c:1533 > > #6 0x00005596879c249c in json_to_ds (json=json@entry=0x559689950670, > > flags=flags@entry=0, ds=ds@entry=0x7ffc17013c80) at ../lib/json.c:1511 > > #7 0x00005596879ae8df in ovsdb_log_compose_record > > (json=json@entry=0x559689950670, > > magic=0x55968993dc60 "CLUSTER", header=header@entry=0x7ffc17013c60, > > data=data@entry=0x7ffc17013c80) at ../ovsdb/log.c:570 > > #8 0x00005596879aebbf in ovsdb_log_write (file=0x5596899b5df0, > > json=0x559689950670) at ../ovsdb/log.c:618 > > #9 0x00005596879aed3e in ovsdb_log_write_and_free > > (log=log@entry=0x5596899b5df0, > > json=0x559689950670) at ../ovsdb/log.c:651 > > #10 0x00005596879b0954 in raft_write_snapshot > > (raft=raft@entry=0x5596899151a0, > > log=0x5596899b5df0, new_log_start=new_log_start@entry=166, > > new_snapshot=new_snapshot@entry=0x7ffc17013e30) at > > ../ovsdb/raft.c:3588 > > #11 0x00005596879b0ec3 in raft_save_snapshot > > (raft=raft@entry=0x5596899151a0, > > new_start=new_start@entry=166, new_snapshot=new_snapshot@ > > entry=0x7ffc17013e30) > > at ../ovsdb/raft.c:3647 > > #12 0x00005596879b8aed in raft_store_snapshot (raft=0x5596899151a0, > > new_snapshot_data=new_snapshot_data@entry=0x5596899505f0) at > > ../ovsdb/raft.c:3849 > > #13 0x00005596879a579e in ovsdb_storage_store_snapshot__ > > (storage=0x5596899137a0, schema=0x559689938ca0, data=0x559689946ea0) at > > ../ovsdb/storage.c:541 > > #14 0x00005596879a625e in ovsdb_storage_store_snapshot > > (storage=0x5596899137a0, schema=schema@entry=0x559689938ca0, > > data=data@entry=0x559689946ea0) at ../ovsdb/storage.c:568 > > #15 0x000055968799f5ab in ovsdb_snapshot (db=0x5596899137e0) at > > ../ovsdb/ovsdb.c:519 > > #16 0x0000559687999f23 in ovsdb_server_compact (conn=0x559689938440, > > argc=<optimized out>, argv=<optimized out>, dbs_=0x7ffc170141c0) at > > ../ovsdb/ovsdb-server.c:1443 > > #17 0x00005596879d9cc0 in process_command (request=<optimized out>, > > conn=0x559689938440) at ../lib/unixctl.c:315 > > #18 run_connection (conn=0x559689938440) at ../lib/unixctl.c:349 > > #19 unixctl_server_run (server=0x559689937370) at ../lib/unixctl.c:400 > > #20 0x0000559687996e1e in main_loop (is_backup=0x7ffc1701412e, > > exiting=0x7ffc1701412f, run_process=0x0, remotes=0x7ffc17014180, > > unixctl=0x559689937370, all_dbs=0x7ffc170141c0, > > jsonrpc=0x559689915120, config=0x7ffc170141e0) at > > ../ovsdb/ovsdb-server.c:201 > > #21 main (argc=<optimized out>, argv=<optimized out>) at > > ../ovsdb/ovsdb-server.c:457 > > > > Thanks, > > ~Girish > > > > > > > > On Wed, Jul 25, 2018 at 3:06 PM, Ben Pfaff <b...@ovn.org> wrote: > > > >> On Wed, Jul 18, 2018 at 10:48:08AM -0700, Girish Moodalbail wrote: > >> > Hello all, > >> > > >> > We are able to reproduce this issue on OVS 2.9.2 at will. The OVSDB NB > >> > server or OVSDB SB server dumps core while it is trying to compact the > >> > database. > >> > > >> > You can reproduce the issue by using: > >> > > >> > root@u1804-HVM-domU:/var/crash# ovs-appctl -t > >> > /var/run/openvswitch/ovnsb_db.ctl ovsdb-server/compact OVN_Southbound > >> > > >> > 2018-07-18T17:34:29Z|00001|unixctl|WARN|error communicating with > >> > unix:/var/run/openvswitch/ovnsb_db.ctl: End of file > >> > ovs-appctl: /var/run/openvswitch/ovnsb_db.ctl: transaction error (End > >> of > >> > file) > >> > >> Hmm. I've now spent some time playing with clustered OVSDB, in 3-server > >> and 5-server configurations, and triggering compaction at various points > >> while starting and stopping servers. But I haven't yet managed to > >> trigger this crash. > >> > >> Is there anything else that seems to be an important element? > >> > >> Thanks, > >> > >> Ben. > >> > > > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev