Follow-up email here has the backtrace for the second method of reproduction. In this case the bridge is not deleted, it was using the loop logic of effectively these commands:
ovs-vsctl del-port br0 bond0 ip link del bond0 <recreate the bond0 in linux> ovs-vsctl add-port br0 bond0 I have seen other crashes while adding/removing other ports, I really don't think the bond has anything to do with it. Not sure why I did it that way but it's what we had in this test when we saw the first crashes. The log leading up to it and backtrace are below: 2015-11-20T12:52:10.539Z|06360|netdev_linux|WARN|dock0: removing policing failed: Operation not supported 2015-11-20T12:52:10.539Z|06361|netdev_linux|WARN|vi_l3_3: removing policing failed: Operation not supported 2015-11-20T12:52:10.912Z|06362|bridge|INFO|bridge lan0: added interface bond0 on port 1625 2015-11-20T12:52:30.654Z|06363|netdev_linux|WARN|Dropped 99 log messages in last 20 seconds (most recently, 20 seconds ago) due to excessive rate 2015-11-20T12:52:30.654Z|06364|netdev_linux|WARN|dock0: removing policing failed: Operation not supported 2015-11-20T12:52:30.945Z|06365|bridge|INFO|bridge lan0: added interface bond0 on port 1626 2015-11-20T12:52:50.759Z|06366|netdev_linux|WARN|Dropped 100 log messages in last 20 seconds (most recently, 20 seconds ago) due to excessive rate 2015-11-20T12:52:50.759Z|06367|netdev_linux|WARN|dock0: removing policing failed: Operation not supported 2015-11-20T12:52:50.759Z|06368|netdev_linux|WARN|vi_l3_3: removing policing failed: Operation not supported 2015-11-20T12:52:51.100Z|06369|bridge|INFO|bridge lan0: added interface bond0 on port 1627 2015-11-20T12:53:10.885Z|06370|netdev_linux|WARN|Dropped 99 log messages in last 20 seconds (most recently, 19 seconds ago) due to excessive rate 2015-11-20T12:53:10.885Z|06371|netdev_linux|WARN|dock0: removing policing failed: Operation not supported 2015-11-20T12:53:10.885Z|06372|netdev_linux|WARN|vi_l3_3: removing policing failed: Operation not supported 2015-11-20T12:53:11.374Z|06373|bridge|INFO|bridge lan0: added interface bond0 on port 1628 2015-11-20T12:53:11.381Z|00001|dpif(revalidator250)|WARN|system@ovs-system: failed to put[modify] (No such file or directory) ufid:d6f5b091-1231-4050-b7e3-1c7d59d7879a recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(3),skb_mark(0x4/0),ct_state(0x21/0xffffffe7),ct_zone(0/0),ct_mark(0x4),ct_label(0/0),eth(src=dc:39:79:80:28:00,dst=84:18:88:d2:09:28),eth_type(0x0800),ipv4(src= 10.20.254.1/0.0.0.0,dst=10.11.192.150/0.0.0.0,proto=17,tos=0/0,ttl=64/0,frag=no),udp(src=59515/0x800,dst=53/0), actions:ct(zone=2,mark=0/0xffffffff),recirc(0x4),ct(zone=3,mark=0/0xffffffff),recirc(0x5),4 2015-11-20T12:53:11.381Z|00002|util(revalidator250)|EMER|lib/cmap.c:846: assertion ok failed in cmap_replace() 2015-11-20T12:53:12.753Z|00002|daemon_unix(monitor)|ERR|1 crashes: pid 959 died, killed (Aborted), core dumped, restarting 2015-11-20T12:53:12.763Z|00003|ovs_numa|INFO|Discovered 0 NUMA nodes and 0 CPU cores 2015-11-20T12:53:12.763Z|00004|memory|INFO|7592 kB peak resident set size after 32992.6 seconds 2015-11-20T12:53:12.763Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2015-11-20T12:53:12.764Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected (gdb) back #0 0x000000fb5413e9dc in raise () from /lib/mips64el-linux-gnuabi64/libc.so.6 #1 0x000000fb54140470 in abort () from /lib/mips64el-linux-gnuabi64/libc.so.6 #2 0x000000012014cf18 in ovs_abort_valist (err_no=<optimized out>, format=<optimized out>, args=<optimized out>) at lib/util.c:323 #3 0x0000000120156bac in vlog_abort_valist (module_=<optimized out>, message=0x1201d3fe0 "%s: assertion %s failed in %s()", args=0xfb25ffc7e0) at lib/vlog.c:1129 #4 0x0000000120156bf4 in vlog_abort (module=<optimized out>, message=<optimized out>) at lib/vlog.c:1143 #5 0x000000012014cb7c in ovs_assert_failure (where=<optimized out>, function=<optimized out>, condition=<optimized out>) at lib/util.c:72 #6 0x00000001200919c8 in cmap_replace (cmap=<optimized out>, old_node=<optimized out>, new_node=0x0, hash=<optimized out>) at lib/cmap.c:846 #7 0x0000000120066b88 in cmap_remove (hash=<optimized out>, node=0xfb10011a70, cmap=0x12340ac78) at ./lib/cmap.h:265 #8 ukey_delete (ukey=0xfb10011a70, umap=0x12340ac48) at ofproto/ofproto-dpif-upcall.c:1729 #9 push_ukey_ops (udpif=udpif@entry=0x123434c70, umap=umap@entry=0x12340ac48, ops=ops@entry=0xfb25ffc890, n_ops=n_ops@entry=1) at ofproto/ofproto-dpif-upcall.c:2046 #10 0x0000000120067f18 in revalidator_sweep__ (revalidator=revalidator@entry=0x123438588, purge=purge@entry=false) at ofproto/ofproto-dpif-upcall.c:2257 #11 0x0000000120068294 in revalidator_sweep (revalidator=0x123438588) at ofproto/ofproto-dpif-upcall.c:2268 #12 udpif_revalidator (arg=0x123438588) at ofproto/ofproto-dpif-upcall.c:863 #13 0x0000000120116d1c in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:340 #14 0x000000fb54387cec in start_thread () from /lib/mips64el-linux-gnuabi64/libpthread.so.0 #15 0x000000fb541f984c in ?? () from /lib/mips64el-linux-gnuabi64/libc.so.6 (gdb) Any known fixes in this area? On Thu, Nov 19, 2015 at 4:54 PM, Keith Holleman <keith.holle...@gmail.com> wrote: > > While obviously not supported or fully in mainline, we've been using the > latest conntrack development code and seeing a number of crashes that have > a very similar failure and the stack trace is below. Curious to know if > this issue is fixed in master - it doesn't really seem to be a conntrack > specific issue but we'd like to get past this and continue to use and test > the conntrack stuff. > > I can reproduce this two ways. The first is to remove a linux bond > interface that is in a bridge using the "ip link del bond0" and then remove > the bridge via "ovs-vsctl del-br <br>". That is the backtrace that is > below and I can reproduce this pretty quickly doing this behavior in a loop > (delete the bond, delete the bridge, wait, create-the bridge, create and > add the bond and another interface, repeat). We weren't really trying > to be that forceful and I figured if the bridge remained up and we just > removed the bond from the bridge cleanly we may avoid the problem. But > simply doing "ovs-vsctl del-port <br> bond0", "ip link del bond0" also > causes a similar problem but with much lower frequency. (I will attach > this backtrace once I get it again if useful) > > Before the crashes, we always see a message like this in ovs-vswitchd.log: > > 2015-11-12T01:29:03.945Z|01200|netdev_linux|WARN|Dropped 52 log messages > in last 74 seconds (most recently, 73 seconds ago) due to excessive rate > 2015-11-12T01:29:03.945Z|01201|netdev_linux|WARN|dock0: removing policing > failed: Operation not supported > 2015-11-12T01:29:03.945Z|01202|netdev_linux|WARN|br0: removing policing > failed: Operation not supported > 2015-11-12T01:29:03.945Z|01203|netdev_linux|WARN|vi_l3_1: removing > policing failed: Operation not supported > 2015-11-12T01:29:03.946Z|01204|netdev_linux|WARN|lan0: removing policing > failed: Operation not supported > 2015-11-12T01:29:03.959Z|01205|bridge|WARN|could not open network device > bond0 (No such device) > 2015-11-12T01:29:03.960Z|01206|netdev_linux|WARN|dock0: removing policing > failed: Operation not supported > 2015-11-12T01:29:03.960Z|01207|netdev_linux|WARN|br0: removing policing > failed: Operation not supported > *2015-11-12T01:29:04.435Z|01208|dpif|WARN|system@ovs-system: failed to > flow_del (No such file or directory) > ufid:734136a4-5748-4691-a52c-cd5bfad4da68 > recirc_id(0),dp_hash(0),skb_priority(0),in_port(2),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=04:00:00:00:00:02,dst=04:00:00:00:00:fe),eth_type(0x0800),ipv4(src=192.168.27.2,dst=10.131.0.87,proto=6,tos=0,ttl=64,frag=no),tcp(src=51582,dst=443),tcp_flags(psh|ack)* > 2015-11-12T01:29:04.435Z|01209|util|EMER|lib/cmap.c:846: assertion ok > failed in cmap_replace() > > We are currently using the conntrack branch at this point: > https://github.com/justinpettit/ovs/commit/86e6bfcb999ed134aa12bf947cea4da1426af2c2 > . We know this is about a month old but updated and didn't see an > improvement and ran into a few other issues. Here is the back trace from > method #1: > > Reading symbols from > /tmp/ovs-sym/usr/lib/debug/usr/sbin/ovs-vswitchd...done. > (gdb) back > #0 0x000000f37a6959dc in raise () from > /lib/mips64el-linux-gnuabi64/libc.so.6 > #1 0x000000f37a697470 in abort () from > /lib/mips64el-linux-gnuabi64/libc.so.6 > #2 0x000000012014cf18 in ovs_abort_valist (err_no=<optimized out>, > format=<optimized out>, args=<optimized out>) at lib/util.c:323 > #3 0x0000000120156bac in vlog_abort_valist (module_=<optimized out>, > message=0x1201d3fe0 "%s: assertion %s failed in %s()", > args=0xf57c5ad230) at lib/vlog.c:1129 > #4 0x0000000120156bf4 in vlog_abort (module=<optimized out>, > message=<optimized out>) at lib/vlog.c:1143 > #5 0x000000012014cb7c in ovs_assert_failure (where=<optimized out>, > function=<optimized out>, condition=<optimized out>) > at lib/util.c:72 > #6 0x00000001200919c8 in cmap_replace (cmap=<optimized out>, > old_node=<optimized out>, new_node=0x0, hash=<optimized out>) > at lib/cmap.c:846 > #7 0x0000000120066b88 in cmap_remove (hash=<optimized out>, > node=0xf2ec00af30, cmap=0x1209031a8) at ./lib/cmap.h:265 > #8 ukey_delete (ukey=0xf2ec00af30, umap=0x120903178) at > ofproto/ofproto-dpif-upcall.c:1729 > #9 push_ukey_ops (udpif=udpif@entry=0x12092ea40, umap=umap@entry=0x120903178, > ops=ops@entry=0xf57c5ad2e0, n_ops=n_ops@entry=1) > at ofproto/ofproto-dpif-upcall.c:2046 > #10 0x0000000120067f18 in revalidator_sweep__ (revalidator=<optimized > out>, purge=purge@entry=true) > at ofproto/ofproto-dpif-upcall.c:2257 > #11 0x0000000120068128 in revalidator_purge (revalidator=<optimized out>) > at ofproto/ofproto-dpif-upcall.c:2274 > #12 udpif_stop_threads (udpif=udpif@entry=0x12092ea40) at > ofproto/ofproto-dpif-upcall.c:461 > #13 0x0000000120068d50 in udpif_stop_threads (udpif=0x12092ea40) at > ofproto/ofproto-dpif-upcall.c:590 > #14 udpif_synchronize (udpif=0x12092ea40) at > ofproto/ofproto-dpif-upcall.c:587 > #15 0x0000000120054458 in destruct (ofproto_=0x1209ad410) at > ofproto/ofproto-dpif.c:1451 > #16 0x00000001200473f0 in ofproto_destroy (p=0x1209ad410) at > ofproto/ofproto.c:1609 > #17 0x000000012002c92c in bridge_destroy (br=br@entry=0x120962d30) at > vswitchd/bridge.c:3207 > #18 0x000000012002ce88 in add_del_bridges (cfg=0x12092e460, > cfg=0x12092e460) at vswitchd/bridge.c:1713 > #19 0x000000012002e404 in bridge_reconfigure > (ovs_cfg=ovs_cfg@entry=0x12092e460) > at vswitchd/bridge.c:597 > #20 0x0000000120032c3c in bridge_run () at vswitchd/bridge.c:2973 > #21 0x0000000120025a28 in main (argc=11, argv=0xf57c5af888) at > vswitchd/ovs-vswitchd.c:120 > > Again, if it helps I can get one from repro method #2. > > Does anyone else see anything similar? Or does anyone know of a fix in > master that may be relevant to this? > > Thanks in advance for any time spent or help. > > >
_______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss