Follow-up email here has the backtrace for the second method of
reproduction.  In this case the bridge is not deleted, it was using the
loop logic of effectively these commands:

ovs-vsctl del-port br0 bond0
ip link del bond0
<recreate the bond0 in linux>
ovs-vsctl add-port br0 bond0

I have seen other crashes while adding/removing other ports, I really don't
think the bond has anything to do with it.  Not sure why I did it that way
but it's what we had in this test when we saw the first crashes.   The log
leading up to it and backtrace are below:

2015-11-20T12:52:10.539Z|06360|netdev_linux|WARN|dock0: removing policing
failed: Operation not supported
2015-11-20T12:52:10.539Z|06361|netdev_linux|WARN|vi_l3_3: removing policing
failed: Operation not supported
2015-11-20T12:52:10.912Z|06362|bridge|INFO|bridge lan0: added interface
bond0 on port 1625
2015-11-20T12:52:30.654Z|06363|netdev_linux|WARN|Dropped 99 log messages in
last 20 seconds (most recently, 20 seconds ago) due to excessive rate
2015-11-20T12:52:30.654Z|06364|netdev_linux|WARN|dock0: removing policing
failed: Operation not supported
2015-11-20T12:52:30.945Z|06365|bridge|INFO|bridge lan0: added interface
bond0 on port 1626
2015-11-20T12:52:50.759Z|06366|netdev_linux|WARN|Dropped 100 log messages
in last 20 seconds (most recently, 20 seconds ago) due to excessive rate
2015-11-20T12:52:50.759Z|06367|netdev_linux|WARN|dock0: removing policing
failed: Operation not supported
2015-11-20T12:52:50.759Z|06368|netdev_linux|WARN|vi_l3_3: removing policing
failed: Operation not supported
2015-11-20T12:52:51.100Z|06369|bridge|INFO|bridge lan0: added interface
bond0 on port 1627
2015-11-20T12:53:10.885Z|06370|netdev_linux|WARN|Dropped 99 log messages in
last 20 seconds (most recently, 19 seconds ago) due to excessive rate
2015-11-20T12:53:10.885Z|06371|netdev_linux|WARN|dock0: removing policing
failed: Operation not supported
2015-11-20T12:53:10.885Z|06372|netdev_linux|WARN|vi_l3_3: removing policing
failed: Operation not supported
2015-11-20T12:53:11.374Z|06373|bridge|INFO|bridge lan0: added interface
bond0 on port 1628
2015-11-20T12:53:11.381Z|00001|dpif(revalidator250)|WARN|system@ovs-system:
failed to put[modify] (No such file or directory)
ufid:d6f5b091-1231-4050-b7e3-1c7d59d7879a
recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(3),skb_mark(0x4/0),ct_state(0x21/0xffffffe7),ct_zone(0/0),ct_mark(0x4),ct_label(0/0),eth(src=dc:39:79:80:28:00,dst=84:18:88:d2:09:28),eth_type(0x0800),ipv4(src=
10.20.254.1/0.0.0.0,dst=10.11.192.150/0.0.0.0,proto=17,tos=0/0,ttl=64/0,frag=no),udp(src=59515/0x800,dst=53/0),
actions:ct(zone=2,mark=0/0xffffffff),recirc(0x4),ct(zone=3,mark=0/0xffffffff),recirc(0x5),4
2015-11-20T12:53:11.381Z|00002|util(revalidator250)|EMER|lib/cmap.c:846:
assertion ok failed in cmap_replace()
2015-11-20T12:53:12.753Z|00002|daemon_unix(monitor)|ERR|1 crashes: pid 959
died, killed (Aborted), core dumped, restarting
2015-11-20T12:53:12.763Z|00003|ovs_numa|INFO|Discovered 0 NUMA nodes and 0
CPU cores
2015-11-20T12:53:12.763Z|00004|memory|INFO|7592 kB peak resident set size
after 32992.6 seconds
2015-11-20T12:53:12.763Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock:
connecting...
2015-11-20T12:53:12.764Z|00006|reconnect|INFO|unix:/var/run/openvswitch/db.sock:
connected


(gdb) back
#0  0x000000fb5413e9dc in raise () from
/lib/mips64el-linux-gnuabi64/libc.so.6
#1  0x000000fb54140470 in abort () from
/lib/mips64el-linux-gnuabi64/libc.so.6
#2  0x000000012014cf18 in ovs_abort_valist (err_no=<optimized out>,
format=<optimized out>, args=<optimized out>) at lib/util.c:323
#3  0x0000000120156bac in vlog_abort_valist (module_=<optimized out>,
message=0x1201d3fe0 "%s: assertion %s failed in %s()",
    args=0xfb25ffc7e0) at lib/vlog.c:1129
#4  0x0000000120156bf4 in vlog_abort (module=<optimized out>,
message=<optimized out>) at lib/vlog.c:1143
#5  0x000000012014cb7c in ovs_assert_failure (where=<optimized out>,
function=<optimized out>, condition=<optimized out>)
    at lib/util.c:72
#6  0x00000001200919c8 in cmap_replace (cmap=<optimized out>,
old_node=<optimized out>, new_node=0x0, hash=<optimized out>)
    at lib/cmap.c:846
#7  0x0000000120066b88 in cmap_remove (hash=<optimized out>,
node=0xfb10011a70, cmap=0x12340ac78) at ./lib/cmap.h:265
#8  ukey_delete (ukey=0xfb10011a70, umap=0x12340ac48) at
ofproto/ofproto-dpif-upcall.c:1729
#9  push_ukey_ops (udpif=udpif@entry=0x123434c70, umap=umap@entry=0x12340ac48,
ops=ops@entry=0xfb25ffc890, n_ops=n_ops@entry=1)
    at ofproto/ofproto-dpif-upcall.c:2046
#10 0x0000000120067f18 in revalidator_sweep__
(revalidator=revalidator@entry=0x123438588,
purge=purge@entry=false)
    at ofproto/ofproto-dpif-upcall.c:2257
#11 0x0000000120068294 in revalidator_sweep (revalidator=0x123438588) at
ofproto/ofproto-dpif-upcall.c:2268
#12 udpif_revalidator (arg=0x123438588) at ofproto/ofproto-dpif-upcall.c:863
#13 0x0000000120116d1c in ovsthread_wrapper (aux_=<optimized out>) at
lib/ovs-thread.c:340
#14 0x000000fb54387cec in start_thread () from
/lib/mips64el-linux-gnuabi64/libpthread.so.0
#15 0x000000fb541f984c in ?? () from /lib/mips64el-linux-gnuabi64/libc.so.6
(gdb)


Any known fixes in this area?

On Thu, Nov 19, 2015 at 4:54 PM, Keith Holleman <keith.holle...@gmail.com>
wrote:

>
> While obviously not supported or fully in mainline, we've been using the
> latest conntrack development code and seeing a number of crashes that have
> a very similar failure and the stack trace is below.  Curious to know if
> this issue is fixed in master - it doesn't really seem to be a conntrack
> specific issue but we'd like to get past this and continue to use and test
> the conntrack stuff.
>
> I can reproduce this two ways.  The first is to remove a linux bond
> interface that is in a bridge using the "ip link del bond0" and then remove
> the bridge via "ovs-vsctl del-br <br>".   That is the backtrace that is
> below and I can reproduce this pretty quickly doing this behavior in a loop
> (delete the bond, delete the bridge, wait, create-the bridge, create and
> add the bond and another interface, repeat).     We weren't really trying
> to be that forceful and I figured if the bridge remained up and we just
> removed the bond from the bridge cleanly we may avoid the problem.  But
> simply doing "ovs-vsctl del-port <br> bond0", "ip link del bond0" also
> causes a similar problem but with much lower frequency.  (I will attach
> this backtrace once I get it again if useful)
>
> Before the crashes, we always see a message like this in ovs-vswitchd.log:
>
> 2015-11-12T01:29:03.945Z|01200|netdev_linux|WARN|Dropped 52 log messages
> in last 74 seconds (most recently, 73 seconds ago) due to excessive rate
> 2015-11-12T01:29:03.945Z|01201|netdev_linux|WARN|dock0: removing policing
> failed: Operation not supported
> 2015-11-12T01:29:03.945Z|01202|netdev_linux|WARN|br0: removing policing
> failed: Operation not supported
> 2015-11-12T01:29:03.945Z|01203|netdev_linux|WARN|vi_l3_1: removing
> policing failed: Operation not supported
> 2015-11-12T01:29:03.946Z|01204|netdev_linux|WARN|lan0: removing policing
> failed: Operation not supported
> 2015-11-12T01:29:03.959Z|01205|bridge|WARN|could not open network device
> bond0 (No such device)
> 2015-11-12T01:29:03.960Z|01206|netdev_linux|WARN|dock0: removing policing
> failed: Operation not supported
> 2015-11-12T01:29:03.960Z|01207|netdev_linux|WARN|br0: removing policing
> failed: Operation not supported
> *2015-11-12T01:29:04.435Z|01208|dpif|WARN|system@ovs-system: failed to
> flow_del (No such file or directory)
> ufid:734136a4-5748-4691-a52c-cd5bfad4da68
> recirc_id(0),dp_hash(0),skb_priority(0),in_port(2),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=04:00:00:00:00:02,dst=04:00:00:00:00:fe),eth_type(0x0800),ipv4(src=192.168.27.2,dst=10.131.0.87,proto=6,tos=0,ttl=64,frag=no),tcp(src=51582,dst=443),tcp_flags(psh|ack)*
> 2015-11-12T01:29:04.435Z|01209|util|EMER|lib/cmap.c:846: assertion ok
> failed in cmap_replace()
>
> We are currently using the conntrack branch at this point:
> https://github.com/justinpettit/ovs/commit/86e6bfcb999ed134aa12bf947cea4da1426af2c2
>  .  We know this is about a month old but updated and didn't see an
> improvement and ran into a few other issues.  Here is the back trace from
> method #1:
>
> Reading symbols from
> /tmp/ovs-sym/usr/lib/debug/usr/sbin/ovs-vswitchd...done.
> (gdb) back
> #0  0x000000f37a6959dc in raise () from
> /lib/mips64el-linux-gnuabi64/libc.so.6
> #1  0x000000f37a697470 in abort () from
> /lib/mips64el-linux-gnuabi64/libc.so.6
> #2  0x000000012014cf18 in ovs_abort_valist (err_no=<optimized out>,
> format=<optimized out>, args=<optimized out>) at lib/util.c:323
> #3  0x0000000120156bac in vlog_abort_valist (module_=<optimized out>,
> message=0x1201d3fe0 "%s: assertion %s failed in %s()",
>     args=0xf57c5ad230) at lib/vlog.c:1129
> #4  0x0000000120156bf4 in vlog_abort (module=<optimized out>,
> message=<optimized out>) at lib/vlog.c:1143
> #5  0x000000012014cb7c in ovs_assert_failure (where=<optimized out>,
> function=<optimized out>, condition=<optimized out>)
>     at lib/util.c:72
> #6  0x00000001200919c8 in cmap_replace (cmap=<optimized out>,
> old_node=<optimized out>, new_node=0x0, hash=<optimized out>)
>     at lib/cmap.c:846
> #7  0x0000000120066b88 in cmap_remove (hash=<optimized out>,
> node=0xf2ec00af30, cmap=0x1209031a8) at ./lib/cmap.h:265
> #8  ukey_delete (ukey=0xf2ec00af30, umap=0x120903178) at
> ofproto/ofproto-dpif-upcall.c:1729
> #9  push_ukey_ops (udpif=udpif@entry=0x12092ea40, umap=umap@entry=0x120903178,
> ops=ops@entry=0xf57c5ad2e0, n_ops=n_ops@entry=1)
>     at ofproto/ofproto-dpif-upcall.c:2046
> #10 0x0000000120067f18 in revalidator_sweep__ (revalidator=<optimized
> out>, purge=purge@entry=true)
>     at ofproto/ofproto-dpif-upcall.c:2257
> #11 0x0000000120068128 in revalidator_purge (revalidator=<optimized out>)
> at ofproto/ofproto-dpif-upcall.c:2274
> #12 udpif_stop_threads (udpif=udpif@entry=0x12092ea40) at
> ofproto/ofproto-dpif-upcall.c:461
> #13 0x0000000120068d50 in udpif_stop_threads (udpif=0x12092ea40) at
> ofproto/ofproto-dpif-upcall.c:590
> #14 udpif_synchronize (udpif=0x12092ea40) at
> ofproto/ofproto-dpif-upcall.c:587
> #15 0x0000000120054458 in destruct (ofproto_=0x1209ad410) at
> ofproto/ofproto-dpif.c:1451
> #16 0x00000001200473f0 in ofproto_destroy (p=0x1209ad410) at
> ofproto/ofproto.c:1609
> #17 0x000000012002c92c in bridge_destroy (br=br@entry=0x120962d30) at
> vswitchd/bridge.c:3207
> #18 0x000000012002ce88 in add_del_bridges (cfg=0x12092e460,
> cfg=0x12092e460) at vswitchd/bridge.c:1713
> #19 0x000000012002e404 in bridge_reconfigure 
> (ovs_cfg=ovs_cfg@entry=0x12092e460)
> at vswitchd/bridge.c:597
> #20 0x0000000120032c3c in bridge_run () at vswitchd/bridge.c:2973
> #21 0x0000000120025a28 in main (argc=11, argv=0xf57c5af888) at
> vswitchd/ovs-vswitchd.c:120
>
> Again, if it helps I can get one from repro method #2.
>
> Does anyone else see anything similar?  Or does anyone know of a fix in
> master that may be relevant to this?
>
> Thanks in advance for any time spent or help.
>
>
>
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to