Strange indeed. Ubuntu built this binary with this GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Let me know if there is any other information I can provide that would be helpful here. On Fri, Oct 31, 2014 at 3:42 PM, Ben Pfaff <[email protected]> wrote: > On Mon, Oct 27, 2014 at 04:21:54PM -0700, Duncan Idaho wrote: > > We're currently seeing this crash several times a day in our Ubuntu > > Icehouse OpenStack environment of about 60 nodes. > > > > Program terminated with signal SIGSEGV, Segmentation fault. > > #0 nl_attr_get_size (nla=nla@entry=0x0) at ../lib/netlink.c:506 > > #1 0x0000000000460473 in format_generic_odp_key (a=a@entry=0x0, > > ds=ds@entry=0x7fffc803d290) > > at ../lib/odp-util.c:767 > > #2 0x0000000000460cd2 in format_odp_key_attr (a=a@entry=0x1a63c98, > > ma=ma@entry=0x0, ds=ds@entry=0x7fffc803d290, verbose=verbose@entry=true) > at > > ../lib/odp-util.c:1332 > > #3 0x00000000004609d7 in odp_flow_format (key=key@entry=0x1a63c50, > > key_len=key_len@entry=80, mask=mask@entry=0x0, mask_len=mask_len@entry > =0, > > ds=ds@entry=0x7fffc803d290, > > verbose=verbose@entry=true) at ../lib/odp-util.c:1402 > > #4 0x00000000004450f3 in log_flow_message (error=error@entry=2, > > operation=operation@entry=0x4d0e73 "flow_del", key=0x1a63c50, > key_len=80, > > mask=mask@entry=0x0, > > mask_len=mask_len@entry=0, stats=0x0, actions=actions@entry=0x0, > > actions_len=actions_len@entry=0, dpif=<optimized out>) at > ../lib/dpif.c:1354 > > #5 0x00000000004453c9 in log_flow_del_message (dpif=dpif@entry > =0x1a06c70, > > del=del@entry=0x7fffc803d340, error=error@entry=2) at ../lib/dpif.c:1397 > > #6 0x0000000000445433 in log_flow_del_message (error=2, > > del=0x7fffc803d340, dpif=0x1a06c70) at ../lib/dpif.c:1396 > > #7 dpif_flow_del__ (dpif=0x1a06c70, del=del@entry=0x7fffc803d340) at > > ../lib/dpif.c:945 > > #8 0x00000000004455ca in dpif_flow_del (dpif=<optimized out>, > > key=<optimized out>, key_len=<optimized out>, stats=stats@entry > =0x7fffc803d370) > > at ../lib/dpif.c:965 > > #9 0x000000000041b423 in subfacet_uninstall (subfacet=0x1be9a80) at > > ../ofproto/ofproto-dpif.c:4686 > > #10 0x0000000000420f18 in facet_remove (facet=0x1be9680) at > > ../ofproto/ofproto-dpif.c:4014 > > #11 0x0000000000422f52 in facet_revalidate (facet=facet@entry=0x1be9680) > at > > ../ofproto/ofproto-dpif.c:4321 > > #12 0x0000000000423b96 in type_run (type=<optimized out>) at > > ../ofproto/ofproto-dpif.c:836 > > #13 0x000000000041224f in ofproto_type_run (datapath_type=<optimized > out>, > > datapath_type@entry=0x1ab88a0 "system") at ../ofproto/ofproto.c:1309 > > #14 0x000000000040d755 in bridge_run () at ../vswitchd/bridge.c:2384 > > #15 0x00000000004059bb in main (argc=<optimized out>, argv=<optimized > out>) > > at ../vswitchd/ovs-vswitchd.c:118 > > This backtrace doesn't quite add up. > > We can see from frames 4 and 3 that we've got a nonnull 'key', which > becomes a nonnull nlattr 'a' in frame 2. Along the same chain, we > have a null 'mask' that becomes a null 'ma'. I often don't trust GDB > to give me correct arguments in backtraces but all of that adds up > nicely so I tend to believe it. > > Take a look at the code for format_odp_key_attr(). It always > dereferences 'a' to get its type 'attr': > > enum ovs_key_attr attr = nl_attr_type(a); > > A few lines later we can see 'is_exact' getting set to true (since > 'ma' is NULL): > > bool is_exact; > > is_exact = ma ? odp_mask_attr_is_exact(ma) : true; > > We're evidently hitting the default case in the switch statement given > the line number cited in the backtrace, which runs this code: > > case OVS_KEY_ATTR_UNSPEC: > case __OVS_KEY_ATTR_MAX: > default: > format_generic_odp_key(a, ds); > if (!is_exact) { > ds_put_char(ds, '/'); > format_generic_odp_key(ma, ds); <---- line 1332 > } > break; > > but that doesn't make sense--we should never get there, because > is_exact is true. So--WTF? > > > This is probably related to the following "fixed" Ubuntu bug: > > https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1352570 > > > > The fix referenced was: > > > https://github.com/openvswitch/ovs/commit/dd2e44f835fac8c2df99f84c54250c3ca981f2f5 > > > > Not sure if it's relevant but part of this patch was reverted prior to > the > > 2.0.2 release: > > > https://github.com/openvswitch/ovs/commit/e8ac8c3940535fb439eba980afa6c61bdd428003 > > commit dd2e44f835 is about a race between two threads when a bridge is > being deleted. I don't see any evidence that there's a bridge being > deleted here. > > > Any help will be appreciated! Let me know if I can provide any more > > relevant information. > > What GCC version was used for this build? I've seen an unusual number > of code generation bugs with GCC 4.9.x. >
_______________________________________________ discuss mailing list [email protected] http://openvswitch.org/mailman/listinfo/discuss
