There are several fixes for upcall handling on branch-2.7 after 2.7.0. commit 634e1d0f3a5b9b40b665fca1bcf6e63a07bda2d2 Author: Joe Stringer <j...@ovn.org> Date: Mon Jul 31 16:54:22 2017 -0700
ofproto-dpif-upcall: Fix key attr iteration. This call is operating on messages generated by the datapath. If a datapath implementation sends improperly formatted netlink attributes, then it's possible for a revalidator thread to end up trapped in an infinite loop iterating across these attributes. Rather than using the UNSAFE variation of this iterator, use the regular version. From master commit f2d3fef3d90253dda3e03822df2e921ec853192d. Fixes: 994fcc5a15d3 ("upcall: Check for recirc_id in ukey_create_from_dpif_flow()") Signed-off-by: Joe Stringer <j...@ovn.org> Reviewed-by: Greg Rose <gvrose8...@gmail.com> Acked-by: Ben Pfaff <b...@ovn.org> commit 8e4e55f17e07c45143338c543239c556c2214df1 Author: Joe Stringer <j...@ovn.org> Date: Mon Jul 31 16:54:21 2017 -0700 ofproto-dpif-upcall: Fix action attr iteration. This calls is operating on messages generated by the datapath. If a datapath implementation sends improperly formatted netlink attributes, then it's possible for a revalidator thread to end up trapped in an infinite loop iterating across the actions attributes. Rather than using the UNSAFE variation of this iterator, use the regular version. From master commit 55f854b9d51edcbccf4ae1655855dddd1d9ec1fe. Fixes: e672ff9b4d22 ("ofproto-dpif: Restore metadata and registers on recirculation.") Signed-off-by: Joe Stringer <j...@ovn.org> Acked-by: Ben Pfaff <b...@ovn.org> commit 58e06d7ffeb9e3625a8156632c1139f4cd93d66e Author: Joe Stringer <j...@ovn.org> Date: Mon May 1 12:58:07 2017 -0700 revalidator: Fix logging of xlate_key() failure. This was being logged using xlate_strerror(), but the return code is actually an errno code. Use ovs_strerror() instead. Fixes: dd0dc9eda0e0 ("revalidator: Reuse xlate_ukey from deletion.") Signed-off-by: Joe Stringer <j...@ovn.org> Acked-by: Jarno Rajahalme <ja...@ovn.org> commit 928d21efb8e6b25e06131dd9b2a58ca7ffe7adb8 Author: Joe Stringer <j...@ovn.org> Date: Mon May 1 12:58:06 2017 -0700 revalidator: Revalidate ukeys created from flows. If there is no active ukey for a particular datapath flow, and it is dumped from the datapath, then the revalidator threads will assemble a ukey based on the datapath flow. This will allow tracking of the stats for proper attribution, and future validation of the flow. However, until now when creating the ukey in this context, the ukey's 'reval_seq' has been set to the current udpif's reval_seq. This implies that the flow has been validated against the current flow table. However, this is not true - The flow appeared in the datapath without any prior knowledge in this OVS instance so we should set up the reval_seq of the ukey to ensure that the flow will be validated during the current dump/revalidation cycle. Refer also revalidate_ukey(). Fixes: 23597df05226 ("upcall: Create ukeys in handler threads.") Signed-off-by: Joe Stringer <j...@ovn.org> Acked-by: Jarno Rajahalme <ja...@ovn.org> commit f0320ddadeec012487807cb2c71dd8da4dc9dad9 Author: Joe Stringer <j...@ovn.org> Date: Wed Apr 26 18:03:12 2017 -0700 revalidator: Improve logging for transition_ukey(). There are a few cases where more introspection into ukey transitions would be relevant for logging or assertion. Track the SOURCE_LOCATOR and thread id when states are transitioned and use these for logging. Suggested-by: Jarno Rajahalme <ja...@ovn.org> Signed-off-by: Joe Stringer <j...@ovn.org> Acked-by: Ben Pfaff <b...@ovn.org> commit 1c13aa391bb1404e763919b22f64ab2b3a350ada Author: Joe Stringer <j...@ovn.org> Date: Wed Apr 26 18:03:11 2017 -0700 revalidator: Avoid assert in transition_ukey(). There is a case where a flow is dumped from the kernel after the ukey is already transitioned into an EVICTING/EVICTED/DELETED state, and the revalidator thread attempts to shift that into UKEY_OPERATIONAL because it was able to dump the flow from the datapath. This resulted in triggering the assert in transition_ukey(). Detect this condition and skip handling the flow (as it's already on its way out). Users report: > Program terminated with signal SIGABRT, Aborted. > raise () from /lib/x86_64-linux-gnu/libc.so.6 > raise () from /lib/x86_64-linux-gnu/libc.so.6 > abort () from /lib/x86_64-linux-gnu/libc.so.6 > ovs_abort_valist > vlog_abort_valist > vlog_abort > ovs_assert_failure > transition_ukey (ukey=<optimized out>, dst=<optimized out>) > at ofproto/ofproto-dpif-upcall.c:1674 > revalidate (revalidator=0x1cb36c8) at ofproto/ofproto-dpif-upcall.c:2324 > udpif_revalidator (arg=0x1cb36c8) at ofproto/ofproto-dpif-upcall.c:901 > ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:348 > start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 > clone () from /lib/x86_64-linux-gnu/libc.so.6 VMware-BZ: #1857694 Signed-off-by: Joe Stringer <j...@ovn.org> Acked-by: Ben Pfaff <b...@ovn.org> commit bb494eca7e21d976c617c7983312ae230f847811 Author: Joe Stringer <j...@ovn.org> Date: Mon Mar 20 14:08:19 2017 -0700 ofproto-dpif-upcall: Fix flow setup/delete race. If a handler thread takes a long time to set up a set of flows, it is possible for one of the installed flows to be dumped and scheduled for deletion by a revalidator thread before the handler is able to transition the ukey into an operational state---Between the dpif_operate() above this function and the ukey lock / transition logic modified by this patch. Only transition the ukey for the flow if it wasn't already transitioned to a later state by a revalidator thread. Fixes: 54ebeff4c03d ("upcall: Track ukey states.") Reported-by: Paul Blakey <pa...@mellanox.com> Signed-off-by: Joe Stringer <j...@ovn.org> Tested-by: Paul Blakey <pa...@mellanox.com> On Mon, Nov 26, 2018 at 06:29:31AM +0000, Zhengjingzhou wrote: > Any ideas?Seemed like sevreral similar issures in earlier version, and > several patches fix them > > 发件人: Zhengjingzhou > 发送时间: 2018年11月21日 19:02 > 收件人: 'ovs-discuss@openvswitch.org' <ovs-discuss@openvswitch.org> > 主题: A coredump cause by the race condition by handler、upcall and revalidation > thread > > > We've found a coredump in a daily testcase(ovs2.7.0+dpdk), > After a deep analyze we found it coredumped after a ovs service restart(with > birdges and ports), and the revalidator and upcall threads race at the > ukey_state. However ,it seems hard to reproduce the coredump.We conclude a > possiblity > 1. handlerA deals the packet1, prepare to put a flow,wait(lock) > 2. handlerB deals another packet2(same mac with packet1), find the device has > been deleted, generate a flow > 3. handlerA meet a error(EEXIST, in function flow_put_on_pmd), prepare to > transition ukeystate to evicted(in function handle_upcalls), wait to aquire > the lock > for (i = 0; i < n_ops; i++) { > struct udpif_key *ukey = ops[i].ukey; > > if (ukey) { > ovs_mutex_lock(&ukey->mutex); > if (ops[i].dop.error) { > transition_ukey(ukey, UKEY_EVICTED); ///doubt may be here > should add a state condition? as beflow > } > else if (ukey->state < UKEY_OPERATIONAL) { > transition_ukey(ukey, UKEY_OPERATIONAL); > } > > 4. revalidate thread C transitions the ukeystate to deleted(by expiration or > other deals), releases the ukey lock > 5. handlerA accuires the ukey lock(step 3) ,finds the pre state is deleted, > abort > > which the stack is > #0 0x00007fba3737f417 in raise () from /usr/lib64/libc.so.6 > #1 0x00007fba37380b08 in abort () from /usr/lib64/libc.so.6 > #2 0x00000000005746be in ovs_abort_valist (err_no=err_no@entry=0, > format=format@entry=0x6b0028 "Invalid ukey transition %d->%d (last > transitioned from thread %u at %s)", args=args@entry=0x7fb995628ea0) > at lib/util.c:341 > #3 0x000000000057b8b0 in vlog_abort_valist (function=<optimized out>, > line=<optimized out>, module_=<optimized out>, > message=0x6b0028 "Invalid ukey transition %d->%d (last transitioned from > thread %u at %s)", args=args@entry=0x7fb995628ea0) at lib/vlog.c:1229 > #4 0x000000000057b93a in vlog_abort (function=function@entry=0x6b0d90 > <__func__.28159> "transition_ukey_at", line=line@entry=1741, > module=module@entry=0x9ac220 <this_module>, > message=message@entry=0x6b0028 "Invalid ukey transition %d->%d (last > transitioned from thread %u at %s)") at lib/vlog.c:1243 > #5 0x000000000049a85d in transition_ukey_at (ukey=ukey@entry=0x7fb97c005120, > dst=dst@entry=UKEY_EVICTED, where=where@entry=0x6b0648 > "ofproto/ofproto_dpif_upcall.c:1467") > at ofproto/ofproto_dpif_upcall.c:1739 > #6 0x000000000049da45 in handle_upcalls (n_upcalls=64, > upcalls=0x7fb99564dc70, udpif=<optimized out>) at > ofproto/ofproto_dpif_upcall.c:1467 > #7 recv_upcalls (handler=0x3339f90, handler=0x3339f90) at > ofproto/ofproto_dpif_upcall.c:887 > #8 0x000000000049dc7a in udpif_upcall_handler (arg=0x3339f90) at > ofproto/ofproto_dpif_upcall.c:783 > #9 0x0000000000546e48 in ovsthread_wrapper (aux_=<optimized out>) at > lib/ovs_thread.c:682 > #10 0x00007fba38f97e45 in start_thread () from /usr/lib64/libpthread.so.0 > #11 0x00007fba37442afd in clone () from /usr/lib64/libc.so.6 > > and the relative log > 2018-11-10T16:03:10.796061+08:00|warning|ovs-vswitchd[18219]|xlate_report_error[647]|00001|ofproto_dpif_xlate(handler16)|: > received packet on unknown port 1 while processing > udp,in_port=1,vlan_tci=0x0000,dl_src=38:4c:4f:cb:62:5f,dl_dst=38:4c:4f:cb:62:53,nw_src=199.168.1.106,nw_dst=199.168.1.32,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=53744,tp_dst=4789 > on bridge br-1 > 2018-11-10T16:03:10.796497+08:00|warning|ovs-vswitchd[18219]|xlate_report_error[647]|00002|ofproto_dpif_xlate(handler16)|: > received packet on unknown port 1 while processing > udp,in_port=1,vlan_tci=0x0000,dl_src=38:4c:4f:cb:62:5f,dl_dst=38:4c:4f:cb:62:53,nw_src=199.168.1.106,nw_dst=199.168.1.32,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=53744,tp_dst=4789 > on bridge br-1 > 2018-11-10T16:03:10.796871+08:00|warning|ovs-vswitchd[18219]|xlate_report_error[647]|00001|ofproto_dpif_xlate(handler19)|: > received packet on unknown port 1 while processing > udp,in_port=1,vlan_tci=0x0000,dl_src=38:4c:4f:cb:62:5f,dl_dst=38:4c:4f:cb:62:53,nw_src=199.168.1.106,nw_dst=199.168.1.32,nw_tos=0,nw_ecn=0,nw_ttl=64,tp_src=41088,tp_dst=4789 > on bridge br-1 > 2018-11-10T16:03:10.797261+08:00|info|ovs-vswitchd[18219]|recv_upcalls[848]|00002|ofproto_dpif_upcall(handler19)|: > received packet on unassociated datapath port 5 > 2018-11-10T16:03:10.797616+08:00|info|ovs-vswitchd[18219]|recv_upcalls[848]|00003|ofproto_dpif_upcall(handler19)|: > received packet on unassociated datapath port 5 > 2018-11-10T16:03:10.797833+08:00|info|ovs-vswitchd[18219]|recv_upcalls[848]|00004|ofproto_dpif_upcall(handler19)|: > received packet on unassociated datapath port 5 > 2018-11-10T16:03:10.798202+08:00|info|ovs-vswitchd[18219]|recv_upcalls[848]|00005|ofproto_dpif_upcall(handler19)|: > received packet on unassociated datapath port 5 > 2018-11-10T16:03:10.798434+08:00|info|ovs-vswitchd[18219]|recv_upcalls[848]|00006|ofproto_dpif_upcall(handler19)|: > received packet on unassociated datapath port 5 > 2018-11-10T16:03:10.801329+08:00|info|ovs-vswitchd[18219]|revalidate[2473]|00001|ofproto_dpif_upcall(revalidator20)|: > Unexpected ukey transition from state 4 (last transitioned from thread 19 at > ofproto/ofproto_dpif_upcall.c:1467) > 2018-11-10T16:03:10.813474+08:00|alert|ovs-vswitchd[18219]|transition_ukey_at[1741]|00003|ofproto_dpif_upcall(handler16)|: > Invalid ukey transition 5->4 (last transitioned from thread 21 at > ofproto/ofproto_dpif_upcall.c:1870) > > > _______________________________________________ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss