Hi Ilya,

Sorry for late reply and thanks for the review comments.
Yes, the issue happens only in the userspace datapath and not in kernel 
datapath. I have tested it on kernel datapath and could not see this issue for 
the same build.

Regarding the suggested solution, I still feel like going with recursive mutex 
approach is more cleaner then releasing the lock for 
ofproto_packet_out_finish().  We have the ofproto_dpif_xcache_execute() which 
may lead to race when a flow mod learn is called from another thread. May be 
it could also be possible that in certain combination we may result the crash 
in ofproto_packet_out_start().  May be Ben can share his inputs as well.

Best Regards,Anil 
Kumar

-----Original Message-----
From: Ilya Maximets <i.maxim...@samsung.com>
Sent: Thursday, 29 August, 2019 05:10 PM
To: Anil Kumar Koli <anilkuma...@altencalsoftlabs.com>; 
ovs-dev@openvswitch.org
Cc: 'Ben Pfaff' <b...@ovn.org>
Subject: Re: [ovs-dev] [PATCH v1] ofproto: Fix OVS crash when packets hit 
Openflow rules with certain combinations of nested actions

On 29.08.2019 12:52, Anil Kumar Koli wrote:
> Hi Ilya,
>
> Thanks for the review.
> Please find below the stack trace of the crash
>
>  (gdb) bt
> #0  0x00007f0a3da46c37 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x00007f0a3da4a028 in __GI_abort () at abort.c:89
> #2  0x000000000094262e in ovs_abort_valist (err_no=<optimized out>,
> format=<optimized out>, args=args@entry=0x7fffaf59e308) at
> lib/util.c:419
> #3  0x00000000009426b7 in ovs_abort (err_no=<optimized out>,
> format=format@entry=0xb0289f "%s: pthread_%s_%s failed") at
> lib/util.c:411
> #4  0x00000000009104f9 in ovs_mutex_lock_at (l_=l_@entry=0xe47cc0
> <ofproto_mutex>, where=where@entry=0xad4199 "ofproto/ofproto.c:5391")
> at lib/ovs-thread.c:75
> #5  0x000000000083fb1f in ofproto_flow_mod_learn
> (ofm=ofm@entry=0x7fffaf59e620, keep_ref=<optimized out>,
> limit=<optimized out>, below_limitp=below_limitp@entry=0x7fffaf59e540)
> at ofproto/ofproto.c:5391
> #6  0x000000000085e5d0 in xlate_learn_action
> (ctx=ctx@entry=0x7fffaf5a02e0, learn=learn@entry=0x2b18308) at
> ofproto/ofproto-dpif-xlate.c:5378
> #7  0x00000000008615c3 in do_xlate_actions (ofpacts=<optimized out>,
> ofpacts_len=<optimized out>, ctx=<optimized out>,
> is_last_action=<optimized out>, group_bucket_action=<optimized out>)
> at ofproto/ofproto-dpif-xlate.c:6893
> #8  0x000000000085edba in xlate_recursively (actions_xlator=0x860730
> <do_xlate_actions>, is_last_action=false, deepens=<optimized out>,
> rule=0x2b8e440, ctx=0x7fffaf5a02e0) at
> ofproto/ofproto-dpif-xlate.c:4233
> #9  xlate_table_action (ctx=0x7fffaf5a02e0, in_port=<optimized out>, 
> table_id=<optimized out>, may_packet_in=<optimized out>, 
> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, 
> is_last_action=false,
>     xlator=0x860730 <do_xlate_actions>) at
> ofproto/ofproto-dpif-xlate.c:4361
> #10 0x0000000000861aaa in xlate_ofpact_resubmit (resubmit=<optimized
> out>, resubmit=<optimized out>, resubmit=<optimized out>,
> is_last_action=<optimized out>, ctx=0x7fffaf5a02e0) at
> ofproto/ofproto-dpif-xlate.c:4672
> #11 do_xlate_actions (ofpacts=ofpacts@entry=0x2b654e8, 
> ofpacts_len=ofpacts_len@entry=48, ctx=ctx@entry=0x7fffaf5a02e0, 
> is_last_action=is_last_action@entry=true, 
> group_bucket_action=group_bucket_action@entry=false)
>     at ofproto/ofproto-dpif-xlate.c:6773
> #12 0x00000000008696d6 in xlate_actions (xin=xin@entry=0x7fffaf5a0da0,
> xout=xout@entry=0x7fffaf5a11b0) at ofproto/ofproto-dpif-xlate.c:7570
> #13 0x0000000000859b56 in upcall_xlate (wc=0x7fffaf5a23f0,
> odp_actions=0x7fffaf5a1bc0, upcall=0x7fffaf5a1150, udpif=0x2b10670) at
> ofproto/ofproto-dpif-upcall.c:1197
> #14 process_upcall (udpif=udpif@entry=0x2b10670,
> upcall=upcall@entry=0x7fffaf5a1150,
> odp_actions=odp_actions@entry=0x7fffaf5a1bc0,
> wc=wc@entry=0x7fffaf5a23f0) at ofproto/ofproto-dpif-upcall.c:1413
> #15 0x000000000085a9eb in upcall_cb (packet=<optimized out>, 
> flow=0x7fffaf5a2150, ufid=<optimized out>, pmd_id=<optimized out>, 
> type=<optimized out>, userdata=<optimized out>, actions=0x7fffaf5a1bc0, 
> wc=0x7fffaf5a23f0,
>     put_actions=0x7fffaf5a1c00, aux=0x2b10670) at
> ofproto/ofproto-dpif-upcall.c:1315
> #16 0x000000000088419c in dp_netdev_upcall (pmd=pmd@entry=0x7f0a35f34010, 
> packet_=packet_@entry=0x2b68930, flow=flow@entry=0x7fffaf5a2150, 
> wc=wc@entry=0x7fffaf5a23f0, ufid=ufid@entry=0x7fffaf5a1ba0, 
> type=type@entry=DPIF_UC_MISS,
>     userdata=userdata@entry=0x0, actions=actions@entry=0x7fffaf5a1bc0,
> put_actions=put_actions@entry=0x7fffaf5a1c00) at
> lib/dpif-netdev.c:6236
> #17 0x000000000088d053 in handle_packet_upcall
> (put_actions=0x7fffaf5a1c00, actions=0x7fffaf5a1bc0,
> key=0x7fffaf5a3080, packet=0x2b68930, pmd=0x7f0a35f34010) at
> lib/dpif-netdev.c:6591
> #18 fast_path_processing (pmd=pmd@entry=0x7f0a35f34010, 
> packets_=packets_@entry=0x7fffaf5a35f0, keys=keys@entry=0x7fffaf5a3040, 
> flow_map=flow_map@entry=0x7fffaf5a2ef0, 
> index_map=index_map@entry=0x7fffaf5a2ee0 "",
>     in_port=<optimized out>) at lib/dpif-netdev.c:6709
> #19 0x000000000088e9ce in dp_netdev_input__
> (pmd=pmd@entry=0x7f0a35f34010, packets=packets@entry=0x7fffaf5a35f0,
> md_is_valid=md_is_valid@entry=true, port_no=port_no@entry=0) at
> lib/dpif-netdev.c:6797
> #20 0x0000000000890710 in dp_netdev_recirculate
> (packets=0x7fffaf5a35f0, pmd=0x7f0a35f34010) at lib/dpif-netdev.c:6842
> #21 dp_execute_cb (aux_=aux_@entry=0x7fffaf5a3ea0,
> packets_=0x7fffaf5a35f0, packets_@entry=0x7fffaf5a3eb0,
> a=a@entry=0x2b1c7d4, should_steal=<optimized out>) at
> lib/dpif-netdev.c:7158
> #22 0x00000000008c3048 in odp_execute_actions (dp=dp@entry=0x7fffaf5a3ea0, 
> batch=batch@entry=0x7fffaf5a3eb0, steal=steal@entry=false, 
> actions=<optimized out>, actions_len=<optimized out>,
>     dp_execute_action=dp_execute_action@entry=0x8904f0
> <dp_execute_cb>) at lib/odp-execute.c:794
> #23 0x0000000000887bbd in dp_netdev_execute_actions
> (actions_len=<optimized out>, actions=<optimized out>, flow=<optimized
> out>, should_steal=false, packets=0x7fffaf5a3eb0, pmd=0x7f0a35f34010)
> at lib/dpif-netdev.c:7332
> #24 dpif_netdev_execute (dpif=dpif@entry=0x2b105d0,
> execute=execute@entry=0x7fffaf5a40f8) at lib/dpif-netdev.c:3725
> #25 0x000000000088dccc in dpif_netdev_operate (dpif=0x2b105d0,
> ops=0x7fffaf5a40e8, n_ops=1, offload_type=<optimized out>) at
> lib/dpif-netdev.c:3756
> #26 0x0000000000895b07 in dpif_operate (dpif=0x2b105d0,
> ops=ops@entry=0x7fffaf5a40e8, n_ops=n_ops@entry=1,
> offload_type=offload_type@entry=DPIF_OFFLOAD_AUTO) at lib/dpif.c:1367
> #27 0x0000000000896233 in dpif_execute (dpif=<optimized out>,
> execute=execute@entry=0x7fffaf5a4180) at lib/dpif.c:1321
> #28 0x00000000008459eb in packet_execute (ofproto_=0x2b3ec60,
> opo=0x7fffaf5a4200) at ofproto/ofproto-dpif.c:4760
> #29 0x000000000083a9bb in ofproto_packet_out_finish
> (opo=0x7fffaf5a4200, ofproto=0x2b3ec60) at ofproto/ofproto.c:3594
> #30 handle_packet_out (ofconn=ofconn@entry=0x2b23d50,
> oh=oh@entry=0x2b89470) at ofproto/ofproto.c:3635
> #31 0x000000000083f06f in handle_single_part_openflow
> (type=OFPTYPE_PACKET_OUT, oh=0x2b89470, ofconn=0x2b23d50) at
> ofproto/ofproto.c:8400
> #32 handle_openflow (ofconn=ofconn@entry=0x2b23d50,
> msgs=msgs@entry=0x7fffaf5a5600) at ofproto/ofproto.c:8587
> #33 0x000000000086ff6b in ofconn_run (handle_openflow=0x83e1f0
> <handle_openflow>, ofconn=0x2b23d50) at ofproto/connmgr.c:1318
> #34 connmgr_run (mgr=0x2a26820,
> handle_openflow=handle_openflow@entry=0x83e1f0 <handle_openflow>) at
> ofproto/connmgr.c:355
> #35 0x0000000000838706 in ofproto_run (p=0x2b3ec60) at
> ofproto/ofproto.c:1826
> #36 0x0000000000826234 in bridge_run__ () at vswitchd/bridge.c:2965
> #37 0x000000000082bae8 in bridge_run () at vswitchd/bridge.c:3023
> #38 0x0000000000422a05 in main (argc=11, argv=0x7fffaf5a5a28) at
> vswitchd/ovs-vswitchd.c:127
> (gdb) f 5
> #5  0x000000000083fb1f in ofproto_flow_mod_learn 
> (ofm=ofm@entry=0x7fffaf59e620, keep_ref=<optimized out>, limit=<optimized 
> out>, below_limitp=below_limitp@entry=0x7fffaf59e540) at 
> ofproto/ofproto.c:5391
> 5391    ofproto/ofproto.c: No such file or directory.
> (gdb) p ofproto_mutex
> $2 = {lock = {__data = {__lock = 1, __count = 0, __owner = 67357, __nusers = 
> 1, __kind = 2, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 
> 0x0}},
>     __size =
> "\001\000\000\000\000\000\000\000\035\a\001\000\001\000\000\000\002",
> '\000' <repeats 22 times>, __align = 1}, where = 0xad3dd6
> "ofproto/ofproto.c:3631"}
>
> Please find the discussion link regarding this crash with community.
> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-April/048494.h
> tml

Thanks for the trace and link. It's very useful.
You may, probably, add above link to the commit message via 'Reported-at' tag.

Regarding the issue.
Is it correct that this issue only happens for userspace datapath?
In case of userspace datapath execution of the packet actions happens in the 
same thread that handles the ofproto packet-out request. This is seen in above 
stack trace.  But in case of kernel datapath,
packet_execute() sends packet to the kernel via netlink and further processing 
will happen in a separate thread that handles upcall (if any).

If above statements are correct, there could be another way to fix the issue.

Ben, you said previously:

> How do we fix this?
>
> * We can't get rid of ofproto_mutex in do_bundle_commit(), or drop it
>   temporarily around flow translation (i.e. the call to
>   ofproto_packet_out_start()), because it might need to revert all the
>   flow table changes and dropping the mutex would allow other threads to
>   race in and make conflicting changes

But it seems that the issue happens on ofproto_packet_out_finish() and not on 
ofproto_packet_out_start(). So, the flow translation should be OK under the 
ofproto_mutex and revert is still possible under the mutex.
The only thing we need to take out of the mutex is real action execution by 
the datapath triggered by ofproto_packet_out_finish(). Callers never check the 
status of this function so it should be not so hard.

So, possible solution:
* Move ofproto_packet_out_finish() out of ofproto_mutex in all the callers:
  * It's easy for handle_packet_out()
  * For do_bundle_commit() we could temporary store all the ofproto_packet_out
    entities and finish them later.

Am I missing something?  Is there reason for ofproto_packet_out_finish() to 
require ofproto_mutex?

Ben, Anil, what do you think?

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to