Hi, I've sent a RFC patch trying to resolve this use-after-free issue. just to collect the comments.
Here is some thought on this issues: first, we cannot wait until ofproto ref reduces to 1 to do the real destruct, as this leads to the "egg and chicken" situation: if we do not destroy ofproto we cannot decrease ref, if not decrease ref, ofproto not being destroyed. Second, we cannot put close_dpif_backer into the deferred part of ofproto desruction, as it will call udpif_destroy which stop revalidator threads that might hold some ref to ofproto in its xlate cache. So I am guessing maybe we can introduce a rcu barrier just like kernels and wait for all the meter were destroyed and perform the close-dpif-backer. NOTE: we cannot do this for rule destruction, because it's difficult to find out the precision timing that all the rules have been destroyed, as rule destruction can endure for 2 grace periods. Peng He <xnhp0...@gmail.com> 于2022年3月10日周四 08:57写道: > or what we really need here is not ofproto but backer refcount inc. > I see backer itself has refcount. > > Ilya Maximets <i.maxim...@ovn.org> 于2022年3月10日周四 01:44写道: > >> On 3/9/22 18:23, David Marchand wrote: >> > On Tue, Mar 8, 2022 at 8:44 PM Ilya Maximets <i.maxim...@ovn.org> >> wrote: >> >> >> >> On 3/8/22 17:39, David Marchand wrote: >> >>> On Mon, Mar 7, 2022 at 11:28 PM Ilya Maximets <i.maxim...@ovn.org> >> wrote: >> >>>> >> >>>> 'ofproto' should still be in place while freeing meter id, but this >> is >> >>>> not ensured in any way leading to the use-after-free: >> >>>> >> >>>> ==13702==ERROR: AddressSanitizer: heap-use-after-free >> >>>> READ of size 8 at 0x614000000c50 thread T3 (urcu2) >> >>>> 0 0x536656 in free_meter_id ofproto/ofproto-dpif.c:6780:37 >> >>>> 1 0x7356d0 in ovsrcu_call_postponed lib/ovs-rcu.c:346:13 >> >>>> 2 0x735b21 in ovsrcu_postpone_thread lib/ovs-rcu.c:362:14 >> >>>> 3 0x73a30c in ovsthread_wrapper lib/ovs-thread.c:422:12 >> >>>> 4 0x7f7b1a7876da in start_thread >> >>>> 5 0x7f7b19d0671e in clone >> >>>> >> >>>> 0x614000000c50 is located 16 bytes inside of 400-byte region >> >>>> freed by thread T0 here: >> >>>> 0 0x49640d in free (vswitchd/ovs-vswitchd+0x49640d) >> >>>> 1 0x517e38 in destruct ofproto/ofproto-dpif.c:1851:5 >> >>>> 2 0x4f0bb0 in ofproto_destroy ofproto/ofproto.c:1773:5 >> >>>> 3 0x4c71e4 in bridge_destroy vswitchd/bridge.c:3608:9 >> >>>> 4 0x4c6f4a in bridge_exit vswitchd/bridge.c:553:9 >> >>>> 5 0x4e109a in main vswitchd/ovs-vswitchd.c:146:5 >> >>>> 6 0x7f7b19c06bf6 in __libc_start_main >> >>>> >> >>>> Since introduction of the ofproto refcount this issue can be fixed >> >>>> by taking the 'ofproto' reference before postponing the operation. >> >>>> This will guarantee that 'ofproto' and 'baker' are not destroyed >> until >> >>>> all ids are freed. >> >>> >> >>> We prevent ofproto from getting shot thanks to refcount, but I don't >> >>> understand what prevents backer content from being uninitialised (like >> >>> destroying the meter id pool) in close_dpif_backer(). >> >> >> >> Successfully opened backer is closed only as part of the destruction >> >> of the last ofproto instance that uses it. So, if we're holding the >> > >> > Yes, I think that's what happens here. >> > We are closing the last ofproto instance: backer gets shot in >> > close_dpif_backer()/destroy(). >> > And there is a rcu postponed free_meter_id() with reference to >> ofproto->backer. >> >> Ugh. You're right. ofproto_class->destruct() is called unconditionally >> regardless of ofproto still being referenced. That doesn't sound right. >> _______________________________________________ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > > > -- > hepeng > -- hepeng _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev