e.On Thu, Oct 26, 2017 at 9:24 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote: > Recently, the RCU callbacks used in TC filters and TC actions keep > drawing my attention, they introduce at least 4 race condition bugs: <snip> > As suggested by Paul, we could defer the work to a workqueue and > gain the permission of holding RTNL again without any performance > impact, however, in tcf_block_put() we could have a deadlock when > flushing workqueue while hodling RTNL lock, the trick here is to > defer the work itself in workqueue and make it queued after all > other works so that we keep the same ordering to avoid any > use-after-free. Please see the first patch for details.
Cong, I don't believe the problem's been resolved just yet.... I have a new kernel, compiled just today and I'm still tripping over a kernel bug in this scenario when I run Chris' new test case. I'm doing this on a machine where I don't have a spare device to use on the run. Instead I created a veth pair that will have one end migrated into the container. The bug isn't consistent. I'm running into it anywhere between one and four runs of the d052 test case. Steps to reproduce: sudo ip li add foo type veth sudo ./tdc.py -d foo -c flower [repeat until kernel bug encountered]