On Fri, Oct 20, 2017 at 9:56 AM, Paul E. McKenney <paul...@linux.vnet.ibm.com> wrote: > On Thu, Oct 19, 2017 at 08:26:01PM -0700, Cong Wang wrote: >> On Wed, Oct 18, 2017 at 12:35 PM, Paul E. McKenney >> <paul...@linux.vnet.ibm.com> wrote: >> > 5) Keep call_rcu(), but have the RCU callback schedule a workqueue. >> > The workqueue could then use blocking primitives, for example, acquiring >> > RTNL. >> >> Yeah, this could work too but we would get one more async... >> >> filter delete -> call_rcu() -> schedule_work() -> action destroy > > True, but on the other hand you get to hold RTNL.
I can get RTNL too with converting call_rcu() to synchronize_rcu(). ;) So this turns into the question again: if we mind synchronize_rcu() on slow paths or not? Actually, I just tried this approach, this way makes the core tc filter code harder to wait for flying callbacks, currently rcu_barrier() is enough, with one more schedule_work() added we probably need flush_workqueue()... Huh, this also means I can't use the global workqueue so should add a new workqueue for tc filters. Good news is I seem to make it work without adding much code. Stay tuned. ;) > >> > 6) As with #5, have the RCU callback schedule a workqueue, but aggregate >> > workqueue scheduling using a timer. This would reduce the number of >> > RTNL acquisitions. >> >> Ouch, sounds like even one more async: >> >> filter delete -> call_rcu() -> schedule_work() -> timer -> flush_work() >> -> action destroy >> >> :-( > > Indeed, the price of scalability and performance is often added > asynchronous action at a distance. But sometimes you can have > scalability, performance, -and- synchronous action. Not sure that this > is one of those cases, but perhaps someone will come up with some trick > that we are not yet seeing. > > And again, one benefit you get from the added asynchrony is the ability > to acquire RTNL. Another is increased batching, allowing the overhead > of acquiring RTNL to be amortized over a larger number of updates. Understood, my point is it might not be worthy to optimize a slow path which already has RTNL lock... > >> > 7) As with #5, have the RCU callback schedule a workqueue, but have each >> > iterator accumulate a list of things removed and do call_rcu() on the >> > list. This is an alternative way of aggregating to reduce the number >> > of RTNL acquisitions. >> >> Yeah, this seems working too. >> >> > There are many other ways to skin this cat. >> >> We still have to pick one. :) Any preference? I want to keep it as simple >> as possible, otherwise some day I would not understand it either. > > I must defer to the people who actually fully understand this code. I understand that code, just not sure about which approach I should pick. I will keep you Cc'ed for any further change I make. Thanks!