On Fri, Oct 20, 2017 at 9:56 AM, Paul E. McKenney
<paul...@linux.vnet.ibm.com> wrote:
> On Thu, Oct 19, 2017 at 08:26:01PM -0700, Cong Wang wrote:
>> On Wed, Oct 18, 2017 at 12:35 PM, Paul E. McKenney
>> <paul...@linux.vnet.ibm.com> wrote:
>> > 5) Keep call_rcu(), but have the RCU callback schedule a workqueue.
>> > The workqueue could then use blocking primitives, for example, acquiring
>> > RTNL.
>>
>> Yeah, this could work too but we would get one more async...
>>
>> filter delete -> call_rcu() -> schedule_work() -> action destroy
>
> True, but on the other hand you get to hold RTNL.

I can get RTNL too with converting call_rcu() to synchronize_rcu().
;)

So this turns into the question again: if we mind synchronize_rcu()
on slow paths or not?

Actually, I just tried this approach, this way makes the core tc filter
code harder to wait for flying callbacks, currently rcu_barrier() is
enough, with one more schedule_work() added we probably
need flush_workqueue()... Huh, this also means I can't use the
global workqueue so should add a new workqueue for tc filters.

Good news is I seem to make it work without adding much code.
Stay tuned. ;)

>
>> > 6) As with #5, have the RCU callback schedule a workqueue, but aggregate
>> > workqueue scheduling using a timer.  This would reduce the number of
>> > RTNL acquisitions.
>>
>> Ouch, sounds like even one more async:
>>
>> filter delete -> call_rcu() -> schedule_work() -> timer -> flush_work()
>> -> action destroy
>>
>> :-(
>
> Indeed, the price of scalability and performance is often added
> asynchronous action at a distance.  But sometimes you can have
> scalability, performance, -and- synchronous action.  Not sure that this
> is one of those cases, but perhaps someone will come up with some trick
> that we are not yet seeing.
>
> And again, one benefit you get from the added asynchrony is the ability
> to acquire RTNL.  Another is increased batching, allowing the overhead
> of acquiring RTNL to be amortized over a larger number of updates.

Understood, my point is it might not be worthy to optimize a slow path
which already has RTNL lock...


>
>> > 7) As with #5, have the RCU callback schedule a workqueue, but have each
>> > iterator accumulate a list of things removed and do call_rcu() on the
>> > list.  This is an alternative way of aggregating to reduce the number
>> > of RTNL acquisitions.
>>
>> Yeah, this seems working too.
>>
>> > There are many other ways to skin this cat.
>>
>> We still have to pick one. :) Any preference? I want to keep it as simple
>> as possible, otherwise some day I would not understand it either.
>
> I must defer to the people who actually fully understand this code.

I understand that code, just not sure about which approach I should
pick.

I will keep you Cc'ed for any further change I make.

Thanks!

Reply via email to