Hey, Oleg. On Thu, Mar 22, 2018 at 12:24:12PM +0100, Oleg Nesterov wrote: > OK. And I agree that the usage of WORK_STRUCT_PENDING_BIT in queue_rcu_work() > is > fine. If nothing else the kernel won't crash if you call queue_rcu_work() > twice. > > But why flush_rcu_work() can't simply do flush_work() ? > > If WORK_STRUCT_PENDING_BIT was set by us (rcu_work_rcufn() succeeded) we do > not > need rcu_barrier(). Why should we care about other rcu callbacks? > > If rcu_work_rcufn() fails and someone else sets PENDING, how this > rcu_barrier() > can help? We didn't even do call_rcu() in this case. > > In short. Once flush_work() returns we know that rcu callback which queued > this > work is finished. It doesn't matter if it was fired by us or not. And if it > was > not fired by us, then rcu_barrier() doesn't imply a grace period anyway.
flush_*work() guarantees to wait for the completion of the latest instance of the work item which was visible to the caller. We can't guarantee that w/o rcu_barrier(). > > We can try to > > make it more specialized but then flush_rcu_work()'s behavior would > > have to different too and it gets confusing quickly. Unless there are > > overriding reasons to deviate, I'd like to keep it consistent. > > Perhaps this all is consistent, but I fail to understand this API :/ > > And again, at least for fs/aio.c it doesn't offer too much but sub-optimal > compared to call_rcu() + schedule_work() by hand. Sure, this isn't about performance. It's about making code less painful on the eyes. If performance matters, we sure can hand-craft things, which doesn't seem to be the case, right? Thanks. -- tejun