On 2017-09-22 11:43:14 [-0700], Paul E. McKenney wrote: > On Fri, Sep 22, 2017 at 05:28:04PM +0200, Sebastian Andrzej Siewior wrote: > > The current check via srcu_online is slightly racy because after looking > > at srcu_online there could be an interrupt that interrupted us long > > enough until the CPU we checked against went offline. > > But in that case, wouldn't the interrupt block the synchronize_sched() > later in the offline sequence?
What I meant is: CPU0 CPU1 preempt_disable(); if (READ_ONCE(per_cpu(srcu_online, 1))) *interrupt* WRITE_ONCE(per_cpu(srcu_online, cpu), false); and CPU is offnline ret = queue_delayed_work_on(1, wq, dwork, delay); is this possible or are there a safety belt for this? > More to the point, are you actually seeing this failure, or is this > a theoretical bug? I need to get rid of the preempt_disable() section in which queue_delayed_work*() is invoked for RT. > > An alternative would be to hold the hotplug rwsem (so the CPUs don't > > change their state) and then check based on cpu_online() if we queue it > > on a specific CPU or not. queue_work_on() itself can handle if something > > is enqueued on an offline CPU but a timer which is enqueued on an offline > > CPU won't fire until the CPU is back online. > > > > I am not sure if the removal in rcu_init() is okay or not. I assume that > > SRCU won't enqueue a work item before SRCU is up and ready. > > Another alternative would be to disable preemption across the check and > the call to queue_delayed_work_on(). you need to ensure the *other* CPU won't in the middle of checking its status. preempt_disable() won't do this on the other CPU. > Yet another alternative would be to have an SRCU-specific per-CPU lock > that is acquired across the setting and clearing of srcu_online, > and also across the check and the call to queue_delayed_work_on(). > This last would be more consistent with a desire to remove the > synchronize_sched() from the offline sequence. > > Or am I missing something here? The perCPU lock should work. And cpus_read_lock() is basically that except that srcu_online_cpu() is not holding it but the CPU-HP code. So you want keep things as-is or do you prefer a per-CPU rwsem instead? Sebastian