Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
On Thu, Oct 15, 2020 at 12:12:35PM +0200, Peter Zijlstra wrote: > On Thu, Oct 15, 2020 at 01:40:53AM +0200, Frederic Weisbecker wrote: > > > re tick_nohz_task_switch() being placed wrong, it should probably be > > > placed before finish_lock_switch(). Something like so. > > > > > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > index cf044580683c..5c92c959824f 100644 > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -4084,6 +4084,7 @@ static struct rq *finish_task_switch(struct > > > task_struct *prev) > > > vtime_task_switch(prev); > > > perf_event_task_sched_in(prev, current); > > > finish_task(prev); > > > + tick_nohz_task_switch(); > > > finish_lock_switch(rq); > > > finish_arch_post_lock_switch(); > > > kcov_finish_switch(current); > > > @@ -4121,7 +4122,6 @@ static struct rq *finish_task_switch(struct > > > task_struct *prev) > > > put_task_struct_rcu_user(prev); > > > } > > > > > > - tick_nohz_task_switch(); > > > > IIRC, we wanted to keep it outside rq lock because it shouldn't it... > > But now you've created a window with IRQs on and cause additional IRQ > state changes. > > If you're really worried about rq->lock, I suppose we can do: > > rq_unlock(rq->lock); > tick_nohz_task_switch(); > local_irq_enable(); > > (much like we do at the beginning of __schedule for RCU) Right. Well I'm not that worried about rq->lock though. If you're ok with it I can just move it before finish_lock_switch(). Thanks.
Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
On Tue, Oct 20, 2020 at 03:52:45PM -0300, Marcelo Tosatti wrote: > On Thu, Oct 15, 2020 at 01:40:53AM +0200, Frederic Weisbecker wrote: > > Alternatively, we could rely on p->on_rq which is set to TASK_ON_RQ_QUEUED > > at wake up time, prior to the schedule() full barrier. Of course that > > doesn't > > mean that the task is actually the one running on the CPU but it's a good > > sign, > > considering that we are running in nohz_full mode and it's usually optimized > > for single task mode. > > Unfortunately that would require exporting p->on_rq which is internal to > the scheduler, locklessly. > > (can surely do that if you prefer!) May be: bool task_on_rq(struct task_struct *p) ? Thanks.
Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
On Thu, Oct 15, 2020 at 01:40:53AM +0200, Frederic Weisbecker wrote: > On Wed, Oct 14, 2020 at 10:33:21AM +0200, Peter Zijlstra wrote: > > On Tue, Oct 13, 2020 at 02:13:28PM -0300, Marcelo Tosatti wrote: > > > > > > Yes but if the task isn't running, run_posix_cpu_timers() doesn't have > > > > anything to elapse. So indeed we can spare the IPI if the task is not > > > > running. Provided ordering makes sure that the task sees the new > > > > dependency > > > > when it schedules in of course. > > > > > > True. > > > > > > * p->on_cpu <- { 0, 1 }: > > > * > > > * is set by prepare_task() and cleared by finish_task() such that it > > > will be > > > * set before p is scheduled-in and cleared after p is scheduled-out, > > > both > > > * under rq->lock. Non-zero indicates the task is running on its CPU. > > > > > > > > > CPU-0 (tick_set_dep)CPU-1 (task switch) > > > > > > STORE p->tick_dep_mask > > > smp_mb() (atomic_fetch_or()) > > > LOAD p->on_cpu > > > > > > > > > context_switch(prev, next) > > > STORE next->on_cpu = 1 > > > ... [*] > > > > > > LOAD current->tick_dep_mask > > > > > > > That load is in tick_nohz_task_switch() right? (which BTW is placed > > completely wrong) You could easily do something like the below I > > suppose. > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > index 81632cd5e3b7..2a5fafe66bb0 100644 > > --- a/kernel/time/tick-sched.c > > +++ b/kernel/time/tick-sched.c > > @@ -410,6 +410,14 @@ void __tick_nohz_task_switch(void) > > ts = this_cpu_ptr(&tick_cpu_sched); > > > > if (ts->tick_stopped) { > > + /* > > +* tick_set_dep() (this) > > +* > > +* STORE p->tick_dep_mask STORE p->on_cpu > > +* smp_mb() smp_mb() > > +* LOAD p->on_cpu LOAD p->tick_dep_mask > > +*/ > > + smp_mb(); > > if (atomic_read(¤t->tick_dep_mask) || > > atomic_read(¤t->signal->tick_dep_mask)) > > tick_nohz_full_kick(); > > It would then need to be unconditional (whatever value of ts->tick_stopped). > Assuming the tick isn't stopped, we may well have an interrupt firing right > after schedule() which doesn't see the new value of tick_dep_map. > > Alternatively, we could rely on p->on_rq which is set to TASK_ON_RQ_QUEUED > at wake up time, prior to the schedule() full barrier. Of course that doesn't > mean that the task is actually the one running on the CPU but it's a good > sign, > considering that we are running in nohz_full mode and it's usually optimized > for single task mode. Unfortunately that would require exporting p->on_rq which is internal to the scheduler, locklessly. (can surely do that if you prefer!) > > Also setting a remote task's tick dependency is only used by posix cpu timer > in case the user has the bad taste to enqueue on a task running in nohz_full > mode. It shouldn't deserve an unconditional full barrier in the schedule path. > > If the target is current, as is used by RCU, I guess we can keep a special > treatment. To answer PeterZ's original question: "So we need to kick the CPU unconditionally, or only when the task is actually running? AFAICT we only care about current->tick_dep_mask." If there is a task sharing signals, executing on a remote CPU, yes that remote CPU should be awakened. Could skip the IPI if the remote process is not running, however for the purposes of low latency isolated processes, this optimization is not necessary. So the last posted patchset is good enough for isolated low latency processes. Do you guys want me to do something or can you take the series as it is? > > re tick_nohz_task_switch() being placed wrong, it should probably be > > placed before finish_lock_switch(). Something like so. > > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index cf044580683c..5c92c959824f 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -4084,6 +4084,7 @@ static struct rq *finish_task_switch(struct > > task_struct *prev) > > vtime_task_switch(prev); > > perf_event_task_sched_in(prev, current); > > finish_task(prev); > > + tick_nohz_task_switch(); > > finish_lock_switch(rq); > > finish_arch_post_lock_switch(); > > kcov_finish_switch(current); > > @@ -4121,7 +4122,6 @@ static struct rq *finish_task_switch(struct > > task_struct *prev) > > put_task_struct_rcu_user(prev); > > } > > > > - tick_nohz_task_switch(); > > IIRC, we wanted to keep it outside rq lock because it shouldn't it...
Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
On Thu, Oct 15, 2020 at 01:40:53AM +0200, Frederic Weisbecker wrote: > > re tick_nohz_task_switch() being placed wrong, it should probably be > > placed before finish_lock_switch(). Something like so. > > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index cf044580683c..5c92c959824f 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -4084,6 +4084,7 @@ static struct rq *finish_task_switch(struct > > task_struct *prev) > > vtime_task_switch(prev); > > perf_event_task_sched_in(prev, current); > > finish_task(prev); > > + tick_nohz_task_switch(); > > finish_lock_switch(rq); > > finish_arch_post_lock_switch(); > > kcov_finish_switch(current); > > @@ -4121,7 +4122,6 @@ static struct rq *finish_task_switch(struct > > task_struct *prev) > > put_task_struct_rcu_user(prev); > > } > > > > - tick_nohz_task_switch(); > > IIRC, we wanted to keep it outside rq lock because it shouldn't it... But now you've created a window with IRQs on and cause additional IRQ state changes. If you're really worried about rq->lock, I suppose we can do: rq_unlock(rq->lock); tick_nohz_task_switch(); local_irq_enable(); (much like we do at the beginning of __schedule for RCU)
Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
On Wed, Oct 14, 2020 at 10:33:21AM +0200, Peter Zijlstra wrote: > On Tue, Oct 13, 2020 at 02:13:28PM -0300, Marcelo Tosatti wrote: > > > > Yes but if the task isn't running, run_posix_cpu_timers() doesn't have > > > anything to elapse. So indeed we can spare the IPI if the task is not > > > running. Provided ordering makes sure that the task sees the new > > > dependency > > > when it schedules in of course. > > > > True. > > > > * p->on_cpu <- { 0, 1 }: > > * > > * is set by prepare_task() and cleared by finish_task() such that it > > will be > > * set before p is scheduled-in and cleared after p is scheduled-out, both > > * under rq->lock. Non-zero indicates the task is running on its CPU. > > > > > > CPU-0 (tick_set_dep)CPU-1 (task switch) > > > > STORE p->tick_dep_mask > > smp_mb() (atomic_fetch_or()) > > LOAD p->on_cpu > > > > > > context_switch(prev, next) > > STORE next->on_cpu = 1 > > ... [*] > > > > LOAD current->tick_dep_mask > > > > That load is in tick_nohz_task_switch() right? (which BTW is placed > completely wrong) You could easily do something like the below I > suppose. > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > index 81632cd5e3b7..2a5fafe66bb0 100644 > --- a/kernel/time/tick-sched.c > +++ b/kernel/time/tick-sched.c > @@ -410,6 +410,14 @@ void __tick_nohz_task_switch(void) > ts = this_cpu_ptr(&tick_cpu_sched); > > if (ts->tick_stopped) { > + /* > + * tick_set_dep() (this) > + * > + * STORE p->tick_dep_mask STORE p->on_cpu > + * smp_mb() smp_mb() > + * LOAD p->on_cpu LOAD p->tick_dep_mask > + */ > + smp_mb(); > if (atomic_read(¤t->tick_dep_mask) || > atomic_read(¤t->signal->tick_dep_mask)) > tick_nohz_full_kick(); It would then need to be unconditional (whatever value of ts->tick_stopped). Assuming the tick isn't stopped, we may well have an interrupt firing right after schedule() which doesn't see the new value of tick_dep_map. Alternatively, we could rely on p->on_rq which is set to TASK_ON_RQ_QUEUED at wake up time, prior to the schedule() full barrier. Of course that doesn't mean that the task is actually the one running on the CPU but it's a good sign, considering that we are running in nohz_full mode and it's usually optimized for single task mode. Also setting a remote task's tick dependency is only used by posix cpu timer in case the user has the bad taste to enqueue on a task running in nohz_full mode. It shouldn't deserve an unconditional full barrier in the schedule path. If the target is current, as is used by RCU, I guess we can keep a special treatment. > re tick_nohz_task_switch() being placed wrong, it should probably be > placed before finish_lock_switch(). Something like so. > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index cf044580683c..5c92c959824f 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -4084,6 +4084,7 @@ static struct rq *finish_task_switch(struct task_struct > *prev) > vtime_task_switch(prev); > perf_event_task_sched_in(prev, current); > finish_task(prev); > + tick_nohz_task_switch(); > finish_lock_switch(rq); > finish_arch_post_lock_switch(); > kcov_finish_switch(current); > @@ -4121,7 +4122,6 @@ static struct rq *finish_task_switch(struct task_struct > *prev) > put_task_struct_rcu_user(prev); > } > > - tick_nohz_task_switch(); IIRC, we wanted to keep it outside rq lock because it shouldn't it...
Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
On Tue, Oct 13, 2020 at 02:13:28PM -0300, Marcelo Tosatti wrote: > > Yes but if the task isn't running, run_posix_cpu_timers() doesn't have > > anything to elapse. So indeed we can spare the IPI if the task is not > > running. Provided ordering makes sure that the task sees the new dependency > > when it schedules in of course. > > True. > > * p->on_cpu <- { 0, 1 }: > * > * is set by prepare_task() and cleared by finish_task() such that it will > be > * set before p is scheduled-in and cleared after p is scheduled-out, both > * under rq->lock. Non-zero indicates the task is running on its CPU. > > > CPU-0 (tick_set_dep)CPU-1 (task switch) > > STORE p->tick_dep_mask > smp_mb() (atomic_fetch_or()) > LOAD p->on_cpu > > > context_switch(prev, next) > STORE next->on_cpu = 1 > ... [*] > > LOAD current->tick_dep_mask > That load is in tick_nohz_task_switch() right? (which BTW is placed completely wrong) You could easily do something like the below I suppose. diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 81632cd5e3b7..2a5fafe66bb0 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -410,6 +410,14 @@ void __tick_nohz_task_switch(void) ts = this_cpu_ptr(&tick_cpu_sched); if (ts->tick_stopped) { + /* +* tick_set_dep() (this) +* +* STORE p->tick_dep_mask STORE p->on_cpu +* smp_mb() smp_mb() +* LOAD p->on_cpu LOAD p->tick_dep_mask +*/ + smp_mb(); if (atomic_read(¤t->tick_dep_mask) || atomic_read(¤t->signal->tick_dep_mask)) tick_nohz_full_kick(); re tick_nohz_task_switch() being placed wrong, it should probably be placed before finish_lock_switch(). Something like so. diff --git a/kernel/sched/core.c b/kernel/sched/core.c index cf044580683c..5c92c959824f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4084,6 +4084,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) vtime_task_switch(prev); perf_event_task_sched_in(prev, current); finish_task(prev); + tick_nohz_task_switch(); finish_lock_switch(rq); finish_arch_post_lock_switch(); kcov_finish_switch(current); @@ -4121,7 +4122,6 @@ static struct rq *finish_task_switch(struct task_struct *prev) put_task_struct_rcu_user(prev); } - tick_nohz_task_switch(); return rq; } diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 81632cd5e3b7..c8bddc9fb792 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -402,10 +402,8 @@ void __tick_nohz_task_switch(void) unsigned long flags; struct tick_sched *ts; - local_irq_save(flags); - if (!tick_nohz_full_cpu(smp_processor_id())) - goto out; + return; ts = this_cpu_ptr(&tick_cpu_sched); @@ -414,8 +412,6 @@ void __tick_nohz_task_switch(void) atomic_read(¤t->signal->tick_dep_mask)) tick_nohz_full_kick(); } -out: - local_irq_restore(flags); } /* Get the boot-time nohz CPU list from the kernel parameters. */
Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
On Thu, Oct 08, 2020 at 09:54:44PM +0200, Frederic Weisbecker wrote: > On Thu, Oct 08, 2020 at 02:54:09PM -0300, Marcelo Tosatti wrote: > > On Thu, Oct 08, 2020 at 02:22:56PM +0200, Peter Zijlstra wrote: > > > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > > > > When adding a tick dependency to a task, its necessary to > > > > wakeup the CPU where the task resides to reevaluate tick > > > > dependencies on that CPU. > > > > > > > > However the current code wakes up all nohz_full CPUs, which > > > > is unnecessary. > > > > > > > > Switch to waking up a single CPU, by using ordering of writes > > > > to task->cpu and task->tick_dep_mask. > > > > > > > > From: Frederic Weisbecker > > > > Suggested-by: Peter Zijlstra > > > > Signed-off-by: Frederic Weisbecker > > > > Signed-off-by: Marcelo Tosatti > > > > > > > > Index: linux-2.6/kernel/time/tick-sched.c > > > > === > > > > --- linux-2.6.orig/kernel/time/tick-sched.c > > > > +++ linux-2.6/kernel/time/tick-sched.c > > > > @@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu) > > > > irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu); > > > > } > > > > > > > > +static void tick_nohz_kick_task(struct task_struct *tsk) > > > > +{ > > > > + int cpu = task_cpu(tsk); > > > > + > > > > + /* > > > > +* If the task concurrently migrates to another cpu, > > > > +* we guarantee it sees the new tick dependency upon > > > > +* schedule. > > > > +* > > > > +* > > > > +* set_task_cpu(p, cpu); > > > > +* STORE p->cpu = @cpu > > > > +* __schedule() (switch to task 'p') > > > > +* LOCK rq->lock > > > > +* smp_mb__after_spin_lock() STORE p->tick_dep_mask > > > > +* tick_nohz_task_switch()smp_mb() > > > > (atomic_fetch_or()) > > > > +* LOAD p->tick_dep_mask LOAD p->cpu > > > > +*/ > > > > + > > > > + preempt_disable(); > > > > + if (cpu_online(cpu)) > > > > + tick_nohz_full_kick_cpu(cpu); > > > > + preempt_enable(); > > > > +} > > > > > > So we need to kick the CPU unconditionally, or only when the task is > > > actually running? AFAICT we only care about current->tick_dep_mask. > > > > tick is necessary to execute run_posix_cpu_timers, from tick interrupt, > > even if task is not running. > > Yes but if the task isn't running, run_posix_cpu_timers() doesn't have > anything to elapse. So indeed we can spare the IPI if the task is not > running. Provided ordering makes sure that the task sees the new dependency > when it schedules in of course. True. * p->on_cpu <- { 0, 1 }: * * is set by prepare_task() and cleared by finish_task() such that it will be * set before p is scheduled-in and cleared after p is scheduled-out, both * under rq->lock. Non-zero indicates the task is running on its CPU. CPU-0 (tick_set_dep)CPU-1 (task switch) STORE p->tick_dep_mask smp_mb() (atomic_fetch_or()) LOAD p->on_cpu context_switch(prev, next) STORE next->on_cpu = 1 ... [*] LOAD current->tick_dep_mask Don't see any explicit memory barrier in the [*] section?
Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
On Thu, Oct 08, 2020 at 02:54:09PM -0300, Marcelo Tosatti wrote: > On Thu, Oct 08, 2020 at 02:22:56PM +0200, Peter Zijlstra wrote: > > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > > > When adding a tick dependency to a task, its necessary to > > > wakeup the CPU where the task resides to reevaluate tick > > > dependencies on that CPU. > > > > > > However the current code wakes up all nohz_full CPUs, which > > > is unnecessary. > > > > > > Switch to waking up a single CPU, by using ordering of writes > > > to task->cpu and task->tick_dep_mask. > > > > > > From: Frederic Weisbecker > > > Suggested-by: Peter Zijlstra > > > Signed-off-by: Frederic Weisbecker > > > Signed-off-by: Marcelo Tosatti > > > > > > Index: linux-2.6/kernel/time/tick-sched.c > > > === > > > --- linux-2.6.orig/kernel/time/tick-sched.c > > > +++ linux-2.6/kernel/time/tick-sched.c > > > @@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu) > > > irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu); > > > } > > > > > > +static void tick_nohz_kick_task(struct task_struct *tsk) > > > +{ > > > + int cpu = task_cpu(tsk); > > > + > > > + /* > > > + * If the task concurrently migrates to another cpu, > > > + * we guarantee it sees the new tick dependency upon > > > + * schedule. > > > + * > > > + * > > > + * set_task_cpu(p, cpu); > > > + * STORE p->cpu = @cpu > > > + * __schedule() (switch to task 'p') > > > + * LOCK rq->lock > > > + * smp_mb__after_spin_lock() STORE p->tick_dep_mask > > > + * tick_nohz_task_switch()smp_mb() (atomic_fetch_or()) > > > + * LOAD p->tick_dep_mask LOAD p->cpu > > > + */ > > > + > > > + preempt_disable(); > > > + if (cpu_online(cpu)) > > > + tick_nohz_full_kick_cpu(cpu); > > > + preempt_enable(); > > > +} > > > > So we need to kick the CPU unconditionally, or only when the task is > > actually running? AFAICT we only care about current->tick_dep_mask. > > tick is necessary to execute run_posix_cpu_timers, from tick interrupt, > even if task is not running. Yes but if the task isn't running, run_posix_cpu_timers() doesn't have anything to elapse. So indeed we can spare the IPI if the task is not running. Provided ordering makes sure that the task sees the new dependency when it schedules in of course.
Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
On Thu, Oct 08, 2020 at 05:28:44PM +0200, Peter Zijlstra wrote: > On Thu, Oct 08, 2020 at 10:59:40AM -0400, Peter Xu wrote: > > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > > > +static void tick_nohz_kick_task(struct task_struct *tsk) > > > +{ > > > + int cpu = task_cpu(tsk); > > > + > > > + /* > > > + * If the task concurrently migrates to another cpu, > > > + * we guarantee it sees the new tick dependency upon > > > + * schedule. > > > + * > > > + * > > > + * set_task_cpu(p, cpu); > > > + * STORE p->cpu = @cpu > > > + * __schedule() (switch to task 'p') > > > + * LOCK rq->lock > > > + * smp_mb__after_spin_lock() STORE p->tick_dep_mask > > > + * tick_nohz_task_switch()smp_mb() (atomic_fetch_or()) > > > + * LOAD p->tick_dep_mask LOAD p->cpu > > > + */ > > > + > > > + preempt_disable(); > > > > Pure question: is preempt_disable() required here? Same question to > > tick_nohz_full_kick_all(). > > I think it serializes against hotplug. Exactly! Thanks.
[patch 1/2] nohz: only wakeup a single target cpu when kicking a task
When adding a tick dependency to a task, its necessary to wakeup the CPU where the task resides to reevaluate tick dependencies on that CPU. However the current code wakes up all nohz_full CPUs, which is unnecessary. Switch to waking up a single CPU, by using ordering of writes to task->cpu and task->tick_dep_mask. From: Frederic Weisbecker Suggested-by: Peter Zijlstra Signed-off-by: Frederic Weisbecker Signed-off-by: Marcelo Tosatti Index: linux-2.6/kernel/time/tick-sched.c === --- linux-2.6.orig/kernel/time/tick-sched.c +++ linux-2.6/kernel/time/tick-sched.c @@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu) irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu); } +static void tick_nohz_kick_task(struct task_struct *tsk) +{ + int cpu = task_cpu(tsk); + + /* +* If the task concurrently migrates to another cpu, +* we guarantee it sees the new tick dependency upon +* schedule. +* +* +* set_task_cpu(p, cpu); +* STORE p->cpu = @cpu +* __schedule() (switch to task 'p') +* LOCK rq->lock +* smp_mb__after_spin_lock() STORE p->tick_dep_mask +* tick_nohz_task_switch()smp_mb() (atomic_fetch_or()) +* LOAD p->tick_dep_mask LOAD p->cpu +*/ + + preempt_disable(); + if (cpu_online(cpu)) + tick_nohz_full_kick_cpu(cpu); + preempt_enable(); +} + /* * Kick all full dynticks CPUs in order to force these to re-evaluate * their dependency on the tick and restart it if necessary. @@ -356,19 +381,8 @@ EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cp */ void tick_nohz_dep_set_task(struct task_struct *tsk, enum tick_dep_bits bit) { - if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask)) { - if (tsk == current) { - preempt_disable(); - tick_nohz_full_kick(); - preempt_enable(); - } else { - /* -* Some future tick_nohz_full_kick_task() -* should optimize this. -*/ - tick_nohz_full_kick_all(); - } - } + if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask)) + tick_nohz_kick_task(tsk); } EXPORT_SYMBOL_GPL(tick_nohz_dep_set_task);
Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
On Thu, Oct 08, 2020 at 05:28:44PM +0200, Peter Zijlstra wrote: > On Thu, Oct 08, 2020 at 10:59:40AM -0400, Peter Xu wrote: > > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > > > +static void tick_nohz_kick_task(struct task_struct *tsk) > > > +{ > > > + int cpu = task_cpu(tsk); > > > + > > > + /* > > > + * If the task concurrently migrates to another cpu, > > > + * we guarantee it sees the new tick dependency upon > > > + * schedule. > > > + * > > > + * > > > + * set_task_cpu(p, cpu); > > > + * STORE p->cpu = @cpu > > > + * __schedule() (switch to task 'p') > > > + * LOCK rq->lock > > > + * smp_mb__after_spin_lock() STORE p->tick_dep_mask > > > + * tick_nohz_task_switch()smp_mb() (atomic_fetch_or()) > > > + * LOAD p->tick_dep_mask LOAD p->cpu > > > + */ > > > + > > > + preempt_disable(); > > > > Pure question: is preempt_disable() required here? Same question to > > tick_nohz_full_kick_all(). > > I think it serializes against hotplug. Thanks Peter. So is that a lighter but trickier version of get_online_cpus() which is even allowed with spinlock? I noticed that this method was actually mentioned in the old cpu-hotplug.txt, but it was removed during the convertion to rst: ff58fa7f556c ("Documentation: Update CPU hotplug and move it to core-api") Not sure whether it's intended, just to raise this up. If it's still valid, maybe still worth to document it somewhere. -- Peter Xu
Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
On Thu, Oct 08, 2020 at 02:22:56PM +0200, Peter Zijlstra wrote: > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > > When adding a tick dependency to a task, its necessary to > > wakeup the CPU where the task resides to reevaluate tick > > dependencies on that CPU. > > > > However the current code wakes up all nohz_full CPUs, which > > is unnecessary. > > > > Switch to waking up a single CPU, by using ordering of writes > > to task->cpu and task->tick_dep_mask. > > > > From: Frederic Weisbecker > > Suggested-by: Peter Zijlstra > > Signed-off-by: Frederic Weisbecker > > Signed-off-by: Marcelo Tosatti > > > > Index: linux-2.6/kernel/time/tick-sched.c > > === > > --- linux-2.6.orig/kernel/time/tick-sched.c > > +++ linux-2.6/kernel/time/tick-sched.c > > @@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu) > > irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu); > > } > > > > +static void tick_nohz_kick_task(struct task_struct *tsk) > > +{ > > + int cpu = task_cpu(tsk); > > + > > + /* > > +* If the task concurrently migrates to another cpu, > > +* we guarantee it sees the new tick dependency upon > > +* schedule. > > +* > > +* > > +* set_task_cpu(p, cpu); > > +* STORE p->cpu = @cpu > > +* __schedule() (switch to task 'p') > > +* LOCK rq->lock > > +* smp_mb__after_spin_lock() STORE p->tick_dep_mask > > +* tick_nohz_task_switch()smp_mb() (atomic_fetch_or()) > > +* LOAD p->tick_dep_mask LOAD p->cpu > > +*/ > > + > > + preempt_disable(); > > + if (cpu_online(cpu)) > > + tick_nohz_full_kick_cpu(cpu); > > + preempt_enable(); > > +} > > So we need to kick the CPU unconditionally, or only when the task is > actually running? AFAICT we only care about current->tick_dep_mask. tick is necessary to execute run_posix_cpu_timers, from tick interrupt, even if task is not running.
Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
On Thu, Oct 08, 2020 at 10:59:40AM -0400, Peter Xu wrote: > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > > +static void tick_nohz_kick_task(struct task_struct *tsk) > > +{ > > + int cpu = task_cpu(tsk); > > + > > + /* > > +* If the task concurrently migrates to another cpu, > > +* we guarantee it sees the new tick dependency upon > > +* schedule. > > +* > > +* > > +* set_task_cpu(p, cpu); > > +* STORE p->cpu = @cpu > > +* __schedule() (switch to task 'p') > > +* LOCK rq->lock > > +* smp_mb__after_spin_lock() STORE p->tick_dep_mask > > +* tick_nohz_task_switch()smp_mb() (atomic_fetch_or()) > > +* LOAD p->tick_dep_mask LOAD p->cpu > > +*/ > > + > > + preempt_disable(); > > Pure question: is preempt_disable() required here? Same question to > tick_nohz_full_kick_all(). Hi Peter, Don't see why: irq_queue_work_on() disables preemption if necessary. > > > + if (cpu_online(cpu)) > > + tick_nohz_full_kick_cpu(cpu); > > + preempt_enable(); > > +} > > -- > Peter Xu
Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
On Thu, Oct 08, 2020 at 10:59:40AM -0400, Peter Xu wrote: > On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > > +static void tick_nohz_kick_task(struct task_struct *tsk) > > +{ > > + int cpu = task_cpu(tsk); > > + > > + /* > > +* If the task concurrently migrates to another cpu, > > +* we guarantee it sees the new tick dependency upon > > +* schedule. > > +* > > +* > > +* set_task_cpu(p, cpu); > > +* STORE p->cpu = @cpu > > +* __schedule() (switch to task 'p') > > +* LOCK rq->lock > > +* smp_mb__after_spin_lock() STORE p->tick_dep_mask > > +* tick_nohz_task_switch()smp_mb() (atomic_fetch_or()) > > +* LOAD p->tick_dep_mask LOAD p->cpu > > +*/ > > + > > + preempt_disable(); > > Pure question: is preempt_disable() required here? Same question to > tick_nohz_full_kick_all(). I think it serializes against hotplug. > > > + if (cpu_online(cpu)) > > + tick_nohz_full_kick_cpu(cpu); > > + preempt_enable(); > > +}
Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > +static void tick_nohz_kick_task(struct task_struct *tsk) > +{ > + int cpu = task_cpu(tsk); > + > + /* > + * If the task concurrently migrates to another cpu, > + * we guarantee it sees the new tick dependency upon > + * schedule. > + * > + * > + * set_task_cpu(p, cpu); > + * STORE p->cpu = @cpu > + * __schedule() (switch to task 'p') > + * LOCK rq->lock > + * smp_mb__after_spin_lock() STORE p->tick_dep_mask > + * tick_nohz_task_switch()smp_mb() (atomic_fetch_or()) > + * LOAD p->tick_dep_mask LOAD p->cpu > + */ > + > + preempt_disable(); Pure question: is preempt_disable() required here? Same question to tick_nohz_full_kick_all(). > + if (cpu_online(cpu)) > + tick_nohz_full_kick_cpu(cpu); > + preempt_enable(); > +} -- Peter Xu
Re: [patch 1/2] nohz: only wakeup a single target cpu when kicking a task
On Wed, Oct 07, 2020 at 03:01:52PM -0300, Marcelo Tosatti wrote: > When adding a tick dependency to a task, its necessary to > wakeup the CPU where the task resides to reevaluate tick > dependencies on that CPU. > > However the current code wakes up all nohz_full CPUs, which > is unnecessary. > > Switch to waking up a single CPU, by using ordering of writes > to task->cpu and task->tick_dep_mask. > > From: Frederic Weisbecker > Suggested-by: Peter Zijlstra > Signed-off-by: Frederic Weisbecker > Signed-off-by: Marcelo Tosatti > > Index: linux-2.6/kernel/time/tick-sched.c > === > --- linux-2.6.orig/kernel/time/tick-sched.c > +++ linux-2.6/kernel/time/tick-sched.c > @@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu) > irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu); > } > > +static void tick_nohz_kick_task(struct task_struct *tsk) > +{ > + int cpu = task_cpu(tsk); > + > + /* > + * If the task concurrently migrates to another cpu, > + * we guarantee it sees the new tick dependency upon > + * schedule. > + * > + * > + * set_task_cpu(p, cpu); > + * STORE p->cpu = @cpu > + * __schedule() (switch to task 'p') > + * LOCK rq->lock > + * smp_mb__after_spin_lock() STORE p->tick_dep_mask > + * tick_nohz_task_switch()smp_mb() (atomic_fetch_or()) > + * LOAD p->tick_dep_mask LOAD p->cpu > + */ > + > + preempt_disable(); > + if (cpu_online(cpu)) > + tick_nohz_full_kick_cpu(cpu); > + preempt_enable(); > +} So we need to kick the CPU unconditionally, or only when the task is actually running? AFAICT we only care about current->tick_dep_mask.
[patch 1/2] nohz: only wakeup a single target cpu when kicking a task
When adding a tick dependency to a task, its necessary to wakeup the CPU where the task resides to reevaluate tick dependencies on that CPU. However the current code wakes up all nohz_full CPUs, which is unnecessary. Switch to waking up a single CPU, by using ordering of writes to task->cpu and task->tick_dep_mask. From: Frederic Weisbecker Suggested-by: Peter Zijlstra Signed-off-by: Frederic Weisbecker Signed-off-by: Marcelo Tosatti Index: linux-2.6/kernel/time/tick-sched.c === --- linux-2.6.orig/kernel/time/tick-sched.c +++ linux-2.6/kernel/time/tick-sched.c @@ -274,6 +274,31 @@ void tick_nohz_full_kick_cpu(int cpu) irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu); } +static void tick_nohz_kick_task(struct task_struct *tsk) +{ + int cpu = task_cpu(tsk); + + /* +* If the task concurrently migrates to another cpu, +* we guarantee it sees the new tick dependency upon +* schedule. +* +* +* set_task_cpu(p, cpu); +* STORE p->cpu = @cpu +* __schedule() (switch to task 'p') +* LOCK rq->lock +* smp_mb__after_spin_lock() STORE p->tick_dep_mask +* tick_nohz_task_switch()smp_mb() (atomic_fetch_or()) +* LOAD p->tick_dep_mask LOAD p->cpu +*/ + + preempt_disable(); + if (cpu_online(cpu)) + tick_nohz_full_kick_cpu(cpu); + preempt_enable(); +} + /* * Kick all full dynticks CPUs in order to force these to re-evaluate * their dependency on the tick and restart it if necessary. @@ -356,19 +381,8 @@ EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cp */ void tick_nohz_dep_set_task(struct task_struct *tsk, enum tick_dep_bits bit) { - if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask)) { - if (tsk == current) { - preempt_disable(); - tick_nohz_full_kick(); - preempt_enable(); - } else { - /* -* Some future tick_nohz_full_kick_task() -* should optimize this. -*/ - tick_nohz_full_kick_all(); - } - } + if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask)) + tick_nohz_kick_task(tsk); } EXPORT_SYMBOL_GPL(tick_nohz_dep_set_task);