Re: [PATCH] hrtimer: avoid retrigger_next_event IPI
On Fri, Apr 09 2021 at 13:51, Marcelo Tosatti wrote: > On Fri, Apr 09, 2021 at 04:15:13PM +0200, Thomas Gleixner wrote: >> On Wed, Apr 07 2021 at 10:53, Marcelo Tosatti wrote: >> ---> fail because that newly started timer is on the old offset. > > CPU0 CPU1 > > > clock_was_set() > Case-1: CPU-1 grabs > base->lock before CPU-0: > CPU-0 sees > active_mask[CPU1] and IPIs. > > base = > lock_hrtimer_base(timer, &flags); > if > (__hrtimer_start_range_ns(timer, tim, ... > > hrtimer_reprogram(timer, true); > > > unlock_hrtimer_base(timer, &flags); > > > raw_spin_lock_irqsave(&cpu_base->lock, flags); > if (need_reprogram_timer(cpu_base)) > cpumask_set_cpu(cpu, mask); > else > hrtimer_update_base(cpu_base); > raw_spin_unlock_irqrestore(&cpu_base->lock, flags); > > Case-2: CPU-1 grabs > base->lock after CPU-0: > CPU-0 will have updated > the offsets remotely. > > base = > lock_hrtimer_base(timer, &flags); > if > (__hrtimer_start_range_ns(timer, tim, ... > > hrtimer_reprogram(timer, true); > > > unlock_hrtimer_base(timer, &flags); > > > No? Yeah, you're right. I misread the loop logic. Can we please make that unconditional independent of nohz full. There is no reason to special case it. Thanks, tglx
Re: [PATCH] hrtimer: avoid retrigger_next_event IPI
+CC Anna-Maria. On Fri, Apr 09, 2021 at 04:15:13PM +0200, Thomas Gleixner wrote: > On Wed, Apr 07 2021 at 10:53, Marcelo Tosatti wrote: > > Setting the realtime clock triggers an IPI to all CPUs to reprogram > > hrtimers. > > > > However, only base, boottime and tai clocks have their offsets updated > > base clock? Heh... > And why boottime? Boottime is not affected by a clock > realtime set. It's clock REALTIME and TAI, nothing else. OK! > > +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME)|\ > > +(1U << HRTIMER_BASE_REALTIME_SOFT)|\ > > +(1U << HRTIMER_BASE_BOOTTIME)| \ > > +(1U << HRTIMER_BASE_BOOTTIME_SOFT)|\ > > +(1U << HRTIMER_BASE_TAI)| \ > > +(1U << HRTIMER_BASE_TAI_SOFT)) > > + > > +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base) > > +{ > > + unsigned int active = 0; > > + > > + if (!cpu_base->softirq_activated) > > + active = cpu_base->active_bases & HRTIMER_ACTIVE_SOFT; Again, if (cpu_base->softirq_activated), need to IPI (will resend). > > + active = active | (cpu_base->active_bases & HRTIMER_ACTIVE_HARD); > > + > > + if ((active & CLOCK_SET_BASES) == 0) > > + return false; > > + > > + return true; > > +} > > Errm. What? > > + /* Avoid interrupting nohz_full CPUs if possible */ > > + preempt_disable(); > > + for_each_online_cpu(cpu) { > > + if (tick_nohz_full_cpu(cpu)) { > > + unsigned long flags; > > + struct hrtimer_cpu_base *cpu_base = > > &per_cpu(hrtimer_bases, cpu); > > + > > + raw_spin_lock_irqsave(&cpu_base->lock, flags); > > + if (need_reprogram_timer(cpu_base)) > > + cpumask_set_cpu(cpu, mask); > > + else > > + hrtimer_update_base(cpu_base); > > + raw_spin_unlock_irqrestore(&cpu_base->lock, flags); > > + } > > + } > > How is that supposed to be correct? > > CPU0 CPU1 > > clock_was_set() hrtimer_start(CLOCK_REALTIME) > > if (!active_mask[CPU1] & XXX) > continue; > active_mask |= REALTIME; > > ---> fail because that newly started timer is on the old offset. CPU0CPU1 clock_was_set() Case-1: CPU-1 grabs base->lock before CPU-0: CPU-0 sees active_mask[CPU1] and IPIs. base = lock_hrtimer_base(timer, &flags); if (__hrtimer_start_range_ns(timer, tim, ... hrtimer_reprogram(timer, true); unlock_hrtimer_base(timer, &flags); raw_spin_lock_irqsave(&cpu_base->lock, flags); if (need_reprogram_timer(cpu_base)) cpumask_set_cpu(cpu, mask); else hrtimer_update_base(cpu_base); raw_spin_unlock_irqrestore(&cpu_base->lock, flags); Case-2: CPU-1 grabs base->lock after CPU-0: CPU-0 will have updated the offsets remotely. base = lock_hrtimer_base(timer, &flags); if (__hrtimer_start_range_ns(timer, tim, ... hrtimer_reprogram(timer, true); unlock_hrtimer_base(timer, &flags); No?
Re: [PATCH] hrtimer: avoid retrigger_next_event IPI
On Wed, Apr 07 2021 at 10:53, Marcelo Tosatti wrote: > Setting the realtime clock triggers an IPI to all CPUs to reprogram > hrtimers. > > However, only base, boottime and tai clocks have their offsets updated base clock? And why boottime? Boottime is not affected by a clock realtime set. It's clock REALTIME and TAI, nothing else. > +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME)| \ > + (1U << HRTIMER_BASE_REALTIME_SOFT)|\ > + (1U << HRTIMER_BASE_BOOTTIME)| \ > + (1U << HRTIMER_BASE_BOOTTIME_SOFT)|\ > + (1U << HRTIMER_BASE_TAI)| \ > + (1U << HRTIMER_BASE_TAI_SOFT)) > + > +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base) > +{ > + unsigned int active = 0; > + > + if (!cpu_base->softirq_activated) > + active = cpu_base->active_bases & HRTIMER_ACTIVE_SOFT; > + > + active = active | (cpu_base->active_bases & HRTIMER_ACTIVE_HARD); > + > + if ((active & CLOCK_SET_BASES) == 0) > + return false; > + > + return true; > +} Errm. > + /* Avoid interrupting nohz_full CPUs if possible */ > + preempt_disable(); > + for_each_online_cpu(cpu) { > + if (tick_nohz_full_cpu(cpu)) { > + unsigned long flags; > + struct hrtimer_cpu_base *cpu_base = > &per_cpu(hrtimer_bases, cpu); > + > + raw_spin_lock_irqsave(&cpu_base->lock, flags); > + if (need_reprogram_timer(cpu_base)) > + cpumask_set_cpu(cpu, mask); > + else > + hrtimer_update_base(cpu_base); > + raw_spin_unlock_irqrestore(&cpu_base->lock, flags); > + } > + } How is that supposed to be correct? CPU0CPU1 clock_was_set() hrtimer_start(CLOCK_REALTIME) if (!active_mask[CPU1] & XXX) continue; active_mask |= REALTIME; ---> fail because that newly started timer is on the old offset. Thanks, tglx
Re: [PATCH] hrtimer: avoid retrigger_next_event IPI
On Thu, Apr 08, 2021 at 12:14:57AM +0200, Frederic Weisbecker wrote: > On Wed, Apr 07, 2021 at 10:53:01AM -0300, Marcelo Tosatti wrote: > > > > Setting the realtime clock triggers an IPI to all CPUs to reprogram > > hrtimers. > > > > However, only base, boottime and tai clocks have their offsets updated > > (and therefore potentially require a reprogram). > > > > If the CPU is a nohz_full one, check if it only has > > monotonic active timers, and in that case update the > > realtime base offsets, skipping the IPI. > > > > This reduces interruptions to nohz_full CPUs. > > > > Signed-off-by: Marcelo Tosatti > > > > diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c > > index 743c852e10f2..b42b1a434b22 100644 > > --- a/kernel/time/hrtimer.c > > +++ b/kernel/time/hrtimer.c > > @@ -853,6 +853,28 @@ static void hrtimer_reprogram(struct hrtimer *timer, > > bool reprogram) > > tick_program_event(expires, 1); > > } > > > > +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME)|\ > > +(1U << HRTIMER_BASE_REALTIME_SOFT)|\ > > +(1U << HRTIMER_BASE_BOOTTIME)| \ > > +(1U << HRTIMER_BASE_BOOTTIME_SOFT)|\ > > +(1U << HRTIMER_BASE_TAI)| \ > > +(1U << HRTIMER_BASE_TAI_SOFT)) > > + > > +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base) > > +{ > > + unsigned int active = 0; > > + > > + if (!cpu_base->softirq_activated) > > + active = cpu_base->active_bases & HRTIMER_ACTIVE_SOFT; If cpu_base->softirq_activated == 1, should IPI as well. > > + active = active | (cpu_base->active_bases & HRTIMER_ACTIVE_HARD); > > + > > + if ((active & CLOCK_SET_BASES) == 0) > > + return false; > > + > > + return true; > > +} > > + > > /* > > * Clock realtime was set > > * > > @@ -867,9 +889,41 @@ static void hrtimer_reprogram(struct hrtimer *timer, > > bool reprogram) > > void clock_was_set(void) > > { > > #ifdef CONFIG_HIGH_RES_TIMERS > > - /* Retrigger the CPU local events everywhere */ > > - on_each_cpu(retrigger_next_event, NULL, 1); > > + cpumask_var_t mask; > > + int cpu; > > + > > + if (!tick_nohz_full_enabled()) { > > + /* Retrigger the CPU local events everywhere */ > > + on_each_cpu(retrigger_next_event, NULL, 1); > > + goto set_timerfd; > > + } > > + > > + if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) { > > + on_each_cpu(retrigger_next_event, NULL, 1); > > + goto set_timerfd; > > + } > > + > > + /* Avoid interrupting nohz_full CPUs if possible */ > > + preempt_disable(); > > + for_each_online_cpu(cpu) { > > + if (tick_nohz_full_cpu(cpu)) { > > + unsigned long flags; > > + struct hrtimer_cpu_base *cpu_base = > > &per_cpu(hrtimer_bases, cpu); > > + > > + raw_spin_lock_irqsave(&cpu_base->lock, flags); > > + if (need_reprogram_timer(cpu_base)) > > + cpumask_set_cpu(cpu, mask); > > + else > > + hrtimer_update_base(cpu_base); > > + raw_spin_unlock_irqrestore(&cpu_base->lock, flags); > > + } > > You forgot to add the housekeeping CPUs to the mask. So people are using: console=tty0 console=ttyS0,115200n8 skew_tick=1 nohz=on rcu_nocbs=8-31 tuned.non_isolcpus=00ff intel_pstate=disable nosoftlockup tsc=nowatchdog intel_iommu=on iommu=pt isolcpus=managed_irq,8-31 systemd.cpu_affinity=0,1,2,3,4,5,6,7 default_hugepagesz=1G hugepagesz=2M hugepages=128 nohz_full=8-31 And using the nohz_full= CPUs (or subsets of nohz_full= CPUs) in two modes: Either "generic non-isolated applications" (with load-balancing enabled for those CPUs), or for latency sensitive applications. And switching between the modes. In this case, it would only be possible to check for housekeeping CPUs of type MANAGED_IRQ, which would be strange. > As for the need_reprogram_timer() trick, I'll rather defer to Thomas review... > > Thanks. Thanks! > > > + } > > + > > + smp_call_function_many(mask, retrigger_next_event, NULL, 1); > > + preempt_enable(); > > + free_cpumask_var(mask); > > #endif > > +set_timerfd: > > timerfd_clock_was_set(); > > } > > > >
Re: [PATCH] hrtimer: avoid retrigger_next_event IPI
On Wed, Apr 07, 2021 at 10:53:01AM -0300, Marcelo Tosatti wrote: > > Setting the realtime clock triggers an IPI to all CPUs to reprogram > hrtimers. > > However, only base, boottime and tai clocks have their offsets updated > (and therefore potentially require a reprogram). > > If the CPU is a nohz_full one, check if it only has > monotonic active timers, and in that case update the > realtime base offsets, skipping the IPI. > > This reduces interruptions to nohz_full CPUs. > > Signed-off-by: Marcelo Tosatti > > diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c > index 743c852e10f2..b42b1a434b22 100644 > --- a/kernel/time/hrtimer.c > +++ b/kernel/time/hrtimer.c > @@ -853,6 +853,28 @@ static void hrtimer_reprogram(struct hrtimer *timer, > bool reprogram) > tick_program_event(expires, 1); > } > > +#define CLOCK_SET_BASES ((1U << HRTIMER_BASE_REALTIME)| \ > + (1U << HRTIMER_BASE_REALTIME_SOFT)|\ > + (1U << HRTIMER_BASE_BOOTTIME)| \ > + (1U << HRTIMER_BASE_BOOTTIME_SOFT)|\ > + (1U << HRTIMER_BASE_TAI)| \ > + (1U << HRTIMER_BASE_TAI_SOFT)) > + > +static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base) > +{ > + unsigned int active = 0; > + > + if (!cpu_base->softirq_activated) > + active = cpu_base->active_bases & HRTIMER_ACTIVE_SOFT; > + > + active = active | (cpu_base->active_bases & HRTIMER_ACTIVE_HARD); > + > + if ((active & CLOCK_SET_BASES) == 0) > + return false; > + > + return true; > +} > + > /* > * Clock realtime was set > * > @@ -867,9 +889,41 @@ static void hrtimer_reprogram(struct hrtimer *timer, > bool reprogram) > void clock_was_set(void) > { > #ifdef CONFIG_HIGH_RES_TIMERS > - /* Retrigger the CPU local events everywhere */ > - on_each_cpu(retrigger_next_event, NULL, 1); > + cpumask_var_t mask; > + int cpu; > + > + if (!tick_nohz_full_enabled()) { > + /* Retrigger the CPU local events everywhere */ > + on_each_cpu(retrigger_next_event, NULL, 1); > + goto set_timerfd; > + } > + > + if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) { > + on_each_cpu(retrigger_next_event, NULL, 1); > + goto set_timerfd; > + } > + > + /* Avoid interrupting nohz_full CPUs if possible */ > + preempt_disable(); > + for_each_online_cpu(cpu) { > + if (tick_nohz_full_cpu(cpu)) { > + unsigned long flags; > + struct hrtimer_cpu_base *cpu_base = > &per_cpu(hrtimer_bases, cpu); > + > + raw_spin_lock_irqsave(&cpu_base->lock, flags); > + if (need_reprogram_timer(cpu_base)) > + cpumask_set_cpu(cpu, mask); > + else > + hrtimer_update_base(cpu_base); > + raw_spin_unlock_irqrestore(&cpu_base->lock, flags); > + } You forgot to add the housekeeping CPUs to the mask. As for the need_reprogram_timer() trick, I'll rather defer to Thomas review... Thanks. > + } > + > + smp_call_function_many(mask, retrigger_next_event, NULL, 1); > + preempt_enable(); > + free_cpumask_var(mask); > #endif > +set_timerfd: > timerfd_clock_was_set(); > } > >
Re: [PATCH] hrtimer: avoid retrigger_next_event IPI
Hi Marcelo, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on tip/timers/core] [also build test WARNING on linux/master linus/master v5.12-rc6 next-20210407] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Marcelo-Tosatti/hrtimer-avoid-retrigger_next_event-IPI/20210407-233005 base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git d4c7c28806616809e3baa0b7cd8c665516b2726d config: arm64-randconfig-r032-20210407 (attached as .config) compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project c060945b23a1c54d4b2a053ff4b093a2277b303d) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install arm64 cross compiling tool for clang build # apt-get install binutils-aarch64-linux-gnu # https://github.com/0day-ci/linux/commit/defd4db9d63d1f16e3e21862bd9c9105a9f6a7e9 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Marcelo-Tosatti/hrtimer-avoid-retrigger_next_event-IPI/20210407-233005 git checkout defd4db9d63d1f16e3e21862bd9c9105a9f6a7e9 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All warnings (new ones prefixed by >>): kernel/time/hrtimer.c:120:21: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides] [CLOCK_REALTIME]= HRTIMER_BASE_REALTIME, ^ kernel/time/hrtimer.c:118:27: note: previous initialization is here [0 ... MAX_CLOCKS - 1] = HRTIMER_MAX_CLOCK_BASES, ^~~ kernel/time/hrtimer.c:121:22: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides] [CLOCK_MONOTONIC] = HRTIMER_BASE_MONOTONIC, ^~ kernel/time/hrtimer.c:118:27: note: previous initialization is here [0 ... MAX_CLOCKS - 1] = HRTIMER_MAX_CLOCK_BASES, ^~~ kernel/time/hrtimer.c:122:21: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides] [CLOCK_BOOTTIME]= HRTIMER_BASE_BOOTTIME, ^ kernel/time/hrtimer.c:118:27: note: previous initialization is here [0 ... MAX_CLOCKS - 1] = HRTIMER_MAX_CLOCK_BASES, ^~~ kernel/time/hrtimer.c:123:17: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides] [CLOCK_TAI] = HRTIMER_BASE_TAI, ^~~~ kernel/time/hrtimer.c:118:27: note: previous initialization is here [0 ... MAX_CLOCKS - 1] = HRTIMER_MAX_CLOCK_BASES, ^~~ >> kernel/time/hrtimer.c:944:1: warning: unused label 'set_timerfd' >> [-Wunused-label] set_timerfd: ^~~~ kernel/time/hrtimer.c:147:20: warning: unused function 'is_migration_base' [-Wunused-function] static inline bool is_migration_base(struct hrtimer_clock_base *base) ^ kernel/time/hrtimer.c:650:19: warning: unused function 'hrtimer_hres_active' [-Wunused-function] static inline int hrtimer_hres_active(void) ^ kernel/time/hrtimer.c:881:13: warning: unused function 'need_reprogram_timer' [-Wunused-function] static bool need_reprogram_timer(struct hrtimer_cpu_base *cpu_base) ^ kernel/time/hrtimer.c:1793:20: warning: unused function '__hrtimer_peek_ahead_timers' [-Wunused-function] static inline void __hrtimer_peek_ahead_timers(void) { } ^ 9 warnings generated. vim +/set_timerfd +944 kernel/time/hrtimer.c 895 896 /* 897 * Clock realtime was set 898 * 899 * Change the offset of the realtime clock vs. the monotonic 900 * clock. 901 * 902 * We might have to reprogram the high resolution timer interrupt. On 903 * SMP we call the architecture specific code to retrigger _all_ high 904 * resolution timer interrupts. On UP we just disable interrupts and 905 * call the high resolution interrupt code. 906 */ 907 void clock_was_set(void) 908 { 909 #ifdef CONFIG_HIGH_RES_TIMERS 910 cpumask_var_t mask; 911 int cpu; 912