Re: [RFC PATCH v4 12/21] watchdog/hardlockup/hpet: Adjust timer expiration on the number of monitored CPUs

2019-06-18 Thread Ricardo Neri
On Tue, Jun 11, 2019 at 10:11:04PM +0200, Thomas Gleixner wrote:
> On Thu, 23 May 2019, Ricardo Neri wrote:
> > @@ -52,10 +59,10 @@ static void kick_timer(struct hpet_hld_data *hdata, 
> > bool force)
> > return;
> >  
> > if (hdata->has_periodic)
> > -   period = watchdog_thresh * hdata->ticks_per_second;
> > +   period = watchdog_thresh * hdata->ticks_per_cpu;
> >  
> > count = hpet_readl(HPET_COUNTER);
> > -   new_compare = count + watchdog_thresh * hdata->ticks_per_second;
> > +   new_compare = count + watchdog_thresh * hdata->ticks_per_cpu;
> > hpet_set_comparator(hdata->num, (u32)new_compare, (u32)period);
> 
> So with this you might get close to the point where you trip over the SMI
> induced madness where CPUs vanish for several milliseconds in some value
> add code. You really want to do a read back of the hpet to detect that. See
> the comment in the hpet code. RHEL 7/8 allow up to 768 logical CPUs

Do you mean adding a readback to check if the new compare value is
greater than the current count? Similar to the check at the end of
hpet_next_event():

return res < HPET_MIN_CYCLES ? -ETIME : 0;

In such a case, should it try to set the comparator again? I think it
should, as otherwise the hardlockup detector would stop working.

Thanks and BR,
Ricardo
> 
> Thanks,
> 
>   tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 12/21] watchdog/hardlockup/hpet: Adjust timer expiration on the number of monitored CPUs

2019-06-11 Thread Thomas Gleixner
On Thu, 23 May 2019, Ricardo Neri wrote:
> @@ -52,10 +59,10 @@ static void kick_timer(struct hpet_hld_data *hdata, bool 
> force)
>   return;
>  
>   if (hdata->has_periodic)
> - period = watchdog_thresh * hdata->ticks_per_second;
> + period = watchdog_thresh * hdata->ticks_per_cpu;
>  
>   count = hpet_readl(HPET_COUNTER);
> - new_compare = count + watchdog_thresh * hdata->ticks_per_second;
> + new_compare = count + watchdog_thresh * hdata->ticks_per_cpu;
>   hpet_set_comparator(hdata->num, (u32)new_compare, (u32)period);

So with this you might get close to the point where you trip over the SMI
induced madness where CPUs vanish for several milliseconds in some value
add code. You really want to do a read back of the hpet to detect that. See
the comment in the hpet code. RHEL 7/8 allow up to 768 logical CPUs

Thanks,

tglx


[RFC PATCH v4 12/21] watchdog/hardlockup/hpet: Adjust timer expiration on the number of monitored CPUs

2019-05-23 Thread Ricardo Neri
Each CPU should be monitored for hardlockups every watchdog_thresh seconds.
Since all the CPUs in the system are monitored by the same timer and the
timer interrupt is rotated among the monitored CPUs, the timer must expire
every watchdog_thresh/N seconds; where N is the number of monitored CPUs.
Use the new member of struct hld_data, ticks_per_cpu, to store the
aforementioned quantity.

The ticks-per-CPU quantity is updated every time the number of monitored
CPUs changes: when the watchdog is enabled or disabled for a specific CPU.
If the timer is used in periodic mode, it needs to be adjusted to reflect
the new expected expiration.

Cc: Ashok Raj 
Cc: Andi Kleen 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: Jacob Pan 
Cc: "Rafael J. Wysocki" 
Cc: Don Zickus 
Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Cc: Frederic Weisbecker 
Cc: Alexei Starovoitov 
Cc: Babu Moger 
Cc: Mathieu Desnoyers 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Andrew Morton 
Cc: Philippe Ombredanne 
Cc: Colin Ian King 
Cc: Byungchul Park 
Cc: "Paul E. McKenney" 
Cc: "Luis R. Rodriguez" 
Cc: Waiman Long 
Cc: Josh Poimboeuf 
Cc: Randy Dunlap 
Cc: Davidlohr Bueso 
Cc: Marc Zyngier 
Cc: Kai-Heng Feng 
Cc: Konrad Rzeszutek Wilk 
Cc: David Rientjes 
Cc: Stephane Eranian 
Cc: Suravee Suthikulpanit 
Cc: "Ravi V. Shankar" 
Cc: x...@kernel.org
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Ricardo Neri 
---
 arch/x86/include/asm/hpet.h |  1 +
 arch/x86/kernel/watchdog_hld_hpet.c | 46 +++--
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/hpet.h b/arch/x86/include/asm/hpet.h
index 31fc27508cf3..64acacce095d 100644
--- a/arch/x86/include/asm/hpet.h
+++ b/arch/x86/include/asm/hpet.h
@@ -114,6 +114,7 @@ struct hpet_hld_data {
boolhas_periodic;
u32 num;
u64 ticks_per_second;
+   u64 ticks_per_cpu;
u32 handling_cpu;
u32 enabled_cpus;
struct msi_msg  msi_msg;
diff --git a/arch/x86/kernel/watchdog_hld_hpet.c 
b/arch/x86/kernel/watchdog_hld_hpet.c
index dff4dadabd4c..74aeb0535d08 100644
--- a/arch/x86/kernel/watchdog_hld_hpet.c
+++ b/arch/x86/kernel/watchdog_hld_hpet.c
@@ -45,6 +45,13 @@ static void kick_timer(struct hpet_hld_data *hdata, bool 
force)
 * are able to update the comparator before the counter reaches such new
 * value.
 *
+* Each CPU must be monitored every watch_thresh seconds. Since the
+* timer targets one CPU at a time, it must expire every
+*
+*ticks_per_cpu = watch_thresh * ticks_per_second /enabled_cpus
+*
+* as computed in update_ticks_per_cpu().
+*
 * Let it wrap around if needed.
 */
 
@@ -52,10 +59,10 @@ static void kick_timer(struct hpet_hld_data *hdata, bool 
force)
return;
 
if (hdata->has_periodic)
-   period = watchdog_thresh * hdata->ticks_per_second;
+   period = watchdog_thresh * hdata->ticks_per_cpu;
 
count = hpet_readl(HPET_COUNTER);
-   new_compare = count + watchdog_thresh * hdata->ticks_per_second;
+   new_compare = count + watchdog_thresh * hdata->ticks_per_cpu;
hpet_set_comparator(hdata->num, (u32)new_compare, (u32)period);
 }
 
@@ -234,6 +241,27 @@ static int setup_hpet_irq(struct hpet_hld_data *hdata)
return ret;
 }
 
+/**
+ * update_ticks_per_cpu() - Update the number of HPET ticks per CPU
+ * @hdata: struct with the timer's the ticks-per-second and CPU mask
+ *
+ * From the overall ticks-per-second of the timer, compute the number of ticks
+ * after which the timer should expire to monitor each CPU every watch_thresh
+ * seconds. The ticks-per-cpu quantity is computed using the number of CPUs 
that
+ * the watchdog currently monitors.
+ */
+static void update_ticks_per_cpu(struct hpet_hld_data *hdata)
+{
+   u64 temp = hdata->ticks_per_second;
+
+   /* Only update if there are monitored CPUs. */
+   if (!hdata->enabled_cpus)
+   return;
+
+   do_div(temp, hdata->enabled_cpus);
+   hdata->ticks_per_cpu = temp;
+}
+
 /**
  * hardlockup_detector_hpet_enable() - Enable the hardlockup detector
  * @cpu:   CPU Index in which the watchdog will be enabled.
@@ -246,13 +274,23 @@ void hardlockup_detector_hpet_enable(unsigned int cpu)
 {
cpumask_set_cpu(cpu, to_cpumask(hld_data->cpu_monitored_mask));
 
-   if (!hld_data->enabled_cpus++) {
+   hld_data->enabled_cpus++;
+   update_ticks_per_cpu(hld_data);
+
+   if (hld_data->enabled_cpus == 1) {
hld_data->handling_cpu = cpu;
update_msi_destid(hld_data);
/* Force timer kick when detector is just enabled */
kick_timer(hld_data, true);
enable_timer(hld_data);
}
+
+   /*
+* When in periodic mode, we only kick the timer here. Hence,
+*