Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-22 Thread Colin Cross
On Mon, Jan 14, 2013 at 4:30 PM, Colin Cross  wrote:
> On Mon, Jan 14, 2013 at 4:25 PM, Andrew Morton
>  wrote:
>> On Mon, 14 Jan 2013 16:19:23 -0800
>> Colin Cross  wrote:
>>
>>> >> +static void watchdog_check_hardlockup_other_cpu(void)
>>> >> +{
>>> >> + unsigned int next_cpu;
>>> >> +
>>> >> + /*
>>> >> +  * Test for hardlockups every 3 samples.  The sample period is
>>> >> +  *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly 
>>> >> over
>>> >> +  *  watchdog_thresh (over by 20%).
>>> >> +  */
>>> >> + if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
>>> >> + return;
>>> >
>>> > The hardwired interval Seems Wrong.  watchdog_thresh is tunable at 
>>> > runtime.
>>> >
>>> > The comment could do with some fleshing out.  *why* do we want to test
>>> > at an interval "slightly over watchdog_thresh"?  What's going on here?
>>>
>>> I'll reword it.  We don't want to be slightly over watchdog_thresh,
>>> ideally we would be exactly at watchdog_thresh.  However, since this
>>> relies on the hrtimer interrupts that are scheduled at watchdog_thresh
>>> * 2 / 5, there is no multiple of hrtimer_interrupts that will result
>>> in watchdog_thresh.  watchdog_thresh * 2 / 5 * 3 (watchdog_thresh *
>>> 1.2) is the closest I can get to testing for a hardlockup once every
>>> watchdog_thresh seconds.
>>
>> It needs more than rewording, doesn't it?  What happens if watchdog_thresh is
>> altered at runtime?
>
> I'm not sure what you mean.  If watchdog_thresh changes, the next
> hrtimer interrupt on each cpu will move the following hrtimer
> interrupt forward by the new watchdog_thresh * 2 / 5.  There may be a
> single cycle of watchdog checks at an intermediate period, but nothing
> bad should happen.
>
> This code doesn't ever deal with watchdog_thresh directly, it is only
> counting hrtimer interrupts.  3 hrtimer interrupts is always a
> reasonable approximation of watchdog_thresh.

I think I see the race you were referring to.  When the hrtimer
interrupt fires on the first cpu after changing the watchdog thresh,
it will see the new value and starting firing the timer at the new
rate.  The other cpus may not see the new value for as long as the old
sample period.  If the new threshold is more than 3 times lower than
the old value, one cpu could easily get 3 hrtimer interrupts while
another cpu doesn't get any, triggering the lockup detector.

The exact same issue is already present in the existing NMI detector,
which is conceptually very similar to this one.  It has two
asynchronous timers, the NMI perf counter and the hrtimer.  If one
timer sees the new threshold before the other their relative rates
will be wrong and the lockup detector may fire.  Setting
watchdog_nmi_touch is not good enough, as one interrupt could fire
enough to trigger the hardlockup detector twice.  The only fix I can
think of is to set a timestamp far enough in the future to catch all
the ordering problems, and ignore lockups until sched_clock() is
higher than the timestamp.  I think it would need to be a delay of
twice the larger of the old and new watchdog_thresh values.

But its worse than that, the NMI perf counter is never updated when
watchdog_thresh changes, so raising watchdog_thresh more than 2.5x may
trigger the NMI detector.

I'll send a separate patch for the first problem, but I don't have
anything to test resetting the NMI counter on so I'll leave that issue
for someone else.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-22 Thread Colin Cross
On Mon, Jan 14, 2013 at 4:30 PM, Colin Cross ccr...@android.com wrote:
 On Mon, Jan 14, 2013 at 4:25 PM, Andrew Morton
 a...@linux-foundation.org wrote:
 On Mon, 14 Jan 2013 16:19:23 -0800
 Colin Cross ccr...@android.com wrote:

  +static void watchdog_check_hardlockup_other_cpu(void)
  +{
  + unsigned int next_cpu;
  +
  + /*
  +  * Test for hardlockups every 3 samples.  The sample period is
  +  *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly 
  over
  +  *  watchdog_thresh (over by 20%).
  +  */
  + if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
  + return;
 
  The hardwired interval Seems Wrong.  watchdog_thresh is tunable at 
  runtime.
 
  The comment could do with some fleshing out.  *why* do we want to test
  at an interval slightly over watchdog_thresh?  What's going on here?

 I'll reword it.  We don't want to be slightly over watchdog_thresh,
 ideally we would be exactly at watchdog_thresh.  However, since this
 relies on the hrtimer interrupts that are scheduled at watchdog_thresh
 * 2 / 5, there is no multiple of hrtimer_interrupts that will result
 in watchdog_thresh.  watchdog_thresh * 2 / 5 * 3 (watchdog_thresh *
 1.2) is the closest I can get to testing for a hardlockup once every
 watchdog_thresh seconds.

 It needs more than rewording, doesn't it?  What happens if watchdog_thresh is
 altered at runtime?

 I'm not sure what you mean.  If watchdog_thresh changes, the next
 hrtimer interrupt on each cpu will move the following hrtimer
 interrupt forward by the new watchdog_thresh * 2 / 5.  There may be a
 single cycle of watchdog checks at an intermediate period, but nothing
 bad should happen.

 This code doesn't ever deal with watchdog_thresh directly, it is only
 counting hrtimer interrupts.  3 hrtimer interrupts is always a
 reasonable approximation of watchdog_thresh.

I think I see the race you were referring to.  When the hrtimer
interrupt fires on the first cpu after changing the watchdog thresh,
it will see the new value and starting firing the timer at the new
rate.  The other cpus may not see the new value for as long as the old
sample period.  If the new threshold is more than 3 times lower than
the old value, one cpu could easily get 3 hrtimer interrupts while
another cpu doesn't get any, triggering the lockup detector.

The exact same issue is already present in the existing NMI detector,
which is conceptually very similar to this one.  It has two
asynchronous timers, the NMI perf counter and the hrtimer.  If one
timer sees the new threshold before the other their relative rates
will be wrong and the lockup detector may fire.  Setting
watchdog_nmi_touch is not good enough, as one interrupt could fire
enough to trigger the hardlockup detector twice.  The only fix I can
think of is to set a timestamp far enough in the future to catch all
the ordering problems, and ignore lockups until sched_clock() is
higher than the timestamp.  I think it would need to be a delay of
twice the larger of the old and new watchdog_thresh values.

But its worse than that, the NMI perf counter is never updated when
watchdog_thresh changes, so raising watchdog_thresh more than 2.5x may
trigger the NMI detector.

I'll send a separate patch for the first problem, but I don't have
anything to test resetting the NMI counter on so I'll leave that issue
for someone else.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-15 Thread Paul E. McKenney
On Tue, Jan 15, 2013 at 01:13:10AM +0100, Frederic Weisbecker wrote:
> 2013/1/11 Colin Cross :
> > Emulate NMIs on systems where they are not available by using timer
> > interrupts on other cpus.  Each cpu will use its softlockup hrtimer
> > to check that the next cpu is processing hrtimer interrupts by
> > verifying that a counter is increasing.
> >
> > This patch is useful on systems where the hardlockup detector is not
> > available due to a lack of NMIs, for example most ARM SoCs.
> > Without this patch any cpu stuck with interrupts disabled can
> > cause a hardware watchdog reset with no debugging information,
> > but with this patch the kernel can detect the lockup and panic,
> > which can result in useful debugging info.
> >
> > Signed-off-by: Colin Cross 
> 
> I believe this is pretty much what the RCU stall detector does
> already: checks for other CPUs being responsive. The only difference
> is on how it checks that. For RCU it's about checking for CPUs
> reporting quiescent states when requested to do so. In your case it's
> about ensuring the hrtimer interrupt is well handled.
> 
> One thing you can do is to enqueue an RCU callback (cal_rcu()) every
> minute so you can force other CPUs to report quiescent states
> periodically and thus check for lockups.

This would work in all but one case, and that is where RCU believes
that the non-responsive CPU is in dyntick-idle mode.  In that case,
RCU would not be expecting it to respond and would therefore ignore
any non-responsiveness.

> Now you'll face the same problem in the end: if you don't have NMIs,
> you won't have a very useful report.

Indeed, I must confess that I have thus far chickened out on solving
the general NMI problem.  The RCU stall detector does try to look at
stacks remotely in some cases, but this is often unreliable, and some
architectures seem to refuse to produce a remote stack trace.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-15 Thread Paul E. McKenney
On Tue, Jan 15, 2013 at 01:13:10AM +0100, Frederic Weisbecker wrote:
 2013/1/11 Colin Cross ccr...@android.com:
  Emulate NMIs on systems where they are not available by using timer
  interrupts on other cpus.  Each cpu will use its softlockup hrtimer
  to check that the next cpu is processing hrtimer interrupts by
  verifying that a counter is increasing.
 
  This patch is useful on systems where the hardlockup detector is not
  available due to a lack of NMIs, for example most ARM SoCs.
  Without this patch any cpu stuck with interrupts disabled can
  cause a hardware watchdog reset with no debugging information,
  but with this patch the kernel can detect the lockup and panic,
  which can result in useful debugging info.
 
  Signed-off-by: Colin Cross ccr...@android.com
 
 I believe this is pretty much what the RCU stall detector does
 already: checks for other CPUs being responsive. The only difference
 is on how it checks that. For RCU it's about checking for CPUs
 reporting quiescent states when requested to do so. In your case it's
 about ensuring the hrtimer interrupt is well handled.
 
 One thing you can do is to enqueue an RCU callback (cal_rcu()) every
 minute so you can force other CPUs to report quiescent states
 periodically and thus check for lockups.

This would work in all but one case, and that is where RCU believes
that the non-responsive CPU is in dyntick-idle mode.  In that case,
RCU would not be expecting it to respond and would therefore ignore
any non-responsiveness.

 Now you'll face the same problem in the end: if you don't have NMIs,
 you won't have a very useful report.

Indeed, I must confess that I have thus far chickened out on solving
the general NMI problem.  The RCU stall detector does try to look at
stacks remotely in some cases, but this is often unreliable, and some
architectures seem to refuse to produce a remote stack trace.

Thanx, Paul

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Colin Cross
On Mon, Jan 14, 2013 at 6:48 PM, Frederic Weisbecker  wrote:
> 2013/1/15 Colin Cross :
>> On Mon, Jan 14, 2013 at 4:25 PM, Frederic Weisbecker  
>> wrote:
>>> 2013/1/15 Colin Cross :
 On Mon, Jan 14, 2013 at 4:13 PM, Frederic Weisbecker  
 wrote:
> I believe this is pretty much what the RCU stall detector does
> already: checks for other CPUs being responsive. The only difference
> is on how it checks that. For RCU it's about checking for CPUs
> reporting quiescent states when requested to do so. In your case it's
> about ensuring the hrtimer interrupt is well handled.
>
> One thing you can do is to enqueue an RCU callback (cal_rcu()) every
> minute so you can force other CPUs to report quiescent states
> periodically and thus check for lockups.

 That's a good point, I'll take a look at using that.  A minute is too
 long, some SoCs have maximum HW watchdog periods of under 30 seconds,
 but a call_rcu every 10-20 seconds might be sufficient.
>>>
>>> Sure. And you can tune CONFIG_RCU_CPU_STALL_TIMEOUT accordingly.
>>
>> After considering this, I think the hrtimer watchdog is more useful.
>> RCU stalls are not usually panic events, and I wouldn't want to add a
>> panic on every RCU stall.  The lack of stack traces on the affected
>> cpu makes a panic important.  I'm planning to add an ARM DBGPCSR panic
>> handler, which will be able to dump the PC of a stuck cpu even if it
>> is not responding to interrupts.  kexec or kgdb on panic might also
>> allow some inspection of the stack on stuck cpu.
>>
>> Failing to process interrupts is a much more serious event than an RCU
>> stall, and being able to detect them separately may be very valuable
>> for debugging.
>
> RCU stalls can happen for different reasons: softlockup (failure to
> schedule another task), hardlockup (failure to process interrupts), or
> a bug in RCU itself. But if you have a hardlockup, it will report it.

It will report it, but it will report it in the same way that it
reports a less serious issue, and in this case with zero debugging
information since the affected cpu won't dump its backtrace.  Better
than nothing, but not as useful as a panic can be.

> Now why do you need a panic in any case? I don't know DBGPCSR, is this
> a breakpoint register? How do you plan to use it remotely from the CPU
> that detects the lockup?

Panics can trigger extra debugging tools, like my previous examples
kexec and kgdb.

DBGPCSR is the "DeBuG Program Counter Sampling Register".  It is a
memory mapped register available on many (all?) ARM Cortex cpus that
returns a recent PC value for the cpu.  I have used it along with this
patch, and it produces very useful information.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Frederic Weisbecker
2013/1/15 Colin Cross :
> On Mon, Jan 14, 2013 at 4:25 PM, Frederic Weisbecker  
> wrote:
>> 2013/1/15 Colin Cross :
>>> On Mon, Jan 14, 2013 at 4:13 PM, Frederic Weisbecker  
>>> wrote:
 I believe this is pretty much what the RCU stall detector does
 already: checks for other CPUs being responsive. The only difference
 is on how it checks that. For RCU it's about checking for CPUs
 reporting quiescent states when requested to do so. In your case it's
 about ensuring the hrtimer interrupt is well handled.

 One thing you can do is to enqueue an RCU callback (cal_rcu()) every
 minute so you can force other CPUs to report quiescent states
 periodically and thus check for lockups.
>>>
>>> That's a good point, I'll take a look at using that.  A minute is too
>>> long, some SoCs have maximum HW watchdog periods of under 30 seconds,
>>> but a call_rcu every 10-20 seconds might be sufficient.
>>
>> Sure. And you can tune CONFIG_RCU_CPU_STALL_TIMEOUT accordingly.
>
> After considering this, I think the hrtimer watchdog is more useful.
> RCU stalls are not usually panic events, and I wouldn't want to add a
> panic on every RCU stall.  The lack of stack traces on the affected
> cpu makes a panic important.  I'm planning to add an ARM DBGPCSR panic
> handler, which will be able to dump the PC of a stuck cpu even if it
> is not responding to interrupts.  kexec or kgdb on panic might also
> allow some inspection of the stack on stuck cpu.
>
> Failing to process interrupts is a much more serious event than an RCU
> stall, and being able to detect them separately may be very valuable
> for debugging.

RCU stalls can happen for different reasons: softlockup (failure to
schedule another task), hardlockup (failure to process interrupts), or
a bug in RCU itself. But if you have a hardlockup, it will report it.

Now why do you need a panic in any case? I don't know DBGPCSR, is this
a breakpoint register? How do you plan to use it remotely from the CPU
that detects the lockup?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Colin Cross
On Mon, Jan 14, 2013 at 4:25 PM, Frederic Weisbecker  wrote:
> 2013/1/15 Colin Cross :
>> On Mon, Jan 14, 2013 at 4:13 PM, Frederic Weisbecker  
>> wrote:
>>> I believe this is pretty much what the RCU stall detector does
>>> already: checks for other CPUs being responsive. The only difference
>>> is on how it checks that. For RCU it's about checking for CPUs
>>> reporting quiescent states when requested to do so. In your case it's
>>> about ensuring the hrtimer interrupt is well handled.
>>>
>>> One thing you can do is to enqueue an RCU callback (cal_rcu()) every
>>> minute so you can force other CPUs to report quiescent states
>>> periodically and thus check for lockups.
>>
>> That's a good point, I'll take a look at using that.  A minute is too
>> long, some SoCs have maximum HW watchdog periods of under 30 seconds,
>> but a call_rcu every 10-20 seconds might be sufficient.
>
> Sure. And you can tune CONFIG_RCU_CPU_STALL_TIMEOUT accordingly.

After considering this, I think the hrtimer watchdog is more useful.
RCU stalls are not usually panic events, and I wouldn't want to add a
panic on every RCU stall.  The lack of stack traces on the affected
cpu makes a panic important.  I'm planning to add an ARM DBGPCSR panic
handler, which will be able to dump the PC of a stuck cpu even if it
is not responding to interrupts.  kexec or kgdb on panic might also
allow some inspection of the stack on stuck cpu.

Failing to process interrupts is a much more serious event than an RCU
stall, and being able to detect them separately may be very valuable
for debugging.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Colin Cross
On Mon, Jan 14, 2013 at 4:19 PM, Colin Cross  wrote:
> On Mon, Jan 14, 2013 at 3:49 PM, Andrew Morton
>  wrote:
>> On Fri, 11 Jan 2013 13:51:48 -0800
>> Colin Cross  wrote:
>>
>>> Emulate NMIs on systems where they are not available by using timer
>>> interrupts on other cpus.  Each cpu will use its softlockup hrtimer
>>> to check that the next cpu is processing hrtimer interrupts by
>>> verifying that a counter is increasing.
>>
>> Seems sensible.
>>
>>> This patch is useful on systems where the hardlockup detector is not
>>> available due to a lack of NMIs, for example most ARM SoCs.
>>> Without this patch any cpu stuck with interrupts disabled can
>>> cause a hardware watchdog reset with no debugging information,
>>> but with this patch the kernel can detect the lockup and panic,
>>> which can result in useful debugging info.
>>
>> But we don't get the target cpu's stack, yes?  That's a pretty big loss.
>
> It's a huge loss, but its still useful.  For one, it can separate
> "linux locked up one cpu" bugs from "the whole cpu complex stopped
> responding" bugs, which are much more common than you would hope on
> ARM cpus.  Also, as a separate patch I'm hoping to add reading the
> DBGPCSR register of both cpus during panic, which will at least give
> you the PC of the cpu that is stuck.
>
>>>
>>> ...
>>>
>>> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
>>> +static unsigned int watchdog_next_cpu(unsigned int cpu)
>>> +{
>>> + cpumask_t cpus = watchdog_cpus;
>>
>> cpumask_t can be tremendously huge and putting one on the stack is
>> risky.  Can we use watchdog_cpus directly here?  Perhaps with a lock?
>> or take a copy into a static local, with a lock?
>
> Sure, I can use a lock around it.  I'm used to very small numbers of cpus.
>

On second thought, I'm just going to remove the local copy and read
the global directly.  watchdog_cpus is updated with atomic bitmask
operations, so there is no erroneous value that could be returned when
referencing the global directly that couldn't also occur with a
slightly different order of updates.  The local copy is also not
completely atomic, since the bitmask could span multiple words.  All
intermediate values during multiple sequential updates should already
be handled by setting watchdog_nmi_touch on the appropriate cpus
during watchdog_nmi_enable and watchdog_nmi_disable.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Colin Cross
On Mon, Jan 14, 2013 at 4:25 PM, Andrew Morton
 wrote:
> On Mon, 14 Jan 2013 16:19:23 -0800
> Colin Cross  wrote:
>
>> >> +static void watchdog_check_hardlockup_other_cpu(void)
>> >> +{
>> >> + unsigned int next_cpu;
>> >> +
>> >> + /*
>> >> +  * Test for hardlockups every 3 samples.  The sample period is
>> >> +  *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly 
>> >> over
>> >> +  *  watchdog_thresh (over by 20%).
>> >> +  */
>> >> + if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
>> >> + return;
>> >
>> > The hardwired interval Seems Wrong.  watchdog_thresh is tunable at runtime.
>> >
>> > The comment could do with some fleshing out.  *why* do we want to test
>> > at an interval "slightly over watchdog_thresh"?  What's going on here?
>>
>> I'll reword it.  We don't want to be slightly over watchdog_thresh,
>> ideally we would be exactly at watchdog_thresh.  However, since this
>> relies on the hrtimer interrupts that are scheduled at watchdog_thresh
>> * 2 / 5, there is no multiple of hrtimer_interrupts that will result
>> in watchdog_thresh.  watchdog_thresh * 2 / 5 * 3 (watchdog_thresh *
>> 1.2) is the closest I can get to testing for a hardlockup once every
>> watchdog_thresh seconds.
>
> It needs more than rewording, doesn't it?  What happens if watchdog_thresh is
> altered at runtime?

I'm not sure what you mean.  If watchdog_thresh changes, the next
hrtimer interrupt on each cpu will move the following hrtimer
interrupt forward by the new watchdog_thresh * 2 / 5.  There may be a
single cycle of watchdog checks at an intermediate period, but nothing
bad should happen.

This code doesn't ever deal with watchdog_thresh directly, it is only
counting hrtimer interrupts.  3 hrtimer interrupts is always a
reasonable approximation of watchdog_thresh.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Frederic Weisbecker
2013/1/15 Colin Cross :
> On Mon, Jan 14, 2013 at 4:13 PM, Frederic Weisbecker  
> wrote:
>> I believe this is pretty much what the RCU stall detector does
>> already: checks for other CPUs being responsive. The only difference
>> is on how it checks that. For RCU it's about checking for CPUs
>> reporting quiescent states when requested to do so. In your case it's
>> about ensuring the hrtimer interrupt is well handled.
>>
>> One thing you can do is to enqueue an RCU callback (cal_rcu()) every
>> minute so you can force other CPUs to report quiescent states
>> periodically and thus check for lockups.
>
> That's a good point, I'll take a look at using that.  A minute is too
> long, some SoCs have maximum HW watchdog periods of under 30 seconds,
> but a call_rcu every 10-20 seconds might be sufficient.

Sure. And you can tune CONFIG_RCU_CPU_STALL_TIMEOUT accordingly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Andrew Morton
On Mon, 14 Jan 2013 16:19:23 -0800
Colin Cross  wrote:

> >> +static void watchdog_check_hardlockup_other_cpu(void)
> >> +{
> >> + unsigned int next_cpu;
> >> +
> >> + /*
> >> +  * Test for hardlockups every 3 samples.  The sample period is
> >> +  *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly 
> >> over
> >> +  *  watchdog_thresh (over by 20%).
> >> +  */
> >> + if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
> >> + return;
> >
> > The hardwired interval Seems Wrong.  watchdog_thresh is tunable at runtime.
> >
> > The comment could do with some fleshing out.  *why* do we want to test
> > at an interval "slightly over watchdog_thresh"?  What's going on here?
> 
> I'll reword it.  We don't want to be slightly over watchdog_thresh,
> ideally we would be exactly at watchdog_thresh.  However, since this
> relies on the hrtimer interrupts that are scheduled at watchdog_thresh
> * 2 / 5, there is no multiple of hrtimer_interrupts that will result
> in watchdog_thresh.  watchdog_thresh * 2 / 5 * 3 (watchdog_thresh *
> 1.2) is the closest I can get to testing for a hardlockup once every
> watchdog_thresh seconds.

It needs more than rewording, doesn't it?  What happens if watchdog_thresh is
altered at runtime?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Colin Cross
On Mon, Jan 14, 2013 at 4:13 PM, Frederic Weisbecker  wrote:
> 2013/1/11 Colin Cross :
>> Emulate NMIs on systems where they are not available by using timer
>> interrupts on other cpus.  Each cpu will use its softlockup hrtimer
>> to check that the next cpu is processing hrtimer interrupts by
>> verifying that a counter is increasing.
>>
>> This patch is useful on systems where the hardlockup detector is not
>> available due to a lack of NMIs, for example most ARM SoCs.
>> Without this patch any cpu stuck with interrupts disabled can
>> cause a hardware watchdog reset with no debugging information,
>> but with this patch the kernel can detect the lockup and panic,
>> which can result in useful debugging info.
>>
>> Signed-off-by: Colin Cross 
>
> I believe this is pretty much what the RCU stall detector does
> already: checks for other CPUs being responsive. The only difference
> is on how it checks that. For RCU it's about checking for CPUs
> reporting quiescent states when requested to do so. In your case it's
> about ensuring the hrtimer interrupt is well handled.
>
> One thing you can do is to enqueue an RCU callback (cal_rcu()) every
> minute so you can force other CPUs to report quiescent states
> periodically and thus check for lockups.

That's a good point, I'll take a look at using that.  A minute is too
long, some SoCs have maximum HW watchdog periods of under 30 seconds,
but a call_rcu every 10-20 seconds might be sufficient.

> Now you'll face the same problem in the end: if you don't have NMIs,
> you won't have a very useful report.

Yes, but its still better than a silent reset.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Colin Cross
On Mon, Jan 14, 2013 at 3:49 PM, Andrew Morton
 wrote:
> On Fri, 11 Jan 2013 13:51:48 -0800
> Colin Cross  wrote:
>
>> Emulate NMIs on systems where they are not available by using timer
>> interrupts on other cpus.  Each cpu will use its softlockup hrtimer
>> to check that the next cpu is processing hrtimer interrupts by
>> verifying that a counter is increasing.
>
> Seems sensible.
>
>> This patch is useful on systems where the hardlockup detector is not
>> available due to a lack of NMIs, for example most ARM SoCs.
>> Without this patch any cpu stuck with interrupts disabled can
>> cause a hardware watchdog reset with no debugging information,
>> but with this patch the kernel can detect the lockup and panic,
>> which can result in useful debugging info.
>
> But we don't get the target cpu's stack, yes?  That's a pretty big loss.

It's a huge loss, but its still useful.  For one, it can separate
"linux locked up one cpu" bugs from "the whole cpu complex stopped
responding" bugs, which are much more common than you would hope on
ARM cpus.  Also, as a separate patch I'm hoping to add reading the
DBGPCSR register of both cpus during panic, which will at least give
you the PC of the cpu that is stuck.

>>
>> ...
>>
>> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
>> +static unsigned int watchdog_next_cpu(unsigned int cpu)
>> +{
>> + cpumask_t cpus = watchdog_cpus;
>
> cpumask_t can be tremendously huge and putting one on the stack is
> risky.  Can we use watchdog_cpus directly here?  Perhaps with a lock?
> or take a copy into a static local, with a lock?

Sure, I can use a lock around it.  I'm used to very small numbers of cpus.

>> + unsigned int next_cpu;
>> +
>> + next_cpu = cpumask_next(cpu, );
>> + if (next_cpu >= nr_cpu_ids)
>> + next_cpu = cpumask_first();
>> +
>> + if (next_cpu == cpu)
>> + return nr_cpu_ids;
>> +
>> + return next_cpu;
>> +}
>> +
>> +static int is_hardlockup_other_cpu(unsigned int cpu)
>> +{
>> + unsigned long hrint = per_cpu(hrtimer_interrupts, cpu);
>> +
>> + if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
>> + return 1;
>> +
>> + per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
>> + return 0;
>> +}
>
> This could return a bool type.

I based it on is_hardlockup, but I can convert it to a bool.

>> +static void watchdog_check_hardlockup_other_cpu(void)
>> +{
>> + unsigned int next_cpu;
>> +
>> + /*
>> +  * Test for hardlockups every 3 samples.  The sample period is
>> +  *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
>> +  *  watchdog_thresh (over by 20%).
>> +  */
>> + if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
>> + return;
>
> The hardwired interval Seems Wrong.  watchdog_thresh is tunable at runtime.
>
> The comment could do with some fleshing out.  *why* do we want to test
> at an interval "slightly over watchdog_thresh"?  What's going on here?

I'll reword it.  We don't want to be slightly over watchdog_thresh,
ideally we would be exactly at watchdog_thresh.  However, since this
relies on the hrtimer interrupts that are scheduled at watchdog_thresh
* 2 / 5, there is no multiple of hrtimer_interrupts that will result
in watchdog_thresh.  watchdog_thresh * 2 / 5 * 3 (watchdog_thresh *
1.2) is the closest I can get to testing for a hardlockup once every
watchdog_thresh seconds.

>> + /* check for a hardlockup on the next cpu */
>> + next_cpu = watchdog_next_cpu(smp_processor_id());
>> + if (next_cpu >= nr_cpu_ids)
>> + return;
>> +
>> + smp_rmb();
>
> Mystery barrier (always) needs an explanatory comment, please.

OK.  Will add:  smp_rmb matches smp_wmb in watchdog_nmi_enable and
watchdog_nmi_disable to ensure watchdog_nmi_touch is updated for a cpu
before any other cpu sees it is online.

>> + if (per_cpu(watchdog_nmi_touch, next_cpu) == true) {
>> + per_cpu(watchdog_nmi_touch, next_cpu) = false;
>> + return;
>> + }
>
> I wonder if a well-timed CPU plug/unplug could result in two CPUs
> simultaneously checking one other CPU's state.

It can, which is why I set watchdog_nmi_touch for a cpu when it comes
online or when the previous cpu goes offline.  That should prevent a
false positive by preventing the first cpu that checks from updating
hrtimer_interrupts_saved.

>> + if (is_hardlockup_other_cpu(next_cpu)) {
>> + /* only warn once */
>> + if (per_cpu(hard_watchdog_warn, next_cpu) == true)
>> + return;
>> +
>> + if (hardlockup_panic)
>> + panic("Watchdog detected hard LOCKUP on cpu %u", 
>> next_cpu);
>> + else
>> + WARN(1, "Watchdog detected hard LOCKUP on cpu %u", 
>> next_cpu);
>
> I suggest we use messages here which make it clear to people who read
> kernel output that this was triggered by hrtimers, not by NMI.  Most
> importantly because people will need 

Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Frederic Weisbecker
2013/1/11 Colin Cross :
> Emulate NMIs on systems where they are not available by using timer
> interrupts on other cpus.  Each cpu will use its softlockup hrtimer
> to check that the next cpu is processing hrtimer interrupts by
> verifying that a counter is increasing.
>
> This patch is useful on systems where the hardlockup detector is not
> available due to a lack of NMIs, for example most ARM SoCs.
> Without this patch any cpu stuck with interrupts disabled can
> cause a hardware watchdog reset with no debugging information,
> but with this patch the kernel can detect the lockup and panic,
> which can result in useful debugging info.
>
> Signed-off-by: Colin Cross 

I believe this is pretty much what the RCU stall detector does
already: checks for other CPUs being responsive. The only difference
is on how it checks that. For RCU it's about checking for CPUs
reporting quiescent states when requested to do so. In your case it's
about ensuring the hrtimer interrupt is well handled.

One thing you can do is to enqueue an RCU callback (cal_rcu()) every
minute so you can force other CPUs to report quiescent states
periodically and thus check for lockups.

Now you'll face the same problem in the end: if you don't have NMIs,
you won't have a very useful report.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Andrew Morton
On Fri, 11 Jan 2013 13:51:48 -0800
Colin Cross  wrote:

> Emulate NMIs on systems where they are not available by using timer
> interrupts on other cpus.  Each cpu will use its softlockup hrtimer
> to check that the next cpu is processing hrtimer interrupts by
> verifying that a counter is increasing.

Seems sensible.

> This patch is useful on systems where the hardlockup detector is not
> available due to a lack of NMIs, for example most ARM SoCs.
> Without this patch any cpu stuck with interrupts disabled can
> cause a hardware watchdog reset with no debugging information,
> but with this patch the kernel can detect the lockup and panic,
> which can result in useful debugging info.

But we don't get the target cpu's stack, yes?  That's a pretty big loss.

>
> ...
>
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
> +static unsigned int watchdog_next_cpu(unsigned int cpu)
> +{
> + cpumask_t cpus = watchdog_cpus;

cpumask_t can be tremendously huge and putting one on the stack is
risky.  Can we use watchdog_cpus directly here?  Perhaps with a lock? 
or take a copy into a static local, with a lock?

> + unsigned int next_cpu;
> +
> + next_cpu = cpumask_next(cpu, );
> + if (next_cpu >= nr_cpu_ids)
> + next_cpu = cpumask_first();
> +
> + if (next_cpu == cpu)
> + return nr_cpu_ids;
> +
> + return next_cpu;
> +}
> +
> +static int is_hardlockup_other_cpu(unsigned int cpu)
> +{
> + unsigned long hrint = per_cpu(hrtimer_interrupts, cpu);
> +
> + if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> + return 1;
> +
> + per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> + return 0;
> +}

This could return a bool type.

> +static void watchdog_check_hardlockup_other_cpu(void)
> +{
> + unsigned int next_cpu;
> +
> + /*
> +  * Test for hardlockups every 3 samples.  The sample period is
> +  *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
> +  *  watchdog_thresh (over by 20%).
> +  */
> + if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
> + return;

The hardwired interval Seems Wrong.  watchdog_thresh is tunable at runtime.

The comment could do with some fleshing out.  *why* do we want to test
at an interval "slightly over watchdog_thresh"?  What's going on here?

> + /* check for a hardlockup on the next cpu */
> + next_cpu = watchdog_next_cpu(smp_processor_id());
> + if (next_cpu >= nr_cpu_ids)
> + return;
> +
> + smp_rmb();

Mystery barrier (always) needs an explanatory comment, please.

> + if (per_cpu(watchdog_nmi_touch, next_cpu) == true) {
> + per_cpu(watchdog_nmi_touch, next_cpu) = false;
> + return;
> + }

I wonder if a well-timed CPU plug/unplug could result in two CPUs
simultaneously checking one other CPU's state.

> + if (is_hardlockup_other_cpu(next_cpu)) {
> + /* only warn once */
> + if (per_cpu(hard_watchdog_warn, next_cpu) == true)
> + return;
> +
> + if (hardlockup_panic)
> + panic("Watchdog detected hard LOCKUP on cpu %u", 
> next_cpu);
> + else
> + WARN(1, "Watchdog detected hard LOCKUP on cpu %u", 
> next_cpu);

I suggest we use messages here which make it clear to people who read
kernel output that this was triggered by hrtimers, not by NMI.  Most
importantly because people will need to know that the CPU which locked
up is *not this CPU* and that any backtrace from the reporting CPU is
misleading.

Also, there was never any sense in making the LOCKUP all-caps ;)

> + per_cpu(hard_watchdog_warn, next_cpu) = true;
> + } else {
> + per_cpu(hard_watchdog_warn, next_cpu) = false;
> + }
> +}
> +#else
> +static inline void watchdog_check_hardlockup_other_cpu(void) { return; }
> +#endif
> +
>
> ...
>
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -191,15 +191,27 @@ config LOCKUP_DETECTOR
> The overhead should be minimal.  A periodic hrtimer runs to
> generate interrupts and kick the watchdog task every 4 seconds.
> An NMI is generated every 10 seconds or so to check for hardlockups.
> +   If NMIs are not available on the platform, every 12 seconds the

hm.  Is the old "4 seconds" still true/accurate/complete?

> +   hrtimer interrupt on one cpu will be used to check for hardlockups
> +   on the next cpu.
>  
> The frequency of hrtimer and NMI events and the soft and hard lockup
> thresholds can be controlled through the sysctl watchdog_thresh.
>  
> -config HARDLOCKUP_DETECTOR
> +config HARDLOCKUP_DETECTOR_NMI
>   def_bool y
>   depends on LOCKUP_DETECTOR && !HAVE_NMI_WATCHDOG
>   depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI

Confused.  I'd have expected this to depend on HAVE_NMI_WATCHDOG,
rather than -no-that.  What does "HAVE_NMI_WATCHDOG" actually mean and
what's happening here?

>
> ...

Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Andrew Morton
On Fri, 11 Jan 2013 13:51:48 -0800
Colin Cross ccr...@android.com wrote:

 Emulate NMIs on systems where they are not available by using timer
 interrupts on other cpus.  Each cpu will use its softlockup hrtimer
 to check that the next cpu is processing hrtimer interrupts by
 verifying that a counter is increasing.

Seems sensible.

 This patch is useful on systems where the hardlockup detector is not
 available due to a lack of NMIs, for example most ARM SoCs.
 Without this patch any cpu stuck with interrupts disabled can
 cause a hardware watchdog reset with no debugging information,
 but with this patch the kernel can detect the lockup and panic,
 which can result in useful debugging info.

But we don't get the target cpu's stack, yes?  That's a pretty big loss.


 ...

 +#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
 +static unsigned int watchdog_next_cpu(unsigned int cpu)
 +{
 + cpumask_t cpus = watchdog_cpus;

cpumask_t can be tremendously huge and putting one on the stack is
risky.  Can we use watchdog_cpus directly here?  Perhaps with a lock? 
or take a copy into a static local, with a lock?

 + unsigned int next_cpu;
 +
 + next_cpu = cpumask_next(cpu, cpus);
 + if (next_cpu = nr_cpu_ids)
 + next_cpu = cpumask_first(cpus);
 +
 + if (next_cpu == cpu)
 + return nr_cpu_ids;
 +
 + return next_cpu;
 +}
 +
 +static int is_hardlockup_other_cpu(unsigned int cpu)
 +{
 + unsigned long hrint = per_cpu(hrtimer_interrupts, cpu);
 +
 + if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
 + return 1;
 +
 + per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
 + return 0;
 +}

This could return a bool type.

 +static void watchdog_check_hardlockup_other_cpu(void)
 +{
 + unsigned int next_cpu;
 +
 + /*
 +  * Test for hardlockups every 3 samples.  The sample period is
 +  *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
 +  *  watchdog_thresh (over by 20%).
 +  */
 + if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
 + return;

The hardwired interval Seems Wrong.  watchdog_thresh is tunable at runtime.

The comment could do with some fleshing out.  *why* do we want to test
at an interval slightly over watchdog_thresh?  What's going on here?

 + /* check for a hardlockup on the next cpu */
 + next_cpu = watchdog_next_cpu(smp_processor_id());
 + if (next_cpu = nr_cpu_ids)
 + return;
 +
 + smp_rmb();

Mystery barrier (always) needs an explanatory comment, please.

 + if (per_cpu(watchdog_nmi_touch, next_cpu) == true) {
 + per_cpu(watchdog_nmi_touch, next_cpu) = false;
 + return;
 + }

I wonder if a well-timed CPU plug/unplug could result in two CPUs
simultaneously checking one other CPU's state.

 + if (is_hardlockup_other_cpu(next_cpu)) {
 + /* only warn once */
 + if (per_cpu(hard_watchdog_warn, next_cpu) == true)
 + return;
 +
 + if (hardlockup_panic)
 + panic(Watchdog detected hard LOCKUP on cpu %u, 
 next_cpu);
 + else
 + WARN(1, Watchdog detected hard LOCKUP on cpu %u, 
 next_cpu);

I suggest we use messages here which make it clear to people who read
kernel output that this was triggered by hrtimers, not by NMI.  Most
importantly because people will need to know that the CPU which locked
up is *not this CPU* and that any backtrace from the reporting CPU is
misleading.

Also, there was never any sense in making the LOCKUP all-caps ;)

 + per_cpu(hard_watchdog_warn, next_cpu) = true;
 + } else {
 + per_cpu(hard_watchdog_warn, next_cpu) = false;
 + }
 +}
 +#else
 +static inline void watchdog_check_hardlockup_other_cpu(void) { return; }
 +#endif
 +

 ...

 --- a/lib/Kconfig.debug
 +++ b/lib/Kconfig.debug
 @@ -191,15 +191,27 @@ config LOCKUP_DETECTOR
 The overhead should be minimal.  A periodic hrtimer runs to
 generate interrupts and kick the watchdog task every 4 seconds.
 An NMI is generated every 10 seconds or so to check for hardlockups.
 +   If NMIs are not available on the platform, every 12 seconds the

hm.  Is the old 4 seconds still true/accurate/complete?

 +   hrtimer interrupt on one cpu will be used to check for hardlockups
 +   on the next cpu.
  
 The frequency of hrtimer and NMI events and the soft and hard lockup
 thresholds can be controlled through the sysctl watchdog_thresh.
  
 -config HARDLOCKUP_DETECTOR
 +config HARDLOCKUP_DETECTOR_NMI
   def_bool y
   depends on LOCKUP_DETECTOR  !HAVE_NMI_WATCHDOG
   depends on PERF_EVENTS  HAVE_PERF_EVENTS_NMI

Confused.  I'd have expected this to depend on HAVE_NMI_WATCHDOG,
rather than -no-that.  What does HAVE_NMI_WATCHDOG actually mean and
what's happening here?


 ...


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a 

Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Frederic Weisbecker
2013/1/11 Colin Cross ccr...@android.com:
 Emulate NMIs on systems where they are not available by using timer
 interrupts on other cpus.  Each cpu will use its softlockup hrtimer
 to check that the next cpu is processing hrtimer interrupts by
 verifying that a counter is increasing.

 This patch is useful on systems where the hardlockup detector is not
 available due to a lack of NMIs, for example most ARM SoCs.
 Without this patch any cpu stuck with interrupts disabled can
 cause a hardware watchdog reset with no debugging information,
 but with this patch the kernel can detect the lockup and panic,
 which can result in useful debugging info.

 Signed-off-by: Colin Cross ccr...@android.com

I believe this is pretty much what the RCU stall detector does
already: checks for other CPUs being responsive. The only difference
is on how it checks that. For RCU it's about checking for CPUs
reporting quiescent states when requested to do so. In your case it's
about ensuring the hrtimer interrupt is well handled.

One thing you can do is to enqueue an RCU callback (cal_rcu()) every
minute so you can force other CPUs to report quiescent states
periodically and thus check for lockups.

Now you'll face the same problem in the end: if you don't have NMIs,
you won't have a very useful report.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Colin Cross
On Mon, Jan 14, 2013 at 3:49 PM, Andrew Morton
a...@linux-foundation.org wrote:
 On Fri, 11 Jan 2013 13:51:48 -0800
 Colin Cross ccr...@android.com wrote:

 Emulate NMIs on systems where they are not available by using timer
 interrupts on other cpus.  Each cpu will use its softlockup hrtimer
 to check that the next cpu is processing hrtimer interrupts by
 verifying that a counter is increasing.

 Seems sensible.

 This patch is useful on systems where the hardlockup detector is not
 available due to a lack of NMIs, for example most ARM SoCs.
 Without this patch any cpu stuck with interrupts disabled can
 cause a hardware watchdog reset with no debugging information,
 but with this patch the kernel can detect the lockup and panic,
 which can result in useful debugging info.

 But we don't get the target cpu's stack, yes?  That's a pretty big loss.

It's a huge loss, but its still useful.  For one, it can separate
linux locked up one cpu bugs from the whole cpu complex stopped
responding bugs, which are much more common than you would hope on
ARM cpus.  Also, as a separate patch I'm hoping to add reading the
DBGPCSR register of both cpus during panic, which will at least give
you the PC of the cpu that is stuck.


 ...

 +#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
 +static unsigned int watchdog_next_cpu(unsigned int cpu)
 +{
 + cpumask_t cpus = watchdog_cpus;

 cpumask_t can be tremendously huge and putting one on the stack is
 risky.  Can we use watchdog_cpus directly here?  Perhaps with a lock?
 or take a copy into a static local, with a lock?

Sure, I can use a lock around it.  I'm used to very small numbers of cpus.

 + unsigned int next_cpu;
 +
 + next_cpu = cpumask_next(cpu, cpus);
 + if (next_cpu = nr_cpu_ids)
 + next_cpu = cpumask_first(cpus);
 +
 + if (next_cpu == cpu)
 + return nr_cpu_ids;
 +
 + return next_cpu;
 +}
 +
 +static int is_hardlockup_other_cpu(unsigned int cpu)
 +{
 + unsigned long hrint = per_cpu(hrtimer_interrupts, cpu);
 +
 + if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
 + return 1;
 +
 + per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
 + return 0;
 +}

 This could return a bool type.

I based it on is_hardlockup, but I can convert it to a bool.

 +static void watchdog_check_hardlockup_other_cpu(void)
 +{
 + unsigned int next_cpu;
 +
 + /*
 +  * Test for hardlockups every 3 samples.  The sample period is
 +  *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
 +  *  watchdog_thresh (over by 20%).
 +  */
 + if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
 + return;

 The hardwired interval Seems Wrong.  watchdog_thresh is tunable at runtime.

 The comment could do with some fleshing out.  *why* do we want to test
 at an interval slightly over watchdog_thresh?  What's going on here?

I'll reword it.  We don't want to be slightly over watchdog_thresh,
ideally we would be exactly at watchdog_thresh.  However, since this
relies on the hrtimer interrupts that are scheduled at watchdog_thresh
* 2 / 5, there is no multiple of hrtimer_interrupts that will result
in watchdog_thresh.  watchdog_thresh * 2 / 5 * 3 (watchdog_thresh *
1.2) is the closest I can get to testing for a hardlockup once every
watchdog_thresh seconds.

 + /* check for a hardlockup on the next cpu */
 + next_cpu = watchdog_next_cpu(smp_processor_id());
 + if (next_cpu = nr_cpu_ids)
 + return;
 +
 + smp_rmb();

 Mystery barrier (always) needs an explanatory comment, please.

OK.  Will add:  smp_rmb matches smp_wmb in watchdog_nmi_enable and
watchdog_nmi_disable to ensure watchdog_nmi_touch is updated for a cpu
before any other cpu sees it is online.

 + if (per_cpu(watchdog_nmi_touch, next_cpu) == true) {
 + per_cpu(watchdog_nmi_touch, next_cpu) = false;
 + return;
 + }

 I wonder if a well-timed CPU plug/unplug could result in two CPUs
 simultaneously checking one other CPU's state.

It can, which is why I set watchdog_nmi_touch for a cpu when it comes
online or when the previous cpu goes offline.  That should prevent a
false positive by preventing the first cpu that checks from updating
hrtimer_interrupts_saved.

 + if (is_hardlockup_other_cpu(next_cpu)) {
 + /* only warn once */
 + if (per_cpu(hard_watchdog_warn, next_cpu) == true)
 + return;
 +
 + if (hardlockup_panic)
 + panic(Watchdog detected hard LOCKUP on cpu %u, 
 next_cpu);
 + else
 + WARN(1, Watchdog detected hard LOCKUP on cpu %u, 
 next_cpu);

 I suggest we use messages here which make it clear to people who read
 kernel output that this was triggered by hrtimers, not by NMI.  Most
 importantly because people will need to know that the CPU which locked
 up is *not this CPU* and that any backtrace from the reporting CPU is
 misleading.

 Also, 

Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Colin Cross
On Mon, Jan 14, 2013 at 4:13 PM, Frederic Weisbecker fweis...@gmail.com wrote:
 2013/1/11 Colin Cross ccr...@android.com:
 Emulate NMIs on systems where they are not available by using timer
 interrupts on other cpus.  Each cpu will use its softlockup hrtimer
 to check that the next cpu is processing hrtimer interrupts by
 verifying that a counter is increasing.

 This patch is useful on systems where the hardlockup detector is not
 available due to a lack of NMIs, for example most ARM SoCs.
 Without this patch any cpu stuck with interrupts disabled can
 cause a hardware watchdog reset with no debugging information,
 but with this patch the kernel can detect the lockup and panic,
 which can result in useful debugging info.

 Signed-off-by: Colin Cross ccr...@android.com

 I believe this is pretty much what the RCU stall detector does
 already: checks for other CPUs being responsive. The only difference
 is on how it checks that. For RCU it's about checking for CPUs
 reporting quiescent states when requested to do so. In your case it's
 about ensuring the hrtimer interrupt is well handled.

 One thing you can do is to enqueue an RCU callback (cal_rcu()) every
 minute so you can force other CPUs to report quiescent states
 periodically and thus check for lockups.

That's a good point, I'll take a look at using that.  A minute is too
long, some SoCs have maximum HW watchdog periods of under 30 seconds,
but a call_rcu every 10-20 seconds might be sufficient.

 Now you'll face the same problem in the end: if you don't have NMIs,
 you won't have a very useful report.

Yes, but its still better than a silent reset.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Andrew Morton
On Mon, 14 Jan 2013 16:19:23 -0800
Colin Cross ccr...@android.com wrote:

  +static void watchdog_check_hardlockup_other_cpu(void)
  +{
  + unsigned int next_cpu;
  +
  + /*
  +  * Test for hardlockups every 3 samples.  The sample period is
  +  *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly 
  over
  +  *  watchdog_thresh (over by 20%).
  +  */
  + if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
  + return;
 
  The hardwired interval Seems Wrong.  watchdog_thresh is tunable at runtime.
 
  The comment could do with some fleshing out.  *why* do we want to test
  at an interval slightly over watchdog_thresh?  What's going on here?
 
 I'll reword it.  We don't want to be slightly over watchdog_thresh,
 ideally we would be exactly at watchdog_thresh.  However, since this
 relies on the hrtimer interrupts that are scheduled at watchdog_thresh
 * 2 / 5, there is no multiple of hrtimer_interrupts that will result
 in watchdog_thresh.  watchdog_thresh * 2 / 5 * 3 (watchdog_thresh *
 1.2) is the closest I can get to testing for a hardlockup once every
 watchdog_thresh seconds.

It needs more than rewording, doesn't it?  What happens if watchdog_thresh is
altered at runtime?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Frederic Weisbecker
2013/1/15 Colin Cross ccr...@android.com:
 On Mon, Jan 14, 2013 at 4:13 PM, Frederic Weisbecker fweis...@gmail.com 
 wrote:
 I believe this is pretty much what the RCU stall detector does
 already: checks for other CPUs being responsive. The only difference
 is on how it checks that. For RCU it's about checking for CPUs
 reporting quiescent states when requested to do so. In your case it's
 about ensuring the hrtimer interrupt is well handled.

 One thing you can do is to enqueue an RCU callback (cal_rcu()) every
 minute so you can force other CPUs to report quiescent states
 periodically and thus check for lockups.

 That's a good point, I'll take a look at using that.  A minute is too
 long, some SoCs have maximum HW watchdog periods of under 30 seconds,
 but a call_rcu every 10-20 seconds might be sufficient.

Sure. And you can tune CONFIG_RCU_CPU_STALL_TIMEOUT accordingly.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Colin Cross
On Mon, Jan 14, 2013 at 4:25 PM, Andrew Morton
a...@linux-foundation.org wrote:
 On Mon, 14 Jan 2013 16:19:23 -0800
 Colin Cross ccr...@android.com wrote:

  +static void watchdog_check_hardlockup_other_cpu(void)
  +{
  + unsigned int next_cpu;
  +
  + /*
  +  * Test for hardlockups every 3 samples.  The sample period is
  +  *  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly 
  over
  +  *  watchdog_thresh (over by 20%).
  +  */
  + if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
  + return;
 
  The hardwired interval Seems Wrong.  watchdog_thresh is tunable at runtime.
 
  The comment could do with some fleshing out.  *why* do we want to test
  at an interval slightly over watchdog_thresh?  What's going on here?

 I'll reword it.  We don't want to be slightly over watchdog_thresh,
 ideally we would be exactly at watchdog_thresh.  However, since this
 relies on the hrtimer interrupts that are scheduled at watchdog_thresh
 * 2 / 5, there is no multiple of hrtimer_interrupts that will result
 in watchdog_thresh.  watchdog_thresh * 2 / 5 * 3 (watchdog_thresh *
 1.2) is the closest I can get to testing for a hardlockup once every
 watchdog_thresh seconds.

 It needs more than rewording, doesn't it?  What happens if watchdog_thresh is
 altered at runtime?

I'm not sure what you mean.  If watchdog_thresh changes, the next
hrtimer interrupt on each cpu will move the following hrtimer
interrupt forward by the new watchdog_thresh * 2 / 5.  There may be a
single cycle of watchdog checks at an intermediate period, but nothing
bad should happen.

This code doesn't ever deal with watchdog_thresh directly, it is only
counting hrtimer interrupts.  3 hrtimer interrupts is always a
reasonable approximation of watchdog_thresh.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Colin Cross
On Mon, Jan 14, 2013 at 4:19 PM, Colin Cross ccr...@android.com wrote:
 On Mon, Jan 14, 2013 at 3:49 PM, Andrew Morton
 a...@linux-foundation.org wrote:
 On Fri, 11 Jan 2013 13:51:48 -0800
 Colin Cross ccr...@android.com wrote:

 Emulate NMIs on systems where they are not available by using timer
 interrupts on other cpus.  Each cpu will use its softlockup hrtimer
 to check that the next cpu is processing hrtimer interrupts by
 verifying that a counter is increasing.

 Seems sensible.

 This patch is useful on systems where the hardlockup detector is not
 available due to a lack of NMIs, for example most ARM SoCs.
 Without this patch any cpu stuck with interrupts disabled can
 cause a hardware watchdog reset with no debugging information,
 but with this patch the kernel can detect the lockup and panic,
 which can result in useful debugging info.

 But we don't get the target cpu's stack, yes?  That's a pretty big loss.

 It's a huge loss, but its still useful.  For one, it can separate
 linux locked up one cpu bugs from the whole cpu complex stopped
 responding bugs, which are much more common than you would hope on
 ARM cpus.  Also, as a separate patch I'm hoping to add reading the
 DBGPCSR register of both cpus during panic, which will at least give
 you the PC of the cpu that is stuck.


 ...

 +#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
 +static unsigned int watchdog_next_cpu(unsigned int cpu)
 +{
 + cpumask_t cpus = watchdog_cpus;

 cpumask_t can be tremendously huge and putting one on the stack is
 risky.  Can we use watchdog_cpus directly here?  Perhaps with a lock?
 or take a copy into a static local, with a lock?

 Sure, I can use a lock around it.  I'm used to very small numbers of cpus.


On second thought, I'm just going to remove the local copy and read
the global directly.  watchdog_cpus is updated with atomic bitmask
operations, so there is no erroneous value that could be returned when
referencing the global directly that couldn't also occur with a
slightly different order of updates.  The local copy is also not
completely atomic, since the bitmask could span multiple words.  All
intermediate values during multiple sequential updates should already
be handled by setting watchdog_nmi_touch on the appropriate cpus
during watchdog_nmi_enable and watchdog_nmi_disable.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Colin Cross
On Mon, Jan 14, 2013 at 4:25 PM, Frederic Weisbecker fweis...@gmail.com wrote:
 2013/1/15 Colin Cross ccr...@android.com:
 On Mon, Jan 14, 2013 at 4:13 PM, Frederic Weisbecker fweis...@gmail.com 
 wrote:
 I believe this is pretty much what the RCU stall detector does
 already: checks for other CPUs being responsive. The only difference
 is on how it checks that. For RCU it's about checking for CPUs
 reporting quiescent states when requested to do so. In your case it's
 about ensuring the hrtimer interrupt is well handled.

 One thing you can do is to enqueue an RCU callback (cal_rcu()) every
 minute so you can force other CPUs to report quiescent states
 periodically and thus check for lockups.

 That's a good point, I'll take a look at using that.  A minute is too
 long, some SoCs have maximum HW watchdog periods of under 30 seconds,
 but a call_rcu every 10-20 seconds might be sufficient.

 Sure. And you can tune CONFIG_RCU_CPU_STALL_TIMEOUT accordingly.

After considering this, I think the hrtimer watchdog is more useful.
RCU stalls are not usually panic events, and I wouldn't want to add a
panic on every RCU stall.  The lack of stack traces on the affected
cpu makes a panic important.  I'm planning to add an ARM DBGPCSR panic
handler, which will be able to dump the PC of a stuck cpu even if it
is not responding to interrupts.  kexec or kgdb on panic might also
allow some inspection of the stack on stuck cpu.

Failing to process interrupts is a much more serious event than an RCU
stall, and being able to detect them separately may be very valuable
for debugging.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Frederic Weisbecker
2013/1/15 Colin Cross ccr...@android.com:
 On Mon, Jan 14, 2013 at 4:25 PM, Frederic Weisbecker fweis...@gmail.com 
 wrote:
 2013/1/15 Colin Cross ccr...@android.com:
 On Mon, Jan 14, 2013 at 4:13 PM, Frederic Weisbecker fweis...@gmail.com 
 wrote:
 I believe this is pretty much what the RCU stall detector does
 already: checks for other CPUs being responsive. The only difference
 is on how it checks that. For RCU it's about checking for CPUs
 reporting quiescent states when requested to do so. In your case it's
 about ensuring the hrtimer interrupt is well handled.

 One thing you can do is to enqueue an RCU callback (cal_rcu()) every
 minute so you can force other CPUs to report quiescent states
 periodically and thus check for lockups.

 That's a good point, I'll take a look at using that.  A minute is too
 long, some SoCs have maximum HW watchdog periods of under 30 seconds,
 but a call_rcu every 10-20 seconds might be sufficient.

 Sure. And you can tune CONFIG_RCU_CPU_STALL_TIMEOUT accordingly.

 After considering this, I think the hrtimer watchdog is more useful.
 RCU stalls are not usually panic events, and I wouldn't want to add a
 panic on every RCU stall.  The lack of stack traces on the affected
 cpu makes a panic important.  I'm planning to add an ARM DBGPCSR panic
 handler, which will be able to dump the PC of a stuck cpu even if it
 is not responding to interrupts.  kexec or kgdb on panic might also
 allow some inspection of the stack on stuck cpu.

 Failing to process interrupts is a much more serious event than an RCU
 stall, and being able to detect them separately may be very valuable
 for debugging.

RCU stalls can happen for different reasons: softlockup (failure to
schedule another task), hardlockup (failure to process interrupts), or
a bug in RCU itself. But if you have a hardlockup, it will report it.

Now why do you need a panic in any case? I don't know DBGPCSR, is this
a breakpoint register? How do you plan to use it remotely from the CPU
that detects the lockup?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-14 Thread Colin Cross
On Mon, Jan 14, 2013 at 6:48 PM, Frederic Weisbecker fweis...@gmail.com wrote:
 2013/1/15 Colin Cross ccr...@android.com:
 On Mon, Jan 14, 2013 at 4:25 PM, Frederic Weisbecker fweis...@gmail.com 
 wrote:
 2013/1/15 Colin Cross ccr...@android.com:
 On Mon, Jan 14, 2013 at 4:13 PM, Frederic Weisbecker fweis...@gmail.com 
 wrote:
 I believe this is pretty much what the RCU stall detector does
 already: checks for other CPUs being responsive. The only difference
 is on how it checks that. For RCU it's about checking for CPUs
 reporting quiescent states when requested to do so. In your case it's
 about ensuring the hrtimer interrupt is well handled.

 One thing you can do is to enqueue an RCU callback (cal_rcu()) every
 minute so you can force other CPUs to report quiescent states
 periodically and thus check for lockups.

 That's a good point, I'll take a look at using that.  A minute is too
 long, some SoCs have maximum HW watchdog periods of under 30 seconds,
 but a call_rcu every 10-20 seconds might be sufficient.

 Sure. And you can tune CONFIG_RCU_CPU_STALL_TIMEOUT accordingly.

 After considering this, I think the hrtimer watchdog is more useful.
 RCU stalls are not usually panic events, and I wouldn't want to add a
 panic on every RCU stall.  The lack of stack traces on the affected
 cpu makes a panic important.  I'm planning to add an ARM DBGPCSR panic
 handler, which will be able to dump the PC of a stuck cpu even if it
 is not responding to interrupts.  kexec or kgdb on panic might also
 allow some inspection of the stack on stuck cpu.

 Failing to process interrupts is a much more serious event than an RCU
 stall, and being able to detect them separately may be very valuable
 for debugging.

 RCU stalls can happen for different reasons: softlockup (failure to
 schedule another task), hardlockup (failure to process interrupts), or
 a bug in RCU itself. But if you have a hardlockup, it will report it.

It will report it, but it will report it in the same way that it
reports a less serious issue, and in this case with zero debugging
information since the affected cpu won't dump its backtrace.  Better
than nothing, but not as useful as a panic can be.

 Now why do you need a panic in any case? I don't know DBGPCSR, is this
 a breakpoint register? How do you plan to use it remotely from the CPU
 that detects the lockup?

Panics can trigger extra debugging tools, like my previous examples
kexec and kgdb.

DBGPCSR is the DeBuG Program Counter Sampling Register.  It is a
memory mapped register available on many (all?) ARM Cortex cpus that
returns a recent PC value for the cpu.  I have used it along with this
patch, and it produces very useful information.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-11 Thread Colin Cross
Emulate NMIs on systems where they are not available by using timer
interrupts on other cpus.  Each cpu will use its softlockup hrtimer
to check that the next cpu is processing hrtimer interrupts by
verifying that a counter is increasing.

This patch is useful on systems where the hardlockup detector is not
available due to a lack of NMIs, for example most ARM SoCs.
Without this patch any cpu stuck with interrupts disabled can
cause a hardware watchdog reset with no debugging information,
but with this patch the kernel can detect the lockup and panic,
which can result in useful debugging info.

Signed-off-by: Colin Cross 
---
 include/linux/nmi.h |5 ++-
 kernel/watchdog.c   |  123 --
 lib/Kconfig.debug   |   14 +-
 3 files changed, 135 insertions(+), 7 deletions(-)

Changes since v1:
renamed variables to clarify when referring to the next cpu
factored out function to get next cpu
fixed int vs unsigned int mismatches
fixed race condition when offlining cpus

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index db50840..c8f8aa0 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -14,8 +14,11 @@
  * may be used to reset the timeout - for code which intentionally
  * disables interrupts for a long time. This call is stateless.
  */
-#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
+#if defined(CONFIG_HAVE_NMI_WATCHDOG) || 
defined(CONFIG_HARDLOCKUP_DETECTOR_NMI)
 #include 
+#endif
+
+#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void touch_nmi_watchdog(void);
 #else
 static inline void touch_nmi_watchdog(void)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 75a2ab3..61a0595 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -44,6 +44,11 @@
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
+#endif
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
+static cpumask_t __read_mostly watchdog_cpus;
+#endif
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
 #endif
 
@@ -179,7 +184,7 @@ void touch_softlockup_watchdog_sync(void)
__raw_get_cpu_var(watchdog_touch_ts) = 0;
 }
 
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI
 /* watchdog detector functions */
 static int is_hardlockup(void)
 {
@@ -193,6 +198,76 @@ static int is_hardlockup(void)
 }
 #endif
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
+static unsigned int watchdog_next_cpu(unsigned int cpu)
+{
+   cpumask_t cpus = watchdog_cpus;
+   unsigned int next_cpu;
+
+   next_cpu = cpumask_next(cpu, );
+   if (next_cpu >= nr_cpu_ids)
+   next_cpu = cpumask_first();
+
+   if (next_cpu == cpu)
+   return nr_cpu_ids;
+
+   return next_cpu;
+}
+
+static int is_hardlockup_other_cpu(unsigned int cpu)
+{
+   unsigned long hrint = per_cpu(hrtimer_interrupts, cpu);
+
+   if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
+   return 1;
+
+   per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
+   return 0;
+}
+
+static void watchdog_check_hardlockup_other_cpu(void)
+{
+   unsigned int next_cpu;
+
+   /*
+* Test for hardlockups every 3 samples.  The sample period is
+*  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
+*  watchdog_thresh (over by 20%).
+*/
+   if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
+   return;
+
+   /* check for a hardlockup on the next cpu */
+   next_cpu = watchdog_next_cpu(smp_processor_id());
+   if (next_cpu >= nr_cpu_ids)
+   return;
+
+   smp_rmb();
+
+   if (per_cpu(watchdog_nmi_touch, next_cpu) == true) {
+   per_cpu(watchdog_nmi_touch, next_cpu) = false;
+   return;
+   }
+
+   if (is_hardlockup_other_cpu(next_cpu)) {
+   /* only warn once */
+   if (per_cpu(hard_watchdog_warn, next_cpu) == true)
+   return;
+
+   if (hardlockup_panic)
+   panic("Watchdog detected hard LOCKUP on cpu %u", 
next_cpu);
+   else
+   WARN(1, "Watchdog detected hard LOCKUP on cpu %u", 
next_cpu);
+
+   per_cpu(hard_watchdog_warn, next_cpu) = true;
+   } else {
+   per_cpu(hard_watchdog_warn, next_cpu) = false;
+   }
+}
+#else
+static inline void watchdog_check_hardlockup_other_cpu(void) { return; }
+#endif
+
 static int is_softlockup(unsigned long touch_ts)
 {
unsigned long now = get_timestamp(smp_processor_id());
@@ -204,7 +279,7 @@ static int is_softlockup(unsigned long touch_ts)
return 0;
 }
 
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI
 
 static struct perf_event_attr wd_hw_attr = {
.type   = 

[PATCH v2] hardlockup: detect hard lockups without NMIs using secondary cpus

2013-01-11 Thread Colin Cross
Emulate NMIs on systems where they are not available by using timer
interrupts on other cpus.  Each cpu will use its softlockup hrtimer
to check that the next cpu is processing hrtimer interrupts by
verifying that a counter is increasing.

This patch is useful on systems where the hardlockup detector is not
available due to a lack of NMIs, for example most ARM SoCs.
Without this patch any cpu stuck with interrupts disabled can
cause a hardware watchdog reset with no debugging information,
but with this patch the kernel can detect the lockup and panic,
which can result in useful debugging info.

Signed-off-by: Colin Cross ccr...@android.com
---
 include/linux/nmi.h |5 ++-
 kernel/watchdog.c   |  123 --
 lib/Kconfig.debug   |   14 +-
 3 files changed, 135 insertions(+), 7 deletions(-)

Changes since v1:
renamed variables to clarify when referring to the next cpu
factored out function to get next cpu
fixed int vs unsigned int mismatches
fixed race condition when offlining cpus

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index db50840..c8f8aa0 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -14,8 +14,11 @@
  * may be used to reset the timeout - for code which intentionally
  * disables interrupts for a long time. This call is stateless.
  */
-#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
+#if defined(CONFIG_HAVE_NMI_WATCHDOG) || 
defined(CONFIG_HARDLOCKUP_DETECTOR_NMI)
 #include asm/nmi.h
+#endif
+
+#if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
 extern void touch_nmi_watchdog(void);
 #else
 static inline void touch_nmi_watchdog(void)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 75a2ab3..61a0595 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -44,6 +44,11 @@
 static DEFINE_PER_CPU(bool, hard_watchdog_warn);
 static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
+#endif
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
+static cpumask_t __read_mostly watchdog_cpus;
+#endif
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI
 static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
 #endif
 
@@ -179,7 +184,7 @@ void touch_softlockup_watchdog_sync(void)
__raw_get_cpu_var(watchdog_touch_ts) = 0;
 }
 
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI
 /* watchdog detector functions */
 static int is_hardlockup(void)
 {
@@ -193,6 +198,76 @@ static int is_hardlockup(void)
 }
 #endif
 
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU
+static unsigned int watchdog_next_cpu(unsigned int cpu)
+{
+   cpumask_t cpus = watchdog_cpus;
+   unsigned int next_cpu;
+
+   next_cpu = cpumask_next(cpu, cpus);
+   if (next_cpu = nr_cpu_ids)
+   next_cpu = cpumask_first(cpus);
+
+   if (next_cpu == cpu)
+   return nr_cpu_ids;
+
+   return next_cpu;
+}
+
+static int is_hardlockup_other_cpu(unsigned int cpu)
+{
+   unsigned long hrint = per_cpu(hrtimer_interrupts, cpu);
+
+   if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
+   return 1;
+
+   per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
+   return 0;
+}
+
+static void watchdog_check_hardlockup_other_cpu(void)
+{
+   unsigned int next_cpu;
+
+   /*
+* Test for hardlockups every 3 samples.  The sample period is
+*  watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over
+*  watchdog_thresh (over by 20%).
+*/
+   if (__this_cpu_read(hrtimer_interrupts) % 3 != 0)
+   return;
+
+   /* check for a hardlockup on the next cpu */
+   next_cpu = watchdog_next_cpu(smp_processor_id());
+   if (next_cpu = nr_cpu_ids)
+   return;
+
+   smp_rmb();
+
+   if (per_cpu(watchdog_nmi_touch, next_cpu) == true) {
+   per_cpu(watchdog_nmi_touch, next_cpu) = false;
+   return;
+   }
+
+   if (is_hardlockup_other_cpu(next_cpu)) {
+   /* only warn once */
+   if (per_cpu(hard_watchdog_warn, next_cpu) == true)
+   return;
+
+   if (hardlockup_panic)
+   panic(Watchdog detected hard LOCKUP on cpu %u, 
next_cpu);
+   else
+   WARN(1, Watchdog detected hard LOCKUP on cpu %u, 
next_cpu);
+
+   per_cpu(hard_watchdog_warn, next_cpu) = true;
+   } else {
+   per_cpu(hard_watchdog_warn, next_cpu) = false;
+   }
+}
+#else
+static inline void watchdog_check_hardlockup_other_cpu(void) { return; }
+#endif
+
 static int is_softlockup(unsigned long touch_ts)
 {
unsigned long now = get_timestamp(smp_processor_id());
@@ -204,7 +279,7 @@ static int is_softlockup(unsigned long touch_ts)
return 0;
 }
 
-#ifdef CONFIG_HARDLOCKUP_DETECTOR
+#ifdef CONFIG_HARDLOCKUP_DETECTOR_NMI
 
 static struct perf_event_attr wd_hw_attr =