On Sat, 2012-04-07 at 13:40 -0400, David Miller wrote:
From: Ben Hutchings b...@decadent.org.uk
Date: Sat, 07 Apr 2012 18:21:38 +0100
cheetah_xcall_deliver() does appear to be relevant to the problem and it
looks like it could loop indefinitely - though presumably only if a
processor is
From: Ben Hutchings b...@decadent.org.uk
Date: Sun, 08 Apr 2012 22:12:06 +0100
Will the recipient NACK if the cross-call interrupt is disabled, or do
the processors have a buffer/FIFO for such IRQs?
Recipient's NACK when their incoming cross-call queue is
full. A cpu hung with PSTATE_IE clear
Kieron Gillespie wrote:
With a completely clean install of Debian and every major version of
the Linux kernel I haven't run into this error again.
To be clear: are the kernels you are testing now upstream kernels or
pre-compiled Debian ones?
If I understood correctly before, Tibor experienced
They are compiled from the kernel source tars from snapshot.debian.org.
I was experiencing the problem with the upstream kernels from kernel.org
and the pre-compiled kernels from Debian. I am going to try to compile
an upstream kernel again and see if it happens. But before that I am
going to
Summary for the SPARC maintainers:
The NMI watchdog is firing on Sunfire 280R and Sun Blade2500 systems
with one or both processors in cheetah_xcall_deliver(). This has been
seen under 3.0, 3.2 and 3.3 and seems to be associated with disk I/O.
Full bug log is at: http://bugs.debian.org/648766
From: Ben Hutchings b...@decadent.org.uk
Date: Sat, 07 Apr 2012 18:21:38 +0100
cheetah_xcall_deliver() does appear to be relevant to the problem and it
looks like it could loop indefinitely - though presumably only if a
processor is behaving strangely?
I can only loop indefinitely if one of
Kieron Gillespie wrote:
I am right now testing one major kernel version at a time, and on
the 3.0.0-1 I got
Just to be clear, if each time you test the version halfway between
the newest known-good and oldest known-bad kernel then you only have
to test log(n) kernels instead of n. :)
That's
That's what I would have done except I ran into a problem.
With a completely clean install of Debian and every major version of the
Linux kernel I haven't run into this error again. Of coarse I was
running bare base of Debian with only the ssh server installed. This
error has yet to come up
I am right now testing one major kernel version at a time, and on the
3.0.0-1 I got an interesting error when I ran my brutality test on the
system.
sd 0:0:0:0: ABORT operation complete.
I wonder if this is some symptom of the problem as well. It canceled the
cat /dev/sda /dev/null process
So what have a learned after lots of test cases.
With SMP on or off, and nouveau driver loaded or not I have the same
unstable behavior and crashing on linux kernel 3.2.13, 3.2.14, 3.3.1.
All test involved with only one CPU plugged in, both CPUs plugged in,
with SMP on and off, with the
found 648766 linux-2.6/3.2.13-1
found 648766 linux-2.6/3.2.14-1
# 3.3.1
found 648766 linux-2.6/3.3-1~experimental.1
tags 648766 + upstream
quit
Kieron Gillespie wrote:
Now with that said I can't seem to crash the 2.6.32 kernel in the
same way with SMP off, haven't tried with SMP on yet, but I
severity 648766 important
quit
Kieron Gillespie wrote:
I've attached some images to this bug message, not sure if they will
appear
Received; thanks much.
What version of the kernel are you using? Full dmesg output from a
normal boot would be useful as well, so we can get to know your
Here are the dmesg output from the current system running Linux 3.2.13
with SMP enabled with tickless disabled.
[0.00] PROMLIB: Sun IEEE Boot Prom 'OBP 4.9.7 2004/05/27 07:31'
[0.00] PROMLIB: Root node compatible:
[0.00] Initializing cgroup subsys cpuset
[0.00]
Kieron Gillespie wrote:
Here are the dmesg output from the current system running Linux
3.2.13 with SMP enabled with tickless disabled.
Great.
Is this reproducible without nouveau? It might be possible to test
by putting
blacklist nouveau
in /etc/modprobe.d/kg-disable-nouveau.conf
I have also noticed, that if I am reading the trace correctly that in
both of my cases, and the original bug submitter's, and a bug posted on
old.nabble.com's case the crash always seems to happen when one CPU is
doing cheetah_xcall_deliver, and the other CPU is in the same
instruction in
15 matches
Mail list logo