Re: OOPS in 2.6.19.1, connected to nfs4 and autofs4

2007-06-23 Thread Malte Schröder
On Fri, 22 Jun 2007 17:42:36 -0700
Stuart Anderson <[EMAIL PROTECTED]> wrote:

(I CC: this to the lkml)

> Did you find a resolution to your posting regarding,
> 
> "OOPS in 2.6.19.1, connected to nfs4 and autofs4"
> 
> We just had a 2.6.20.11 kernel crash with a similar stack trace.

No, it still happens (2.6.21.5) once in a while (once in a week or so).
I don't know a way to clearly reproduce this.

> 
> 
> Thanks.
> 


-- 
---
Malte Schröder
[EMAIL PROTECTED]
ICQ# 68121508
---



signature.asc
Description: PGP signature


Re: OOPS in 2.6.19.1, connected to nfs4 and autofs4

2007-06-23 Thread Malte Schröder
On Fri, 22 Jun 2007 17:42:36 -0700
Stuart Anderson [EMAIL PROTECTED] wrote:

(I CC: this to the lkml)

 Did you find a resolution to your posting regarding,
 
 OOPS in 2.6.19.1, connected to nfs4 and autofs4
 
 We just had a 2.6.20.11 kernel crash with a similar stack trace.

No, it still happens (2.6.21.5) once in a while (once in a week or so).
I don't know a way to clearly reproduce this.

 
 
 Thanks.
 


-- 
---
Malte Schröder
[EMAIL PROTECTED]
ICQ# 68121508
---



signature.asc
Description: PGP signature


APIC Oops on 2.6.19.1

2007-02-03 Thread Antoine Martin

As Matt Mackall said:
"So yes, if a user reports a bug that's attributable to a single bit 
memory error that's otherwise unreproduced and unexplained, it's totally 
reasonable to chalk it up to cosmic rays until some sort of pattern of 
reports emerges."


So I guess that the only way to figure out if this is indeed a one-off 
cosmic ray is to post it somewhere public in case someone else sees it?
As there is no APIC mailing list, I am posting to LKML - sorry for the 
line noise, feel free to tell me to post elsewhere (/dev/null?).



Here it is: on a Dual-Opteron Tyan board which is rebooting every hour 
to run some unit tests, I caught this -only once- at boot (partially 
copied by hand):


Pid: 1, comm: swapper Not tainted 2.6.19.1 #1
RIP: 0010:[] [] 
setup_APIC_timer+0x1e/0xba

RSP: :81007ffa7ec0  EFLAGS: 0002
RAX: 


CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 000 CR3: 00201000 CR4: 6e0
Process swapper: (pid: 1, threadinfo f81007ffa6000, task 
8100023937a0)

Stack: 0be41ca0 ff806491bc  40b40009
 0008e000 009 8e000 80267297
  fff80546280 f80261a61 
Call Trace:
 [] setup_boot_APIC_clock+0x115/0x11d
 [] init+0x48/0x306
 [] child_rip+0xa/0x12


Code: 8b 04 25 f0 e0 5f ff 39 d0 73 f5 8b 04 25 f0 e0 5f ff 39 d0
 <0>Kernel panic - not syncing: Attempting to kill init!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


APIC Oops on 2.6.19.1

2007-02-03 Thread Antoine Martin

As Matt Mackall said:
So yes, if a user reports a bug that's attributable to a single bit 
memory error that's otherwise unreproduced and unexplained, it's totally 
reasonable to chalk it up to cosmic rays until some sort of pattern of 
reports emerges.


So I guess that the only way to figure out if this is indeed a one-off 
cosmic ray is to post it somewhere public in case someone else sees it?
As there is no APIC mailing list, I am posting to LKML - sorry for the 
line noise, feel free to tell me to post elsewhere (/dev/null?).



Here it is: on a Dual-Opteron Tyan board which is rebooting every hour 
to run some unit tests, I caught this -only once- at boot (partially 
copied by hand):


Pid: 1, comm: swapper Not tainted 2.6.19.1 #1
RIP: 0010:[80272dba] [80272dba] 
setup_APIC_timer+0x1e/0xba

RSP: :81007ffa7ec0  EFLAGS: 0002
RAX: 


CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 000 CR3: 00201000 CR4: 6e0
Process swapper: (pid: 1, threadinfo f81007ffa6000, task 
8100023937a0)

Stack: 0be41ca0 ff806491bc  40b40009
 0008e000 009 8e000 80267297
  fff80546280 f80261a61 
Call Trace:
 [806491bc] setup_boot_APIC_clock+0x115/0x11d
 [80267297] init+0x48/0x306
 [8025bed8] child_rip+0xa/0x12


Code: 8b 04 25 f0 e0 5f ff 39 d0 73 f5 8b 04 25 f0 e0 5f ff 39 d0
 0Kernel panic - not syncing: Attempting to kill init!

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: "kernel + gcc 4.1 = several problems" / "Oops in 2.6.19.1"

2007-01-03 Thread Adrian Bunk
On Wed, Jan 03, 2007 at 04:25:09PM +0100, Udo van den Heuvel wrote:
> Hello,
> 
> I just read about the subjects.
> I have a firewall which has some issues.
> First it was a VIA CL6000 (c3).
> Now it is a EK8000 (c3-2) with different power supply, RAM and board of
> course. Still I see strange things sometimes. Crashes, hangs, etc. Now
> and then. Not too often.
> 
> I have in .config:
> CONFIG_CC_OPTIMIZE_FOR_SIZE=y
> CONFIG_MVIAC3_2=y
> 
> Does this mean the issue applies to my own kernels?

It could be.
Or it could be something completely different.

If the same kernel compiled with gcc 3.4.6 works fine, you might run 
into one of the mysterious problems with gcc 4.1.

It could also be hardware problems (e.g. try running memtest86 for a 
longer time).

Does the machine hang completely, or is any useful information like e.g. 
an oops available?

> Udo

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


"kernel + gcc 4.1 = several problems" / "Oops in 2.6.19.1"

2007-01-03 Thread Udo van den Heuvel
Hello,

I just read about the subjects.
I have a firewall which has some issues.
First it was a VIA CL6000 (c3).
Now it is a EK8000 (c3-2) with different power supply, RAM and board of
course. Still I see strange things sometimes. Crashes, hangs, etc. Now
and then. Not too often.

I have in .config:
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_MVIAC3_2=y

Does this mean the issue applies to my own kernels?

Udo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


kernel + gcc 4.1 = several problems / Oops in 2.6.19.1

2007-01-03 Thread Udo van den Heuvel
Hello,

I just read about the subjects.
I have a firewall which has some issues.
First it was a VIA CL6000 (c3).
Now it is a EK8000 (c3-2) with different power supply, RAM and board of
course. Still I see strange things sometimes. Crashes, hangs, etc. Now
and then. Not too often.

I have in .config:
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_MVIAC3_2=y

Does this mean the issue applies to my own kernels?

Udo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel + gcc 4.1 = several problems / Oops in 2.6.19.1

2007-01-03 Thread Adrian Bunk
On Wed, Jan 03, 2007 at 04:25:09PM +0100, Udo van den Heuvel wrote:
 Hello,
 
 I just read about the subjects.
 I have a firewall which has some issues.
 First it was a VIA CL6000 (c3).
 Now it is a EK8000 (c3-2) with different power supply, RAM and board of
 course. Still I see strange things sometimes. Crashes, hangs, etc. Now
 and then. Not too often.
 
 I have in .config:
 CONFIG_CC_OPTIMIZE_FOR_SIZE=y
 CONFIG_MVIAC3_2=y
 
 Does this mean the issue applies to my own kernels?

It could be.
Or it could be something completely different.

If the same kernel compiled with gcc 3.4.6 works fine, you might run 
into one of the mysterious problems with gcc 4.1.

It could also be hardware problems (e.g. try running memtest86 for a 
longer time).

Does the machine hang completely, or is any useful information like e.g. 
an oops available?

 Udo

cu
Adrian

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2007-01-02 Thread Adrian Bunk
On Sun, Dec 31, 2006 at 04:48:43PM +, Alistair John Strachan wrote:
> On Sunday 31 December 2006 16:28, Adrian Bunk wrote:
> > On Sat, Dec 30, 2006 at 06:29:15PM +, Alistair John Strachan wrote:
> > > On Saturday 30 December 2006 17:21, Chuck Ebbert wrote:
> > > > In-Reply-To: <[EMAIL PROTECTED]>
> > > >
> > > > On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote:
> > > > > I've eliminated 2.6.19.1 as the culprit, and also tried toggling
> > > > > "optimize for size", various debug options. 2.6.19 compiled with GCC
> > > > > 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably,
> > > > > within approximately 12 hours.
> > > >
> > > > Which CPU are you compiling for?  You should try different options.
> > >
> > > I should, I haven't thought of that. Currently it's compiling for
> > > CONFIG_MVIAC3_2, but I could try i686 for example.
> > >
> > > > Can you post disassembly of pipe_poll() for both the one that crashes
> > > > and the one that doesn't?  Use 'objdump -D -r fs/pipe.o' so we get the
> > > > relocation info and post just the one function from each for now.
> > >
> > > Sure, no problem:
> > >
> > > http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/
> > >
> > > Both use identical configs, neither are optimised for size. The config is
> > > available from the same location.
> >
> > Can you try enabling as many debug options as possible?
> 
> Specifically what? I've already had:
> 
> CONFIG_DETECT_SOFTLOCKUP
> CONFIG_FRAME_POINTER
> CONFIG_UNWIND_INFO
> 
> Enabled. CONFIG_4KSTACKS is disabled. Are there any debugging features 
> actually pertinent to this bug?

No, that's only an "enable as much as possible and hope one helps" shot 
in the dark.

> Cheers,
> Alistair.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2007-01-02 Thread Adrian Bunk
On Sun, Dec 31, 2006 at 04:48:43PM +, Alistair John Strachan wrote:
 On Sunday 31 December 2006 16:28, Adrian Bunk wrote:
  On Sat, Dec 30, 2006 at 06:29:15PM +, Alistair John Strachan wrote:
   On Saturday 30 December 2006 17:21, Chuck Ebbert wrote:
In-Reply-To: [EMAIL PROTECTED]
   
On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote:
 I've eliminated 2.6.19.1 as the culprit, and also tried toggling
 optimize for size, various debug options. 2.6.19 compiled with GCC
 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably,
 within approximately 12 hours.
   
Which CPU are you compiling for?  You should try different options.
  
   I should, I haven't thought of that. Currently it's compiling for
   CONFIG_MVIAC3_2, but I could try i686 for example.
  
Can you post disassembly of pipe_poll() for both the one that crashes
and the one that doesn't?  Use 'objdump -D -r fs/pipe.o' so we get the
relocation info and post just the one function from each for now.
  
   Sure, no problem:
  
   http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/
  
   Both use identical configs, neither are optimised for size. The config is
   available from the same location.
 
  Can you try enabling as many debug options as possible?
 
 Specifically what? I've already had:
 
 CONFIG_DETECT_SOFTLOCKUP
 CONFIG_FRAME_POINTER
 CONFIG_UNWIND_INFO
 
 Enabled. CONFIG_4KSTACKS is disabled. Are there any debugging features 
 actually pertinent to this bug?

No, that's only an enable as much as possible and hope one helps shot 
in the dark.

 Cheers,
 Alistair.

cu
Adrian

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-31 Thread Alistair John Strachan
On Sunday 31 December 2006 21:43, Chuck Ebbert wrote:
> In-Reply-To: <[EMAIL PROTECTED]>
>
> On Sat, 30 Dec 2006 18:29:15 +, Alistair John Strachan wrote:
> > > Can you post disassembly of pipe_poll() for both the one that crashes
> > > and the one that doesn't?  Use 'objdump -D -r fs/pipe.o' so we get the
> > > relocation info and post just the one function from each for now.
> >
> > Sure, no problem:
> >
> > http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/
> >
> > Both use identical configs, neither are optimised for size. The config is
> > available from the same location.
>
> Those were compiled without frame pointers.  Can you post them compiled
> with frame pointers so they match your original bug report? And confirm
> that pipe_poll() is still at 0xc0156ec0 in vmlinux?

c0156ec0 :

I used the config I original sent you to rebuild it again. This time I've put 
up the whole vmlinux for both kernels, the config is replaced, the 
decompilation is re-done, I've confirmed the offset in the GCC 4.1.1 kernel 
is identical. Sorry for the confusion.

The reason I changed the configs was to experiment with enabling and disabling 
debugging (and other such) options that might have shaken out compiler bugs.

However none of these kernels have ever crashed gracefully again, most of them 
hang the machine (no nmi watchdog though) so I've not been able to look at 
the oops. It's the same root cause, however, as GCC 3.4.6 kernels do not 
crash.

http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/

Happy new year, btw.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-31 Thread Chuck Ebbert
In-Reply-To: <[EMAIL PROTECTED]>

On Sat, 30 Dec 2006 18:29:15 +, Alistair John Strachan wrote:

> > Can you post disassembly of pipe_poll() for both the one that crashes
> > and the one that doesn't?  Use 'objdump -D -r fs/pipe.o' so we get the
> > relocation info and post just the one function from each for now.
> 
> Sure, no problem:
> 
> http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/
> 
> Both use identical configs, neither are optimised for size. The config is 
> available from the same location.

Those were compiled without frame pointers.  Can you post them compiled
with frame pointers so they match your original bug report? And confirm
that pipe_poll() is still at 0xc0156ec0 in vmlinux?

-- 
MBTI: IXTP

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-31 Thread Alistair John Strachan
On Sunday 31 December 2006 16:27, Adrian Bunk wrote:
> On Sat, Dec 30, 2006 at 04:59:35PM +, Alistair John Strachan wrote:
> > On Thursday 28 December 2006 04:14, Alistair John Strachan wrote:
> > > On Thursday 28 December 2006 04:02, Alistair John Strachan wrote:
> > > > On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote:
> > > > [snip]
> > > >
> > > > > > Here's a current decompilation of vmlinux/pipe_poll() from the
> > > > > > running kernel, the addresses have changed slightly. There's no
> > > > > > xchg there either:
> > > > >
> > > > > Could you reproduce the bug by the new kernel, so we could get the
> > > > > exact address and instruction of the bug?
> > > >
> > > > It crashed again, but this time with no output (machine locked
> > > > solid). To be honest, the disassembly looks right (it's like Chuck
> > > > said, it's jumping back half way through an instruction):
> > > >
> > > > c0156f5f:   3b 87 68 01 00 00   cmp0x168(%edi),%eax
> > > >
> > > > So c0156f60 is 87 68 01 00 00..
> > > >
> > > > This is with the GCC recompile, so it's not a distro problem. It
> > > > could still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's
> > > > serious. 2.6.19 with GCC 3.4.3 is 100% stable.
> > >
> > > Looks like a similar crash here:
> > >
> > > http://ubuntuforums.org/showthread.php?p=1803389
> >
> > I've eliminated 2.6.19.1 as the culprit, and also tried toggling
> > "optimize for size", various debug options. 2.6.19 compiled with GCC
> > 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably,
> > within approximately 12 hours.
> >
> > The machine passes 6 hours of Prime95 (a CPU stability tester), four
> > memtest86 passes, and there are no heat problems.
> >
> > I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config
> > using this compiler (but the same binutils), and will report back if it
> > crashes. My bet is that it won't, however.
>
> There are occasional reports of problems with kernels compiled with
> gcc 4.1 that vanish when using older versions of gcc.
>
> AFAIK, until now noone has ever debugged whether that's a gcc bug,
> gcc exposing a kernel bug or gcc exposing a hardware bug.
>
> Comparing your report and [1], it seems that if these are the same
> problem, it's not a hardware bug but a gcc or kernel bug.

This bug specifically indicates some kind of miscompilation in a driver, 
causing boot time hangs. My problem is quite different, and more subtle. The 
crash happens in the same place every time, which does suggest determinism 
(even with various options toggled on and off, and a 300K smaller kernel 
image), but it takes 8-12 hours to manifest and only happens with GCC 4.1.1.

Unless we can start narrowing this down, it would be a mammoth task to seek 
out either the kernel or GCC change that first exhibited this bug, due to the 
non-immediate reproducibility of the bug, the lack of clues, and this 
machine's role as a stable, high-availability server.

(If I had another Epia M1 or another computer I could reproduce the bug 
on, I would be only too happy to boot as many kernels as required to fix it; 
however I cannot spare this machine).

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-31 Thread Alistair John Strachan
On Sunday 31 December 2006 16:28, Adrian Bunk wrote:
> On Sat, Dec 30, 2006 at 06:29:15PM +, Alistair John Strachan wrote:
> > On Saturday 30 December 2006 17:21, Chuck Ebbert wrote:
> > > In-Reply-To: <[EMAIL PROTECTED]>
> > >
> > > On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote:
> > > > I've eliminated 2.6.19.1 as the culprit, and also tried toggling
> > > > "optimize for size", various debug options. 2.6.19 compiled with GCC
> > > > 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably,
> > > > within approximately 12 hours.
> > >
> > > Which CPU are you compiling for?  You should try different options.
> >
> > I should, I haven't thought of that. Currently it's compiling for
> > CONFIG_MVIAC3_2, but I could try i686 for example.
> >
> > > Can you post disassembly of pipe_poll() for both the one that crashes
> > > and the one that doesn't?  Use 'objdump -D -r fs/pipe.o' so we get the
> > > relocation info and post just the one function from each for now.
> >
> > Sure, no problem:
> >
> > http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/
> >
> > Both use identical configs, neither are optimised for size. The config is
> > available from the same location.
>
> Can you try enabling as many debug options as possible?

Specifically what? I've already had:

CONFIG_DETECT_SOFTLOCKUP
CONFIG_FRAME_POINTER
CONFIG_UNWIND_INFO

Enabled. CONFIG_4KSTACKS is disabled. Are there any debugging features 
actually pertinent to this bug?

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-31 Thread Adrian Bunk
On Sat, Dec 30, 2006 at 06:29:15PM +, Alistair John Strachan wrote:
> On Saturday 30 December 2006 17:21, Chuck Ebbert wrote:
> > In-Reply-To: <[EMAIL PROTECTED]>
> >
> > On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote:
> > > I've eliminated 2.6.19.1 as the culprit, and also tried toggling
> > > "optimize for size", various debug options. 2.6.19 compiled with GCC
> > > 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably,
> > > within approximately 12 hours.
> >
> > Which CPU are you compiling for?  You should try different options.
> 
> I should, I haven't thought of that. Currently it's compiling for 
> CONFIG_MVIAC3_2, but I could try i686 for example.
> 
> > Can you post disassembly of pipe_poll() for both the one that crashes
> > and the one that doesn't?  Use 'objdump -D -r fs/pipe.o' so we get the
> > relocation info and post just the one function from each for now.
> 
> Sure, no problem:
> 
> http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/
> 
> Both use identical configs, neither are optimised for size. The config is 
> available from the same location.

Can you try enabling as many debug options as possible?

> Cheers,
> Alistair.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-31 Thread Adrian Bunk
On Sat, Dec 30, 2006 at 04:59:35PM +, Alistair John Strachan wrote:
> On Thursday 28 December 2006 04:14, Alistair John Strachan wrote:
> > On Thursday 28 December 2006 04:02, Alistair John Strachan wrote:
> > > On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote:
> > > [snip]
> > >
> > > > > Here's a current decompilation of vmlinux/pipe_poll() from the
> > > > > running kernel, the addresses have changed slightly. There's no xchg
> > > > > there either:
> > > >
> > > > Could you reproduce the bug by the new kernel, so we could get the
> > > > exact address and instruction of the bug?
> > >
> > > It crashed again, but this time with no output (machine locked solid). To
> > > be honest, the disassembly looks right (it's like Chuck said, it's
> > > jumping back half way through an instruction):
> > >
> > > c0156f5f:   3b 87 68 01 00 00   cmp0x168(%edi),%eax
> > >
> > > So c0156f60 is 87 68 01 00 00..
> > >
> > > This is with the GCC recompile, so it's not a distro problem. It could
> > > still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious.
> > > 2.6.19 with GCC 3.4.3 is 100% stable.
> >
> > Looks like a similar crash here:
> >
> > http://ubuntuforums.org/showthread.php?p=1803389
> 
> I've eliminated 2.6.19.1 as the culprit, and also tried toggling "optimize 
> for 
> size", various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via 
> Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 
> hours.
> 
> The machine passes 6 hours of Prime95 (a CPU stability tester), four 
> memtest86 
> passes, and there are no heat problems.
> 
> I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config using 
> this compiler (but the same binutils), and will report back if it crashes. My 
> bet is that it won't, however.

There are occasional reports of problems with kernels compiled with 
gcc 4.1 that vanish when using older versions of gcc.

AFAIK, until now noone has ever debugged whether that's a gcc bug, 
gcc exposing a kernel bug or gcc exposing a hardware bug.

Comparing your report and [1], it seems that if these are the same 
problem, it's not a hardware bug but a gcc or kernel bug.

> Cheers,
> Alistair.

cu
Adrian

[1] http://bugzilla.kernel.org/show_bug.cgi?id=7176

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-31 Thread Alistair John Strachan
On Saturday 30 December 2006 16:59, Alistair John Strachan wrote:
> I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config
> using this compiler (but the same binutils), and will report back if it
> crashes. My bet is that it won't, however.

Still fine after >24 hours. Linux 2.6.19, GCC 3.4.6, Binutils 2.17.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-31 Thread Alistair John Strachan
On Saturday 30 December 2006 16:59, Alistair John Strachan wrote:
 I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config
 using this compiler (but the same binutils), and will report back if it
 crashes. My bet is that it won't, however.

Still fine after 24 hours. Linux 2.6.19, GCC 3.4.6, Binutils 2.17.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-31 Thread Adrian Bunk
On Sat, Dec 30, 2006 at 04:59:35PM +, Alistair John Strachan wrote:
 On Thursday 28 December 2006 04:14, Alistair John Strachan wrote:
  On Thursday 28 December 2006 04:02, Alistair John Strachan wrote:
   On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote:
   [snip]
  
 Here's a current decompilation of vmlinux/pipe_poll() from the
 running kernel, the addresses have changed slightly. There's no xchg
 there either:
   
Could you reproduce the bug by the new kernel, so we could get the
exact address and instruction of the bug?
  
   It crashed again, but this time with no output (machine locked solid). To
   be honest, the disassembly looks right (it's like Chuck said, it's
   jumping back half way through an instruction):
  
   c0156f5f:   3b 87 68 01 00 00   cmp0x168(%edi),%eax
  
   So c0156f60 is 87 68 01 00 00..
  
   This is with the GCC recompile, so it's not a distro problem. It could
   still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious.
   2.6.19 with GCC 3.4.3 is 100% stable.
 
  Looks like a similar crash here:
 
  http://ubuntuforums.org/showthread.php?p=1803389
 
 I've eliminated 2.6.19.1 as the culprit, and also tried toggling optimize 
 for 
 size, various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via 
 Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 
 hours.
 
 The machine passes 6 hours of Prime95 (a CPU stability tester), four 
 memtest86 
 passes, and there are no heat problems.
 
 I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config using 
 this compiler (but the same binutils), and will report back if it crashes. My 
 bet is that it won't, however.

There are occasional reports of problems with kernels compiled with 
gcc 4.1 that vanish when using older versions of gcc.

AFAIK, until now noone has ever debugged whether that's a gcc bug, 
gcc exposing a kernel bug or gcc exposing a hardware bug.

Comparing your report and [1], it seems that if these are the same 
problem, it's not a hardware bug but a gcc or kernel bug.

 Cheers,
 Alistair.

cu
Adrian

[1] http://bugzilla.kernel.org/show_bug.cgi?id=7176

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-31 Thread Adrian Bunk
On Sat, Dec 30, 2006 at 06:29:15PM +, Alistair John Strachan wrote:
 On Saturday 30 December 2006 17:21, Chuck Ebbert wrote:
  In-Reply-To: [EMAIL PROTECTED]
 
  On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote:
   I've eliminated 2.6.19.1 as the culprit, and also tried toggling
   optimize for size, various debug options. 2.6.19 compiled with GCC
   4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably,
   within approximately 12 hours.
 
  Which CPU are you compiling for?  You should try different options.
 
 I should, I haven't thought of that. Currently it's compiling for 
 CONFIG_MVIAC3_2, but I could try i686 for example.
 
  Can you post disassembly of pipe_poll() for both the one that crashes
  and the one that doesn't?  Use 'objdump -D -r fs/pipe.o' so we get the
  relocation info and post just the one function from each for now.
 
 Sure, no problem:
 
 http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/
 
 Both use identical configs, neither are optimised for size. The config is 
 available from the same location.

Can you try enabling as many debug options as possible?

 Cheers,
 Alistair.

cu
Adrian

-- 

   Is there not promise of rain? Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   Only a promise, Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-31 Thread Alistair John Strachan
On Sunday 31 December 2006 16:28, Adrian Bunk wrote:
 On Sat, Dec 30, 2006 at 06:29:15PM +, Alistair John Strachan wrote:
  On Saturday 30 December 2006 17:21, Chuck Ebbert wrote:
   In-Reply-To: [EMAIL PROTECTED]
  
   On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote:
I've eliminated 2.6.19.1 as the culprit, and also tried toggling
optimize for size, various debug options. 2.6.19 compiled with GCC
4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably,
within approximately 12 hours.
  
   Which CPU are you compiling for?  You should try different options.
 
  I should, I haven't thought of that. Currently it's compiling for
  CONFIG_MVIAC3_2, but I could try i686 for example.
 
   Can you post disassembly of pipe_poll() for both the one that crashes
   and the one that doesn't?  Use 'objdump -D -r fs/pipe.o' so we get the
   relocation info and post just the one function from each for now.
 
  Sure, no problem:
 
  http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/
 
  Both use identical configs, neither are optimised for size. The config is
  available from the same location.

 Can you try enabling as many debug options as possible?

Specifically what? I've already had:

CONFIG_DETECT_SOFTLOCKUP
CONFIG_FRAME_POINTER
CONFIG_UNWIND_INFO

Enabled. CONFIG_4KSTACKS is disabled. Are there any debugging features 
actually pertinent to this bug?

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-31 Thread Alistair John Strachan
On Sunday 31 December 2006 16:27, Adrian Bunk wrote:
 On Sat, Dec 30, 2006 at 04:59:35PM +, Alistair John Strachan wrote:
  On Thursday 28 December 2006 04:14, Alistair John Strachan wrote:
   On Thursday 28 December 2006 04:02, Alistair John Strachan wrote:
On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote:
[snip]
   
  Here's a current decompilation of vmlinux/pipe_poll() from the
  running kernel, the addresses have changed slightly. There's no
  xchg there either:

 Could you reproduce the bug by the new kernel, so we could get the
 exact address and instruction of the bug?
   
It crashed again, but this time with no output (machine locked
solid). To be honest, the disassembly looks right (it's like Chuck
said, it's jumping back half way through an instruction):
   
c0156f5f:   3b 87 68 01 00 00   cmp0x168(%edi),%eax
   
So c0156f60 is 87 68 01 00 00..
   
This is with the GCC recompile, so it's not a distro problem. It
could still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's
serious. 2.6.19 with GCC 3.4.3 is 100% stable.
  
   Looks like a similar crash here:
  
   http://ubuntuforums.org/showthread.php?p=1803389
 
  I've eliminated 2.6.19.1 as the culprit, and also tried toggling
  optimize for size, various debug options. 2.6.19 compiled with GCC
  4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably,
  within approximately 12 hours.
 
  The machine passes 6 hours of Prime95 (a CPU stability tester), four
  memtest86 passes, and there are no heat problems.
 
  I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config
  using this compiler (but the same binutils), and will report back if it
  crashes. My bet is that it won't, however.

 There are occasional reports of problems with kernels compiled with
 gcc 4.1 that vanish when using older versions of gcc.

 AFAIK, until now noone has ever debugged whether that's a gcc bug,
 gcc exposing a kernel bug or gcc exposing a hardware bug.

 Comparing your report and [1], it seems that if these are the same
 problem, it's not a hardware bug but a gcc or kernel bug.

This bug specifically indicates some kind of miscompilation in a driver, 
causing boot time hangs. My problem is quite different, and more subtle. The 
crash happens in the same place every time, which does suggest determinism 
(even with various options toggled on and off, and a 300K smaller kernel 
image), but it takes 8-12 hours to manifest and only happens with GCC 4.1.1.

Unless we can start narrowing this down, it would be a mammoth task to seek 
out either the kernel or GCC change that first exhibited this bug, due to the 
non-immediate reproducibility of the bug, the lack of clues, and this 
machine's role as a stable, high-availability server.

(If I had another Epia M1 or another computer I could reproduce the bug 
on, I would be only too happy to boot as many kernels as required to fix it; 
however I cannot spare this machine).

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-31 Thread Chuck Ebbert
In-Reply-To: [EMAIL PROTECTED]

On Sat, 30 Dec 2006 18:29:15 +, Alistair John Strachan wrote:

  Can you post disassembly of pipe_poll() for both the one that crashes
  and the one that doesn't?  Use 'objdump -D -r fs/pipe.o' so we get the
  relocation info and post just the one function from each for now.
 
 Sure, no problem:
 
 http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/
 
 Both use identical configs, neither are optimised for size. The config is 
 available from the same location.

Those were compiled without frame pointers.  Can you post them compiled
with frame pointers so they match your original bug report? And confirm
that pipe_poll() is still at 0xc0156ec0 in vmlinux?

-- 
MBTI: IXTP

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-31 Thread Alistair John Strachan
On Sunday 31 December 2006 21:43, Chuck Ebbert wrote:
 In-Reply-To: [EMAIL PROTECTED]

 On Sat, 30 Dec 2006 18:29:15 +, Alistair John Strachan wrote:
   Can you post disassembly of pipe_poll() for both the one that crashes
   and the one that doesn't?  Use 'objdump -D -r fs/pipe.o' so we get the
   relocation info and post just the one function from each for now.
 
  Sure, no problem:
 
  http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/
 
  Both use identical configs, neither are optimised for size. The config is
  available from the same location.

 Those were compiled without frame pointers.  Can you post them compiled
 with frame pointers so they match your original bug report? And confirm
 that pipe_poll() is still at 0xc0156ec0 in vmlinux?

c0156ec0 pipe_poll:

I used the config I original sent you to rebuild it again. This time I've put 
up the whole vmlinux for both kernels, the config is replaced, the 
decompilation is re-done, I've confirmed the offset in the GCC 4.1.1 kernel 
is identical. Sorry for the confusion.

The reason I changed the configs was to experiment with enabling and disabling 
debugging (and other such) options that might have shaken out compiler bugs.

However none of these kernels have ever crashed gracefully again, most of them 
hang the machine (no nmi watchdog though) so I've not been able to look at 
the oops. It's the same root cause, however, as GCC 3.4.6 kernels do not 
crash.

http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/

Happy new year, btw.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-30 Thread Alistair John Strachan
On Saturday 30 December 2006 18:06, James Courtier-Dutton wrote:
> > I'd guess you have some kind of hardware problem.  It could also be
> > a kernel problem where the saved address was corrupted during an
> > interrupt, but that's not likely.
>
> This looks rather strange.
[snip]

> 2) Kernel modules compiled with different gcc than rest of kernel.

Previously there was only one GCC version (4.1.1 totally replaced 3.4.3, and 
is the system wide GCC), now I have installed 3.4.6 into /opt/gcc-3.4.6 and 
it is only PATH'ed explicitly by me when I wish to compile a kernel using it:

export PATH=/opt/gcc-3.4.6/bin:$PATH
cp /boot/config-2.6.19-test .config
make oldconfig
make

> 3) kernel headers do not match the kernel being used.

The tree is a pristine 2.6.19.

> One way to start tracking this down would be to run it with the fewest
> amount of kernel modules loaded as one can, but still reproduce the
> problem.

Crippling the machine, though. Impractical for something that isn't 
immediately reproducible.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-30 Thread Alistair John Strachan
On Saturday 30 December 2006 17:21, Chuck Ebbert wrote:
> In-Reply-To: <[EMAIL PROTECTED]>
>
> On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote:
> > I've eliminated 2.6.19.1 as the culprit, and also tried toggling
> > "optimize for size", various debug options. 2.6.19 compiled with GCC
> > 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably,
> > within approximately 12 hours.
>
> Which CPU are you compiling for?  You should try different options.

I should, I haven't thought of that. Currently it's compiling for 
CONFIG_MVIAC3_2, but I could try i686 for example.

> Can you post disassembly of pipe_poll() for both the one that crashes
> and the one that doesn't?  Use 'objdump -D -r fs/pipe.o' so we get the
> relocation info and post just the one function from each for now.

Sure, no problem:

http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/

Both use identical configs, neither are optimised for size. The config is 
available from the same location.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-30 Thread James Courtier-Dutton

Chuck Ebbert wrote:

In-Reply-To: <[EMAIL PROTECTED]>

On Wed, 20 Dec 2006 14:21:03 +, Alistair John Strachan wrote:


Any ideas?

BUG: unable to handle kernel NULL pointer dereference at virtual address 
0009


83 ca 10  or $0x10,%edx
3b.byte 0x3b
87 68 01  xchg   %ebp,0x1(%eax)   <=
00 00 add%al,(%eax)

Somehow it is trying to execute code in the middle of an instruction.
That almost never works, even when the resulting fragment is a legal
opcode. :)

The real instruction is:

3b 87 68 01 00 00 00cmp0x168(%edi),%eax

I'd guess you have some kind of hardware problem.  It could also be
a kernel problem where the saved address was corrupted during an
interrupt, but that's not likely.


This looks rather strange.
The times I have seen this sort of problem is:
1) when one bit of the kernel is corrupting another part of it.
2) Kernel modules compiled with different gcc than rest of kernel.
3) kernel headers do not match the kernel being used.

One way to start tracking this down would be to run it with the fewest 
amount of kernel modules loaded as one can, but still reproduce the problem.


James
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-30 Thread Chuck Ebbert
In-Reply-To: <[EMAIL PROTECTED]>

On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote:

> I've eliminated 2.6.19.1 as the culprit, and also tried toggling "optimize 
> for 
> size", various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via 
> Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 
> hours.

Which CPU are you compiling for?  You should try different options.

Can you post disassembly of pipe_poll() for both the one that crashes
and the one that doesn't?  Use 'objdump -D -r fs/pipe.o' so we get the
relocation info and post just the one function from each for now.

-- 
MBTI: IXTP

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-30 Thread Alistair John Strachan
On Thursday 28 December 2006 04:14, Alistair John Strachan wrote:
> On Thursday 28 December 2006 04:02, Alistair John Strachan wrote:
> > On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote:
> > [snip]
> >
> > > > Here's a current decompilation of vmlinux/pipe_poll() from the
> > > > running kernel, the addresses have changed slightly. There's no xchg
> > > > there either:
> > >
> > > Could you reproduce the bug by the new kernel, so we could get the
> > > exact address and instruction of the bug?
> >
> > It crashed again, but this time with no output (machine locked solid). To
> > be honest, the disassembly looks right (it's like Chuck said, it's
> > jumping back half way through an instruction):
> >
> > c0156f5f:   3b 87 68 01 00 00   cmp0x168(%edi),%eax
> >
> > So c0156f60 is 87 68 01 00 00..
> >
> > This is with the GCC recompile, so it's not a distro problem. It could
> > still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious.
> > 2.6.19 with GCC 3.4.3 is 100% stable.
>
> Looks like a similar crash here:
>
> http://ubuntuforums.org/showthread.php?p=1803389

I've eliminated 2.6.19.1 as the culprit, and also tried toggling "optimize for 
size", various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via 
Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 
hours.

The machine passes 6 hours of Prime95 (a CPU stability tester), four memtest86 
passes, and there are no heat problems.

I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config using 
this compiler (but the same binutils), and will report back if it crashes. My 
bet is that it won't, however.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-30 Thread Alistair John Strachan
On Thursday 28 December 2006 04:14, Alistair John Strachan wrote:
 On Thursday 28 December 2006 04:02, Alistair John Strachan wrote:
  On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote:
  [snip]
 
Here's a current decompilation of vmlinux/pipe_poll() from the
running kernel, the addresses have changed slightly. There's no xchg
there either:
  
   Could you reproduce the bug by the new kernel, so we could get the
   exact address and instruction of the bug?
 
  It crashed again, but this time with no output (machine locked solid). To
  be honest, the disassembly looks right (it's like Chuck said, it's
  jumping back half way through an instruction):
 
  c0156f5f:   3b 87 68 01 00 00   cmp0x168(%edi),%eax
 
  So c0156f60 is 87 68 01 00 00..
 
  This is with the GCC recompile, so it's not a distro problem. It could
  still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious.
  2.6.19 with GCC 3.4.3 is 100% stable.

 Looks like a similar crash here:

 http://ubuntuforums.org/showthread.php?p=1803389

I've eliminated 2.6.19.1 as the culprit, and also tried toggling optimize for 
size, various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via 
Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 
hours.

The machine passes 6 hours of Prime95 (a CPU stability tester), four memtest86 
passes, and there are no heat problems.

I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config using 
this compiler (but the same binutils), and will report back if it crashes. My 
bet is that it won't, however.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-30 Thread Chuck Ebbert
In-Reply-To: [EMAIL PROTECTED]

On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote:

 I've eliminated 2.6.19.1 as the culprit, and also tried toggling optimize 
 for 
 size, various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via 
 Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 
 hours.

Which CPU are you compiling for?  You should try different options.

Can you post disassembly of pipe_poll() for both the one that crashes
and the one that doesn't?  Use 'objdump -D -r fs/pipe.o' so we get the
relocation info and post just the one function from each for now.

-- 
MBTI: IXTP

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-30 Thread James Courtier-Dutton

Chuck Ebbert wrote:

In-Reply-To: [EMAIL PROTECTED]

On Wed, 20 Dec 2006 14:21:03 +, Alistair John Strachan wrote:


Any ideas?

BUG: unable to handle kernel NULL pointer dereference at virtual address 
0009


83 ca 10  or $0x10,%edx
3b.byte 0x3b
87 68 01  xchg   %ebp,0x1(%eax)   =
00 00 add%al,(%eax)

Somehow it is trying to execute code in the middle of an instruction.
That almost never works, even when the resulting fragment is a legal
opcode. :)

The real instruction is:

3b 87 68 01 00 00 00cmp0x168(%edi),%eax

I'd guess you have some kind of hardware problem.  It could also be
a kernel problem where the saved address was corrupted during an
interrupt, but that's not likely.


This looks rather strange.
The times I have seen this sort of problem is:
1) when one bit of the kernel is corrupting another part of it.
2) Kernel modules compiled with different gcc than rest of kernel.
3) kernel headers do not match the kernel being used.

One way to start tracking this down would be to run it with the fewest 
amount of kernel modules loaded as one can, but still reproduce the problem.


James
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-30 Thread Alistair John Strachan
On Saturday 30 December 2006 17:21, Chuck Ebbert wrote:
 In-Reply-To: [EMAIL PROTECTED]

 On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote:
  I've eliminated 2.6.19.1 as the culprit, and also tried toggling
  optimize for size, various debug options. 2.6.19 compiled with GCC
  4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably,
  within approximately 12 hours.

 Which CPU are you compiling for?  You should try different options.

I should, I haven't thought of that. Currently it's compiling for 
CONFIG_MVIAC3_2, but I could try i686 for example.

 Can you post disassembly of pipe_poll() for both the one that crashes
 and the one that doesn't?  Use 'objdump -D -r fs/pipe.o' so we get the
 relocation info and post just the one function from each for now.

Sure, no problem:

http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/

Both use identical configs, neither are optimised for size. The config is 
available from the same location.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-30 Thread Alistair John Strachan
On Saturday 30 December 2006 18:06, James Courtier-Dutton wrote:
  I'd guess you have some kind of hardware problem.  It could also be
  a kernel problem where the saved address was corrupted during an
  interrupt, but that's not likely.

 This looks rather strange.
[snip]

 2) Kernel modules compiled with different gcc than rest of kernel.

Previously there was only one GCC version (4.1.1 totally replaced 3.4.3, and 
is the system wide GCC), now I have installed 3.4.6 into /opt/gcc-3.4.6 and 
it is only PATH'ed explicitly by me when I wish to compile a kernel using it:

export PATH=/opt/gcc-3.4.6/bin:$PATH
cp /boot/config-2.6.19-test .config
make oldconfig
make

 3) kernel headers do not match the kernel being used.

The tree is a pristine 2.6.19.

 One way to start tracking this down would be to run it with the fewest
 amount of kernel modules loaded as one can, but still reproduce the
 problem.

Crippling the machine, though. Impractical for something that isn't 
immediately reproducible.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-27 Thread Alistair John Strachan
On Thursday 28 December 2006 04:02, Alistair John Strachan wrote:
> On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote:
> [snip]
>
> > > Here's a current decompilation of vmlinux/pipe_poll() from the running
> > > kernel, the addresses have changed slightly. There's no xchg there
> > > either:
> >
> > Could you reproduce the bug by the new kernel, so we could get the exact
> > address and instruction of the bug?
>
> It crashed again, but this time with no output (machine locked solid). To
> be honest, the disassembly looks right (it's like Chuck said, it's jumping
> back half way through an instruction):
>
> c0156f5f:   3b 87 68 01 00 00   cmp0x168(%edi),%eax
>
> So c0156f60 is 87 68 01 00 00..
>
> This is with the GCC recompile, so it's not a distro problem. It could
> still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious.
> 2.6.19 with GCC 3.4.3 is 100% stable.

Looks like a similar crash here:

http://ubuntuforums.org/showthread.php?p=1803389

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-27 Thread Alistair John Strachan
On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote:
[snip]
> > Here's a current decompilation of vmlinux/pipe_poll() from the running
> > kernel, the addresses have changed slightly. There's no xchg there
> > either:
>
> Could you reproduce the bug by the new kernel, so we could get the exact
> address and instruction of the bug?

It crashed again, but this time with no output (machine locked solid). To be 
honest, the disassembly looks right (it's like Chuck said, it's jumping back 
half way through an instruction):

c0156f5f:   3b 87 68 01 00 00   cmp0x168(%edi),%eax

So c0156f60 is 87 68 01 00 00..

This is with the GCC recompile, so it's not a distro problem. It could still 
either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious. 2.6.19 with 
GCC 3.4.3 is 100% stable.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-27 Thread Zhang, Yanmin
On Wed, 2006-12-27 at 12:35 +, Alistair John Strachan wrote:
> On Wednesday 27 December 2006 02:07, Zhang, Yanmin wrote:
> [snip]
> > >  Call Trace:
> > >  [] do_sys_poll+0x253/0x480
> > >  [] sys_poll+0x33/0x50
> > >  [] syscall_call+0x7/0xb
> > >  [] 0xb7f26402
> > >  ===
> > > Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4
> > > 89 c8 8b 75
> > > f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 45
> > > ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00
> >
> > Above codes look weird. Could you disassemble kernel image and post
> > the part around address 0xc0156f60?
> >
> > "87 68 01 00 00" is instruction xchg, but if I disassemble from the
> > begining, I couldn't see instruct xchg.
> >
> > > EIP: [] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c
> 
> Unfortunately, after suspecting the toolchain, I did a manual rebuild of 
> binutils, gcc and glibc from the official sites, and then rebuilt 2.6.19.1. 
> This might upset the decompile below, versus the original report.
> 
> Assuming it's NOT a bug in my distro's toolchain (because I am now running 
> the 
> GNU stuff), it'll crash again, so this is still useful.
> 
> Here's a current decompilation of vmlinux/pipe_poll() from the running 
> kernel, 
> the addresses have changed slightly. There's no xchg there either:
Could you reproduce the bug by the new kernel, so we could get the exact address
and instruction of the bug?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-27 Thread Alistair John Strachan
On Wednesday 27 December 2006 02:07, Zhang, Yanmin wrote:
[snip]
> >  Call Trace:
> >  [] do_sys_poll+0x253/0x480
> >  [] sys_poll+0x33/0x50
> >  [] syscall_call+0x7/0xb
> >  [] 0xb7f26402
> >  ===
> > Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4
> > 89 c8 8b 75
> > f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 45
> > ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00
>
> Above codes look weird. Could you disassemble kernel image and post
> the part around address 0xc0156f60?
>
> "87 68 01 00 00" is instruction xchg, but if I disassemble from the
> begining, I couldn't see instruct xchg.
>
> > EIP: [] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c

Unfortunately, after suspecting the toolchain, I did a manual rebuild of 
binutils, gcc and glibc from the official sites, and then rebuilt 2.6.19.1. 
This might upset the decompile below, versus the original report.

Assuming it's NOT a bug in my distro's toolchain (because I am now running the 
GNU stuff), it'll crash again, so this is still useful.

Here's a current decompilation of vmlinux/pipe_poll() from the running kernel, 
the addresses have changed slightly. There's no xchg there either:

c0156ec0 :
c0156ec0:   55  push   %ebp
c0156ec1:   89 e5   mov%esp,%ebp
c0156ec3:   83 ec 10sub$0x10,%esp
c0156ec6:   89 5d f4mov%ebx,0xfff4(%ebp)
c0156ec9:   85 d2   test   %edx,%edx
c0156ecb:   89 d3   mov%edx,%ebx
c0156ecd:   89 75 f8mov%esi,0xfff8(%ebp)
c0156ed0:   89 c6   mov%eax,%esi
c0156ed2:   89 7d fcmov%edi,0xfffc(%ebp)
c0156ed5:   8b 40 08mov0x8(%eax),%eax
c0156ed8:   8b 40 08mov0x8(%eax),%eax
c0156edb:   8b b8 f0 00 00 00   mov0xf0(%eax),%edi
c0156ee1:   74 0c   je c0156eef 
c0156ee3:   85 ff   test   %edi,%edi
c0156ee5:   74 08   je c0156eef 
c0156ee7:   89 d1   mov%edx,%ecx
c0156ee9:   89 f0   mov%esi,%eax
c0156eeb:   89 fa   mov%edi,%edx
c0156eed:   ff 13   call   *(%ebx)
c0156eef:   0f b7 5e 1c movzwl 0x1c(%esi),%ebx
c0156ef3:   31 c9   xor%ecx,%ecx
c0156ef5:   8b 47 08mov0x8(%edi),%eax
c0156ef8:   f6 c3 01test   $0x1,%bl
c0156efb:   89 45 f0mov%eax,0xfff0(%ebp)
c0156efe:   74 20   je c0156f20 
c0156f00:   85 c0   test   %eax,%eax
c0156f02:   b8 41 00 00 00  mov$0x41,%eax
c0156f07:   0f 4f c8cmovg  %eax,%ecx
c0156f0a:   8b 87 5c 01 00 00   mov0x15c(%edi),%eax
c0156f10:   85 c0   test   %eax,%eax
c0156f12:   74 43   je c0156f57 
c0156f14:   8d b6 00 00 00 00   lea0x0(%esi),%esi
c0156f1a:   8d bf 00 00 00 00   lea0x0(%edi),%edi
c0156f20:   f6 c3 02test   $0x2,%bl
c0156f23:   74 23   je c0156f48 
c0156f25:   83 7d f0 0f cmpl   $0xf,0xfff0(%ebp)
c0156f29:   b8 04 01 00 00  mov$0x104,%eax
c0156f2e:   ba 00 00 00 00  mov$0x0,%edx
c0156f33:   8b 9f 58 01 00 00   mov0x158(%edi),%ebx
c0156f39:   0f 4f c2cmovg  %edx,%eax
c0156f3c:   09 c1   or %eax,%ecx
c0156f3e:   89 c8   mov%ecx,%eax
c0156f40:   83 c8 08or $0x8,%eax
c0156f43:   85 db   test   %ebx,%ebx
c0156f45:   0f 44 c8cmove  %eax,%ecx
c0156f48:   8b 5d f4mov0xfff4(%ebp),%ebx
c0156f4b:   89 c8   mov%ecx,%eax
c0156f4d:   8b 75 f8mov0xfff8(%ebp),%esi
c0156f50:   8b 7d fcmov0xfffc(%ebp),%edi
c0156f53:   89 ec   mov%ebp,%esp
c0156f55:   5d  pop%ebp
c0156f56:   c3  ret
c0156f57:   89 ca   mov%ecx,%edx
c0156f59:   8b 46 6cmov0x6c(%esi),%eax
c0156f5c:   83 ca 10or $0x10,%edx
c0156f5f:   3b 87 68 01 00 00   cmp0x168(%edi),%eax
c0156f65:   0f 45 cacmovne %edx,%ecx
c0156f68:   eb b6   jmpc0156f20 
c0156f6a:   8d b6 00 00 00 00   lea0x0(%esi),%esi

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  

Re: Oops in 2.6.19.1

2006-12-27 Thread Alistair John Strachan
On Wednesday 27 December 2006 02:07, Zhang, Yanmin wrote:
[snip]
   Call Trace:
   [c015d7f3] do_sys_poll+0x253/0x480
   [c015da53] sys_poll+0x33/0x50
   [c0102c97] syscall_call+0x7/0xb
   [b7f26402] 0xb7f26402
   ===
  Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4
  89 c8 8b 75
  f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b 87 68 01 00 00 0f 45
  ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00

 Above codes look weird. Could you disassemble kernel image and post
 the part around address 0xc0156f60?

 87 68 01 00 00 is instruction xchg, but if I disassemble from the
 begining, I couldn't see instruct xchg.

  EIP: [c0156f60] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c

Unfortunately, after suspecting the toolchain, I did a manual rebuild of 
binutils, gcc and glibc from the official sites, and then rebuilt 2.6.19.1. 
This might upset the decompile below, versus the original report.

Assuming it's NOT a bug in my distro's toolchain (because I am now running the 
GNU stuff), it'll crash again, so this is still useful.

Here's a current decompilation of vmlinux/pipe_poll() from the running kernel, 
the addresses have changed slightly. There's no xchg there either:

c0156ec0 pipe_poll:
c0156ec0:   55  push   %ebp
c0156ec1:   89 e5   mov%esp,%ebp
c0156ec3:   83 ec 10sub$0x10,%esp
c0156ec6:   89 5d f4mov%ebx,0xfff4(%ebp)
c0156ec9:   85 d2   test   %edx,%edx
c0156ecb:   89 d3   mov%edx,%ebx
c0156ecd:   89 75 f8mov%esi,0xfff8(%ebp)
c0156ed0:   89 c6   mov%eax,%esi
c0156ed2:   89 7d fcmov%edi,0xfffc(%ebp)
c0156ed5:   8b 40 08mov0x8(%eax),%eax
c0156ed8:   8b 40 08mov0x8(%eax),%eax
c0156edb:   8b b8 f0 00 00 00   mov0xf0(%eax),%edi
c0156ee1:   74 0c   je c0156eef pipe_poll+0x2f
c0156ee3:   85 ff   test   %edi,%edi
c0156ee5:   74 08   je c0156eef pipe_poll+0x2f
c0156ee7:   89 d1   mov%edx,%ecx
c0156ee9:   89 f0   mov%esi,%eax
c0156eeb:   89 fa   mov%edi,%edx
c0156eed:   ff 13   call   *(%ebx)
c0156eef:   0f b7 5e 1c movzwl 0x1c(%esi),%ebx
c0156ef3:   31 c9   xor%ecx,%ecx
c0156ef5:   8b 47 08mov0x8(%edi),%eax
c0156ef8:   f6 c3 01test   $0x1,%bl
c0156efb:   89 45 f0mov%eax,0xfff0(%ebp)
c0156efe:   74 20   je c0156f20 pipe_poll+0x60
c0156f00:   85 c0   test   %eax,%eax
c0156f02:   b8 41 00 00 00  mov$0x41,%eax
c0156f07:   0f 4f c8cmovg  %eax,%ecx
c0156f0a:   8b 87 5c 01 00 00   mov0x15c(%edi),%eax
c0156f10:   85 c0   test   %eax,%eax
c0156f12:   74 43   je c0156f57 pipe_poll+0x97
c0156f14:   8d b6 00 00 00 00   lea0x0(%esi),%esi
c0156f1a:   8d bf 00 00 00 00   lea0x0(%edi),%edi
c0156f20:   f6 c3 02test   $0x2,%bl
c0156f23:   74 23   je c0156f48 pipe_poll+0x88
c0156f25:   83 7d f0 0f cmpl   $0xf,0xfff0(%ebp)
c0156f29:   b8 04 01 00 00  mov$0x104,%eax
c0156f2e:   ba 00 00 00 00  mov$0x0,%edx
c0156f33:   8b 9f 58 01 00 00   mov0x158(%edi),%ebx
c0156f39:   0f 4f c2cmovg  %edx,%eax
c0156f3c:   09 c1   or %eax,%ecx
c0156f3e:   89 c8   mov%ecx,%eax
c0156f40:   83 c8 08or $0x8,%eax
c0156f43:   85 db   test   %ebx,%ebx
c0156f45:   0f 44 c8cmove  %eax,%ecx
c0156f48:   8b 5d f4mov0xfff4(%ebp),%ebx
c0156f4b:   89 c8   mov%ecx,%eax
c0156f4d:   8b 75 f8mov0xfff8(%ebp),%esi
c0156f50:   8b 7d fcmov0xfffc(%ebp),%edi
c0156f53:   89 ec   mov%ebp,%esp
c0156f55:   5d  pop%ebp
c0156f56:   c3  ret
c0156f57:   89 ca   mov%ecx,%edx
c0156f59:   8b 46 6cmov0x6c(%esi),%eax
c0156f5c:   83 ca 10or $0x10,%edx
c0156f5f:   3b 87 68 01 00 00   cmp0x168(%edi),%eax
c0156f65:   0f 45 cacmovne %edx,%ecx
c0156f68:   eb b6   jmpc0156f20 pipe_poll+0x60
c0156f6a:   8d b6 00 00 00 00   lea0x0(%esi),%esi

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line 

Re: Oops in 2.6.19.1

2006-12-27 Thread Zhang, Yanmin
On Wed, 2006-12-27 at 12:35 +, Alistair John Strachan wrote:
 On Wednesday 27 December 2006 02:07, Zhang, Yanmin wrote:
 [snip]
    Call Trace:
[c015d7f3] do_sys_poll+0x253/0x480
[c015da53] sys_poll+0x33/0x50
[c0102c97] syscall_call+0x7/0xb
[b7f26402] 0xb7f26402
===
   Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4
   89 c8 8b 75
   f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b 87 68 01 00 00 0f 45
   ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00
 
  Above codes look weird. Could you disassemble kernel image and post
  the part around address 0xc0156f60?
 
  87 68 01 00 00 is instruction xchg, but if I disassemble from the
  begining, I couldn't see instruct xchg.
 
   EIP: [c0156f60] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c
 
 Unfortunately, after suspecting the toolchain, I did a manual rebuild of 
 binutils, gcc and glibc from the official sites, and then rebuilt 2.6.19.1. 
 This might upset the decompile below, versus the original report.
 
 Assuming it's NOT a bug in my distro's toolchain (because I am now running 
 the 
 GNU stuff), it'll crash again, so this is still useful.
 
 Here's a current decompilation of vmlinux/pipe_poll() from the running 
 kernel, 
 the addresses have changed slightly. There's no xchg there either:
Could you reproduce the bug by the new kernel, so we could get the exact address
and instruction of the bug?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-27 Thread Alistair John Strachan
On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote:
[snip]
  Here's a current decompilation of vmlinux/pipe_poll() from the running
  kernel, the addresses have changed slightly. There's no xchg there
  either:

 Could you reproduce the bug by the new kernel, so we could get the exact
 address and instruction of the bug?

It crashed again, but this time with no output (machine locked solid). To be 
honest, the disassembly looks right (it's like Chuck said, it's jumping back 
half way through an instruction):

c0156f5f:   3b 87 68 01 00 00   cmp0x168(%edi),%eax

So c0156f60 is 87 68 01 00 00..

This is with the GCC recompile, so it's not a distro problem. It could still 
either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious. 2.6.19 with 
GCC 3.4.3 is 100% stable.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-27 Thread Alistair John Strachan
On Thursday 28 December 2006 04:02, Alistair John Strachan wrote:
 On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote:
 [snip]

   Here's a current decompilation of vmlinux/pipe_poll() from the running
   kernel, the addresses have changed slightly. There's no xchg there
   either:
 
  Could you reproduce the bug by the new kernel, so we could get the exact
  address and instruction of the bug?

 It crashed again, but this time with no output (machine locked solid). To
 be honest, the disassembly looks right (it's like Chuck said, it's jumping
 back half way through an instruction):

 c0156f5f:   3b 87 68 01 00 00   cmp0x168(%edi),%eax

 So c0156f60 is 87 68 01 00 00..

 This is with the GCC recompile, so it's not a distro problem. It could
 still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious.
 2.6.19 with GCC 3.4.3 is 100% stable.

Looks like a similar crash here:

http://ubuntuforums.org/showthread.php?p=1803389

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-26 Thread Zhang, Yanmin
On Sat, 2006-12-23 at 15:40 +, Alistair John Strachan wrote:
> On Wednesday 20 December 2006 14:21, Alistair John Strachan wrote:
> > Hi,
> >
> > Any ideas?
> 
> Pretty much like clockwork, it happened again. I think it's time to take this 
> seriously as a software bug, and not some hardware problem. I've ran kernels 
> since 2.6.0 on this machine without such crashes, and now two of the same in 
> 2.6.19.1? Pretty unlikely!
> 
> BUG: unable to handle kernel NULL pointer dereference at virtual address 
> 0009
>  printing eip:
> c0156f60
> *pde = 
> Oops: 0002 [#1]
> Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat 
> xt_sta
> te iptable_filter ip_tables x_tables prism54 yenta_socket rsrc_nonstatic 
> pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus
> snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore 
> usblp ehci_hcd eth1394 uhci_hcd usbcore ohci1394 i
> eee1394 via_agp agpgart vt1211 hwmon_vid hwmon ip_nat_ftp ip_nat 
> ip_conntrack_ftp ip_conntrack
> CPU:0
> EIP:0060:[]Not tainted VLI
> EFLAGS: 00010246   (2.6.19.1 #1)
> EIP is at pipe_poll+0xa0/0xb0
> eax: 0008   ebx:    ecx: 0008   edx: 
> esi: ee1b9e9c   edi: f4d80a00   ebp: ee1b9c1c   esp: ee1b9c0c
> ds: 007b   es: 007b   ss: 0068
> Process java (pid: 5374, ti=ee1b8000 task=f7117560 task.ti=ee1b8000)
> Stack:   ee1b9e9c f6c17160 ee1b9fa4 c015d7f3 ee1b9c54 ee1b9fac
>082dff90 0010 082dffa0  ee1b9e94 ee1b9e94 0002 ee1b9eac
> ee1b9e94 c015e580   0002 f6c17160 
> Call Trace:
>  [] do_sys_poll+0x253/0x480
>  [] sys_poll+0x33/0x50
>  [] syscall_call+0x7/0xb
>  [] 0xb7f26402
>  ===
> Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 
> 8b 75
> f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 45 ca 
> eb b6 8d b6 00 00 00 00 55 b8 01 00 00
Above codes look weird. Could you disassemble kernel image and post
the part around address 0xc0156f60?

"87 68 01 00 00" is instruction xchg, but if I disassemble from the begining,
I couldn't see instruct xchg.


> EIP: [] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-26 Thread Zhang, Yanmin
On Sat, 2006-12-23 at 15:40 +, Alistair John Strachan wrote:
 On Wednesday 20 December 2006 14:21, Alistair John Strachan wrote:
  Hi,
 
  Any ideas?
 
 Pretty much like clockwork, it happened again. I think it's time to take this 
 seriously as a software bug, and not some hardware problem. I've ran kernels 
 since 2.6.0 on this machine without such crashes, and now two of the same in 
 2.6.19.1? Pretty unlikely!
 
 BUG: unable to handle kernel NULL pointer dereference at virtual address 
 0009
  printing eip:
 c0156f60
 *pde = 
 Oops: 0002 [#1]
 Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat 
 xt_sta
 te iptable_filter ip_tables x_tables prism54 yenta_socket rsrc_nonstatic 
 pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus
 snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore 
 usblp ehci_hcd eth1394 uhci_hcd usbcore ohci1394 i
 eee1394 via_agp agpgart vt1211 hwmon_vid hwmon ip_nat_ftp ip_nat 
 ip_conntrack_ftp ip_conntrack
 CPU:0
 EIP:0060:[c0156f60]Not tainted VLI
 EFLAGS: 00010246   (2.6.19.1 #1)
 EIP is at pipe_poll+0xa0/0xb0
 eax: 0008   ebx:    ecx: 0008   edx: 
 esi: ee1b9e9c   edi: f4d80a00   ebp: ee1b9c1c   esp: ee1b9c0c
 ds: 007b   es: 007b   ss: 0068
 Process java (pid: 5374, ti=ee1b8000 task=f7117560 task.ti=ee1b8000)
 Stack:   ee1b9e9c f6c17160 ee1b9fa4 c015d7f3 ee1b9c54 ee1b9fac
082dff90 0010 082dffa0  ee1b9e94 ee1b9e94 0002 ee1b9eac
 ee1b9e94 c015e580   0002 f6c17160 
 Call Trace:
  [c015d7f3] do_sys_poll+0x253/0x480
  [c015da53] sys_poll+0x33/0x50
  [c0102c97] syscall_call+0x7/0xb
  [b7f26402] 0xb7f26402
  ===
 Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 
 8b 75
 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b 87 68 01 00 00 0f 45 ca 
 eb b6 8d b6 00 00 00 00 55 b8 01 00 00
Above codes look weird. Could you disassemble kernel image and post
the part around address 0xc0156f60?

87 68 01 00 00 is instruction xchg, but if I disassemble from the begining,
I couldn't see instruct xchg.


 EIP: [c0156f60] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-24 Thread Alistair John Strachan
On Sunday 24 December 2006 04:23, Chuck Ebbert wrote:
[snip]
> Anyway, post your complete .config.

Config attached.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.19.1
# Sat Dec 16 19:30:00 2006
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
# CONFIG_IKCONFIG is not set
# CONFIG_RELAY is not set
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y

#
# Block layer
#
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
# CONFIG_IOSCHED_DEADLINE is not set
# CONFIG_IOSCHED_CFQ is not set
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="anticipatory"

#
# Processor type and features
#
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
CONFIG_MVIAC3_2=y
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_MCE is not set
CONFIG_VM86=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
# CONFIG_X86_REBOOTFIXUPS is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set

#
# Firmware Drivers
#
# CONFIG_EDD is not set
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
# CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set
CONFIG_PAGE_OFFSET=0xC000
CONFIG_HIGHMEM=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_SPARSEMEM_STATIC=y
CONFIG_SPLIT_PTLOCK_CPUS=4
# CONFIG_RESOURCES_64BIT is not set
# CONFIG_HIGHPTE is not set
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
# CONFIG_EFI is not set
CONFIG_REGPARM=y
# CONFIG_SECCOMP is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
# CONFIG_KEXEC is not set
# CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x10
# CONFIG_COMPAT_VDSO is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y

#
# Power management options (ACPI, 

Re: Oops in 2.6.19.1

2006-12-24 Thread Alistair John Strachan
On Sunday 24 December 2006 04:23, Chuck Ebbert wrote:
> In-Reply-To: <[EMAIL PROTECTED]>
>
> On Sat, 23 Dec 2006 15:40:46 +, Alistair John Strachan wrote:
> > Pretty much like clockwork, it happened again. I think it's time to take
> > this seriously as a software bug, and not some hardware problem. I've ran
> > kernels since 2.6.0 on this machine without such crashes, and now two of
> > the same in 2.6.19.1? Pretty unlikely!
>
> Stranger things have happened, e.g. your system might have started
> to overheat just recently.

True, I've considered it, I'll replace the CPU fan.

> Anyway, post your complete .config.  And exactly which one of the
> many Via cpus are you using?  Are you using the Padlock unit?

No, much older than that:

[alistair] 14:38 [~] cat /proc/cpuinfo
processor   : 0
vendor_id   : CentaurHauls
cpu family  : 6
model   : 9
model name  : VIA Nehemiah
stepping: 1
cpu MHz : 999.569
cache size  : 64 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu de tsc msr cx8 mtrr pge cmov mmx fxsr sse fxsr_opt
bogomips: 2000.02

> What do those java/python programs do that are running?  What pipe
> are they polling?
>
> You could try going back to 2.6.18.x for a while in the meantime.

Well, I have had a thought. I recently upgraded the toolchain on the machine 
from binutils 2.16.x and GCC 3.4.3 (2.6.19 was built with this) to binutils 
2.17 and GCC 4.1.1. It's conceivable that this is some sort of compiler bug.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-24 Thread Alistair John Strachan
On Sunday 24 December 2006 04:23, Chuck Ebbert wrote:
 In-Reply-To: [EMAIL PROTECTED]

 On Sat, 23 Dec 2006 15:40:46 +, Alistair John Strachan wrote:
  Pretty much like clockwork, it happened again. I think it's time to take
  this seriously as a software bug, and not some hardware problem. I've ran
  kernels since 2.6.0 on this machine without such crashes, and now two of
  the same in 2.6.19.1? Pretty unlikely!

 Stranger things have happened, e.g. your system might have started
 to overheat just recently.

True, I've considered it, I'll replace the CPU fan.

 Anyway, post your complete .config.  And exactly which one of the
 many Via cpus are you using?  Are you using the Padlock unit?

No, much older than that:

[alistair] 14:38 [~] cat /proc/cpuinfo
processor   : 0
vendor_id   : CentaurHauls
cpu family  : 6
model   : 9
model name  : VIA Nehemiah
stepping: 1
cpu MHz : 999.569
cache size  : 64 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu de tsc msr cx8 mtrr pge cmov mmx fxsr sse fxsr_opt
bogomips: 2000.02

 What do those java/python programs do that are running?  What pipe
 are they polling?

 You could try going back to 2.6.18.x for a while in the meantime.

Well, I have had a thought. I recently upgraded the toolchain on the machine 
from binutils 2.16.x and GCC 3.4.3 (2.6.19 was built with this) to binutils 
2.17 and GCC 4.1.1. It's conceivable that this is some sort of compiler bug.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-24 Thread Alistair John Strachan
On Sunday 24 December 2006 04:23, Chuck Ebbert wrote:
[snip]
 Anyway, post your complete .config.

Config attached.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.19.1
# Sat Dec 16 19:30:00 2006
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
# CONFIG_IKCONFIG is not set
# CONFIG_RELAY is not set
CONFIG_INITRAMFS_SOURCE=
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y

#
# Block layer
#
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
# CONFIG_IOSCHED_DEADLINE is not set
# CONFIG_IOSCHED_CFQ is not set
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED=anticipatory

#
# Processor type and features
#
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
CONFIG_MVIAC3_2=y
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_MCE is not set
CONFIG_VM86=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
# CONFIG_X86_REBOOTFIXUPS is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set

#
# Firmware Drivers
#
# CONFIG_EDD is not set
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
# CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set
CONFIG_PAGE_OFFSET=0xC000
CONFIG_HIGHMEM=y
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_SPARSEMEM_STATIC=y
CONFIG_SPLIT_PTLOCK_CPUS=4
# CONFIG_RESOURCES_64BIT is not set
# CONFIG_HIGHPTE is not set
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
# CONFIG_EFI is not set
CONFIG_REGPARM=y
# CONFIG_SECCOMP is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
# CONFIG_KEXEC is not set
# CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x10
# CONFIG_COMPAT_VDSO is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y

#
# Power management options (ACPI, APM)
#

Re: Oops in 2.6.19.1

2006-12-23 Thread Alistair John Strachan
On Wednesday 20 December 2006 14:21, Alistair John Strachan wrote:
> Hi,
>
> Any ideas?

Pretty much like clockwork, it happened again. I think it's time to take this 
seriously as a software bug, and not some hardware problem. I've ran kernels 
since 2.6.0 on this machine without such crashes, and now two of the same in 
2.6.19.1? Pretty unlikely!

BUG: unable to handle kernel NULL pointer dereference at virtual address 
0009
 printing eip:
c0156f60
*pde = 
Oops: 0002 [#1]
Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat 
xt_sta
te iptable_filter ip_tables x_tables prism54 yenta_socket rsrc_nonstatic 
pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus
snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore 
usblp ehci_hcd eth1394 uhci_hcd usbcore ohci1394 i
eee1394 via_agp agpgart vt1211 hwmon_vid hwmon ip_nat_ftp ip_nat 
ip_conntrack_ftp ip_conntrack
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010246   (2.6.19.1 #1)
EIP is at pipe_poll+0xa0/0xb0
eax: 0008   ebx:    ecx: 0008   edx: 
esi: ee1b9e9c   edi: f4d80a00   ebp: ee1b9c1c   esp: ee1b9c0c
ds: 007b   es: 007b   ss: 0068
Process java (pid: 5374, ti=ee1b8000 task=f7117560 task.ti=ee1b8000)
Stack:   ee1b9e9c f6c17160 ee1b9fa4 c015d7f3 ee1b9c54 ee1b9fac
   082dff90 0010 082dffa0  ee1b9e94 ee1b9e94 0002 ee1b9eac
    ee1b9e94 c015e580   0002 f6c17160 
Call Trace:
 [] do_sys_poll+0x253/0x480
 [] sys_poll+0x33/0x50
 [] syscall_call+0x7/0xb
 [] 0xb7f26402
 ===
Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 
8b 75
f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 45 ca 
eb b6 8d b6 00 00 00 00 55 b8 01 00 00
EIP: [] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-23 Thread Alistair John Strachan
On Wednesday 20 December 2006 14:21, Alistair John Strachan wrote:
 Hi,

 Any ideas?

Pretty much like clockwork, it happened again. I think it's time to take this 
seriously as a software bug, and not some hardware problem. I've ran kernels 
since 2.6.0 on this machine without such crashes, and now two of the same in 
2.6.19.1? Pretty unlikely!

BUG: unable to handle kernel NULL pointer dereference at virtual address 
0009
 printing eip:
c0156f60
*pde = 
Oops: 0002 [#1]
Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat 
xt_sta
te iptable_filter ip_tables x_tables prism54 yenta_socket rsrc_nonstatic 
pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus
snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore 
usblp ehci_hcd eth1394 uhci_hcd usbcore ohci1394 i
eee1394 via_agp agpgart vt1211 hwmon_vid hwmon ip_nat_ftp ip_nat 
ip_conntrack_ftp ip_conntrack
CPU:0
EIP:0060:[c0156f60]Not tainted VLI
EFLAGS: 00010246   (2.6.19.1 #1)
EIP is at pipe_poll+0xa0/0xb0
eax: 0008   ebx:    ecx: 0008   edx: 
esi: ee1b9e9c   edi: f4d80a00   ebp: ee1b9c1c   esp: ee1b9c0c
ds: 007b   es: 007b   ss: 0068
Process java (pid: 5374, ti=ee1b8000 task=f7117560 task.ti=ee1b8000)
Stack:   ee1b9e9c f6c17160 ee1b9fa4 c015d7f3 ee1b9c54 ee1b9fac
   082dff90 0010 082dffa0  ee1b9e94 ee1b9e94 0002 ee1b9eac
    ee1b9e94 c015e580   0002 f6c17160 
Call Trace:
 [c015d7f3] do_sys_poll+0x253/0x480
 [c015da53] sys_poll+0x33/0x50
 [c0102c97] syscall_call+0x7/0xb
 [b7f26402] 0xb7f26402
 ===
Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 
8b 75
f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b 87 68 01 00 00 0f 45 ca 
eb b6 8d b6 00 00 00 00 55 b8 01 00 00
EIP: [c0156f60] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-21 Thread Valdis . Kletnieks
On Wed, 20 Dec 2006 22:15:50 GMT, Alistair John Strachan said:
> Seems pretty unlikely on a 4 year old Via Epia. Never had any problems with it
> before now.
> 
> Maybe a cosmic ray event? ;-)

More likely a stray alpha particle from a radioactive decay in the actual chip
casing - I saw some research a while back that said that the average commodity
system should *expect* to see 1 or 2 alpha-induced single-bit errors per year,
and the chance that *you* saw the event was directly related to whether the
memory had ECC, and how much of the other circuitry had ECC on it


pgpTomvw9InXj.pgp
Description: PGP signature


Re: Oops in 2.6.19.1

2006-12-21 Thread Alistair John Strachan
On Thursday 21 December 2006 08:05, Chuck Ebbert wrote:
> In-Reply-To: <[EMAIL PROTECTED]>
>
> On Wed, 20 Dec 2006 22:15:50 +, Alistair John Strachan wrote:
> > > I'd guess you have some kind of hardware problem.  It could also be
> > > a kernel problem where the saved address was corrupted during an
> > > interrupt, but that's not likely.
> >
> > Seems pretty unlikely on a 4 year old Via Epia. Never had any problems
> > with it before now.
> >
> > Maybe a cosmic ray event? ;-)
>
> The low byte of eip should be 5f and it changed to 60, so that's
> probably not it.  And the oops report is consistent with that being
> the instruction that was really executed, so it's not the kernel
> misreporting the address after it happened.
>
> You weren't trying kprobes or something, were you? Have you ever
> had another unexplained oops with this machine?

Nope, it's a stock kernel and it's running on a server, kprobes isn't in use.

And no, to my knowledge there's not been another "unexplained" oops. I've had 
crashes, but they've always been known issues or BIOS trouble.

The machine was recently tampered with to install additional HDDs, but the 
memory was memtest'ed when it was installed and passed several times without 
issue. I'm rather puzzled.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-21 Thread Chuck Ebbert
In-Reply-To: <[EMAIL PROTECTED]>

On Wed, 20 Dec 2006 22:15:50 +, Alistair John Strachan wrote:

> > I'd guess you have some kind of hardware problem.  It could also be
> > a kernel problem where the saved address was corrupted during an
> > interrupt, but that's not likely.
> 
> Seems pretty unlikely on a 4 year old Via Epia. Never had any problems with 
> it 
> before now.
> 
> Maybe a cosmic ray event? ;-)

The low byte of eip should be 5f and it changed to 60, so that's
probably not it.  And the oops report is consistent with that being
the instruction that was really executed, so it's not the kernel
misreporting the address after it happened.

You weren't trying kprobes or something, were you? Have you ever
had another unexplained oops with this machine?

-- 
MBTI: IXTP

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-21 Thread Chuck Ebbert
In-Reply-To: [EMAIL PROTECTED]

On Wed, 20 Dec 2006 22:15:50 +, Alistair John Strachan wrote:

  I'd guess you have some kind of hardware problem.  It could also be
  a kernel problem where the saved address was corrupted during an
  interrupt, but that's not likely.
 
 Seems pretty unlikely on a 4 year old Via Epia. Never had any problems with 
 it 
 before now.
 
 Maybe a cosmic ray event? ;-)

The low byte of eip should be 5f and it changed to 60, so that's
probably not it.  And the oops report is consistent with that being
the instruction that was really executed, so it's not the kernel
misreporting the address after it happened.

You weren't trying kprobes or something, were you? Have you ever
had another unexplained oops with this machine?

-- 
MBTI: IXTP

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-21 Thread Alistair John Strachan
On Thursday 21 December 2006 08:05, Chuck Ebbert wrote:
 In-Reply-To: [EMAIL PROTECTED]

 On Wed, 20 Dec 2006 22:15:50 +, Alistair John Strachan wrote:
   I'd guess you have some kind of hardware problem.  It could also be
   a kernel problem where the saved address was corrupted during an
   interrupt, but that's not likely.
 
  Seems pretty unlikely on a 4 year old Via Epia. Never had any problems
  with it before now.
 
  Maybe a cosmic ray event? ;-)

 The low byte of eip should be 5f and it changed to 60, so that's
 probably not it.  And the oops report is consistent with that being
 the instruction that was really executed, so it's not the kernel
 misreporting the address after it happened.

 You weren't trying kprobes or something, were you? Have you ever
 had another unexplained oops with this machine?

Nope, it's a stock kernel and it's running on a server, kprobes isn't in use.

And no, to my knowledge there's not been another unexplained oops. I've had 
crashes, but they've always been known issues or BIOS trouble.

The machine was recently tampered with to install additional HDDs, but the 
memory was memtest'ed when it was installed and passed several times without 
issue. I'm rather puzzled.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-21 Thread Valdis . Kletnieks
On Wed, 20 Dec 2006 22:15:50 GMT, Alistair John Strachan said:
 Seems pretty unlikely on a 4 year old Via Epia. Never had any problems with it
 before now.
 
 Maybe a cosmic ray event? ;-)

More likely a stray alpha particle from a radioactive decay in the actual chip
casing - I saw some research a while back that said that the average commodity
system should *expect* to see 1 or 2 alpha-induced single-bit errors per year,
and the chance that *you* saw the event was directly related to whether the
memory had ECC, and how much of the other circuitry had ECC on it


pgpTomvw9InXj.pgp
Description: PGP signature


Re: Oops in 2.6.19.1

2006-12-20 Thread Alistair John Strachan
On Wednesday 20 December 2006 20:48, Chuck Ebbert wrote:
[snip]
> I'd guess you have some kind of hardware problem.  It could also be
> a kernel problem where the saved address was corrupted during an
> interrupt, but that's not likely.

Seems pretty unlikely on a 4 year old Via Epia. Never had any problems with it 
before now.

Maybe a cosmic ray event? ;-)

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-20 Thread Chuck Ebbert
In-Reply-To: <[EMAIL PROTECTED]>

On Wed, 20 Dec 2006 14:21:03 +, Alistair John Strachan wrote:

> Any ideas?
> 
> BUG: unable to handle kernel NULL pointer dereference at virtual address 
> 0009

83 ca 10  or $0x10,%edx
3b.byte 0x3b
87 68 01  xchg   %ebp,0x1(%eax)   <=
00 00 add%al,(%eax)

Somehow it is trying to execute code in the middle of an instruction.
That almost never works, even when the resulting fragment is a legal
opcode. :)

The real instruction is:

3b 87 68 01 00 00 00cmp0x168(%edi),%eax

I'd guess you have some kind of hardware problem.  It could also be
a kernel problem where the saved address was corrupted during an
interrupt, but that's not likely.
-- 
MBTI: IXTP
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-20 Thread Alistair John Strachan
On Wednesday 20 December 2006 16:30, Greg KH wrote:
> On Wed, Dec 20, 2006 at 02:21:03PM +, Alistair John Strachan wrote:
> > Hi,
> >
> > Any ideas?
>
> Does the problem also happen in 2.6.19?

No idea. I ran 2.6.19 for a couple of weeks without problems. It took 2 days 
to oops 2.6.19.1, so if it happens again within that time period I guess that 
might be indicative of a -stable patch.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-20 Thread Greg KH
On Wed, Dec 20, 2006 at 02:21:03PM +, Alistair John Strachan wrote:
> Hi,
> 
> Any ideas?

Does the problem also happen in 2.6.19?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Oops in 2.6.19.1

2006-12-20 Thread Alistair John Strachan
Hi,

Any ideas?

BUG: unable to handle kernel NULL pointer dereference at virtual address 
0009
 printing eip:
c0156f60
*pde = 
Oops: 0002 [#1]
Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat 
xt_state iptable_filter ip_tables x_tables prism54 yenta_socket 
rsrc_nonstatic pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus snd_pcm 
snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore ehci_hcd 
usblp eth1394 uhci_hcd usbcore ohci1394 ieee1394 via_agp agpgart vt1211 
hwmon_vid hwmon ip_nat_ftp ip_nat ip_conntrack_ftp ip_conntrack
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010246   (2.6.19.1 #1)
EIP is at pipe_poll+0xa0/0xb0
eax: 0008   ebx:    ecx: 0008   edx: 
esi: f70f3e9c   edi: f7017c00   ebp: f70f3c1c   esp: f70f3c0c
ds: 007b   es: 007b   ss: 0068
Process python (pid: 4178, ti=f70f2000 task=f70c4a90 task.ti=f70f2000)
Stack:   f70f3e9c f6e111c0 f70f3fa4 c015d7f3 f70f3c54 f70f3fac
   084c44a0 0030 084c44d0  f70f3e94 f70f3e94 0006 f70f3ecc
    f70f3e94 c015e580   0006 f6e111c0 
Call Trace:
 [] do_sys_poll+0x253/0x480
 [] sys_poll+0x33/0x50
 [] syscall_call+0x7/0xb
 [] 0xb7f6b402
 ===
Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 
8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 
45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00
EIP: [] pipe_poll+0xa0/0xb0 SS:ESP 0068:f70f3c0c

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Oops in 2.6.19.1

2006-12-20 Thread Alistair John Strachan
Hi,

Any ideas?

BUG: unable to handle kernel NULL pointer dereference at virtual address 
0009
 printing eip:
c0156f60
*pde = 
Oops: 0002 [#1]
Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat 
xt_state iptable_filter ip_tables x_tables prism54 yenta_socket 
rsrc_nonstatic pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus snd_pcm 
snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore ehci_hcd 
usblp eth1394 uhci_hcd usbcore ohci1394 ieee1394 via_agp agpgart vt1211 
hwmon_vid hwmon ip_nat_ftp ip_nat ip_conntrack_ftp ip_conntrack
CPU:0
EIP:0060:[c0156f60]Not tainted VLI
EFLAGS: 00010246   (2.6.19.1 #1)
EIP is at pipe_poll+0xa0/0xb0
eax: 0008   ebx:    ecx: 0008   edx: 
esi: f70f3e9c   edi: f7017c00   ebp: f70f3c1c   esp: f70f3c0c
ds: 007b   es: 007b   ss: 0068
Process python (pid: 4178, ti=f70f2000 task=f70c4a90 task.ti=f70f2000)
Stack:   f70f3e9c f6e111c0 f70f3fa4 c015d7f3 f70f3c54 f70f3fac
   084c44a0 0030 084c44d0  f70f3e94 f70f3e94 0006 f70f3ecc
    f70f3e94 c015e580   0006 f6e111c0 
Call Trace:
 [c015d7f3] do_sys_poll+0x253/0x480
 [c015da53] sys_poll+0x33/0x50
 [c0102c97] syscall_call+0x7/0xb
 [b7f6b402] 0xb7f6b402
 ===
Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 
8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b 87 68 01 00 00 0f 
45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00
EIP: [c0156f60] pipe_poll+0xa0/0xb0 SS:ESP 0068:f70f3c0c

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-20 Thread Greg KH
On Wed, Dec 20, 2006 at 02:21:03PM +, Alistair John Strachan wrote:
 Hi,
 
 Any ideas?

Does the problem also happen in 2.6.19?

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-20 Thread Alistair John Strachan
On Wednesday 20 December 2006 16:30, Greg KH wrote:
 On Wed, Dec 20, 2006 at 02:21:03PM +, Alistair John Strachan wrote:
  Hi,
 
  Any ideas?

 Does the problem also happen in 2.6.19?

No idea. I ran 2.6.19 for a couple of weeks without problems. It took 2 days 
to oops 2.6.19.1, so if it happens again within that time period I guess that 
might be indicative of a -stable patch.

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-20 Thread Chuck Ebbert
In-Reply-To: [EMAIL PROTECTED]

On Wed, 20 Dec 2006 14:21:03 +, Alistair John Strachan wrote:

 Any ideas?
 
 BUG: unable to handle kernel NULL pointer dereference at virtual address 
 0009

83 ca 10  or $0x10,%edx
3b.byte 0x3b
87 68 01  xchg   %ebp,0x1(%eax)   =
00 00 add%al,(%eax)

Somehow it is trying to execute code in the middle of an instruction.
That almost never works, even when the resulting fragment is a legal
opcode. :)

The real instruction is:

3b 87 68 01 00 00 00cmp0x168(%edi),%eax

I'd guess you have some kind of hardware problem.  It could also be
a kernel problem where the saved address was corrupted during an
interrupt, but that's not likely.
-- 
MBTI: IXTP
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in 2.6.19.1

2006-12-20 Thread Alistair John Strachan
On Wednesday 20 December 2006 20:48, Chuck Ebbert wrote:
[snip]
 I'd guess you have some kind of hardware problem.  It could also be
 a kernel problem where the saved address was corrupted during an
 interrupt, but that's not likely.

Seems pretty unlikely on a 4 year old Via Epia. Never had any problems with it 
before now.

Maybe a cosmic ray event? ;-)

-- 
Cheers,
Alistair.

Final year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/