Re: OOPS in 2.6.19.1, connected to nfs4 and autofs4
On Fri, 22 Jun 2007 17:42:36 -0700 Stuart Anderson <[EMAIL PROTECTED]> wrote: (I CC: this to the lkml) > Did you find a resolution to your posting regarding, > > "OOPS in 2.6.19.1, connected to nfs4 and autofs4" > > We just had a 2.6.20.11 kernel crash with a similar stack trace. No, it still happens (2.6.21.5) once in a while (once in a week or so). I don't know a way to clearly reproduce this. > > > Thanks. > -- --- Malte Schröder [EMAIL PROTECTED] ICQ# 68121508 --- signature.asc Description: PGP signature
Re: OOPS in 2.6.19.1, connected to nfs4 and autofs4
On Fri, 22 Jun 2007 17:42:36 -0700 Stuart Anderson [EMAIL PROTECTED] wrote: (I CC: this to the lkml) Did you find a resolution to your posting regarding, OOPS in 2.6.19.1, connected to nfs4 and autofs4 We just had a 2.6.20.11 kernel crash with a similar stack trace. No, it still happens (2.6.21.5) once in a while (once in a week or so). I don't know a way to clearly reproduce this. Thanks. -- --- Malte Schröder [EMAIL PROTECTED] ICQ# 68121508 --- signature.asc Description: PGP signature
APIC Oops on 2.6.19.1
As Matt Mackall said: "So yes, if a user reports a bug that's attributable to a single bit memory error that's otherwise unreproduced and unexplained, it's totally reasonable to chalk it up to cosmic rays until some sort of pattern of reports emerges." So I guess that the only way to figure out if this is indeed a one-off cosmic ray is to post it somewhere public in case someone else sees it? As there is no APIC mailing list, I am posting to LKML - sorry for the line noise, feel free to tell me to post elsewhere (/dev/null?). Here it is: on a Dual-Opteron Tyan board which is rebooting every hour to run some unit tests, I caught this -only once- at boot (partially copied by hand): Pid: 1, comm: swapper Not tainted 2.6.19.1 #1 RIP: 0010:[] [] setup_APIC_timer+0x1e/0xba RSP: :81007ffa7ec0 EFLAGS: 0002 RAX: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 000 CR3: 00201000 CR4: 6e0 Process swapper: (pid: 1, threadinfo f81007ffa6000, task 8100023937a0) Stack: 0be41ca0 ff806491bc 40b40009 0008e000 009 8e000 80267297 fff80546280 f80261a61 Call Trace: [] setup_boot_APIC_clock+0x115/0x11d [] init+0x48/0x306 [] child_rip+0xa/0x12 Code: 8b 04 25 f0 e0 5f ff 39 d0 73 f5 8b 04 25 f0 e0 5f ff 39 d0 <0>Kernel panic - not syncing: Attempting to kill init! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
APIC Oops on 2.6.19.1
As Matt Mackall said: So yes, if a user reports a bug that's attributable to a single bit memory error that's otherwise unreproduced and unexplained, it's totally reasonable to chalk it up to cosmic rays until some sort of pattern of reports emerges. So I guess that the only way to figure out if this is indeed a one-off cosmic ray is to post it somewhere public in case someone else sees it? As there is no APIC mailing list, I am posting to LKML - sorry for the line noise, feel free to tell me to post elsewhere (/dev/null?). Here it is: on a Dual-Opteron Tyan board which is rebooting every hour to run some unit tests, I caught this -only once- at boot (partially copied by hand): Pid: 1, comm: swapper Not tainted 2.6.19.1 #1 RIP: 0010:[80272dba] [80272dba] setup_APIC_timer+0x1e/0xba RSP: :81007ffa7ec0 EFLAGS: 0002 RAX: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 000 CR3: 00201000 CR4: 6e0 Process swapper: (pid: 1, threadinfo f81007ffa6000, task 8100023937a0) Stack: 0be41ca0 ff806491bc 40b40009 0008e000 009 8e000 80267297 fff80546280 f80261a61 Call Trace: [806491bc] setup_boot_APIC_clock+0x115/0x11d [80267297] init+0x48/0x306 [8025bed8] child_rip+0xa/0x12 Code: 8b 04 25 f0 e0 5f ff 39 d0 73 f5 8b 04 25 f0 e0 5f ff 39 d0 0Kernel panic - not syncing: Attempting to kill init! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: "kernel + gcc 4.1 = several problems" / "Oops in 2.6.19.1"
On Wed, Jan 03, 2007 at 04:25:09PM +0100, Udo van den Heuvel wrote: > Hello, > > I just read about the subjects. > I have a firewall which has some issues. > First it was a VIA CL6000 (c3). > Now it is a EK8000 (c3-2) with different power supply, RAM and board of > course. Still I see strange things sometimes. Crashes, hangs, etc. Now > and then. Not too often. > > I have in .config: > CONFIG_CC_OPTIMIZE_FOR_SIZE=y > CONFIG_MVIAC3_2=y > > Does this mean the issue applies to my own kernels? It could be. Or it could be something completely different. If the same kernel compiled with gcc 3.4.6 works fine, you might run into one of the mysterious problems with gcc 4.1. It could also be hardware problems (e.g. try running memtest86 for a longer time). Does the machine hang completely, or is any useful information like e.g. an oops available? > Udo cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
"kernel + gcc 4.1 = several problems" / "Oops in 2.6.19.1"
Hello, I just read about the subjects. I have a firewall which has some issues. First it was a VIA CL6000 (c3). Now it is a EK8000 (c3-2) with different power supply, RAM and board of course. Still I see strange things sometimes. Crashes, hangs, etc. Now and then. Not too often. I have in .config: CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_MVIAC3_2=y Does this mean the issue applies to my own kernels? Udo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
kernel + gcc 4.1 = several problems / Oops in 2.6.19.1
Hello, I just read about the subjects. I have a firewall which has some issues. First it was a VIA CL6000 (c3). Now it is a EK8000 (c3-2) with different power supply, RAM and board of course. Still I see strange things sometimes. Crashes, hangs, etc. Now and then. Not too often. I have in .config: CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_MVIAC3_2=y Does this mean the issue applies to my own kernels? Udo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel + gcc 4.1 = several problems / Oops in 2.6.19.1
On Wed, Jan 03, 2007 at 04:25:09PM +0100, Udo van den Heuvel wrote: Hello, I just read about the subjects. I have a firewall which has some issues. First it was a VIA CL6000 (c3). Now it is a EK8000 (c3-2) with different power supply, RAM and board of course. Still I see strange things sometimes. Crashes, hangs, etc. Now and then. Not too often. I have in .config: CONFIG_CC_OPTIMIZE_FOR_SIZE=y CONFIG_MVIAC3_2=y Does this mean the issue applies to my own kernels? It could be. Or it could be something completely different. If the same kernel compiled with gcc 3.4.6 works fine, you might run into one of the mysterious problems with gcc 4.1. It could also be hardware problems (e.g. try running memtest86 for a longer time). Does the machine hang completely, or is any useful information like e.g. an oops available? Udo cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sun, Dec 31, 2006 at 04:48:43PM +, Alistair John Strachan wrote: > On Sunday 31 December 2006 16:28, Adrian Bunk wrote: > > On Sat, Dec 30, 2006 at 06:29:15PM +, Alistair John Strachan wrote: > > > On Saturday 30 December 2006 17:21, Chuck Ebbert wrote: > > > > In-Reply-To: <[EMAIL PROTECTED]> > > > > > > > > On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote: > > > > > I've eliminated 2.6.19.1 as the culprit, and also tried toggling > > > > > "optimize for size", various debug options. 2.6.19 compiled with GCC > > > > > 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, > > > > > within approximately 12 hours. > > > > > > > > Which CPU are you compiling for? You should try different options. > > > > > > I should, I haven't thought of that. Currently it's compiling for > > > CONFIG_MVIAC3_2, but I could try i686 for example. > > > > > > > Can you post disassembly of pipe_poll() for both the one that crashes > > > > and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the > > > > relocation info and post just the one function from each for now. > > > > > > Sure, no problem: > > > > > > http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/ > > > > > > Both use identical configs, neither are optimised for size. The config is > > > available from the same location. > > > > Can you try enabling as many debug options as possible? > > Specifically what? I've already had: > > CONFIG_DETECT_SOFTLOCKUP > CONFIG_FRAME_POINTER > CONFIG_UNWIND_INFO > > Enabled. CONFIG_4KSTACKS is disabled. Are there any debugging features > actually pertinent to this bug? No, that's only an "enable as much as possible and hope one helps" shot in the dark. > Cheers, > Alistair. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sun, Dec 31, 2006 at 04:48:43PM +, Alistair John Strachan wrote: On Sunday 31 December 2006 16:28, Adrian Bunk wrote: On Sat, Dec 30, 2006 at 06:29:15PM +, Alistair John Strachan wrote: On Saturday 30 December 2006 17:21, Chuck Ebbert wrote: In-Reply-To: [EMAIL PROTECTED] On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote: I've eliminated 2.6.19.1 as the culprit, and also tried toggling optimize for size, various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 hours. Which CPU are you compiling for? You should try different options. I should, I haven't thought of that. Currently it's compiling for CONFIG_MVIAC3_2, but I could try i686 for example. Can you post disassembly of pipe_poll() for both the one that crashes and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the relocation info and post just the one function from each for now. Sure, no problem: http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/ Both use identical configs, neither are optimised for size. The config is available from the same location. Can you try enabling as many debug options as possible? Specifically what? I've already had: CONFIG_DETECT_SOFTLOCKUP CONFIG_FRAME_POINTER CONFIG_UNWIND_INFO Enabled. CONFIG_4KSTACKS is disabled. Are there any debugging features actually pertinent to this bug? No, that's only an enable as much as possible and hope one helps shot in the dark. Cheers, Alistair. cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sunday 31 December 2006 21:43, Chuck Ebbert wrote: > In-Reply-To: <[EMAIL PROTECTED]> > > On Sat, 30 Dec 2006 18:29:15 +, Alistair John Strachan wrote: > > > Can you post disassembly of pipe_poll() for both the one that crashes > > > and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the > > > relocation info and post just the one function from each for now. > > > > Sure, no problem: > > > > http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/ > > > > Both use identical configs, neither are optimised for size. The config is > > available from the same location. > > Those were compiled without frame pointers. Can you post them compiled > with frame pointers so they match your original bug report? And confirm > that pipe_poll() is still at 0xc0156ec0 in vmlinux? c0156ec0 : I used the config I original sent you to rebuild it again. This time I've put up the whole vmlinux for both kernels, the config is replaced, the decompilation is re-done, I've confirmed the offset in the GCC 4.1.1 kernel is identical. Sorry for the confusion. The reason I changed the configs was to experiment with enabling and disabling debugging (and other such) options that might have shaken out compiler bugs. However none of these kernels have ever crashed gracefully again, most of them hang the machine (no nmi watchdog though) so I've not been able to look at the oops. It's the same root cause, however, as GCC 3.4.6 kernels do not crash. http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/ Happy new year, btw. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
In-Reply-To: <[EMAIL PROTECTED]> On Sat, 30 Dec 2006 18:29:15 +, Alistair John Strachan wrote: > > Can you post disassembly of pipe_poll() for both the one that crashes > > and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the > > relocation info and post just the one function from each for now. > > Sure, no problem: > > http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/ > > Both use identical configs, neither are optimised for size. The config is > available from the same location. Those were compiled without frame pointers. Can you post them compiled with frame pointers so they match your original bug report? And confirm that pipe_poll() is still at 0xc0156ec0 in vmlinux? -- MBTI: IXTP - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sunday 31 December 2006 16:27, Adrian Bunk wrote: > On Sat, Dec 30, 2006 at 04:59:35PM +, Alistair John Strachan wrote: > > On Thursday 28 December 2006 04:14, Alistair John Strachan wrote: > > > On Thursday 28 December 2006 04:02, Alistair John Strachan wrote: > > > > On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote: > > > > [snip] > > > > > > > > > > Here's a current decompilation of vmlinux/pipe_poll() from the > > > > > > running kernel, the addresses have changed slightly. There's no > > > > > > xchg there either: > > > > > > > > > > Could you reproduce the bug by the new kernel, so we could get the > > > > > exact address and instruction of the bug? > > > > > > > > It crashed again, but this time with no output (machine locked > > > > solid). To be honest, the disassembly looks right (it's like Chuck > > > > said, it's jumping back half way through an instruction): > > > > > > > > c0156f5f: 3b 87 68 01 00 00 cmp0x168(%edi),%eax > > > > > > > > So c0156f60 is 87 68 01 00 00.. > > > > > > > > This is with the GCC recompile, so it's not a distro problem. It > > > > could still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's > > > > serious. 2.6.19 with GCC 3.4.3 is 100% stable. > > > > > > Looks like a similar crash here: > > > > > > http://ubuntuforums.org/showthread.php?p=1803389 > > > > I've eliminated 2.6.19.1 as the culprit, and also tried toggling > > "optimize for size", various debug options. 2.6.19 compiled with GCC > > 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, > > within approximately 12 hours. > > > > The machine passes 6 hours of Prime95 (a CPU stability tester), four > > memtest86 passes, and there are no heat problems. > > > > I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config > > using this compiler (but the same binutils), and will report back if it > > crashes. My bet is that it won't, however. > > There are occasional reports of problems with kernels compiled with > gcc 4.1 that vanish when using older versions of gcc. > > AFAIK, until now noone has ever debugged whether that's a gcc bug, > gcc exposing a kernel bug or gcc exposing a hardware bug. > > Comparing your report and [1], it seems that if these are the same > problem, it's not a hardware bug but a gcc or kernel bug. This bug specifically indicates some kind of miscompilation in a driver, causing boot time hangs. My problem is quite different, and more subtle. The crash happens in the same place every time, which does suggest determinism (even with various options toggled on and off, and a 300K smaller kernel image), but it takes 8-12 hours to manifest and only happens with GCC 4.1.1. Unless we can start narrowing this down, it would be a mammoth task to seek out either the kernel or GCC change that first exhibited this bug, due to the non-immediate reproducibility of the bug, the lack of clues, and this machine's role as a stable, high-availability server. (If I had another Epia M1 or another computer I could reproduce the bug on, I would be only too happy to boot as many kernels as required to fix it; however I cannot spare this machine). -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sunday 31 December 2006 16:28, Adrian Bunk wrote: > On Sat, Dec 30, 2006 at 06:29:15PM +, Alistair John Strachan wrote: > > On Saturday 30 December 2006 17:21, Chuck Ebbert wrote: > > > In-Reply-To: <[EMAIL PROTECTED]> > > > > > > On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote: > > > > I've eliminated 2.6.19.1 as the culprit, and also tried toggling > > > > "optimize for size", various debug options. 2.6.19 compiled with GCC > > > > 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, > > > > within approximately 12 hours. > > > > > > Which CPU are you compiling for? You should try different options. > > > > I should, I haven't thought of that. Currently it's compiling for > > CONFIG_MVIAC3_2, but I could try i686 for example. > > > > > Can you post disassembly of pipe_poll() for both the one that crashes > > > and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the > > > relocation info and post just the one function from each for now. > > > > Sure, no problem: > > > > http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/ > > > > Both use identical configs, neither are optimised for size. The config is > > available from the same location. > > Can you try enabling as many debug options as possible? Specifically what? I've already had: CONFIG_DETECT_SOFTLOCKUP CONFIG_FRAME_POINTER CONFIG_UNWIND_INFO Enabled. CONFIG_4KSTACKS is disabled. Are there any debugging features actually pertinent to this bug? -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sat, Dec 30, 2006 at 06:29:15PM +, Alistair John Strachan wrote: > On Saturday 30 December 2006 17:21, Chuck Ebbert wrote: > > In-Reply-To: <[EMAIL PROTECTED]> > > > > On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote: > > > I've eliminated 2.6.19.1 as the culprit, and also tried toggling > > > "optimize for size", various debug options. 2.6.19 compiled with GCC > > > 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, > > > within approximately 12 hours. > > > > Which CPU are you compiling for? You should try different options. > > I should, I haven't thought of that. Currently it's compiling for > CONFIG_MVIAC3_2, but I could try i686 for example. > > > Can you post disassembly of pipe_poll() for both the one that crashes > > and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the > > relocation info and post just the one function from each for now. > > Sure, no problem: > > http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/ > > Both use identical configs, neither are optimised for size. The config is > available from the same location. Can you try enabling as many debug options as possible? > Cheers, > Alistair. cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sat, Dec 30, 2006 at 04:59:35PM +, Alistair John Strachan wrote: > On Thursday 28 December 2006 04:14, Alistair John Strachan wrote: > > On Thursday 28 December 2006 04:02, Alistair John Strachan wrote: > > > On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote: > > > [snip] > > > > > > > > Here's a current decompilation of vmlinux/pipe_poll() from the > > > > > running kernel, the addresses have changed slightly. There's no xchg > > > > > there either: > > > > > > > > Could you reproduce the bug by the new kernel, so we could get the > > > > exact address and instruction of the bug? > > > > > > It crashed again, but this time with no output (machine locked solid). To > > > be honest, the disassembly looks right (it's like Chuck said, it's > > > jumping back half way through an instruction): > > > > > > c0156f5f: 3b 87 68 01 00 00 cmp0x168(%edi),%eax > > > > > > So c0156f60 is 87 68 01 00 00.. > > > > > > This is with the GCC recompile, so it's not a distro problem. It could > > > still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious. > > > 2.6.19 with GCC 3.4.3 is 100% stable. > > > > Looks like a similar crash here: > > > > http://ubuntuforums.org/showthread.php?p=1803389 > > I've eliminated 2.6.19.1 as the culprit, and also tried toggling "optimize > for > size", various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via > Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 > hours. > > The machine passes 6 hours of Prime95 (a CPU stability tester), four > memtest86 > passes, and there are no heat problems. > > I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config using > this compiler (but the same binutils), and will report back if it crashes. My > bet is that it won't, however. There are occasional reports of problems with kernels compiled with gcc 4.1 that vanish when using older versions of gcc. AFAIK, until now noone has ever debugged whether that's a gcc bug, gcc exposing a kernel bug or gcc exposing a hardware bug. Comparing your report and [1], it seems that if these are the same problem, it's not a hardware bug but a gcc or kernel bug. > Cheers, > Alistair. cu Adrian [1] http://bugzilla.kernel.org/show_bug.cgi?id=7176 -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Saturday 30 December 2006 16:59, Alistair John Strachan wrote: > I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config > using this compiler (but the same binutils), and will report back if it > crashes. My bet is that it won't, however. Still fine after >24 hours. Linux 2.6.19, GCC 3.4.6, Binutils 2.17. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Saturday 30 December 2006 16:59, Alistair John Strachan wrote: I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config using this compiler (but the same binutils), and will report back if it crashes. My bet is that it won't, however. Still fine after 24 hours. Linux 2.6.19, GCC 3.4.6, Binutils 2.17. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sat, Dec 30, 2006 at 04:59:35PM +, Alistair John Strachan wrote: On Thursday 28 December 2006 04:14, Alistair John Strachan wrote: On Thursday 28 December 2006 04:02, Alistair John Strachan wrote: On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote: [snip] Here's a current decompilation of vmlinux/pipe_poll() from the running kernel, the addresses have changed slightly. There's no xchg there either: Could you reproduce the bug by the new kernel, so we could get the exact address and instruction of the bug? It crashed again, but this time with no output (machine locked solid). To be honest, the disassembly looks right (it's like Chuck said, it's jumping back half way through an instruction): c0156f5f: 3b 87 68 01 00 00 cmp0x168(%edi),%eax So c0156f60 is 87 68 01 00 00.. This is with the GCC recompile, so it's not a distro problem. It could still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious. 2.6.19 with GCC 3.4.3 is 100% stable. Looks like a similar crash here: http://ubuntuforums.org/showthread.php?p=1803389 I've eliminated 2.6.19.1 as the culprit, and also tried toggling optimize for size, various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 hours. The machine passes 6 hours of Prime95 (a CPU stability tester), four memtest86 passes, and there are no heat problems. I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config using this compiler (but the same binutils), and will report back if it crashes. My bet is that it won't, however. There are occasional reports of problems with kernels compiled with gcc 4.1 that vanish when using older versions of gcc. AFAIK, until now noone has ever debugged whether that's a gcc bug, gcc exposing a kernel bug or gcc exposing a hardware bug. Comparing your report and [1], it seems that if these are the same problem, it's not a hardware bug but a gcc or kernel bug. Cheers, Alistair. cu Adrian [1] http://bugzilla.kernel.org/show_bug.cgi?id=7176 -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sat, Dec 30, 2006 at 06:29:15PM +, Alistair John Strachan wrote: On Saturday 30 December 2006 17:21, Chuck Ebbert wrote: In-Reply-To: [EMAIL PROTECTED] On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote: I've eliminated 2.6.19.1 as the culprit, and also tried toggling optimize for size, various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 hours. Which CPU are you compiling for? You should try different options. I should, I haven't thought of that. Currently it's compiling for CONFIG_MVIAC3_2, but I could try i686 for example. Can you post disassembly of pipe_poll() for both the one that crashes and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the relocation info and post just the one function from each for now. Sure, no problem: http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/ Both use identical configs, neither are optimised for size. The config is available from the same location. Can you try enabling as many debug options as possible? Cheers, Alistair. cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sunday 31 December 2006 16:28, Adrian Bunk wrote: On Sat, Dec 30, 2006 at 06:29:15PM +, Alistair John Strachan wrote: On Saturday 30 December 2006 17:21, Chuck Ebbert wrote: In-Reply-To: [EMAIL PROTECTED] On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote: I've eliminated 2.6.19.1 as the culprit, and also tried toggling optimize for size, various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 hours. Which CPU are you compiling for? You should try different options. I should, I haven't thought of that. Currently it's compiling for CONFIG_MVIAC3_2, but I could try i686 for example. Can you post disassembly of pipe_poll() for both the one that crashes and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the relocation info and post just the one function from each for now. Sure, no problem: http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/ Both use identical configs, neither are optimised for size. The config is available from the same location. Can you try enabling as many debug options as possible? Specifically what? I've already had: CONFIG_DETECT_SOFTLOCKUP CONFIG_FRAME_POINTER CONFIG_UNWIND_INFO Enabled. CONFIG_4KSTACKS is disabled. Are there any debugging features actually pertinent to this bug? -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sunday 31 December 2006 16:27, Adrian Bunk wrote: On Sat, Dec 30, 2006 at 04:59:35PM +, Alistair John Strachan wrote: On Thursday 28 December 2006 04:14, Alistair John Strachan wrote: On Thursday 28 December 2006 04:02, Alistair John Strachan wrote: On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote: [snip] Here's a current decompilation of vmlinux/pipe_poll() from the running kernel, the addresses have changed slightly. There's no xchg there either: Could you reproduce the bug by the new kernel, so we could get the exact address and instruction of the bug? It crashed again, but this time with no output (machine locked solid). To be honest, the disassembly looks right (it's like Chuck said, it's jumping back half way through an instruction): c0156f5f: 3b 87 68 01 00 00 cmp0x168(%edi),%eax So c0156f60 is 87 68 01 00 00.. This is with the GCC recompile, so it's not a distro problem. It could still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious. 2.6.19 with GCC 3.4.3 is 100% stable. Looks like a similar crash here: http://ubuntuforums.org/showthread.php?p=1803389 I've eliminated 2.6.19.1 as the culprit, and also tried toggling optimize for size, various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 hours. The machine passes 6 hours of Prime95 (a CPU stability tester), four memtest86 passes, and there are no heat problems. I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config using this compiler (but the same binutils), and will report back if it crashes. My bet is that it won't, however. There are occasional reports of problems with kernels compiled with gcc 4.1 that vanish when using older versions of gcc. AFAIK, until now noone has ever debugged whether that's a gcc bug, gcc exposing a kernel bug or gcc exposing a hardware bug. Comparing your report and [1], it seems that if these are the same problem, it's not a hardware bug but a gcc or kernel bug. This bug specifically indicates some kind of miscompilation in a driver, causing boot time hangs. My problem is quite different, and more subtle. The crash happens in the same place every time, which does suggest determinism (even with various options toggled on and off, and a 300K smaller kernel image), but it takes 8-12 hours to manifest and only happens with GCC 4.1.1. Unless we can start narrowing this down, it would be a mammoth task to seek out either the kernel or GCC change that first exhibited this bug, due to the non-immediate reproducibility of the bug, the lack of clues, and this machine's role as a stable, high-availability server. (If I had another Epia M1 or another computer I could reproduce the bug on, I would be only too happy to boot as many kernels as required to fix it; however I cannot spare this machine). -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
In-Reply-To: [EMAIL PROTECTED] On Sat, 30 Dec 2006 18:29:15 +, Alistair John Strachan wrote: Can you post disassembly of pipe_poll() for both the one that crashes and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the relocation info and post just the one function from each for now. Sure, no problem: http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/ Both use identical configs, neither are optimised for size. The config is available from the same location. Those were compiled without frame pointers. Can you post them compiled with frame pointers so they match your original bug report? And confirm that pipe_poll() is still at 0xc0156ec0 in vmlinux? -- MBTI: IXTP - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sunday 31 December 2006 21:43, Chuck Ebbert wrote: In-Reply-To: [EMAIL PROTECTED] On Sat, 30 Dec 2006 18:29:15 +, Alistair John Strachan wrote: Can you post disassembly of pipe_poll() for both the one that crashes and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the relocation info and post just the one function from each for now. Sure, no problem: http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/ Both use identical configs, neither are optimised for size. The config is available from the same location. Those were compiled without frame pointers. Can you post them compiled with frame pointers so they match your original bug report? And confirm that pipe_poll() is still at 0xc0156ec0 in vmlinux? c0156ec0 pipe_poll: I used the config I original sent you to rebuild it again. This time I've put up the whole vmlinux for both kernels, the config is replaced, the decompilation is re-done, I've confirmed the offset in the GCC 4.1.1 kernel is identical. Sorry for the confusion. The reason I changed the configs was to experiment with enabling and disabling debugging (and other such) options that might have shaken out compiler bugs. However none of these kernels have ever crashed gracefully again, most of them hang the machine (no nmi watchdog though) so I've not been able to look at the oops. It's the same root cause, however, as GCC 3.4.6 kernels do not crash. http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/ Happy new year, btw. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Saturday 30 December 2006 18:06, James Courtier-Dutton wrote: > > I'd guess you have some kind of hardware problem. It could also be > > a kernel problem where the saved address was corrupted during an > > interrupt, but that's not likely. > > This looks rather strange. [snip] > 2) Kernel modules compiled with different gcc than rest of kernel. Previously there was only one GCC version (4.1.1 totally replaced 3.4.3, and is the system wide GCC), now I have installed 3.4.6 into /opt/gcc-3.4.6 and it is only PATH'ed explicitly by me when I wish to compile a kernel using it: export PATH=/opt/gcc-3.4.6/bin:$PATH cp /boot/config-2.6.19-test .config make oldconfig make > 3) kernel headers do not match the kernel being used. The tree is a pristine 2.6.19. > One way to start tracking this down would be to run it with the fewest > amount of kernel modules loaded as one can, but still reproduce the > problem. Crippling the machine, though. Impractical for something that isn't immediately reproducible. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Saturday 30 December 2006 17:21, Chuck Ebbert wrote: > In-Reply-To: <[EMAIL PROTECTED]> > > On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote: > > I've eliminated 2.6.19.1 as the culprit, and also tried toggling > > "optimize for size", various debug options. 2.6.19 compiled with GCC > > 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, > > within approximately 12 hours. > > Which CPU are you compiling for? You should try different options. I should, I haven't thought of that. Currently it's compiling for CONFIG_MVIAC3_2, but I could try i686 for example. > Can you post disassembly of pipe_poll() for both the one that crashes > and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the > relocation info and post just the one function from each for now. Sure, no problem: http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/ Both use identical configs, neither are optimised for size. The config is available from the same location. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
Chuck Ebbert wrote: In-Reply-To: <[EMAIL PROTECTED]> On Wed, 20 Dec 2006 14:21:03 +, Alistair John Strachan wrote: Any ideas? BUG: unable to handle kernel NULL pointer dereference at virtual address 0009 83 ca 10 or $0x10,%edx 3b.byte 0x3b 87 68 01 xchg %ebp,0x1(%eax) <= 00 00 add%al,(%eax) Somehow it is trying to execute code in the middle of an instruction. That almost never works, even when the resulting fragment is a legal opcode. :) The real instruction is: 3b 87 68 01 00 00 00cmp0x168(%edi),%eax I'd guess you have some kind of hardware problem. It could also be a kernel problem where the saved address was corrupted during an interrupt, but that's not likely. This looks rather strange. The times I have seen this sort of problem is: 1) when one bit of the kernel is corrupting another part of it. 2) Kernel modules compiled with different gcc than rest of kernel. 3) kernel headers do not match the kernel being used. One way to start tracking this down would be to run it with the fewest amount of kernel modules loaded as one can, but still reproduce the problem. James - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
In-Reply-To: <[EMAIL PROTECTED]> On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote: > I've eliminated 2.6.19.1 as the culprit, and also tried toggling "optimize > for > size", various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via > Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 > hours. Which CPU are you compiling for? You should try different options. Can you post disassembly of pipe_poll() for both the one that crashes and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the relocation info and post just the one function from each for now. -- MBTI: IXTP - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Thursday 28 December 2006 04:14, Alistair John Strachan wrote: > On Thursday 28 December 2006 04:02, Alistair John Strachan wrote: > > On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote: > > [snip] > > > > > > Here's a current decompilation of vmlinux/pipe_poll() from the > > > > running kernel, the addresses have changed slightly. There's no xchg > > > > there either: > > > > > > Could you reproduce the bug by the new kernel, so we could get the > > > exact address and instruction of the bug? > > > > It crashed again, but this time with no output (machine locked solid). To > > be honest, the disassembly looks right (it's like Chuck said, it's > > jumping back half way through an instruction): > > > > c0156f5f: 3b 87 68 01 00 00 cmp0x168(%edi),%eax > > > > So c0156f60 is 87 68 01 00 00.. > > > > This is with the GCC recompile, so it's not a distro problem. It could > > still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious. > > 2.6.19 with GCC 3.4.3 is 100% stable. > > Looks like a similar crash here: > > http://ubuntuforums.org/showthread.php?p=1803389 I've eliminated 2.6.19.1 as the culprit, and also tried toggling "optimize for size", various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 hours. The machine passes 6 hours of Prime95 (a CPU stability tester), four memtest86 passes, and there are no heat problems. I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config using this compiler (but the same binutils), and will report back if it crashes. My bet is that it won't, however. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Thursday 28 December 2006 04:14, Alistair John Strachan wrote: On Thursday 28 December 2006 04:02, Alistair John Strachan wrote: On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote: [snip] Here's a current decompilation of vmlinux/pipe_poll() from the running kernel, the addresses have changed slightly. There's no xchg there either: Could you reproduce the bug by the new kernel, so we could get the exact address and instruction of the bug? It crashed again, but this time with no output (machine locked solid). To be honest, the disassembly looks right (it's like Chuck said, it's jumping back half way through an instruction): c0156f5f: 3b 87 68 01 00 00 cmp0x168(%edi),%eax So c0156f60 is 87 68 01 00 00.. This is with the GCC recompile, so it's not a distro problem. It could still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious. 2.6.19 with GCC 3.4.3 is 100% stable. Looks like a similar crash here: http://ubuntuforums.org/showthread.php?p=1803389 I've eliminated 2.6.19.1 as the culprit, and also tried toggling optimize for size, various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 hours. The machine passes 6 hours of Prime95 (a CPU stability tester), four memtest86 passes, and there are no heat problems. I have compiled GCC 3.4.6 and compiled 2.6.19 with an identical config using this compiler (but the same binutils), and will report back if it crashes. My bet is that it won't, however. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
In-Reply-To: [EMAIL PROTECTED] On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote: I've eliminated 2.6.19.1 as the culprit, and also tried toggling optimize for size, various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 hours. Which CPU are you compiling for? You should try different options. Can you post disassembly of pipe_poll() for both the one that crashes and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the relocation info and post just the one function from each for now. -- MBTI: IXTP - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
Chuck Ebbert wrote: In-Reply-To: [EMAIL PROTECTED] On Wed, 20 Dec 2006 14:21:03 +, Alistair John Strachan wrote: Any ideas? BUG: unable to handle kernel NULL pointer dereference at virtual address 0009 83 ca 10 or $0x10,%edx 3b.byte 0x3b 87 68 01 xchg %ebp,0x1(%eax) = 00 00 add%al,(%eax) Somehow it is trying to execute code in the middle of an instruction. That almost never works, even when the resulting fragment is a legal opcode. :) The real instruction is: 3b 87 68 01 00 00 00cmp0x168(%edi),%eax I'd guess you have some kind of hardware problem. It could also be a kernel problem where the saved address was corrupted during an interrupt, but that's not likely. This looks rather strange. The times I have seen this sort of problem is: 1) when one bit of the kernel is corrupting another part of it. 2) Kernel modules compiled with different gcc than rest of kernel. 3) kernel headers do not match the kernel being used. One way to start tracking this down would be to run it with the fewest amount of kernel modules loaded as one can, but still reproduce the problem. James - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Saturday 30 December 2006 17:21, Chuck Ebbert wrote: In-Reply-To: [EMAIL PROTECTED] On Sat, 30 Dec 2006 16:59:35 +, Alistair John Strachan wrote: I've eliminated 2.6.19.1 as the culprit, and also tried toggling optimize for size, various debug options. 2.6.19 compiled with GCC 4.1.1 on an Via Nehemiah C3-2 seems to crash in pipe_poll reliably, within approximately 12 hours. Which CPU are you compiling for? You should try different options. I should, I haven't thought of that. Currently it's compiling for CONFIG_MVIAC3_2, but I could try i686 for example. Can you post disassembly of pipe_poll() for both the one that crashes and the one that doesn't? Use 'objdump -D -r fs/pipe.o' so we get the relocation info and post just the one function from each for now. Sure, no problem: http://devzero.co.uk/~alistair/2.6.19-via-c3-pipe_poll/ Both use identical configs, neither are optimised for size. The config is available from the same location. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Saturday 30 December 2006 18:06, James Courtier-Dutton wrote: I'd guess you have some kind of hardware problem. It could also be a kernel problem where the saved address was corrupted during an interrupt, but that's not likely. This looks rather strange. [snip] 2) Kernel modules compiled with different gcc than rest of kernel. Previously there was only one GCC version (4.1.1 totally replaced 3.4.3, and is the system wide GCC), now I have installed 3.4.6 into /opt/gcc-3.4.6 and it is only PATH'ed explicitly by me when I wish to compile a kernel using it: export PATH=/opt/gcc-3.4.6/bin:$PATH cp /boot/config-2.6.19-test .config make oldconfig make 3) kernel headers do not match the kernel being used. The tree is a pristine 2.6.19. One way to start tracking this down would be to run it with the fewest amount of kernel modules loaded as one can, but still reproduce the problem. Crippling the machine, though. Impractical for something that isn't immediately reproducible. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Thursday 28 December 2006 04:02, Alistair John Strachan wrote: > On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote: > [snip] > > > > Here's a current decompilation of vmlinux/pipe_poll() from the running > > > kernel, the addresses have changed slightly. There's no xchg there > > > either: > > > > Could you reproduce the bug by the new kernel, so we could get the exact > > address and instruction of the bug? > > It crashed again, but this time with no output (machine locked solid). To > be honest, the disassembly looks right (it's like Chuck said, it's jumping > back half way through an instruction): > > c0156f5f: 3b 87 68 01 00 00 cmp0x168(%edi),%eax > > So c0156f60 is 87 68 01 00 00.. > > This is with the GCC recompile, so it's not a distro problem. It could > still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious. > 2.6.19 with GCC 3.4.3 is 100% stable. Looks like a similar crash here: http://ubuntuforums.org/showthread.php?p=1803389 -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote: [snip] > > Here's a current decompilation of vmlinux/pipe_poll() from the running > > kernel, the addresses have changed slightly. There's no xchg there > > either: > > Could you reproduce the bug by the new kernel, so we could get the exact > address and instruction of the bug? It crashed again, but this time with no output (machine locked solid). To be honest, the disassembly looks right (it's like Chuck said, it's jumping back half way through an instruction): c0156f5f: 3b 87 68 01 00 00 cmp0x168(%edi),%eax So c0156f60 is 87 68 01 00 00.. This is with the GCC recompile, so it's not a distro problem. It could still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious. 2.6.19 with GCC 3.4.3 is 100% stable. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Wed, 2006-12-27 at 12:35 +, Alistair John Strachan wrote: > On Wednesday 27 December 2006 02:07, Zhang, Yanmin wrote: > [snip] > > > Call Trace: > > > [] do_sys_poll+0x253/0x480 > > > [] sys_poll+0x33/0x50 > > > [] syscall_call+0x7/0xb > > > [] 0xb7f26402 > > > === > > > Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 > > > 89 c8 8b 75 > > > f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 45 > > > ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00 > > > > Above codes look weird. Could you disassemble kernel image and post > > the part around address 0xc0156f60? > > > > "87 68 01 00 00" is instruction xchg, but if I disassemble from the > > begining, I couldn't see instruct xchg. > > > > > EIP: [] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c > > Unfortunately, after suspecting the toolchain, I did a manual rebuild of > binutils, gcc and glibc from the official sites, and then rebuilt 2.6.19.1. > This might upset the decompile below, versus the original report. > > Assuming it's NOT a bug in my distro's toolchain (because I am now running > the > GNU stuff), it'll crash again, so this is still useful. > > Here's a current decompilation of vmlinux/pipe_poll() from the running > kernel, > the addresses have changed slightly. There's no xchg there either: Could you reproduce the bug by the new kernel, so we could get the exact address and instruction of the bug? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Wednesday 27 December 2006 02:07, Zhang, Yanmin wrote: [snip] > > Call Trace: > > [] do_sys_poll+0x253/0x480 > > [] sys_poll+0x33/0x50 > > [] syscall_call+0x7/0xb > > [] 0xb7f26402 > > === > > Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 > > 89 c8 8b 75 > > f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 45 > > ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00 > > Above codes look weird. Could you disassemble kernel image and post > the part around address 0xc0156f60? > > "87 68 01 00 00" is instruction xchg, but if I disassemble from the > begining, I couldn't see instruct xchg. > > > EIP: [] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c Unfortunately, after suspecting the toolchain, I did a manual rebuild of binutils, gcc and glibc from the official sites, and then rebuilt 2.6.19.1. This might upset the decompile below, versus the original report. Assuming it's NOT a bug in my distro's toolchain (because I am now running the GNU stuff), it'll crash again, so this is still useful. Here's a current decompilation of vmlinux/pipe_poll() from the running kernel, the addresses have changed slightly. There's no xchg there either: c0156ec0 : c0156ec0: 55 push %ebp c0156ec1: 89 e5 mov%esp,%ebp c0156ec3: 83 ec 10sub$0x10,%esp c0156ec6: 89 5d f4mov%ebx,0xfff4(%ebp) c0156ec9: 85 d2 test %edx,%edx c0156ecb: 89 d3 mov%edx,%ebx c0156ecd: 89 75 f8mov%esi,0xfff8(%ebp) c0156ed0: 89 c6 mov%eax,%esi c0156ed2: 89 7d fcmov%edi,0xfffc(%ebp) c0156ed5: 8b 40 08mov0x8(%eax),%eax c0156ed8: 8b 40 08mov0x8(%eax),%eax c0156edb: 8b b8 f0 00 00 00 mov0xf0(%eax),%edi c0156ee1: 74 0c je c0156eef c0156ee3: 85 ff test %edi,%edi c0156ee5: 74 08 je c0156eef c0156ee7: 89 d1 mov%edx,%ecx c0156ee9: 89 f0 mov%esi,%eax c0156eeb: 89 fa mov%edi,%edx c0156eed: ff 13 call *(%ebx) c0156eef: 0f b7 5e 1c movzwl 0x1c(%esi),%ebx c0156ef3: 31 c9 xor%ecx,%ecx c0156ef5: 8b 47 08mov0x8(%edi),%eax c0156ef8: f6 c3 01test $0x1,%bl c0156efb: 89 45 f0mov%eax,0xfff0(%ebp) c0156efe: 74 20 je c0156f20 c0156f00: 85 c0 test %eax,%eax c0156f02: b8 41 00 00 00 mov$0x41,%eax c0156f07: 0f 4f c8cmovg %eax,%ecx c0156f0a: 8b 87 5c 01 00 00 mov0x15c(%edi),%eax c0156f10: 85 c0 test %eax,%eax c0156f12: 74 43 je c0156f57 c0156f14: 8d b6 00 00 00 00 lea0x0(%esi),%esi c0156f1a: 8d bf 00 00 00 00 lea0x0(%edi),%edi c0156f20: f6 c3 02test $0x2,%bl c0156f23: 74 23 je c0156f48 c0156f25: 83 7d f0 0f cmpl $0xf,0xfff0(%ebp) c0156f29: b8 04 01 00 00 mov$0x104,%eax c0156f2e: ba 00 00 00 00 mov$0x0,%edx c0156f33: 8b 9f 58 01 00 00 mov0x158(%edi),%ebx c0156f39: 0f 4f c2cmovg %edx,%eax c0156f3c: 09 c1 or %eax,%ecx c0156f3e: 89 c8 mov%ecx,%eax c0156f40: 83 c8 08or $0x8,%eax c0156f43: 85 db test %ebx,%ebx c0156f45: 0f 44 c8cmove %eax,%ecx c0156f48: 8b 5d f4mov0xfff4(%ebp),%ebx c0156f4b: 89 c8 mov%ecx,%eax c0156f4d: 8b 75 f8mov0xfff8(%ebp),%esi c0156f50: 8b 7d fcmov0xfffc(%ebp),%edi c0156f53: 89 ec mov%ebp,%esp c0156f55: 5d pop%ebp c0156f56: c3 ret c0156f57: 89 ca mov%ecx,%edx c0156f59: 8b 46 6cmov0x6c(%esi),%eax c0156f5c: 83 ca 10or $0x10,%edx c0156f5f: 3b 87 68 01 00 00 cmp0x168(%edi),%eax c0156f65: 0f 45 cacmovne %edx,%ecx c0156f68: eb b6 jmpc0156f20 c0156f6a: 8d b6 00 00 00 00 lea0x0(%esi),%esi -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at
Re: Oops in 2.6.19.1
On Wednesday 27 December 2006 02:07, Zhang, Yanmin wrote: [snip] Call Trace: [c015d7f3] do_sys_poll+0x253/0x480 [c015da53] sys_poll+0x33/0x50 [c0102c97] syscall_call+0x7/0xb [b7f26402] 0xb7f26402 === Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b 87 68 01 00 00 0f 45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00 Above codes look weird. Could you disassemble kernel image and post the part around address 0xc0156f60? 87 68 01 00 00 is instruction xchg, but if I disassemble from the begining, I couldn't see instruct xchg. EIP: [c0156f60] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c Unfortunately, after suspecting the toolchain, I did a manual rebuild of binutils, gcc and glibc from the official sites, and then rebuilt 2.6.19.1. This might upset the decompile below, versus the original report. Assuming it's NOT a bug in my distro's toolchain (because I am now running the GNU stuff), it'll crash again, so this is still useful. Here's a current decompilation of vmlinux/pipe_poll() from the running kernel, the addresses have changed slightly. There's no xchg there either: c0156ec0 pipe_poll: c0156ec0: 55 push %ebp c0156ec1: 89 e5 mov%esp,%ebp c0156ec3: 83 ec 10sub$0x10,%esp c0156ec6: 89 5d f4mov%ebx,0xfff4(%ebp) c0156ec9: 85 d2 test %edx,%edx c0156ecb: 89 d3 mov%edx,%ebx c0156ecd: 89 75 f8mov%esi,0xfff8(%ebp) c0156ed0: 89 c6 mov%eax,%esi c0156ed2: 89 7d fcmov%edi,0xfffc(%ebp) c0156ed5: 8b 40 08mov0x8(%eax),%eax c0156ed8: 8b 40 08mov0x8(%eax),%eax c0156edb: 8b b8 f0 00 00 00 mov0xf0(%eax),%edi c0156ee1: 74 0c je c0156eef pipe_poll+0x2f c0156ee3: 85 ff test %edi,%edi c0156ee5: 74 08 je c0156eef pipe_poll+0x2f c0156ee7: 89 d1 mov%edx,%ecx c0156ee9: 89 f0 mov%esi,%eax c0156eeb: 89 fa mov%edi,%edx c0156eed: ff 13 call *(%ebx) c0156eef: 0f b7 5e 1c movzwl 0x1c(%esi),%ebx c0156ef3: 31 c9 xor%ecx,%ecx c0156ef5: 8b 47 08mov0x8(%edi),%eax c0156ef8: f6 c3 01test $0x1,%bl c0156efb: 89 45 f0mov%eax,0xfff0(%ebp) c0156efe: 74 20 je c0156f20 pipe_poll+0x60 c0156f00: 85 c0 test %eax,%eax c0156f02: b8 41 00 00 00 mov$0x41,%eax c0156f07: 0f 4f c8cmovg %eax,%ecx c0156f0a: 8b 87 5c 01 00 00 mov0x15c(%edi),%eax c0156f10: 85 c0 test %eax,%eax c0156f12: 74 43 je c0156f57 pipe_poll+0x97 c0156f14: 8d b6 00 00 00 00 lea0x0(%esi),%esi c0156f1a: 8d bf 00 00 00 00 lea0x0(%edi),%edi c0156f20: f6 c3 02test $0x2,%bl c0156f23: 74 23 je c0156f48 pipe_poll+0x88 c0156f25: 83 7d f0 0f cmpl $0xf,0xfff0(%ebp) c0156f29: b8 04 01 00 00 mov$0x104,%eax c0156f2e: ba 00 00 00 00 mov$0x0,%edx c0156f33: 8b 9f 58 01 00 00 mov0x158(%edi),%ebx c0156f39: 0f 4f c2cmovg %edx,%eax c0156f3c: 09 c1 or %eax,%ecx c0156f3e: 89 c8 mov%ecx,%eax c0156f40: 83 c8 08or $0x8,%eax c0156f43: 85 db test %ebx,%ebx c0156f45: 0f 44 c8cmove %eax,%ecx c0156f48: 8b 5d f4mov0xfff4(%ebp),%ebx c0156f4b: 89 c8 mov%ecx,%eax c0156f4d: 8b 75 f8mov0xfff8(%ebp),%esi c0156f50: 8b 7d fcmov0xfffc(%ebp),%edi c0156f53: 89 ec mov%ebp,%esp c0156f55: 5d pop%ebp c0156f56: c3 ret c0156f57: 89 ca mov%ecx,%edx c0156f59: 8b 46 6cmov0x6c(%esi),%eax c0156f5c: 83 ca 10or $0x10,%edx c0156f5f: 3b 87 68 01 00 00 cmp0x168(%edi),%eax c0156f65: 0f 45 cacmovne %edx,%ecx c0156f68: eb b6 jmpc0156f20 pipe_poll+0x60 c0156f6a: 8d b6 00 00 00 00 lea0x0(%esi),%esi -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line
Re: Oops in 2.6.19.1
On Wed, 2006-12-27 at 12:35 +, Alistair John Strachan wrote: On Wednesday 27 December 2006 02:07, Zhang, Yanmin wrote: [snip] Call Trace: [c015d7f3] do_sys_poll+0x253/0x480 [c015da53] sys_poll+0x33/0x50 [c0102c97] syscall_call+0x7/0xb [b7f26402] 0xb7f26402 === Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b 87 68 01 00 00 0f 45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00 Above codes look weird. Could you disassemble kernel image and post the part around address 0xc0156f60? 87 68 01 00 00 is instruction xchg, but if I disassemble from the begining, I couldn't see instruct xchg. EIP: [c0156f60] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c Unfortunately, after suspecting the toolchain, I did a manual rebuild of binutils, gcc and glibc from the official sites, and then rebuilt 2.6.19.1. This might upset the decompile below, versus the original report. Assuming it's NOT a bug in my distro's toolchain (because I am now running the GNU stuff), it'll crash again, so this is still useful. Here's a current decompilation of vmlinux/pipe_poll() from the running kernel, the addresses have changed slightly. There's no xchg there either: Could you reproduce the bug by the new kernel, so we could get the exact address and instruction of the bug? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote: [snip] Here's a current decompilation of vmlinux/pipe_poll() from the running kernel, the addresses have changed slightly. There's no xchg there either: Could you reproduce the bug by the new kernel, so we could get the exact address and instruction of the bug? It crashed again, but this time with no output (machine locked solid). To be honest, the disassembly looks right (it's like Chuck said, it's jumping back half way through an instruction): c0156f5f: 3b 87 68 01 00 00 cmp0x168(%edi),%eax So c0156f60 is 87 68 01 00 00.. This is with the GCC recompile, so it's not a distro problem. It could still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious. 2.6.19 with GCC 3.4.3 is 100% stable. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Thursday 28 December 2006 04:02, Alistair John Strachan wrote: On Thursday 28 December 2006 02:41, Zhang, Yanmin wrote: [snip] Here's a current decompilation of vmlinux/pipe_poll() from the running kernel, the addresses have changed slightly. There's no xchg there either: Could you reproduce the bug by the new kernel, so we could get the exact address and instruction of the bug? It crashed again, but this time with no output (machine locked solid). To be honest, the disassembly looks right (it's like Chuck said, it's jumping back half way through an instruction): c0156f5f: 3b 87 68 01 00 00 cmp0x168(%edi),%eax So c0156f60 is 87 68 01 00 00.. This is with the GCC recompile, so it's not a distro problem. It could still either be GCC 4.x, or a 2.6.19.1 specific bug, but it's serious. 2.6.19 with GCC 3.4.3 is 100% stable. Looks like a similar crash here: http://ubuntuforums.org/showthread.php?p=1803389 -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sat, 2006-12-23 at 15:40 +, Alistair John Strachan wrote: > On Wednesday 20 December 2006 14:21, Alistair John Strachan wrote: > > Hi, > > > > Any ideas? > > Pretty much like clockwork, it happened again. I think it's time to take this > seriously as a software bug, and not some hardware problem. I've ran kernels > since 2.6.0 on this machine without such crashes, and now two of the same in > 2.6.19.1? Pretty unlikely! > > BUG: unable to handle kernel NULL pointer dereference at virtual address > 0009 > printing eip: > c0156f60 > *pde = > Oops: 0002 [#1] > Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat > xt_sta > te iptable_filter ip_tables x_tables prism54 yenta_socket rsrc_nonstatic > pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus > snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore > usblp ehci_hcd eth1394 uhci_hcd usbcore ohci1394 i > eee1394 via_agp agpgart vt1211 hwmon_vid hwmon ip_nat_ftp ip_nat > ip_conntrack_ftp ip_conntrack > CPU:0 > EIP:0060:[]Not tainted VLI > EFLAGS: 00010246 (2.6.19.1 #1) > EIP is at pipe_poll+0xa0/0xb0 > eax: 0008 ebx: ecx: 0008 edx: > esi: ee1b9e9c edi: f4d80a00 ebp: ee1b9c1c esp: ee1b9c0c > ds: 007b es: 007b ss: 0068 > Process java (pid: 5374, ti=ee1b8000 task=f7117560 task.ti=ee1b8000) > Stack: ee1b9e9c f6c17160 ee1b9fa4 c015d7f3 ee1b9c54 ee1b9fac >082dff90 0010 082dffa0 ee1b9e94 ee1b9e94 0002 ee1b9eac > ee1b9e94 c015e580 0002 f6c17160 > Call Trace: > [] do_sys_poll+0x253/0x480 > [] sys_poll+0x33/0x50 > [] syscall_call+0x7/0xb > [] 0xb7f26402 > === > Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 > 8b 75 > f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 45 ca > eb b6 8d b6 00 00 00 00 55 b8 01 00 00 Above codes look weird. Could you disassemble kernel image and post the part around address 0xc0156f60? "87 68 01 00 00" is instruction xchg, but if I disassemble from the begining, I couldn't see instruct xchg. > EIP: [] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sat, 2006-12-23 at 15:40 +, Alistair John Strachan wrote: On Wednesday 20 December 2006 14:21, Alistair John Strachan wrote: Hi, Any ideas? Pretty much like clockwork, it happened again. I think it's time to take this seriously as a software bug, and not some hardware problem. I've ran kernels since 2.6.0 on this machine without such crashes, and now two of the same in 2.6.19.1? Pretty unlikely! BUG: unable to handle kernel NULL pointer dereference at virtual address 0009 printing eip: c0156f60 *pde = Oops: 0002 [#1] Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat xt_sta te iptable_filter ip_tables x_tables prism54 yenta_socket rsrc_nonstatic pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore usblp ehci_hcd eth1394 uhci_hcd usbcore ohci1394 i eee1394 via_agp agpgart vt1211 hwmon_vid hwmon ip_nat_ftp ip_nat ip_conntrack_ftp ip_conntrack CPU:0 EIP:0060:[c0156f60]Not tainted VLI EFLAGS: 00010246 (2.6.19.1 #1) EIP is at pipe_poll+0xa0/0xb0 eax: 0008 ebx: ecx: 0008 edx: esi: ee1b9e9c edi: f4d80a00 ebp: ee1b9c1c esp: ee1b9c0c ds: 007b es: 007b ss: 0068 Process java (pid: 5374, ti=ee1b8000 task=f7117560 task.ti=ee1b8000) Stack: ee1b9e9c f6c17160 ee1b9fa4 c015d7f3 ee1b9c54 ee1b9fac 082dff90 0010 082dffa0 ee1b9e94 ee1b9e94 0002 ee1b9eac ee1b9e94 c015e580 0002 f6c17160 Call Trace: [c015d7f3] do_sys_poll+0x253/0x480 [c015da53] sys_poll+0x33/0x50 [c0102c97] syscall_call+0x7/0xb [b7f26402] 0xb7f26402 === Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b 87 68 01 00 00 0f 45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00 Above codes look weird. Could you disassemble kernel image and post the part around address 0xc0156f60? 87 68 01 00 00 is instruction xchg, but if I disassemble from the begining, I couldn't see instruct xchg. EIP: [c0156f60] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sunday 24 December 2006 04:23, Chuck Ebbert wrote: [snip] > Anyway, post your complete .config. Config attached. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. # # Automatically generated make config: don't edit # Linux kernel version: 2.6.19.1 # Sat Dec 16 19:30:00 2006 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="" # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_IPC_NS is not set CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_UTS_NS is not set # CONFIG_AUDIT is not set # CONFIG_IKCONFIG is not set # CONFIG_RELAY is not set CONFIG_INITRAMFS_SOURCE="" # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # CONFIG_SLOB is not set # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y # # Block layer # CONFIG_BLOCK=y # CONFIG_LBD is not set # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y # CONFIG_IOSCHED_DEADLINE is not set # CONFIG_IOSCHED_CFQ is not set CONFIG_DEFAULT_AS=y # CONFIG_DEFAULT_DEADLINE is not set # CONFIG_DEFAULT_CFQ is not set # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED="anticipatory" # # Processor type and features # # CONFIG_SMP is not set CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set CONFIG_MVIAC3_2=y # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_L1_CACHE_SHIFT=5 CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_CMPXCHG64=y CONFIG_X86_ALIGNMENT_16=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_TSC=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y # CONFIG_X86_MCE is not set CONFIG_VM86=y # CONFIG_TOSHIBA is not set # CONFIG_I8K is not set # CONFIG_X86_REBOOTFIXUPS is not set # CONFIG_MICROCODE is not set # CONFIG_X86_MSR is not set # CONFIG_X86_CPUID is not set # # Firmware Drivers # # CONFIG_EDD is not set # CONFIG_DELL_RBU is not set # CONFIG_DCDBAS is not set # CONFIG_NOHIGHMEM is not set CONFIG_HIGHMEM4G=y # CONFIG_HIGHMEM64G is not set CONFIG_PAGE_OFFSET=0xC000 CONFIG_HIGHMEM=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_SPARSEMEM_STATIC=y CONFIG_SPLIT_PTLOCK_CPUS=4 # CONFIG_RESOURCES_64BIT is not set # CONFIG_HIGHPTE is not set # CONFIG_MATH_EMULATION is not set CONFIG_MTRR=y # CONFIG_EFI is not set CONFIG_REGPARM=y # CONFIG_SECCOMP is not set # CONFIG_HZ_100 is not set # CONFIG_HZ_250 is not set CONFIG_HZ_1000=y CONFIG_HZ=1000 # CONFIG_KEXEC is not set # CONFIG_CRASH_DUMP is not set CONFIG_PHYSICAL_START=0x10 # CONFIG_COMPAT_VDSO is not set CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y # # Power management options (ACPI,
Re: Oops in 2.6.19.1
On Sunday 24 December 2006 04:23, Chuck Ebbert wrote: > In-Reply-To: <[EMAIL PROTECTED]> > > On Sat, 23 Dec 2006 15:40:46 +, Alistair John Strachan wrote: > > Pretty much like clockwork, it happened again. I think it's time to take > > this seriously as a software bug, and not some hardware problem. I've ran > > kernels since 2.6.0 on this machine without such crashes, and now two of > > the same in 2.6.19.1? Pretty unlikely! > > Stranger things have happened, e.g. your system might have started > to overheat just recently. True, I've considered it, I'll replace the CPU fan. > Anyway, post your complete .config. And exactly which one of the > many Via cpus are you using? Are you using the Padlock unit? No, much older than that: [alistair] 14:38 [~] cat /proc/cpuinfo processor : 0 vendor_id : CentaurHauls cpu family : 6 model : 9 model name : VIA Nehemiah stepping: 1 cpu MHz : 999.569 cache size : 64 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu de tsc msr cx8 mtrr pge cmov mmx fxsr sse fxsr_opt bogomips: 2000.02 > What do those java/python programs do that are running? What pipe > are they polling? > > You could try going back to 2.6.18.x for a while in the meantime. Well, I have had a thought. I recently upgraded the toolchain on the machine from binutils 2.16.x and GCC 3.4.3 (2.6.19 was built with this) to binutils 2.17 and GCC 4.1.1. It's conceivable that this is some sort of compiler bug. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sunday 24 December 2006 04:23, Chuck Ebbert wrote: In-Reply-To: [EMAIL PROTECTED] On Sat, 23 Dec 2006 15:40:46 +, Alistair John Strachan wrote: Pretty much like clockwork, it happened again. I think it's time to take this seriously as a software bug, and not some hardware problem. I've ran kernels since 2.6.0 on this machine without such crashes, and now two of the same in 2.6.19.1? Pretty unlikely! Stranger things have happened, e.g. your system might have started to overheat just recently. True, I've considered it, I'll replace the CPU fan. Anyway, post your complete .config. And exactly which one of the many Via cpus are you using? Are you using the Padlock unit? No, much older than that: [alistair] 14:38 [~] cat /proc/cpuinfo processor : 0 vendor_id : CentaurHauls cpu family : 6 model : 9 model name : VIA Nehemiah stepping: 1 cpu MHz : 999.569 cache size : 64 KB fdiv_bug: no hlt_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu de tsc msr cx8 mtrr pge cmov mmx fxsr sse fxsr_opt bogomips: 2000.02 What do those java/python programs do that are running? What pipe are they polling? You could try going back to 2.6.18.x for a while in the meantime. Well, I have had a thought. I recently upgraded the toolchain on the machine from binutils 2.16.x and GCC 3.4.3 (2.6.19 was built with this) to binutils 2.17 and GCC 4.1.1. It's conceivable that this is some sort of compiler bug. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Sunday 24 December 2006 04:23, Chuck Ebbert wrote: [snip] Anyway, post your complete .config. Config attached. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. # # Automatically generated make config: don't edit # Linux kernel version: 2.6.19.1 # Sat Dec 16 19:30:00 2006 # CONFIG_X86_32=y CONFIG_GENERIC_TIME=y CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_SEMAPHORE_SLEEPERS=y CONFIG_X86=y CONFIG_MMU=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_DMI=y CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_BROKEN_ON_SMP=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION= # CONFIG_LOCALVERSION_AUTO is not set CONFIG_SWAP=y CONFIG_SYSVIPC=y # CONFIG_IPC_NS is not set CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set # CONFIG_TASKSTATS is not set # CONFIG_UTS_NS is not set # CONFIG_AUDIT is not set # CONFIG_IKCONFIG is not set # CONFIG_RELAY is not set CONFIG_INITRAMFS_SOURCE= # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set CONFIG_SYSCTL=y # CONFIG_EMBEDDED is not set CONFIG_UID16=y CONFIG_SYSCTL_SYSCALL=y CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_ALL is not set # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_HOTPLUG=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_SLAB=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_RT_MUTEXES=y # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # CONFIG_SLOB is not set # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y # # Block layer # CONFIG_BLOCK=y # CONFIG_LBD is not set # CONFIG_BLK_DEV_IO_TRACE is not set # CONFIG_LSF is not set # # IO Schedulers # CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y # CONFIG_IOSCHED_DEADLINE is not set # CONFIG_IOSCHED_CFQ is not set CONFIG_DEFAULT_AS=y # CONFIG_DEFAULT_DEADLINE is not set # CONFIG_DEFAULT_CFQ is not set # CONFIG_DEFAULT_NOOP is not set CONFIG_DEFAULT_IOSCHED=anticipatory # # Processor type and features # # CONFIG_SMP is not set CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MEFFICEON is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MGEODEGX1 is not set # CONFIG_MGEODE_LX is not set # CONFIG_MCYRIXIII is not set CONFIG_MVIAC3_2=y # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_L1_CACHE_SHIFT=5 CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_CMPXCHG64=y CONFIG_X86_ALIGNMENT_16=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_TSC=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y # CONFIG_PREEMPT_NONE is not set CONFIG_PREEMPT_VOLUNTARY=y # CONFIG_PREEMPT is not set CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y # CONFIG_X86_MCE is not set CONFIG_VM86=y # CONFIG_TOSHIBA is not set # CONFIG_I8K is not set # CONFIG_X86_REBOOTFIXUPS is not set # CONFIG_MICROCODE is not set # CONFIG_X86_MSR is not set # CONFIG_X86_CPUID is not set # # Firmware Drivers # # CONFIG_EDD is not set # CONFIG_DELL_RBU is not set # CONFIG_DCDBAS is not set # CONFIG_NOHIGHMEM is not set CONFIG_HIGHMEM4G=y # CONFIG_HIGHMEM64G is not set CONFIG_PAGE_OFFSET=0xC000 CONFIG_HIGHMEM=y CONFIG_ARCH_FLATMEM_ENABLE=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_SELECT_MEMORY_MODEL=y CONFIG_FLATMEM_MANUAL=y # CONFIG_DISCONTIGMEM_MANUAL is not set # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_SPARSEMEM_STATIC=y CONFIG_SPLIT_PTLOCK_CPUS=4 # CONFIG_RESOURCES_64BIT is not set # CONFIG_HIGHPTE is not set # CONFIG_MATH_EMULATION is not set CONFIG_MTRR=y # CONFIG_EFI is not set CONFIG_REGPARM=y # CONFIG_SECCOMP is not set # CONFIG_HZ_100 is not set # CONFIG_HZ_250 is not set CONFIG_HZ_1000=y CONFIG_HZ=1000 # CONFIG_KEXEC is not set # CONFIG_CRASH_DUMP is not set CONFIG_PHYSICAL_START=0x10 # CONFIG_COMPAT_VDSO is not set CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y # # Power management options (ACPI, APM) #
Re: Oops in 2.6.19.1
On Wednesday 20 December 2006 14:21, Alistair John Strachan wrote: > Hi, > > Any ideas? Pretty much like clockwork, it happened again. I think it's time to take this seriously as a software bug, and not some hardware problem. I've ran kernels since 2.6.0 on this machine without such crashes, and now two of the same in 2.6.19.1? Pretty unlikely! BUG: unable to handle kernel NULL pointer dereference at virtual address 0009 printing eip: c0156f60 *pde = Oops: 0002 [#1] Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat xt_sta te iptable_filter ip_tables x_tables prism54 yenta_socket rsrc_nonstatic pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore usblp ehci_hcd eth1394 uhci_hcd usbcore ohci1394 i eee1394 via_agp agpgart vt1211 hwmon_vid hwmon ip_nat_ftp ip_nat ip_conntrack_ftp ip_conntrack CPU:0 EIP:0060:[]Not tainted VLI EFLAGS: 00010246 (2.6.19.1 #1) EIP is at pipe_poll+0xa0/0xb0 eax: 0008 ebx: ecx: 0008 edx: esi: ee1b9e9c edi: f4d80a00 ebp: ee1b9c1c esp: ee1b9c0c ds: 007b es: 007b ss: 0068 Process java (pid: 5374, ti=ee1b8000 task=f7117560 task.ti=ee1b8000) Stack: ee1b9e9c f6c17160 ee1b9fa4 c015d7f3 ee1b9c54 ee1b9fac 082dff90 0010 082dffa0 ee1b9e94 ee1b9e94 0002 ee1b9eac ee1b9e94 c015e580 0002 f6c17160 Call Trace: [] do_sys_poll+0x253/0x480 [] sys_poll+0x33/0x50 [] syscall_call+0x7/0xb [] 0xb7f26402 === Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00 EIP: [] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Wednesday 20 December 2006 14:21, Alistair John Strachan wrote: Hi, Any ideas? Pretty much like clockwork, it happened again. I think it's time to take this seriously as a software bug, and not some hardware problem. I've ran kernels since 2.6.0 on this machine without such crashes, and now two of the same in 2.6.19.1? Pretty unlikely! BUG: unable to handle kernel NULL pointer dereference at virtual address 0009 printing eip: c0156f60 *pde = Oops: 0002 [#1] Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat xt_sta te iptable_filter ip_tables x_tables prism54 yenta_socket rsrc_nonstatic pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore usblp ehci_hcd eth1394 uhci_hcd usbcore ohci1394 i eee1394 via_agp agpgart vt1211 hwmon_vid hwmon ip_nat_ftp ip_nat ip_conntrack_ftp ip_conntrack CPU:0 EIP:0060:[c0156f60]Not tainted VLI EFLAGS: 00010246 (2.6.19.1 #1) EIP is at pipe_poll+0xa0/0xb0 eax: 0008 ebx: ecx: 0008 edx: esi: ee1b9e9c edi: f4d80a00 ebp: ee1b9c1c esp: ee1b9c0c ds: 007b es: 007b ss: 0068 Process java (pid: 5374, ti=ee1b8000 task=f7117560 task.ti=ee1b8000) Stack: ee1b9e9c f6c17160 ee1b9fa4 c015d7f3 ee1b9c54 ee1b9fac 082dff90 0010 082dffa0 ee1b9e94 ee1b9e94 0002 ee1b9eac ee1b9e94 c015e580 0002 f6c17160 Call Trace: [c015d7f3] do_sys_poll+0x253/0x480 [c015da53] sys_poll+0x33/0x50 [c0102c97] syscall_call+0x7/0xb [b7f26402] 0xb7f26402 === Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b 87 68 01 00 00 0f 45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00 EIP: [c0156f60] pipe_poll+0xa0/0xb0 SS:ESP 0068:ee1b9c0c -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Wed, 20 Dec 2006 22:15:50 GMT, Alistair John Strachan said: > Seems pretty unlikely on a 4 year old Via Epia. Never had any problems with it > before now. > > Maybe a cosmic ray event? ;-) More likely a stray alpha particle from a radioactive decay in the actual chip casing - I saw some research a while back that said that the average commodity system should *expect* to see 1 or 2 alpha-induced single-bit errors per year, and the chance that *you* saw the event was directly related to whether the memory had ECC, and how much of the other circuitry had ECC on it pgpTomvw9InXj.pgp Description: PGP signature
Re: Oops in 2.6.19.1
On Thursday 21 December 2006 08:05, Chuck Ebbert wrote: > In-Reply-To: <[EMAIL PROTECTED]> > > On Wed, 20 Dec 2006 22:15:50 +, Alistair John Strachan wrote: > > > I'd guess you have some kind of hardware problem. It could also be > > > a kernel problem where the saved address was corrupted during an > > > interrupt, but that's not likely. > > > > Seems pretty unlikely on a 4 year old Via Epia. Never had any problems > > with it before now. > > > > Maybe a cosmic ray event? ;-) > > The low byte of eip should be 5f and it changed to 60, so that's > probably not it. And the oops report is consistent with that being > the instruction that was really executed, so it's not the kernel > misreporting the address after it happened. > > You weren't trying kprobes or something, were you? Have you ever > had another unexplained oops with this machine? Nope, it's a stock kernel and it's running on a server, kprobes isn't in use. And no, to my knowledge there's not been another "unexplained" oops. I've had crashes, but they've always been known issues or BIOS trouble. The machine was recently tampered with to install additional HDDs, but the memory was memtest'ed when it was installed and passed several times without issue. I'm rather puzzled. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
In-Reply-To: <[EMAIL PROTECTED]> On Wed, 20 Dec 2006 22:15:50 +, Alistair John Strachan wrote: > > I'd guess you have some kind of hardware problem. It could also be > > a kernel problem where the saved address was corrupted during an > > interrupt, but that's not likely. > > Seems pretty unlikely on a 4 year old Via Epia. Never had any problems with > it > before now. > > Maybe a cosmic ray event? ;-) The low byte of eip should be 5f and it changed to 60, so that's probably not it. And the oops report is consistent with that being the instruction that was really executed, so it's not the kernel misreporting the address after it happened. You weren't trying kprobes or something, were you? Have you ever had another unexplained oops with this machine? -- MBTI: IXTP - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
In-Reply-To: [EMAIL PROTECTED] On Wed, 20 Dec 2006 22:15:50 +, Alistair John Strachan wrote: I'd guess you have some kind of hardware problem. It could also be a kernel problem where the saved address was corrupted during an interrupt, but that's not likely. Seems pretty unlikely on a 4 year old Via Epia. Never had any problems with it before now. Maybe a cosmic ray event? ;-) The low byte of eip should be 5f and it changed to 60, so that's probably not it. And the oops report is consistent with that being the instruction that was really executed, so it's not the kernel misreporting the address after it happened. You weren't trying kprobes or something, were you? Have you ever had another unexplained oops with this machine? -- MBTI: IXTP - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Thursday 21 December 2006 08:05, Chuck Ebbert wrote: In-Reply-To: [EMAIL PROTECTED] On Wed, 20 Dec 2006 22:15:50 +, Alistair John Strachan wrote: I'd guess you have some kind of hardware problem. It could also be a kernel problem where the saved address was corrupted during an interrupt, but that's not likely. Seems pretty unlikely on a 4 year old Via Epia. Never had any problems with it before now. Maybe a cosmic ray event? ;-) The low byte of eip should be 5f and it changed to 60, so that's probably not it. And the oops report is consistent with that being the instruction that was really executed, so it's not the kernel misreporting the address after it happened. You weren't trying kprobes or something, were you? Have you ever had another unexplained oops with this machine? Nope, it's a stock kernel and it's running on a server, kprobes isn't in use. And no, to my knowledge there's not been another unexplained oops. I've had crashes, but they've always been known issues or BIOS trouble. The machine was recently tampered with to install additional HDDs, but the memory was memtest'ed when it was installed and passed several times without issue. I'm rather puzzled. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Wed, 20 Dec 2006 22:15:50 GMT, Alistair John Strachan said: Seems pretty unlikely on a 4 year old Via Epia. Never had any problems with it before now. Maybe a cosmic ray event? ;-) More likely a stray alpha particle from a radioactive decay in the actual chip casing - I saw some research a while back that said that the average commodity system should *expect* to see 1 or 2 alpha-induced single-bit errors per year, and the chance that *you* saw the event was directly related to whether the memory had ECC, and how much of the other circuitry had ECC on it pgpTomvw9InXj.pgp Description: PGP signature
Re: Oops in 2.6.19.1
On Wednesday 20 December 2006 20:48, Chuck Ebbert wrote: [snip] > I'd guess you have some kind of hardware problem. It could also be > a kernel problem where the saved address was corrupted during an > interrupt, but that's not likely. Seems pretty unlikely on a 4 year old Via Epia. Never had any problems with it before now. Maybe a cosmic ray event? ;-) -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
In-Reply-To: <[EMAIL PROTECTED]> On Wed, 20 Dec 2006 14:21:03 +, Alistair John Strachan wrote: > Any ideas? > > BUG: unable to handle kernel NULL pointer dereference at virtual address > 0009 83 ca 10 or $0x10,%edx 3b.byte 0x3b 87 68 01 xchg %ebp,0x1(%eax) <= 00 00 add%al,(%eax) Somehow it is trying to execute code in the middle of an instruction. That almost never works, even when the resulting fragment is a legal opcode. :) The real instruction is: 3b 87 68 01 00 00 00cmp0x168(%edi),%eax I'd guess you have some kind of hardware problem. It could also be a kernel problem where the saved address was corrupted during an interrupt, but that's not likely. -- MBTI: IXTP - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Wednesday 20 December 2006 16:30, Greg KH wrote: > On Wed, Dec 20, 2006 at 02:21:03PM +, Alistair John Strachan wrote: > > Hi, > > > > Any ideas? > > Does the problem also happen in 2.6.19? No idea. I ran 2.6.19 for a couple of weeks without problems. It took 2 days to oops 2.6.19.1, so if it happens again within that time period I guess that might be indicative of a -stable patch. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Wed, Dec 20, 2006 at 02:21:03PM +, Alistair John Strachan wrote: > Hi, > > Any ideas? Does the problem also happen in 2.6.19? thanks, greg k-h - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Oops in 2.6.19.1
Hi, Any ideas? BUG: unable to handle kernel NULL pointer dereference at virtual address 0009 printing eip: c0156f60 *pde = Oops: 0002 [#1] Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat xt_state iptable_filter ip_tables x_tables prism54 yenta_socket rsrc_nonstatic pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore ehci_hcd usblp eth1394 uhci_hcd usbcore ohci1394 ieee1394 via_agp agpgart vt1211 hwmon_vid hwmon ip_nat_ftp ip_nat ip_conntrack_ftp ip_conntrack CPU:0 EIP:0060:[]Not tainted VLI EFLAGS: 00010246 (2.6.19.1 #1) EIP is at pipe_poll+0xa0/0xb0 eax: 0008 ebx: ecx: 0008 edx: esi: f70f3e9c edi: f7017c00 ebp: f70f3c1c esp: f70f3c0c ds: 007b es: 007b ss: 0068 Process python (pid: 4178, ti=f70f2000 task=f70c4a90 task.ti=f70f2000) Stack: f70f3e9c f6e111c0 f70f3fa4 c015d7f3 f70f3c54 f70f3fac 084c44a0 0030 084c44d0 f70f3e94 f70f3e94 0006 f70f3ecc f70f3e94 c015e580 0006 f6e111c0 Call Trace: [] do_sys_poll+0x253/0x480 [] sys_poll+0x33/0x50 [] syscall_call+0x7/0xb [] 0xb7f6b402 === Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b <87> 68 01 00 00 0f 45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00 EIP: [] pipe_poll+0xa0/0xb0 SS:ESP 0068:f70f3c0c -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Oops in 2.6.19.1
Hi, Any ideas? BUG: unable to handle kernel NULL pointer dereference at virtual address 0009 printing eip: c0156f60 *pde = Oops: 0002 [#1] Modules linked in: ipt_recent ipt_REJECT xt_tcpudp ipt_MASQUERADE iptable_nat xt_state iptable_filter ip_tables x_tables prism54 yenta_socket rsrc_nonstatic pcmcia_core snd_via82xx snd_ac97_codec snd_ac97_bus snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd soundcore ehci_hcd usblp eth1394 uhci_hcd usbcore ohci1394 ieee1394 via_agp agpgart vt1211 hwmon_vid hwmon ip_nat_ftp ip_nat ip_conntrack_ftp ip_conntrack CPU:0 EIP:0060:[c0156f60]Not tainted VLI EFLAGS: 00010246 (2.6.19.1 #1) EIP is at pipe_poll+0xa0/0xb0 eax: 0008 ebx: ecx: 0008 edx: esi: f70f3e9c edi: f7017c00 ebp: f70f3c1c esp: f70f3c0c ds: 007b es: 007b ss: 0068 Process python (pid: 4178, ti=f70f2000 task=f70c4a90 task.ti=f70f2000) Stack: f70f3e9c f6e111c0 f70f3fa4 c015d7f3 f70f3c54 f70f3fac 084c44a0 0030 084c44d0 f70f3e94 f70f3e94 0006 f70f3ecc f70f3e94 c015e580 0006 f6e111c0 Call Trace: [c015d7f3] do_sys_poll+0x253/0x480 [c015da53] sys_poll+0x33/0x50 [c0102c97] syscall_call+0x7/0xb [b7f6b402] 0xb7f6b402 === Code: 58 01 00 00 0f 4f c2 09 c1 89 c8 83 c8 08 85 db 0f 44 c8 8b 5d f4 89 c8 8b 75 f8 8b 7d fc 89 ec 5d c3 89 ca 8b 46 6c 83 ca 10 3b 87 68 01 00 00 0f 45 ca eb b6 8d b6 00 00 00 00 55 b8 01 00 00 EIP: [c0156f60] pipe_poll+0xa0/0xb0 SS:ESP 0068:f70f3c0c -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Wed, Dec 20, 2006 at 02:21:03PM +, Alistair John Strachan wrote: Hi, Any ideas? Does the problem also happen in 2.6.19? thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Wednesday 20 December 2006 16:30, Greg KH wrote: On Wed, Dec 20, 2006 at 02:21:03PM +, Alistair John Strachan wrote: Hi, Any ideas? Does the problem also happen in 2.6.19? No idea. I ran 2.6.19 for a couple of weeks without problems. It took 2 days to oops 2.6.19.1, so if it happens again within that time period I guess that might be indicative of a -stable patch. -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
In-Reply-To: [EMAIL PROTECTED] On Wed, 20 Dec 2006 14:21:03 +, Alistair John Strachan wrote: Any ideas? BUG: unable to handle kernel NULL pointer dereference at virtual address 0009 83 ca 10 or $0x10,%edx 3b.byte 0x3b 87 68 01 xchg %ebp,0x1(%eax) = 00 00 add%al,(%eax) Somehow it is trying to execute code in the middle of an instruction. That almost never works, even when the resulting fragment is a legal opcode. :) The real instruction is: 3b 87 68 01 00 00 00cmp0x168(%edi),%eax I'd guess you have some kind of hardware problem. It could also be a kernel problem where the saved address was corrupted during an interrupt, but that's not likely. -- MBTI: IXTP - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Oops in 2.6.19.1
On Wednesday 20 December 2006 20:48, Chuck Ebbert wrote: [snip] I'd guess you have some kind of hardware problem. It could also be a kernel problem where the saved address was corrupted during an interrupt, but that's not likely. Seems pretty unlikely on a 4 year old Via Epia. Never had any problems with it before now. Maybe a cosmic ray event? ;-) -- Cheers, Alistair. Final year Computer Science undergraduate. 1F2 55 South Clerk Street, Edinburgh, UK. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/