Re: Panic caused by bad memory?
On Wed, 25 Oct 2006, John Baldwin wrote: On Wednesday 25 October 2006 02:28, Charles Sprickman wrote: On Tue, 24 Oct 2006 [EMAIL PROTECTED] wrote: I can't get a kernel dump since it fails like this each time: dumping to dev #da/0x20001, offset 2097152 dump 1024 1023 1022 1021 Aborting dump due to I/O error. status == 0xb, scsi status == 0x0 failed, reason: i/o error Bad memory seems unlikely to cause an I/O error trying to write the dump to the swap partition. I'd guess a dicey drive -- and bad swap space could also account for the original crash. You might be able to get a backup by booting single user, provided nothing activates the (presumably bad) swap partition. Just for the record, this box is running an Adaptec raid controller (2005S - ZCR card) and swap is coming off a mirrored array. Coincidentally, I have a utility box where it had bad blocks on the swap partition (but no others) - what I saw there is that the box would just hang and spit out a bunch of swap_pager timeout messages to the console. Quick and dirty remote fix while waiting for a drive? Run file-backed swap on /usr. :) Let's pretend for a minute it's not the drive that's the root cause... Not saying it isn't - we're none too thrilled with these Adaptec RAID controllers... Do those memory addresses in the panic message point towards bad memory if they are always the same? No, they are virtual addresses. Having the same EIP means you are crashing in the same place. Did you recently kldunload a module before it crashed? Same place == same code? The only change on this box was a massive portupgrade which included apache, php, mysql, postgres and most of the additional gnu tools. There is one module that someone set to load on boot, and that's the linuxolator. I have disabled that in rc.conf for now and we'll see what happens after the next panic. We also have a few sticks of RAM on order now... Thanks, Charles -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Panic caused by bad memory?
On Tue, 24 Oct 2006 [EMAIL PROTECTED] wrote: I can't get a kernel dump since it fails like this each time: dumping to dev #da/0x20001, offset 2097152 dump 1024 1023 1022 1021 Aborting dump due to I/O error. status == 0xb, scsi status == 0x0 failed, reason: i/o error Bad memory seems unlikely to cause an I/O error trying to write the dump to the swap partition. I'd guess a dicey drive -- and bad swap space could also account for the original crash. You might be able to get a backup by booting single user, provided nothing activates the (presumably bad) swap partition. Just for the record, this box is running an Adaptec raid controller (2005S - ZCR card) and swap is coming off a mirrored array. Coincidentally, I have a utility box where it had bad blocks on the swap partition (but no others) - what I saw there is that the box would just hang and spit out a bunch of swap_pager timeout messages to the console. Quick and dirty remote fix while waiting for a drive? Run file-backed swap on /usr. :) Let's pretend for a minute it's not the drive that's the root cause... Not saying it isn't - we're none too thrilled with these Adaptec RAID controllers... Do those memory addresses in the panic message point towards bad memory if they are always the same? Thanks, Charles ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Panic caused by bad memory?
On Wednesday 25 October 2006 02:28, Charles Sprickman wrote: On Tue, 24 Oct 2006 [EMAIL PROTECTED] wrote: I can't get a kernel dump since it fails like this each time: dumping to dev #da/0x20001, offset 2097152 dump 1024 1023 1022 1021 Aborting dump due to I/O error. status == 0xb, scsi status == 0x0 failed, reason: i/o error Bad memory seems unlikely to cause an I/O error trying to write the dump to the swap partition. I'd guess a dicey drive -- and bad swap space could also account for the original crash. You might be able to get a backup by booting single user, provided nothing activates the (presumably bad) swap partition. Just for the record, this box is running an Adaptec raid controller (2005S - ZCR card) and swap is coming off a mirrored array. Coincidentally, I have a utility box where it had bad blocks on the swap partition (but no others) - what I saw there is that the box would just hang and spit out a bunch of swap_pager timeout messages to the console. Quick and dirty remote fix while waiting for a drive? Run file-backed swap on /usr. :) Let's pretend for a minute it's not the drive that's the root cause... Not saying it isn't - we're none too thrilled with these Adaptec RAID controllers... Do those memory addresses in the panic message point towards bad memory if they are always the same? No, they are virtual addresses. Having the same EIP means you are crashing in the same place. Did you recently kldunload a module before it crashed? -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Panic caused by bad memory?
Hello all, Without a full dump are there any telltale signs from the panic message that can give me some sign of whether I'm dealing with a hardware or software issue? I have a box that has been running 4.11-p10 for quite some time with no problems. I upgraded a number of ports (apache/php/mysql) and since then I've had two panics. Of course userland apps shouldn't cause this, but that's the only change I see. I can't get a kernel dump since it fails like this each time: dumping to dev #da/0x20001, offset 2097152 dump 1024 1023 1022 1021 Aborting dump due to I/O error. status == 0xb, scsi status == 0x0 failed, reason: i/o error The meat of my question though, what are these lines telling me: (panic 1) instruction pointer = 0x8:0xc028b053 stack pointer = 0x10:0xe138eefc frame pointer = 0x10:0xe138ef2c (panic 2) instruction pointer = 0x8:0xc028b053 stack pointer = 0x10:0xe138eefc frame pointer = 0x10:0xe138ef2c Are those physical memory addresses where the code that caused the panic resides? If so, does that point to bad RAM? Thanks, Charles Here's more info if anyone is curious: [-- MARK -- Mon Oct 23 06:00:00 2006] Fatal trap 12: page fault while in kernel mode mp_lock = 0002; cpuid = 0; lapic.id = fault virtual address = 0xc327c614 fault code = supervisor read, page not present instruction pointer = 0x8:0xc028b053 stack pointer = 0x10:0xe138eefc frame pointer = 0x10:0xe138ef2c code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 8 (syncer) interrupt mask = none - SMP: XXX trap number = 12 panic: page fault mp_lock = 0002; cpuid = 0; lapic.id = boot() called on cpu#0 syncing disks... panic: rslock: cpu: 0, addr: 0xc0391ccc, lock: 0x0001 mp_lock = 0002; cpuid = 0; lapic.id = boot() called on cpu#0 Uptime: 441d9h31m5s dumping to dev #da/0x20001, offset 2097152 dump 1024 1023 1022 1021 Aborting dump due to I/O error. status == 0xb, scsi status == 0x0 failed, reason: i/o error Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... cpu_reset called on cpu#0 cpu_reset: Stopping other CPUs [-- MARK -- Tue Oct 24 09:00:00 2006] Fatal trap 12: page fault while in kernel mode mp_lock = 0102; cpuid = 1; lapic.id = 0100 fault virtual address = 0xc29d2b94 fault code = supervisor read, page not present instruction pointer = 0x8:0xc028b053 stack pointer = 0x10:0xe138eefc frame pointer = 0x10:0xe138ef2c code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 8 (syncer) interrupt mask = none - SMP: XXX trap number = 12 panic: page fault mp_lock = 0102; cpuid = 1; lapic.id = 0100 boot() called on cpu#1 syncing disks... panic: rslock: cpu: 1, addr: 0xc0391ccc, lock: 0x0101 mp_lock = 0102; cpuid = 1; lapic.id = 0100 boot() called on cpu#1 Uptime: 1d2h55m38s dumping to dev #da/0x20001, offset 2097152 dump 1024 1023 1022 1021 Aborting dump due to I/O error. status == 0xb, scsi status == 0x0 failed, reason: i/o error Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... cpu_reset called on cpu#1 cpu_reset: Stopping other CPUs cpu_reset: Restarting BSP cpu_reset_proxy: Grabbed mp locckp uf_re sBeStP: BSP did not grab mp lock ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Panic caused by bad memory?
I can't get a kernel dump since it fails like this each time: dumping to dev #da/0x20001, offset 2097152 dump 1024 1023 1022 1021 Aborting dump due to I/O error. status == 0xb, scsi status == 0x0 failed, reason: i/o error Bad memory seems unlikely to cause an I/O error trying to write the dump to the swap partition. I'd guess a dicey drive -- and bad swap space could also account for the original crash. You might be able to get a backup by booting single user, provided nothing activates the (presumably bad) swap partition. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]