Re: How to diagnose kernel panic?
Thank you for your suggestions how to chase down this intermittant panic. This did occur from beginning with this machine (Supermicro P4SCi MB, Ablecom 420w power, Seagate Barracuda SATA HD's, Crucial RAM, Debian testing with 2.6.15 kernel, software RAID1). All HD tests from manufacturer passed (thanks for idea). Search for others reporting same panic message was interesting but seemingly no exact matches--seems to indicate the problem is in the hardware somewhere. Data center guy suspects static zap during assembly. Can't imagine what conditions from being on-line at data center I am unable to replicate... Guess I'll just wait, run [EMAIL PROTECTED] and occasionally bombard with http requests. It's gotta happen again sometime, especially if it's the power supply. Any further test suggestions would be welcome. Mark -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: How to diagnose kernel panic?
Hi Mark, On Sun, 2006-07-09 at 15:56 -0400, Mark Copper wrote: I have a server that is brought down by a kernel panic every two weeks on average. Did it do that right from the first installation or did it run for some time without problems? Nothing untoward gets in the logs and the on-screen panic message starts with something like Kernel panic - not syncing: Fatal exception in interrupt Call trace: [c026bc42] scsi_request_fn+0xf610x294 I wasn't able to get any more at the data center... Well the first thing i'd suppose would be some problem with the hard ware (most of the kernel panics i saw where related to broken hardware). And it looks like it's something problematic with the hard drive containing the root filesystem. The steps i would do: 1.) Use google with the name of your hard drive and scsi_request_fn and kernel panic 2.) Get the hard drive checking tools of your manufacturer and test that drive 3.) If you have two machines with the same drive, swap drives between them (and check where the problem occurs then) 4.) Get a newer kernel After doing this and none of the things above helped: 5.) Write a mail to the kernel guys, tell them about the problem and what you did to find the problem. Hth, Lothar -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: How to diagnose kernel panic?
On Sun, Jul 09, 2006 at 03:56:12PM -0400, Mark Copper wrote: I have a server that is brought down by a kernel panic every two weeks on average. Nothing untoward gets in the logs and the on-screen panic message starts with something like Kernel panic - not syncing: Fatal exception in interrupt Call trace: [c026bc42] scsi_request_fn+0xf610x294 I wasn't able to get any more at the data center... So I brought the machine home and am running [EMAIL PROTECTED] on it and so far I have not been able to induce the panic. there was something unique about what the machine was doing previously that caused this. You are not doing whatever that is now and thus not inducing the error. The replacement machine is similar, but not identical. The main difference being a switch from software to hardware RAID1. Also, the new machine, except for the hardware driver, uses stable while the problematic machine uses testing. And the replacement has run so far without problem. well, the call trace above points to a disk problem and you've changed the disk setup in the new machine by putting a piece of hardware between the disks and the mother board, so you're problem may be gone because of that. Its unclear what exactly you've done here. Does the new machine use the old disks through the hardware raid? or are you dealing with all new disks. Either way you've changed a lot from old to new machine and its not surprising that you've eliminated the problem as a result of this. The only other thing I can add is that the bad machine would seem to start getting sluggish before it froze, but for the life of me, I couldn't see why. maybe the kernel was trying repeatedly to do some disk operation that failed, which used up cpu time and caused the sluggish behaviour? I am posting because I'm hopeful that list participants might have suggestions how I might start to chase down or, better yet, eliminate this problem. can you reproduce the exact setup that was causing problems before, including the usage levels? A signature.asc Description: Digital signature
How to diagnose kernel panic?
I have a server that is brought down by a kernel panic every two weeks on average. Nothing untoward gets in the logs and the on-screen panic message starts with something like Kernel panic - not syncing: Fatal exception in interrupt Call trace: [c026bc42] scsi_request_fn+0xf610x294 I wasn't able to get any more at the data center... So I brought the machine home and am running [EMAIL PROTECTED] on it and so far I have not been able to induce the panic. The replacement machine is similar, but not identical. The main difference being a switch from software to hardware RAID1. Also, the new machine, except for the hardware driver, uses stable while the problematic machine uses testing. And the replacement has run so far without problem. The only other thing I can add is that the bad machine would seem to start getting sluggish before it froze, but for the life of me, I couldn't see why. I am posting because I'm hopeful that list participants might have suggestions how I might start to chase down or, better yet, eliminate this problem. Is there a way, perhaps, to manufacture the possible interrupts that occur? Thanks. Mark -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: How to diagnose kernel panic?
Hi Mark. I don't know if this kind of information will help out at all or not, but what are the specs of your machine? Specifically, do you have a quality power supply? How about your hard drive and your motherboard? As I said, I don't know if answering these questions will reveal anything important, but it always helps to verify that you are using quality parts in your machine. After all, a software program is just a collection of assembly instructions to your CPU (usually compiled from a high-level language, such as C++). If a piece of software executes an assembly instruction that addresses a hard disk for information and if the motherboard and/or the hard disk are cheapies and they fail to properly return whatever data the assembly instruction was expecting, that certainly cause software bugs ranging from incorrect display of data to kernel panics, depending on the program that gets lucky (cheap motherboards and hard disks are cheap because they have less redundancy, fault-tolerance, and use components more likely to fail to begin with). Also, if your power supply is a cheap one, it might not be supplying enough power to your computer and if that happens, well, your computer just won't work correctly because both your software and hardware expect full power in order to work correctly. Hope all that helps. On Sunday 09 July 2006 12:56, Mark Copper wrote: I have a server that is brought down by a kernel panic every two weeks on average. Nothing untoward gets in the logs and the on-screen panic message starts with something like Kernel panic - not syncing: Fatal exception in interrupt Call trace: [c026bc42] scsi_request_fn+0xf610x294 I wasn't able to get any more at the data center... So I brought the machine home and am running [EMAIL PROTECTED] on it and so far I have not been able to induce the panic. The replacement machine is similar, but not identical. The main difference being a switch from software to hardware RAID1. Also, the new machine, except for the hardware driver, uses stable while the problematic machine uses testing. And the replacement has run so far without problem. The only other thing I can add is that the bad machine would seem to start getting sluggish before it froze, but for the life of me, I couldn't see why. I am posting because I'm hopeful that list participants might have suggestions how I might start to chase down or, better yet, eliminate this problem. Is there a way, perhaps, to manufacture the possible interrupts that occur? Thanks. Mark -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]