Re: [CentOS] Kernel panic - not syncing: CPU context corrupt
William L. Maltby wrote: On Fri, 2008-06-20 at 15:57 -0500, Lanny Marcus wrote: On 6/20/08, Alwin Roosen <[EMAIL PROTECTED]> wrote: ws174 login: CPU 1: Machine Check Exception: 0005 CPU 0: Machine Check Exception: 0004 Bank 3: f6220002010a at 32c93500 Bank 5: f2300c000e0f Kernel panic - not syncing: CPU context corrupt Bank 3: f6220002010a Two banks of Memory (3 and 5) have problems? If the RAM tests OK, suggest you swap the motherboard IIRC, you have memory interleaved? I've had problems with that, in the past, on ... an acer? Anyway, if so, try turning it off in the BIOS setup. Also, make sure you have the latest BIOS for the mainboard. I'm pretty sure those 'banks' mentioned in that error relate to the on-CPU cache, and not to motherboard main RAM. any ECC in a MACHINE CHECK is likely CACHE ecc, not main memory ECC. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Kernel panic - not syncing: CPU context corrupt
On 6/20/08, Alwin Roosen <[EMAIL PROTECTED]> wrote: > This is a brand new server, which has been tested for days with FreeBSD > in our office, and a few days with Windows on the site of our hardware > distributor. Now customer wants CentOS, which we installed, but after > few days we get a kernel panic. Last night at 2:08 it gave the same > kernel panic. Have you checked to verify that the fans are spinning? Since it is a new system, I think you should take it back to your HW distributor and have them run cerberus(ctcs) on it, as Richard Karhuse wrote. If it takes a few days for it to get the Kernel Panic, I doubt that is related to the OS. Let your HW distributor do the work of troubleshooting and replacing whatever component(s) are faulty. They can get a CentOS Live CD and run that on it. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Kernel panic - not syncing: CPU context corrupt
On Fri, 2008-06-20 at 15:57 -0500, Lanny Marcus wrote: > On 6/20/08, Alwin Roosen <[EMAIL PROTECTED]> wrote: > > > > > ws174 login: CPU 1: Machine Check Exception: 0005 > > CPU 0: Machine Check Exception: 0004 > > Bank 3: f6220002010a at 32c93500 > > Bank 5: f2300c000e0f > > Kernel panic - not syncing: CPU context corrupt > > Bank 3: f6220002010a > > Two banks of Memory (3 and 5) have problems? > > If the RAM tests OK, suggest you swap the motherboard IIRC, you have memory interleaved? I've had problems with that, in the past, on ... an acer? Anyway, if so, try turning it off in the BIOS setup. Also, make sure you have the latest BIOS for the mainboard. > HTH -- Bill ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Kernel panic - not syncing: CPU context corrupt
On 6/20/08, Alwin Roosen <[EMAIL PROTECTED]> wrote: > This is a brand new server, which has been tested for days with FreeBSD > in our office, and a few days with Windows on the site of our hardware > distributor. Now customer wants CentOS, which we installed, but after > few days we get a kernel panic. Last night at 2:08 it gave the same > kernel panic. The fact that it worked OK, the first few days, with FreeBSD and Windows, may have been a Burn In test and now something in the HW has failed or is failing. Or, possibly CentOS is utilizing the HW much more robustly than the other 2 OS did? I would suggest that you get a Knoppix Live CD, or, preferably, a CentOS Live CD, and let it roll. And, you get a Kernel Panic, after_ a_ few_ days, on CentOS. That might indicate a Memory problem? Or, a Cooling problem? > ws174 login: CPU 1: Machine Check Exception: 0005 > CPU 0: Machine Check Exception: 0004 > Bank 3: f6220002010a at 32c93500 > Bank 5: f2300c000e0f > Kernel panic - not syncing: CPU context corrupt > Bank 3: f6220002010a Two banks of Memory (3 and 5) have problems? If the RAM tests OK, suggest you swap the motherboard ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Kernel panic - not syncing: CPU context corrupt
Richard Karhuse wrote: > Dag's Repo has the new memtest86+ 2.01 RPM. I'd pull it and > let it run overnight. While memtest86+ is good, I've recently had > cases where is didn't find (obvious) memory errors. My favorite test is cerberus(ctcs). Quite a few OEMs out there use it to burn in their systems. For me it can typically find a problem within a few hours. Whereas memtest I've let it run for a week and have it not find anything useful. Though the results of cerberus sometimes won't help you pinpoint the problem(often the result is just a machine crash). But at least you know there is an issue and can start swapping hardware until it's fixed(or just replace the whole system). http://sourceforge.net/projects/va-ctcs/ nate ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Kernel panic - not syncing: CPU context corrupt
On 6/20/08, Alwin Roosen <[EMAIL PROTECTED]> wrote: > > Hi, > > > CentOS release 5 (Final) > Kernel 2.6.18-53.1.21.el5 on an i686 > > ws174 login: CPU 1: Machine Check Exception: 0005 > CPU 0: Machine Check Exception: 0004 > Bank 3: f6220002010a at 32c93500 > Bank 5: f2300c000e0f > Kernel panic - not syncing: CPU context corrupt > Bank 3: f6220002010a > > > Alwin --> I would be very, very "surprised" *IF* this wasn't hardware related. Dave Jones wrote a nice little program to help decode this: $ parsemce -b 3 -s f6220002010a -e 5 -a 32c93500 Status: (5) Machine Check in progress. Restart IP valid. parsebank(3): f6220002010a @ 32c93500 External tag parity error CPU state corrupt. Restart not possible Address in addr register valid Error enabled in control register Error not corrected. Error overflow Memory hierarchy error Request: Generic error Transaction type : Generic Memory/IO : I/O and: $ parsemce -b 5 -s f2300c000e0f -e 4 -a 0 Status: (4) Machine Check in progress. Restart IP invalid. parsebank(5): f2300c000e0f @ 0 External tag parity error CPU state corrupt. Restart not possible Error enabled in control register Error not corrected. Error overflow Bus and interconnect error Participation: Generic Timeout: Request did not timeout Request: Generic error Transaction type : Invalid Memory/IO : Other Dag's Repo has the new memtest86+ 2.01 RPM. I'd pull it and let it run overnight. While memtest86+ is good, I've recently had cases where is didn't find (obvious) memory errors. I've also seen things like SATA disks drive cause MCEs. This one looks like you're taking memory parity errors somewhere in the path to the CPU. On you BIOS, check you Events log for any "interesting" entries, too. Hope this helps ... -rak- ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Kernel panic - not syncing: CPU context corrupt
On 6/20/08, nate <[EMAIL PROTECTED]> wrote: > Easiest is to buy from a vendor that can test on your OS of choice, > there are lots of vendors out there that can do it. > > Two such companies I have bought from that do this include > http://www.siliconmechanics.com/ (HQ in Seattle, WA area) > http://www.asaservers.com/ (HQ in San Fransisco, CA area) > > Both specialize in Supermicro/Tyan-based systems(as to most other > "whitebox" vendors). That, IMHO, is the best way to go. Another way, if the HW is available, is to test it with a Live CD for CentOS, before purchasing, to see if CentOS will run properly on the HW. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Kernel panic - not syncing: CPU context corrupt
Michael wrote: > But I don't want to get into the situation above, where I purchase NEW > hardware, and CentOS doesn't like it, and furthermore the resolution is > elusive. > > What is the best HW environment for CentOS? > Brand, MFG, chipset rev, and so on Easiest is to buy from a vendor that can test on your OS of choice, there are lots of vendors out there that can do it. Two such companies I have bought from that do this include http://www.siliconmechanics.com/ (HQ in Seattle, WA area) http://www.asaservers.com/ (HQ in San Fransisco, CA area) Both specialize in Supermicro/Tyan-based systems(as to most other "whitebox" vendors). nate ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Kernel panic - not syncing: CPU context corrupt
Lanny Marcus wrote: On 6/20/08, Alwin Roosen <[EMAIL PROTECTED]> wrote: CentOS release 5 (Final) Kernel 2.6.18-53.1.21.el5 on an i686 ws174 login: CPU 1: Machine Check Exception: 0005 CPU 0: Machine Check Exception: 0004 Bank 3: f6220002010a at 32c93500 Bank 5: f2300c000e0f Kernel panic - not syncing: CPU context corrupt Bank 3: f6220002010a Phil or someone else: Do the three (3) "Bank" lines above indicate RAM problems? If not, what do they refer to? Alwin wrote that this is brand new HW, so he suspects that it is OK, but it doesn't seem to be OK? Lanny ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos I have the same issue, unresolved. However I am using old desktop hardware (Compaq Persario, and HP something or another). Maybe it is memory, or CPU, or some kind of incompatibility with something. I was just making a list of the hardware that should be purchased to run a low-end SME server using CentOS. Rack mountable case, with Power Supply and fans included. MotherBoard, mid-range processor. 2 Gb RAM USB Drive 1 Tb Two 500Gb or four 300 Gb internal hardrives (HW Raid would be nice) CD/DVD R/W drive and so on.. But I don't want to get into the situation above, where I purchase NEW hardware, and CentOS doesn't like it, and furthermore the resolution is elusive. What is the best HW environment for CentOS? Brand, MFG, chipset rev, and so on -- Michael Anderson, J3k Solutions Sr.Systems Programmer/Analyst 832.515.3868 ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Kernel panic - not syncing: CPU context corrupt
On 6/20/08, Alwin Roosen <[EMAIL PROTECTED]> wrote: > CentOS release 5 (Final) > Kernel 2.6.18-53.1.21.el5 on an i686 > > ws174 login: CPU 1: Machine Check Exception: 0005 > CPU 0: Machine Check Exception: 0004 > Bank 3: f6220002010a at 32c93500 > Bank 5: f2300c000e0f > Kernel panic - not syncing: CPU context corrupt > Bank 3: f6220002010a > Phil or someone else: Do the three (3) "Bank" lines above indicate RAM problems? If not, what do they refer to? Alwin wrote that this is brand new HW, so he suspects that it is OK, but it doesn't seem to be OK? Lanny ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Kernel panic - not syncing: CPU context corrupt
2008/6/20 Alwin Roosen <[EMAIL PROTECTED]>: > Hi, > > > Is there someone on this mailing list who could/want help me figure out > this issue? We do not know where to look to solve this. > If your installation is standard CentOS with no thirdparty software, and configurations, I would first run the vendor hardware checks several times, as they are usually not good with intermittent or hard to find problems, run extenisve memtest also if possible regards Walid ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Kernel panic - not syncing: CPU context corrupt
On Fri, 2008-06-20 at 14:40 +0200, Alwin Roosen wrote: > Hi, > > > Is there someone on this mailing list who could/want help me figure out > this issue? We do not know where to look to solve this. ... > I would be very surprised if this is hardware related. A google on "Machine Check Exception" "Kernel panic - not syncing: CPU context corrupt" turns up 50 results (including your CentOS BZ request referring you to this list), many of which point to hardware problems - CPU, MB (bad caps), chipset, are all listed as possible problems. I'd go back to the hardware vendor if still under warranty. Phil ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] Kernel panic - not syncing: CPU context corrupt
Hi, Is there someone on this mailing list who could/want help me figure out this issue? We do not know where to look to solve this. --- Description --- This is a brand new server, which has been tested for days with FreeBSD in our office, and a few days with Windows on the site of our hardware distributor. Now customer wants CentOS, which we installed, but after few days we get a kernel panic. Last night at 2:08 it gave the same kernel panic. Please tell me what information I should give you and most important how to get it from the system, because we do not have experience with CentOS (only FreeBSD). I would be very surprised if this is hardware related. We use the same hardware for several years, and run FreeBSD on it very successfully. It is a SuperMicro PDSMI+ motherboard with 3ware raid controller (8006-2LP). CPU is Xeon 3040 1.8 Ghz EM64 2MB 1066FSB (65W). Memory is DDR 2 Trancend 2048MB ECC Unbuffered 800. Error message on console is in "Additional Information". I am hoping that I should switch off some setting in CentOS to fix this, but I cannot find much useful information about this issue on Google. --- Additional Information --- CentOS release 5 (Final) Kernel 2.6.18-53.1.21.el5 on an i686 ws174 login: CPU 1: Machine Check Exception: 0005 CPU 0: Machine Check Exception: 0004 Bank 3: f6220002010a at 32c93500 Bank 5: f2300c000e0f Kernel panic - not syncing: CPU context corrupt Bank 3: f6220002010a --- Attachments --- 19-06-2008 16-03-31.png (Screenshot of console) With kind regards, Alwin Roosen <>___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos