Thanks fooler and Edwin, I ran memtest and mestester on the server for several days each and both didn't find any problem with the memory modules installed.
--- mike t. From: fooler mail <fooler.m...@gmail.com> To: Michael Tinsay <tinsa...@yahoo.com>; Philippine Linux Users' Group (PLUG) Technical Discussion List <plug@lists.linux.org.ph> Sent: Wednesday, 26 October 2016, 9:05 Subject: Re: [plug] decoding further a Machine Check Excepton it looks like a memory error to me... can you remove the memory at bank 8 if that solves the problem? fooler. On Mon, Oct 24, 2016 at 9:31 PM, Michael Tinsay <tinsa...@yahoo.com> wrote: > Hi! > > Yesterday one of our servers had this on the console: > > [ 1184.087973] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank > 8: ba000000000000b2 > [ 1184.087973] mce: [Hardware Error]: TSC 3a3965b65c0 MISC 80000 > [ 1184.087973] mce: [Hardware Error]: PROCESSOR 0:206c2 TIME 1477301538 > SOCKET 0 APIC 0 microcode 2 > [ 1184.087973] mce: [Hardware Error]: Machine check: Processor context > corrupt > > So I did some research and found out that I can use an app named mcelog to > decode this. This was the output from it: > > Hardware event. This is not a software error. > CPU 0 BANK 8 TSC 3a3965b65c0 > MISC 80000 > TIME 1477301538 Mon Oct 24 17:32:18 2016 > MCG status:MCIP > MCi status: > Uncorrected error > Error enabled > MCi_MISC register valid > Processor context corrupt > MCA: MEMORY CONTROLLER AC_CHANNEL2_ERR > Transaction: Address/Command error > Memory corrected error count (CORE_ERR_CNT): 0 > Memory transaction Tracker ID (RTId): 0 > Memory DIMM ID of error: 0 > Memory channel ID of error: 2 > Memory ECC syndrome: 0 > STATUS ba000000000000b2 MCGSTATUS 4 > CPUID Vendor Intel Family 6 Model 44 > SOCKET 0 APIC 0 microcode 2 > tinsaymc@IT-046641:~$ cat mce.txt > CPU 0: Machine Check Exception: 4 Bank 8: ba000000000000b2 > TSC 3a3965b65c0 MISC 80000 > PROCESSOR 0:206c2 TIME 1477301538 SOCKET 0 APIC 0 microcode 2 > > So my question now, for those who know more about this area than I, is: Is > the exception due to a problem in the CPU itself or somewhere on the > motherboard? > > Regards. > > > --- mike t. > > _________________________________________________ > Philippine Linux Users' Group (PLUG) Mailing List > http://lists.linux.org.ph/mailman/listinfo/plug > Searchable Archives: http://archives.free.net.ph
_________________________________________________ Philippine Linux Users' Group (PLUG) Mailing List http://lists.linux.org.ph/mailman/listinfo/plug Searchable Archives: http://archives.free.net.ph