Ok probably a short blip on your power supply (eg. low voltage) that can cause data corruption. Is the problem still persist?
fooler. On Thu, Oct 27, 2016 at 3:45 AM, Michael Tinsay <[email protected]> wrote: > Thanks fooler and Edwin, > > I ran memtest and mestester on the server for several days each and both > didn't find any problem with the memory modules installed. > > > --- mike t. > > ________________________________ > From: fooler mail <[email protected]> > To: Michael Tinsay <[email protected]>; Philippine Linux Users' Group > (PLUG) Technical Discussion List <[email protected]> > Sent: Wednesday, 26 October 2016, 9:05 > Subject: Re: [plug] decoding further a Machine Check Excepton > > it looks like a memory error to me... can you remove the memory at > bank 8 if that solves the problem? > > fooler. > > On Mon, Oct 24, 2016 at 9:31 PM, Michael Tinsay <[email protected]> wrote: >> Hi! >> >> Yesterday one of our servers had this on the console: >> >> [ 1184.087973] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 >> Bank >> 8: ba000000000000b2 >> [ 1184.087973] mce: [Hardware Error]: TSC 3a3965b65c0 MISC 80000 >> [ 1184.087973] mce: [Hardware Error]: PROCESSOR 0:206c2 TIME 1477301538 >> SOCKET 0 APIC 0 microcode 2 >> [ 1184.087973] mce: [Hardware Error]: Machine check: Processor context >> corrupt >> >> So I did some research and found out that I can use an app named mcelog to >> decode this. This was the output from it: >> >> Hardware event. This is not a software error. >> CPU 0 BANK 8 TSC 3a3965b65c0 >> MISC 80000 >> TIME 1477301538 Mon Oct 24 17:32:18 2016 >> MCG status:MCIP >> MCi status: >> Uncorrected error >> Error enabled >> MCi_MISC register valid >> Processor context corrupt >> MCA: MEMORY CONTROLLER AC_CHANNEL2_ERR >> Transaction: Address/Command error >> Memory corrected error count (CORE_ERR_CNT): 0 >> Memory transaction Tracker ID (RTId): 0 >> Memory DIMM ID of error: 0 >> Memory channel ID of error: 2 >> Memory ECC syndrome: 0 >> STATUS ba000000000000b2 MCGSTATUS 4 >> CPUID Vendor Intel Family 6 Model 44 >> SOCKET 0 APIC 0 microcode 2 >> tinsaymc@IT-046641:~$ cat mce.txt >> CPU 0: Machine Check Exception: 4 Bank 8: ba000000000000b2 >> TSC 3a3965b65c0 MISC 80000 >> PROCESSOR 0:206c2 TIME 1477301538 SOCKET 0 APIC 0 microcode 2 >> >> So my question now, for those who know more about this area than I, is: >> Is >> the exception due to a problem in the CPU itself or somewhere on the >> motherboard? >> >> Regards. >> >> >> --- mike t. > >> >> _________________________________________________ >> Philippine Linux Users' Group (PLUG) Mailing List >> http://lists.linux.org.ph/mailman/listinfo/plug >> Searchable Archives: http://archives.free.net.ph > > > > > _________________________________________________ > Philippine Linux Users' Group (PLUG) Mailing List > http://lists.linux.org.ph/mailman/listinfo/plug > Searchable Archives: http://archives.free.net.ph _________________________________________________ Philippine Linux Users' Group (PLUG) Mailing List http://lists.linux.org.ph/mailman/listinfo/plug Searchable Archives: http://archives.free.net.ph

