Hi John, Thanks for the explanation. I ran some tests and ended up being a power savings mode (aka unstable mode?). Disabling this feature put an end to the freezes. I came to this conclusion by stress testing the box for 3 days, and there were no issues. Nothing, then I stopped the stress test and about 15-30 min later it froze. It seemed to only occur during periods of low load. I have not received any of these errors after turning off this power savings mode.
On Wed, Feb 24, 2016 at 3:14 PM, John Baldwin <j...@freebsd.org> wrote: > On Friday, February 12, 2016 08:11:37 PM Ultima wrote: > > Recently installed some cpus and received two MCA errors. Using mcelog, > I > > found that the version in ports is about 5 years out of dated and didn't > > support my cpu. Decided to update it to the newest version (Will post on > > bugzilla shortly) to pull some more info. Going to post orig and decoded > > mcelog. > > > > > > Raw: > > MCA: Bank 20, Status 0xc800084000310e0f > > MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000 > > MCA: Vendor "GenuineIntel", ID 0x306f1, APIC ID 0 > > MCA: CPU 0 COR (33) OVER BUSLG ??? ERR Other > > MCA: Misc 0x1df87b000d9eff > > MCA: Bank 5, Status 0xc800008000310e0f > > MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000 > > MCA: Vendor "GenuineIntel", ID 0x306f1, APIC ID 42 > > MCA: CPU 34 COR (2) OVER BUSLG ??? ERR Other > > MCA: Misc 0xdf87b008d9eff > > > > mcelog v131: > > Hardware event. This is not a software error. > > CPU 0 BANK 20 > > MISC 1df87b000d9eff > > MCG status: > > QPI: Rx detected CRC error - successful LLR wihout Phy re-init > > STATUS c800084000310e0f MCGSTATUS 0 > > MCGCAP 7000c16 APICID 0 SOCKETID 0 > > CPUID Vendor Intel Family 6 Model 63 > > Hardware event. This is not a software error. > > CPU 34 BANK 5 > > MISC df87b008d9eff > > MCG status: > > QPI: Rx detected CRC error - successful LLR wihout Phy re-init > > STATUS c800008000310e0f MCGSTATUS 0 > > MCGCAP 7000c16 APICID 2a SOCKETID 0 > > CPUID Vendor Intel Family 6 Model 63 > > > > After receiving this error, the system was in a frozen state. Any ideas > > what may cause this? > > Well, hardware causes it. QPI is the interconnect bus between your > CPUs and RAM. "Rx detected CRC error" implies that a CPU detected a > corrupted message on that bus, but when it requested a resend the > resent message was ok. Normally corrected errors shouldn't hang your > machine, but perhaps your machine had another hardware error after this > that broke it too badly to report and/or log the subsequent error. > > -- > John Baldwin > _______________________________________________ freebsd-hardware@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-hardware To unsubscribe, send any mail to "freebsd-hardware-unsubscr...@freebsd.org"