On 01/31/2014 11:22 AM, John Baldwin wrote:
On Wednesday, January 29, 2014 6:49:21 pm Tim Daneliuk wrote:
Resending in hopes that people on one of the other lists will have some insight 
here:

On 01/27/2014 10:50 PM, Tim Daneliuk wrote:
I am running 9.2 stable i386 r261207.  As noted earlier:

I just replaced mobo/CPU on FBSD server (Gigabyte Z-87-D3HP with
an Intel i3-4130).  I am not overclocking ...  but I continue to see this sort 
of thing:

MCA: CPU 0 COR (1) internal parity error

Dmesg shows:

MCA: Vendor "GenuineIntel", ID 0x306c3, APIC ID 0
MCA: CPU 0 COR (1) internal parity error
MCA: Bank 0, Status 0x90000040000f0005
MCA: Global Cap 0x0000000000000c07, Status 0x0000000000000000_

I've swapped CPUs (i5). I've fiddled with an endless supply of
mobo settings. I've switched power supplies.  I've moved mem
sticks around ....   No joy.

So, I dug through the sources and found this:



mca_log(const struct mca_record *rec)
{
          uint16_t mca_error;

          printf("MCA: Bank %d, Status 0x%016llx\n", rec->mr_bank,
              (long long)rec->mr_status);
          printf("MCA: Global Cap 0x%016llx, Status 0x%016llx\n",
              (long long)rec->mr_mcg_cap, (long long)rec->mr_mcg_status);
          printf("MCA: Vendor \"%s\", ID 0x%x, APIC ID %d\n", cpu_vendor,
              rec->mr_cpu_id, rec->mr_apic_id);
          printf("MCA: CPU %d ", rec->mr_cpu);
          if (rec->mr_status & MC_STATUS_UC)
                  printf("UNCOR ");
          else {
                  printf("COR ");
                  if (rec->mr_mcg_cap & MCG_CAP_CMCI_P)
                          printf("(%lld) ", ((long long)rec->mr_status &
                              MC_STATUS_COR_COUNT) >> 38);
          }


It looks like the trailing else clause is kicking out the error but I am
unclear what the error means, beyond the fact that it appears to be a parity
error somewhere within the CPU's internal memory (cache?).  Is this error
getting corrected?  Is this benign, Should I get a different mobo?

Um .... Haaaaalp :)


I have now tried different motherboards, CPUs, memory, and power supplies and
this error is still showing up now and then.

This points strongly to either FreeBSD bogus reporting, or these errors being
benign.  It's hard to believe that the exact same error might occur with
completely different hardware ... unless it's being caused by the case.

Are they all the same model CPU?  Since it is a corrected error you can
probably ignore it, but it is not bogus reporting.  FreeBSD only reports
these errors because they show up in registers on your CPU.


It's looking like this is an artifact of running 9.2-STABLE i386 on that 
hardware.
I just installed 10-STABLE x64 and am beating the hardware to death and have yet
to see an MCA check.

It *is* possible the 9.2 install is boogered up (I went to grad school to learn 
how
to say that), so I am pursuing a full rebuild of the server.  While painful, 
this
will also finally move this machine to x64 which is long overdue.



--
----------------------------------------------------------------------------
Tim Daneliuk     tun...@tundraware.com
PGP Key:         http://www.tundraware.com/PGP/

_______________________________________________
freebsd-hardware@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
To unsubscribe, send any mail to "freebsd-hardware-unsubscr...@freebsd.org"

Reply via email to