Re: MCA messages in /var/log/message?

2010-04-23 Thread John Baldwin
On Thursday 22 April 2010 6:28:34 pm Steve Kargl wrote:
> How does one interpret the following MCA message?
> 
> MCA: Bank 4, Status 0x945a4000d6080a13
> MCA: Global Cap 0x0105, Status 0x
> MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 0
> MCA: CPU 0 COR BUSLG Responder RD Memory
> MCA: Address 0x70c42280
> MCA: Bank 4, Status 0x942140012a080813
> MCA: Global Cap 0x0105, Status 0x
> MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 1
> MCA: CPU 1 COR BUSLG Source RD Memory
> MCA: Address 0x1b97ca578
> 
> It appears that these messages coincide with a 15 to 30
> second period where my USB mouse inexplicably loses a
> large number of button clicks, (which is quite noticable
> with firefox3).

If you have access to p4, you can download a patched version of mcelog from 
//depot/projects/mcelog/... (have to use 'make FREEBSD=yes') which will parse 
these for you.

Hmm, I ran it and here is what it said:

HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 4 northbridge
ADDR 70c42280
  Northbridge RAM Chipkill ECC error
  Chipkill ECC syndrome = d6b4
   bit46 = corrected ecc error
  bus error 'local node response, request didn't time out
 generic read mem transaction
 memory access, level generic'
STATUS 945a4000d6080a13 MCGSTATUS 0
MCGCAP 105 APICID 0 SOCKETID 0
CPUID Vendor AMD Family 15 Model 5
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 4 northbridge
ADDR 1b97ca578
  Northbridge RAM Chipkill ECC error
  Chipkill ECC syndrome = 2a42
   bit32 = err cpu0
   bit46 = corrected ecc error
  bus error 'local node origin, request didn't time out
 generic read mem transaction
 memory access, level generic'
STATUS 942140012a080813 MCGSTATUS 0
MCGCAP 105 APICID 1 SOCKETID 0
CPUID Vendor AMD Family 15 Model 5

Note that they are corrected errors, so the RAM may not actually be bad, it 
just may be transient failures.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: MCA messages in /var/log/message?

2010-04-22 Thread Steve Kargl
On Fri, Apr 23, 2010 at 02:24:03AM +0300, Andriy Gapon wrote:
> on 23/04/2010 01:28 Steve Kargl said the following:
> > How does one interpret the following MCA message?
> > 
> > MCA: Bank 4, Status 0x945a4000d6080a13
> > MCA: Global Cap 0x0105, Status 0x
> > MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 0
> > MCA: CPU 0 COR BUSLG Responder RD Memory
> > MCA: Address 0x70c42280
> > MCA: Bank 4, Status 0x942140012a080813
> > MCA: Global Cap 0x0105, Status 0x
> > MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 1
> > MCA: CPU 1 COR BUSLG Source RD Memory
> > MCA: Address 0x1b97ca578
> > 
> > It appears that these messages coincide with a 15 to 30
> > second period where my USB mouse inexplicably loses a
> > large number of button clicks, (which is quite noticable
> > with firefox3).
> 
> This very much looks like DRAM ECC error.
> You seem to have family Fh AMD processor, so I am not entirely sure.
> But for 10h processors BKDG table 80 (NB error signatures) definitely 
> specifies
> that extended error code of 8 (in bits 20:16) means ECC error.
> 

Thanks for the information.  The system that generates these
messages is getting long in the tooth.  Guess it's time to
reboot and run memtest86+ on the system.

-- 
Steve
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: MCA messages in /var/log/message?

2010-04-22 Thread Andriy Gapon
on 23/04/2010 01:28 Steve Kargl said the following:
> How does one interpret the following MCA message?
> 
> MCA: Bank 4, Status 0x945a4000d6080a13
> MCA: Global Cap 0x0105, Status 0x
> MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 0
> MCA: CPU 0 COR BUSLG Responder RD Memory
> MCA: Address 0x70c42280
> MCA: Bank 4, Status 0x942140012a080813
> MCA: Global Cap 0x0105, Status 0x
> MCA: Vendor "AuthenticAMD", ID 0xf5a, APIC ID 1
> MCA: CPU 1 COR BUSLG Source RD Memory
> MCA: Address 0x1b97ca578
> 
> It appears that these messages coincide with a 15 to 30
> second period where my USB mouse inexplicably loses a
> large number of button clicks, (which is quite noticable
> with firefox3).

This very much looks like DRAM ECC error.
You seem to have family Fh AMD processor, so I am not entirely sure.
But for 10h processors BKDG table 80 (NB error signatures) definitely specifies
that extended error code of 8 (in bits 20:16) means ECC error.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"