L2 cache errors???
Hi, Are these what I think they are? Errors in the CPU L2 cache? /var/log/messages:Jul 24 13:14:40 box kernel: MCA: Bank 3, Status 0x902000120120100e /var/log/messages:Jul 24 13:14:40 box kernel: MCA: Global Cap 0x0806, Status 0x /var/log/messages:Jul 24 13:14:40 box kernel: MCA: Vendor GenuineIntel, ID 0x10676, APIC ID 2 /var/log/messages:Jul 24 13:14:40 box kernel: MCA: CPU 2 COR L2 memory error /var/log/messages:Jul 28 19:12:42 box kernel: MCA: Bank 3, Status 0x90270220100e /var/log/messages:Jul 28 19:12:42 box kernel: MCA: Global Cap 0x0806, Status 0x /var/log/messages:Jul 28 19:12:42 box kernel: MCA: Vendor GenuineIntel, ID 0x10676, APIC ID 0 /var/log/messages:Jul 28 19:12:42 box kernel: MCA: CPU 0 COR L2 memory error Are the ECC corrected? Or is error really data kaput? --WjW ___ freebsd-hardware@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hardware To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org
Re: L2 cache errors???
On 07/28/2015 13:40, Willem Jan Withagen wrote: On 28/07/2015 19:48, Mike Tancsa wrote: On 7/28/2015 1:16 PM, Willem Jan Withagen wrote: Hi, Are these what I think they are? Errors in the CPU L2 cache? Are the ECC corrected? Or is error really data kaput? Could be. There is also an erratum issue that triggers these errors on certain CPUs when running software like virtualbox. It was fixed in RELENG_10 some time ago. What are you running ? https://svnweb.freebsd.org/base?view=revisionrevision=269052 has some details. 'mmm, Not running Haswell stuff, but rather older hardware. Looked in older logfiles, and there are a few more... All with the same data, except that it is detected on different CPUs And it occurs when running: mbuffer -4 -m 1000M -I | \ zfs receive -F -d -v zfs to receive a full backup from my fileserver. --WjW You can tell ECC corrected the error because on FreeBSD if ECC can't fix the error the system will panic. Other systems (Solaris and HP-UX being the two I have direct experience with) can detach subsystems that have sustained uncorrectable errors in some cases. (Yes, even CPUs!) If a system is generating hundreds or thousands of MCAs a minute you are dealing with a hardware issue. If you are getting spurious MCAs to the tune of a few a day there's nothing abnormal or broken there it's just the system doing what it's supposed to. Given the amount of data that flies around inside modern computers I'm surprised there aren't more MCAs than there are in most systems. -- FreeBSD - The Power To Serve. ___ freebsd-hardware@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hardware To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org
Re: L2 cache errors???
On 7/28/2015 1:16 PM, Willem Jan Withagen wrote: Hi, Are these what I think they are? Errors in the CPU L2 cache? Are the ECC corrected? Or is error really data kaput? Could be. There is also an erratum issue that triggers these errors on certain CPUs when running software like virtualbox. It was fixed in RELENG_10 some time ago. What are you running ? https://svnweb.freebsd.org/base?view=revisionrevision=269052 has some details. ---Mike -- --- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, m...@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ ___ freebsd-hardware@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hardware To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org
Re: L2 cache errors???
On 28/07/2015 19:48, Mike Tancsa wrote: On 7/28/2015 1:16 PM, Willem Jan Withagen wrote: Hi, Are these what I think they are? Errors in the CPU L2 cache? Are the ECC corrected? Or is error really data kaput? Could be. There is also an erratum issue that triggers these errors on certain CPUs when running software like virtualbox. It was fixed in RELENG_10 some time ago. What are you running ? https://svnweb.freebsd.org/base?view=revisionrevision=269052 has some details. 'mmm, Not running Haswell stuff, but rather older hardware. Looked in older logfiles, and there are a few more... All with the same data, except that it is detected on different CPUs And it occurs when running: mbuffer -4 -m 1000M -I | \ zfs receive -F -d -v zfs to receive a full backup from my fileserver. --WjW No tweeked settings, neither is the CPU overheated. System consumes about 200W, and has a supermicro 450W supply Running 10.2-BETA2 on a CPU: Intel(R) Core(TM)2 Extreme CPU X9650 @ 3.00GHz (3005.62-MHz K8-class CPU) Origin=GenuineIntel Id=0x10676 Family=0x6 Model=0x17 Stepping=6 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x8e3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1 AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF VT-x: Basic Features=0x5a0800SMM,INS/OUTS Pin-Based Controls=0x3fExtINT,NMI,VNMI Primary Processor Controls=0xf7f9fffeINTWIN,TSCOff,HLT,INVLPG,MWAIT,RDPMC,RDTSC,CR3-LD,CR3-ST,CR8-LD,CR8-ST,TPR,NMIWIN,MOV-DR,IO,IOmap,MSRmap,MONITOR,PAUSE Secondary Processor Controls=0x41APIC,WBINVD Exit Controls=0x5a0800PAT-LD,EFER-SV,PTMR-SV Entry Controls=0x5a0800 TSC: P-state invariant, performance statistics Instruction TLB: 2M pages, 4-way, 8 entries or 4M pages, 4-way, 4 entries Instruction TLB: 4 KB Pages, 4-way set associative, 128 entries 64-Byte prefetching Data TLB0: 4 KByte pages, 4-way associative, 16 entries Data TLB0: 4 MByte pages, 4-way set associative, 16 entries 2nd-level cache: 6MByte, 24-way set associative, 64 byte line size 1st-level instruction cache: 32 KB, 8-way set associative, 64 byte line size Data TLB1: 4 KByte pages, 4-way associative, 256 entries 1st-level data cache: 32 KB, 8-way set associative, 64 byte line size L2 cache: 6144 kbytes, 16-way associative, 64 bytes/line real memory = 7516192768 (7168 MB) Motherboard: Base Board Information Manufacturer: ASUSTeK Computer INC. Product Name: P5Q-E Version: Rev 1.xx Serial Number: MS1C87B16302305 ___ freebsd-hardware@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hardware To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org
Re: L2 cache errors???
On 28/07/2015 21:04, Josh Paetzel wrote: On 07/28/2015 13:40, Willem Jan Withagen wrote: On 28/07/2015 19:48, Mike Tancsa wrote: On 7/28/2015 1:16 PM, Willem Jan Withagen wrote: Hi, Are these what I think they are? Errors in the CPU L2 cache? Are the ECC corrected? Or is error really data kaput? Could be. There is also an erratum issue that triggers these errors on certain CPUs when running software like virtualbox. It was fixed in RELENG_10 some time ago. What are you running ? https://svnweb.freebsd.org/base?view=revisionrevision=269052 has some details. 'mmm, Not running Haswell stuff, but rather older hardware. Looked in older logfiles, and there are a few more... All with the same data, except that it is detected on different CPUs And it occurs when running: mbuffer -4 -m 1000M -I | \ zfs receive -F -d -v zfs to receive a full backup from my fileserver. --WjW You can tell ECC corrected the error because on FreeBSD if ECC can't fix the error the system will panic. Other systems (Solaris and HP-UX being the two I have direct experience with) can detach subsystems that have sustained uncorrectable errors in some cases. (Yes, even CPUs!) Offlining CPus, cool. No the system does not panic, but I do get reports from 'zfs receive' that the datastream is invalid. And it then aborts. So I'll have to do more digging, to see what is up. If a system is generating hundreds or thousands of MCAs a minute you are dealing with a hardware issue. If you are getting spurious MCAs to the tune of a few a day there's nothing abnormal or broken there it's just the system doing what it's supposed to. Never had them before, and now about 6 this week. Let alone in L2 cache. So it got me worried. Given the amount of data that flies around inside modern computers I'm surprised there aren't more MCAs than there are in most systems. Perhaps not enough alpha particles hitting the cells. :) Thanx, --WjW ___ freebsd-hardware@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hardware To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org
Disk Controllers - ciss driver mismatch of supported devices
Hello! Sorry, I found a mismatch between the lists of supported controllers of driver ciss. On the website of the developers stated that ciss supports new disk controlers of servers HP Gen 9 (H240ar, P440ar, etc.). But in the FreeBSD 10.1 documentation their support is not mentioned. Does FreeBSD 10 support new disk controlers (H240ar, P440ar) of servers HP Gen 9? http://cciss.sourceforge.net/ https://www.freebsd.org/releases/10.1R/hardware.html Of course, I also found incomplete compliance supported network controllers HP. But that can be partly understood through matching chips. Best regards, Dmitriy ___ freebsd-hardware@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hardware To unsubscribe, send any mail to freebsd-hardware-unsubscr...@freebsd.org