Re: [CentOS] Machine check events

2013-11-28 Thread Glenn Eychaner
m.roth writes: Is the system still under warranty? How 'bout the memory, if you've replaced it? You *should* replace it. It's not going to get better This is brand-new Kingston 1600MHz ECC memory on a workstation/server running at high altitude in a relatively open environment; I am loath

Re: [CentOS] Machine check events

2013-11-28 Thread Devin Reade
Quoting Glenn Eychaner geycha...@mac.com: This is brand-new Kingston 1600MHz ECC memory on a workstation/server running at high altitude [snip] Cosmic rays? Do you have a Poisson distribution for those machine check events? :) Devin ___ CentOS

Re: [CentOS] Machine check events

2013-11-28 Thread Cliff Pratt
He's not running the Poisson distro, he's using CentOS! 8-) On Fri, Nov 29, 2013 at 11:57 AM, Devin Reade g...@gno.org wrote: Quoting Glenn Eychaner geycha...@mac.com: This is brand-new Kingston 1600MHz ECC memory on a workstation/server running at high altitude [snip] Cosmic rays? Do

Re: [CentOS] Machine check events

2013-11-27 Thread Glenn Eychaner
On further, further, further toying, I now have mcelog running on my 32-bit CentOS 6 systems! I admit to doing it the dumb way: I grabbed the source from the git repository, compiled and installed it, and THEN discovered that the init.d file supplied with the source was not CentOS compatible, so I

Re: [CentOS] Machine check events

2013-11-27 Thread Glenn Eychaner
And all that work was done to get this, output of a corrected memory parity error. I get about one of these per workstation per 3 days, more or less; is this a surprising number? (The workstation under the heaviest load gets more, while the idle spare gets none at all; no surprise there!) MCE 6

Re: [CentOS] Machine check events

2013-11-27 Thread m . roth
Glenn Eychaner wrote: And all that work was done to get this, output of a corrected memory parity error. I get about one of these per workstation per 3 days, more or less; is this a surprising number? (The workstation under the heaviest load gets more, while the idle spare gets none at all; no

Re: [CentOS] Machine check events

2013-11-26 Thread Glenn Eychaner
Further investigation seems to indicate that these events should be handled by mcelog or mced. However, there is no /var/log/mcelog, nor do I have a mcelog or mced binary, nor does yum seem to contain anything related (based on yum whatprovides '*/mcelog' and similar queries). Thus, I still don't

Re: [CentOS] Machine check events

2013-11-26 Thread Stephen Harris
On Tue, Nov 26, 2013 at 09:25:55AM -0300, Glenn Eychaner wrote: Further investigation seems to indicate that these events should be handled by mcelog or mced. However, there is no /var/log/mcelog, nor do I have a mcelog or mced binary, nor does yum seem to contain anything related (based on

Re: [CentOS] Machine check events

2013-11-26 Thread John Doe
From: Glenn Eychaner geycha...@mac.com Further investigation seems to indicate that these events should be handled by mcelog or mced. However, there is no /var/log/mcelog, nor do I have a mcelog or mced binary, nor does yum seem to contain anything related (based on yum whatprovides

Re: [CentOS] Machine check events

2013-11-26 Thread Glenn Eychaner
On further, further investigation, it looks like according to the mcelog install guide at http://www.mcelog.org/installation.html, I could roll my own for 32-bit CentOS 6: For bad page offlining you will need a 2.6.33+ kernel or a 2.6.32 kernel with the soft offlining capability backported (like

Re: [CentOS] Machine check events

2013-11-26 Thread Patrick Lists
On 11/26/2013 03:11 PM, Glenn Eychaner wrote: [snip] The current kernel I am running is 2.6.32-358.23.2, but I can't tell whether it has CONFIG_X86_MCE enabled. How can I find this out? $ grep CONFIG_X86_MCE /boot/config-2.6.32-358.23.2.el6.x86_64 CONFIG_X86_MCE=y CONFIG_X86_MCE_INTEL=y

[CentOS] Machine check events

2013-11-25 Thread Glenn Eychaner
On my new Haswell-based machines, I am occasionally seeing entries like the following in /var/log/messages: kernel: [Hardware Error]: Machine check events logged (I would not have even noticed them, except that they get flagged by logwatch.) These messages always occur alone, and don't