-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 11/20/2014 08:24 PM, Rasmus Liland wrote: > On 2014-11-19 22:53, Rasmus Liland wrote: >> On 2014-11-19 21:41, Mark Lee wrote: >>> >>> To Rasmus, >>> >>> Can you run the parts where it says "run the abvoe through >>> mcelog --ascii" and post the contents? >>> >>> Regards, Mark >>> >> >> I'm attaching the output of mcelog to this message. However, I'm >> unsure of the usefulness of the output. >> > > I checked dmesg now after having uptime of ... >> rasmus@angrist ~ % uptime 02:04:01 up 1 day, 7:35, 1 user, >> load average: 0.04, 0.15, 0.40 rasmus@angrist ~ % uname -a Linux >> angrist 3.11.5-1-ARCH #1 SMP PREEMPT Mon Oct 14 08:31:43 CEST >> 2013 x86_64 GNU/Linux > > ... about 26 hours. It seems after about 19 hours some (possibly) > temperature related were causing mce hardware errors over a ten > minute interval: >> [70133.209654] mce: [Hardware Error]: Machine check events >> logged [70376.833053] CPU2: Core temperature above threshold, cpu >> clock throttled (total events = 30628) [70376.833056] CPU3: Core >> temperature above threshold, cpu clock throttled (total events = >> 30628) [70376.833061] CPU3: Package temperature above threshold, >> cpu clock throttled (total events = 174126) [70376.833070] CPU2: >> Package temperature above threshold, cpu clock throttled (total >> events = 174126) [70376.833074] CPU1: Package temperature above >> threshold, cpu clock throttled (total events = 174126) >> [70376.833077] CPU0: Package temperature above threshold, cpu >> clock throttled (total events = 174124) [70376.835060] CPU3: Core >> temperature/speed normal [70376.835064] CPU2: Core >> temperature/speed normal [70376.835070] CPU2: Package >> temperature/speed normal [70376.835074] CPU3: Package >> temperature/speed normal [70376.835087] CPU1: Package >> temperature/speed normal [70376.835090] CPU0: Package >> temperature/speed normal [70433.353800] mce: [Hardware Error]: >> Machine check events logged [70676.969501] CPU2: Core >> temperature/speed normal [70676.969505] CPU3: Core >> temperature/speed normal [70676.969511] CPU0: Package temperature >> above threshold, cpu clock throttled (total events = 198545) >> [70676.969516] CPU1: Package temperature above threshold, cpu >> clock throttled (total events = 198547) [70676.969522] CPU3: >> Package temperature above threshold, cpu clock throttled (total >> events = 198547) [70676.969545] CPU2: Package temperature above >> threshold, cpu clock throttled (total events = 198547) >> [70676.970519] CPU0: Package temperature/speed normal >> [70676.970522] CPU2: Package temperature/speed normal >> [70676.970524] CPU3: Package temperature/speed normal >> [70676.970526] CPU1: Package temperature/speed normal >> [70733.497978] mce: [Hardware Error]: Machine check events >> logged > > As the system did not reboot, it were able to self heal. >
To Rasmus, Can you run a logger to find out which programs causing your cpu temperatures to rise? Regards, Mark -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iF4EAREIAAYFAlRumB8ACgkQZ/Z80n6+J/YI8gD/bN3dHoENwzLxK33lS0GCF2zs cn+8X3TDDqIMWSe8lEQBAJLcUwazQrJS7R4qTOZo8gbk2NE9wSoAo1t1jaeoolCB =mirr -----END PGP SIGNATURE-----