krys...@ibse.cz wrote: 
> Dear Debian community,
> we recently started using AMD Ryzen CPUs, ASRock Rack motherboards and 
> Kingston unbuffered ECC DIMMs for our small bussiness servers. All the 
> servers are running on ZFS for which ECC memory is recommended. So I naively 
> tried to test it actually works. I read EVERY disscussion on EVERY forum I 
> was able to find (and there is a lot of them, believe me), but I did not find 
> a satisfying answer. According to the legendary tweet from AMD (for which is 
> link in every discussion), the Ryzen CPUs should support ECC memory, but it 
> is not tested feature since they are consumer CPUs. Funny thing is, that 
> according to their spec sheets even EPYC class CPUs do not support them (only 
> CPUs with stated ECC support I found are Ryzen Embedded ones - for example 
> the V1605B in UDOO Bolt). Nevertheless system reports it works -  dmidecode, 
> lshw, kernel loads driver and EDAC MC is present in 
> /sys/devices/system/edac/mc, even memtest86+ v6.0 and above reports ECC 
> memory. In forum discussions Intel guys are saying that correctable ECC 
> errors are relatively common - stated counts vary, but I got the impression 
> that at least one in a week should appear. And our virtual hypervisor running 
> over half a year with more than 80% memory utilization has not a single one, 
> niether in sysfs nor in EUFI event log. I understand that the errror count 
> rises with height above mean sea level due to solar radiation and we are in 
> 246m altitude, but at least one error would be nice.
> The only thing I had success with was memory overclocking - I lowered timing 
> as low as possible for system to POST and when Debian was running, it 
> reported corectable errors from different memory regions (13 during 30 
> minutes). Rising memory frequency did not work. But all this was done on Asus 
> motherboard, with same memory and CPU however. When I change any memory 
> related setting on ASRock Rack motherboard, it will not POST.
> In kernel documentation is described that Intel CPUs have ability to inject 
> errors for driver testing but I did not find anything like it for AMD. Does 
> anyone know any way to test that ECC works without breaking the system 
> before? Thank you for your answers.
> 
> PS: Some commercial memtests should allegedly be able to inject ECC errors 
> (for example the one from passmark), have anyone tried those?


We see ECC errors irregularly and infrequently on both Intel and
AMD CPUs. One a week would be very concerning if we're talking
about one system, but not too concerning if we are discussing a
thousand systems.

-dsr-

Reply via email to