On 4/18/12 12:35 PM, Jeroen van Aart wrote:
Laurent GUERBY wrote:
> Do you have reference to recent papers with experimental data about
> non ECC memory errors? It should be fairly easy to do
Maybe this provides some information:
http://en.wikipedia.org/wiki/ECC_memory#Problem_background
"Work published between 2007 and 2009 showed widely varying error
rates with over 7 orders of magnitude difference, ranging from
10−10−10−17 error/bit·h, roughly one bit error, per hour, per
gigabyte of memory to one bit error, per century, per gigabyte of
memory.[2][4][5] A very large-scale study based on Google's very
large number of servers was presented at the
SIGMETRICS/Performance’09 conference.[4] The actual error rate found
was several orders of magnitude higher than previous small-scale or
laboratory studies, with 25,000 to 70,000 errors per billion device
hours per megabit (about 3–10×10−9 error/bit·h), and more than 8% of
DIMM memory modules affected by errors per year."
Dear Jeroen,
In the work that led up to RFC3309, many of the errors found on the
Internet pertained to single interface bits, and not single data bits.
Working at a large chip manufacturer that removed internal memory error
detection to foolishly save space, cost them dearly in then needing to
do far more exhaustive four corner testing. Checksums used by TCP and
UDP are able to detect single bit data errors, but may miss as much as
2% of single interface bit errors. It would be surprising to find
memory designs lacking internal error detection logic.
Regards,
Douglas Otis