On Apr 18, 2012, at 5:55 32PM, Douglas Otis wrote:

> On 4/18/12 12:35 PM, Jeroen van Aart wrote:
>> Laurent GUERBY wrote:
>> > Do you have reference to recent papers with experimental data about
>> > non ECC memory errors? It should be fairly easy to do
>> Maybe this provides some information:
>> 
>> http://en.wikipedia.org/wiki/ECC_memory#Problem_background
>> 
>> "Work published between 2007 and 2009 showed widely varying error
>> rates with over 7 orders of magnitude difference, ranging from
>> 10−10−10−17 error/bit·h, roughly one bit error, per hour, per
>> gigabyte of memory to one bit error, per century, per gigabyte of
>> memory.[2][4][5] A very large-scale study based on Google's very
>> large number of servers was presented at the
>> SIGMETRICS/Performance’09 conference.[4] The actual error rate found
>> was several orders of magnitude higher than previous small-scale or
>> laboratory studies, with 25,000 to 70,000 errors per billion device
>> hours per megabit (about 3–10×10−9 error/bit·h), and more than 8% of
>> DIMM memory modules affected by errors per year."
> Dear Jeroen,
> 
> In the work that led up to RFC3309, many of the errors found on the Internet 
> pertained to single interface bits, and not single data bits.  Working at a 
> large chip manufacturer that removed internal memory error detection to 
> foolishly save space, cost them dearly in then needing to do far more 
> exhaustive four corner testing.  Checksums used by TCP and UDP are able to 
> detect single bit data errors, but may miss as much as 2% of single interface 
> bit errors.  It would be surprising to find memory designs lacking internal 
> error detection logic.


mallet:~ smb$ head -14 doc/ietf/rfc/rfc3309.txt | sed 1,7d | sed 2,5d; date
Request for Comments: 3309                                      Stanford
                                                          September 2002

Wed Apr 18 23:07:53 EDT 2012


We are not in a static field...  (3309 is one of my favorite RFCs -- but
the specific findings (errors happen more often than you think), as
opposed the general lesson (understand your threat model) may be OBE.


                --Steve Bellovin, https://www.cs.columbia.edu/~smb






Reply via email to