As this topic comes up reasonably often on the list, I thought others might be interested in this:
http://arstechnica.com/business/news/2009/10/dram-study-turns-assumptions-about-errors-upside-down.ars?utm_source=rss&utm_medium=rss&utm_campaign=rss Basically, the takeaway is that RAM errors are pretty common, especially on machines under high load (like Hadoop clusters). ECC RAM is important to catch them, and the error rate gets way worse at 20 months. -Todd