> In thinking about this, I began to wonder about the following. Suppose > that a (possibly RAID) disk controller correctly reads data from disk and > has correct data in the controller memory and buffers. However when that > data is DMA'd into system memory some errors occur (cosmic rays, > electrical noise, etc). Am I correct that these errors would NOT be > detected, even on a 'reliable' server with ECC memory? In other words the > ECC bits would be calculated in server memory based on incorrect data from > the disk.
Architecture specific. > The alternative is that disk controllers (or at least ones that are meant > to be reliable) DMA both the data AND the ECC byte into system memory. > So that if an error occurs in this transfer, then it would most likely be > picked up and corrected by the ECC mechanism. But I don't think that > 'this is how it works'. Could someone knowledgable please confirm or > contradict? Its almost entirely device specific at every level. Some general information and comment however - Drives normally do error correction and shouldn't be fooled very often by bad bits. - The ECC level on the drive processors and memory cache vary by vendor. Good luck getting any information on this although maybe if you are Cern sized they will talk After the drive we cross the cable. For SATA this is pretty good, and UDMA data transfer is CRC protected. For PATA the data is but not the command block so on PATA there is a minute chance you send the CRC protected block to the wrong place Once its crossing the PCI bus and main memory and CPU cache its entirely down to the system you are running what is protected and how much. Note that a lot of systems won't report ECC errors unless you ask. If you have hardware RAID controllers its all vendor specific including CPU cache etc on the card etc. The next usual mess is network transfers. The TCP checksum strength is questionable for such workloads but the ethernet one is pretty good. Unfortunately lots of high performance people use checksum offload which removes much of the end to end protection and leads to problems with iffy cards and the like. This is well studied and known to be very problematic but in the market speed sells not correctness. >From the paper type II sounds like slab might be a candidate kernel side but also CPU bugs as near OOM we will be paging hard and any L2 cache page out/page table race from software or hardware would fit what it describes, especially the transient nature Type III wrong block on PATA fits with the fact the block number isn't protected and also the limits on the cache quality of drives/drive firmware bugs. For drivers/ide there are *lots* of problems with error handling so that might be implicated (would want to do old v new ide tests on the same h/w which would be very intriguing). Stale data from disk cache I've seen reported, also offsets from FIFO hardware bugs (The LOTR render farm hit the latter and had to avoid UDMA to avoid a hardware bug) Chunks of zero sounds like caches again, would be interesting to know what hardware changes occurred at the point they began to pop up and what software. We also see chipset bugs under high contention some of which are explained and worked around (VIA ones in the past), others we see are clear correlations - eg between Nvidia chipsets and Silicon Image SATA controllers. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/