On 4/30/2019 09:12, Alan Somers wrote: > On Tue, Apr 30, 2019 at 8:05 AM Michelle Sullivan <miche...@sorbs.net> wrote: > . >> I know this... unless I misread Karl’s message he implied the ECC would have >> saved the corruption in the crash... which is patently false... I think >> you’ll agree.. > I don't think that's what Karl meant. I think he meant that the > non-ECC RAM could've caused latent corruption that was only detected > when the crash forced a reboot and resilver.
Exactly. Non-ECC memory means you can potentially write data to *all* copies of a block (and its parity in the case of a Raidz) where the checksum is invalid and there is no way for the code to know it happened or defend against it. Unfortunately since the checksum is very small compared to the data size the odds are that IF that happens it's the *data* and not the checksum that's bad and there are *no* good copies. Contrary to popular belief the "power good" signal on your PSU and MB do not provide 100% protection against transient power problems causing this to occur with non-ECC memory either. IMHO non-ECC memory systems are ok for personal desktop and laptop machines where loss of stored data requiring a restore is acceptable (assuming you have a reasonable backup paradigm for same) but not for servers and *especially* not for ZFS storage. I don't like the price of ECC memory and I really don't like Intel's practices when it comes to only enabling ECC RAM on their "server" class line of CPUs either but it is what it is. Pay up for the machines where it matters. One of the ironies is that there's better data *integrity* with ZFS than other filesystems in this circumstance; you're much more-likely to *know* you're hosed even if the situation is unrecoverable and requires a restore. With UFS and other filesystems you can quite-easily wind up with silent corruption that can go undetected; the filesystem "works" just fine but the data is garbage. From my point of view that's *much* worse. In addition IMHO consumer drives are not exactly safe for online ZFS storage. Ironically they're *safer* for archival use because when not actively in use they're dismounted and thus not subject to "you're silently hosed" sort of failures. What sort of "you're hosed" failures? Oh, for example, claiming to have flushed their cache buffers before returning "complete" on that request when they really did not! In combination with write re-ordering that can *really* screw you and there's nothing that any filesystem can defensively do about it either. This sort of "cheat" is much-more likely to be present in consumer drives than ones sold for either enterprise or NAS purposes and it's quite difficult to accurately test for this sort of thing on an individual basis too. -- Karl Denninger k...@denninger.net <mailto:k...@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/
smime.p7s
Description: S/MIME Cryptographic Signature