On 8/10/21 7:51 PM, Celejar wrote:
On Tue, 10 Aug 2021 17:35:32 -0700
David Christensen <dpchr...@holgerdanske.com> wrote:

On 8/10/21 12:56 PM, Dan Ritter wrote:
David Christensen wrote:
On 8/10/21 8:04 AM, Leandro Noferini wrote:

https://wiki.debian.org/ZFS

...

- ECC memory is safer than non-ECC memory.

This is true, but there is nothing that makes ZFS more dangerous
than another filesystem using non-ECC memory.


I think the amount of danger depends upon how you do your risk
assessment math.  I find used entry-level server hardware with ECC
memory to be desirable for additional reasons.

Dan's point is that while ECC memory is indeed safer than non-ECC
memory, this is true whether one is using ZFS or some other filesystem;
furthermore, with or without ECC memory, there's no reason to believe
that ZFS is less safe than the alternative.

See:

https://arstechnica.com/information-technology/2020/05/zfs-101-understanding-zfs-storage-and-performance/?comments=1&post=38877683
https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/

So while ECC memory is always good, it's not a consideration when
trying to choose between ZFS and other filesystems.


I see two sets of choices:

1.  Memory integrity:

    a.  No error checking or correcting -- non-ECC.

    b.  Error checking and correcting -- ECC.

2.  Operating system storage stack data integrity:

    a.  No data integrity -- md, LVM, ext*, FAT, NTFS.

    b.  Data integrity -- dm-integrity, btrfs, ZFS.


There are four combinations of the above. I order them from highest risk (A) to lowest risk (D) as follows:

A.  Non-ECC memory (1a) and data integrity (2b)

B.  Non-ECC memory (1a) and no data integrity (2a)

C.  ECC memory (1b) and no data integrity (2a)

D.  ECC memory (1b) and data integrity (2b)


I have seen a few computers with failing non-ECC memory and no OS storage stack data integrity (case B). It might take weeks or months to identify the problem. If those computers had had OS storage stack data integrity with automatic correction (case A), the "scrub of death" is the logical outcome (failure modes and effects analysis); it's just a question of time. Given the eventual catastrophic outcome (fault hazard analysis), I see a significant difference in risk between A and B.


I started buying ECC machines specifically for ZFS a few years ago (case D), and suffered through a rash of drive, rack, cable, and/or HBA failures. Given RAID, ZFS snapshots, backups, etc.,, I replaced bad drives, fixed connections, resilvered, restored, verified, etc., with minimal loss. If I had chosen md, LVM, and ext4 instead (case C), there would still be hardware checksums inside the drives, hardware checksums on the connections, and memory checksums. So, the risk difference C-D is less pronounced than A-B.


Holding the data integrity choice constant and comparing memory choices (cases A-D and cases B-C), I see more risk with non-ECC memory and less risk with ECC memory for both data integrity choices.


So, I do consider memory when choosing the storage stack. Furthermore, my OS storage stack data integrity choice with non-ECC memory is the opposite of my choice with ECC memory. My desktops and laptops have non-ECC and ext4 (case B). My servers have ECC and ZFS (case D).


Therefore, my suggestion of ZFS on RPi contradicts my own practice.  :-/


David

Reply via email to