On 15 Feb 2024 10:41 -0500, from wande...@fastmail.fm (The Wanderer): >> 65,000 hard links seems to be an ext4 limit: >> >> https://www.linuxquestions.org/questions/linux-kernel-70/max-hard-link-per-file-on-ext4-4175454538/#post4914624 > > That sounds right. > >> I believe ZFS can do more hard links. (Much more? Limited by >> available storage space?) > > I'm not sure, but I'll have to look into that, when I get to the point > of trying to set up that tiered backup.
ZFS can definitely do more; I ran a background loop hardlinking a single file on a new pool while typing up this email, and toward the end, it's at >75K and still going strong. That consumed about 5 MB of storage. >> Data integrity validation is tough without a mechanism. Adding an >> rsnapshot(1) postexec MD5SUMS, etc., file into the root of each >> backup tree could solve this need, but could waste a lot of time and >> energy checksumming files that have not changed. > > AFAIK, all such things require you to be starting from a point with a > known-good copy of the data, which is a luxury I don't currently have > (as far as validating my current data goes). It's something to keep in > mind when planning a more proper backup system, however. What you do have is a _current_ state. Being able to detect unintended changes from that state may be beneficial even if the current state isn't known-perfect. >> One of the reasons I switched to ZFS was because ZFS has built-in >> data and metadata integrity checking (and repair; depending upon >> redundancy). > > I'm not sure I understand how this would be useful in the case I have at > hand; that probably means that I'm not understanding the picture properly. Unless you go out of your way to turn off checksumming beforehand, ZFS will refuse to let you read a block where the checksum doesn't match the block's payload data. Meaning that if you're able to _read_ a block of data by normal means, you can be certain that the probability of it not matching what was originally written to disk to be _very_ low. ZFS will also automatically repair any repairable error it detects. In a redundant setup, this is almost everything; in a non-redundant setup, it's rather less, but still more than nothing. >> rsync(1) should be able to copy backups onto an external HDD. > > Yeah, but that only provides one tier of backup; the advantage of > rsnapshot (or similar) is the multiple deduplicated tiers, which gives > you options if it turns out the latest backup already included the > damage you're trying to recover from. rsnapshot is largely a front-end for rsync --link-dest=<something>. It does make a few things easier but there isn't much you can do with rsnapshot that you can't do with rsync and a little shell scripting if you're willing to live with a specialized tool for your purposes. rsnapshot is generic. > (USB-3 will almost certainly not be a viable option for an automatic > scheduled backup of the sort rsnapshot's documentation suggests, because > the *fastest* backup cycle I saw from my working with the data I had was > over three hours, and the initial pass to copy the data out to the drive > in the first place took nearly *20* hours. A cron job to run even an > incremental backup even once a day, much less the several times a day > suggested for the deeper rsnapshot tiers, would not be *remotely* > workable in that sort of environment. Though on the flip side, that's > not just a USB-3 bottleneck, but also the bottleneck of the spinning > mechanical hard drive inside the external case...) I think rsnapshot's suggested backup schedule is excessively frequent for pretty much anything more than a relatively small home directory. In my case rsnapshot runs for several hours, much of which is likely for checking file metadata for updates; I run backups once a day and there is no realistic way that enough data is modified each day to take that long to copy. I recently wrote a script to take advantage of ZFS snapshots to get a basically point-in-time atomic snapshot of the data onto the backup drive, even in the presence of live changes while the backup is running. (It's not necessarily _quite_ point-in-time atomic because I have two ZFS pools plus an ext4 file system; but it's close enough to be a workable approximation.) -- Michael Kjörling 🔗 https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”