Re: SMART Uncorrectable_Error_Cnt rising - should I be worried?

Michael Kjörling Thu, 15 Feb 2024 09:39:49 -0800

On 15 Feb 2024 10:41 -0500, from wande...@fastmail.fm (The Wanderer):
>> 65,000 hard links seems to be an ext4 limit:
>> 
>> https://www.linuxquestions.org/questions/linux-kernel-70/max-hard-link-per-file-on-ext4-4175454538/#post4914624
> 
> That sounds right.
> 
>> I believe ZFS can do more hard links. (Much more?  Limited by
>> available storage space?)
> 
> I'm not sure, but I'll have to look into that, when I get to the point
> of trying to set up that tiered backup.


ZFS can definitely do more; I ran a background loop hardlinking a
single file on a new pool while typing up this email, and toward the
end, it's at >75K and still going strong. That consumed about 5 MB of
storage.


>> Data integrity validation is tough without a mechanism.  Adding an
>> rsnapshot(1) postexec MD5SUMS, etc., file into the root of each
>> backup tree could solve this need, but could waste a lot of time and
>> energy checksumming files that have not changed.
> 
> AFAIK, all such things require you to be starting from a point with a
> known-good copy of the data, which is a luxury I don't currently have
> (as far as validating my current data goes). It's something to keep in
> mind when planning a more proper backup system, however.

What you do have is a _current_ state. Being able to detect unintended
changes from that state may be beneficial even if the current state
isn't known-perfect.


>> One of the reasons I switched to ZFS was because ZFS has built-in
>> data and metadata integrity checking (and repair; depending upon
>> redundancy).
> 
> I'm not sure I understand how this would be useful in the case I have at
> hand; that probably means that I'm not understanding the picture properly.

Unless you go out of your way to turn off checksumming beforehand, ZFS
will refuse to let you read a block where the checksum doesn't match
the block's payload data. Meaning that if you're able to _read_ a
block of data by normal means, you can be certain that the probability
of it not matching what was originally written to disk to be _very_
low.

ZFS will also automatically repair any repairable error it detects. In
a redundant setup, this is almost everything; in a non-redundant
setup, it's rather less, but still more than nothing.


>> rsync(1) should be able to copy backups onto an external HDD.
> 
> Yeah, but that only provides one tier of backup; the advantage of
> rsnapshot (or similar) is the multiple deduplicated tiers, which gives
> you options if it turns out the latest backup already included the
> damage you're trying to recover from.

rsnapshot is largely a front-end for rsync --link-dest=<something>. It
does make a few things easier but there isn't much you can do with
rsnapshot that you can't do with rsync and a little shell scripting if
you're willing to live with a specialized tool for your purposes.
rsnapshot is generic.


> (USB-3 will almost certainly not be a viable option for an automatic
> scheduled backup of the sort rsnapshot's documentation suggests, because
> the *fastest* backup cycle I saw from my working with the data I had was
> over three hours, and the initial pass to copy the data out to the drive
> in the first place took nearly *20* hours. A cron job to run even an
> incremental backup even once a day, much less the several times a day
> suggested for the deeper rsnapshot tiers, would not be *remotely*
> workable in that sort of environment. Though on the flip side, that's
> not just a USB-3 bottleneck, but also the bottleneck of the spinning
> mechanical hard drive inside the external case...)

I think rsnapshot's suggested backup schedule is excessively frequent
for pretty much anything more than a relatively small home directory.
In my case rsnapshot runs for several hours, much of which is likely
for checking file metadata for updates; I run backups once a day and
there is no realistic way that enough data is modified each day to
take that long to copy.

I recently wrote a script to take advantage of ZFS snapshots to get a
basically point-in-time atomic snapshot of the data onto the backup
drive, even in the presence of live changes while the backup is
running. (It's not necessarily _quite_ point-in-time atomic because I
have two ZFS pools plus an ext4 file system; but it's close enough to
be a workable approximation.)

-- 
Michael Kjörling                     🔗 https://michael.kjorling.se
“Remember when, on the Internet, nobody cared that you were a dog?”

Re: SMART Uncorrectable_Error_Cnt rising - should I be worried?

Reply via email to