On 9/5/23 07:34, Michael Kjörling wrote:
On 4 Sep 2023 13:57 -0700, from dpchr...@holgerdanske.com (David Christensen):
* I am using zfs-auto-snapshot(8) for snapsnots. Are you using rsnapshot(1)
for snapshots?
No. I'm using ZFS snapshots on the source, but not for backup
purposes. (I have contemplated doing that, but it would increase
complexity a fair bit.) The backup target is not snapshotted at the > block
storage or file system level; however, rsync --link-dest uses
hardlinks to deduplicate whole files.
+1 for complexity of ZFS backups via snapshots and replication.
My question was incongruous, as "snapshot" has different meanings for
ZFS and rsnapshot(1):
* https://docs.oracle.com/cd/E18752_01/html/819-5461/ftyue.html
snapshot
A read-only copy of a file system or volume at a given point in
time.
* https://rsnapshot.org/rsnapshot/docs/docbook/rest.html
Using rsnapshot, it is possible to take snapshots of your
filesystems at different points in time.
As I understand your network topology and backup strategy, it appears
that you are using rsnapshot(1) for snapshots (in the rsnapshot(1) sense
of the term).
* du(1) of the backup file system matches ZFS properties 'referenced' and
'usedbydataset'.
This would be expected, depending on exact specifics (what data du
traverses over and what your ZFS dataset layout is). To more closely
match the the _apparent_ size of the files, you'd look at e.g.
logicalreferenced or logicalused.
* I am unable to correlate du(1) of the snapshots to any ZFS properties --
du(1) reports much more storage than ZFS 'usedbysnapshots', even when scaled
by 'compressratio'.
This would also be expected, as ZFS snapshots are copy-on-write and
thus in effect only bookkeep a delta, whereas du counts the apparent
size of all files accessible under a path and ZFS snapshots allow
access to all files within the file system as they appeared at the
moment the snapshot was created. There are nuances and caveats
involved but, as a first approximation, immediately after taking a ZFS
snapshot the size of the snapshot is zero (plus a small amount of
metadata overhead for the snapshot itself) regardless of the size of
the underlying dataset, and the apparent size of the snapshot grows as
changes are made to the underlying dataset which cause some data to be
referenced only by the snapshot.
In general, ZFS disk space usage accounting for snapshots is really
rather non-intuitive, but it does make more sense when you consider
that ZFS is a copy-on-write file system and that snapshots largely
boil down to an atomic point-in-time marker for dataset state.
Okay. My server contains one backup ZFS file system for each host on my
network. So, the 'logicalreferenced', 'logicalused', and
'usedbysnapshots' properties I posted for one host's backup file system
are affected by the ZFS pool aggregate COW, compression, and/or
deduplcation features.
(In ZFS, a dataset can be either a file system optionally exposed at a
directory mountpoint or a volume exposed as a block device.)
I try to use ZFS vocabulary per the current Oracle WWW documentation
(but have found discrepancies). I wonder if ZFS-on-Linux and/or OpenZFS
have diverged (e.g. 'man zfs' on Debian, etc.):
https://docs.oracle.com/cd/E18752_01/html/819-5461/ftyue.html
"A generic name for the following ZFS components: clones, file
systems, snapshots, and volumes."
David