Hi, On Sun, Jun 30, 2024 at 07:36:58AM -0700, pe...@easthope.ca wrote: > >From https://rsnapshot.org/ > > rsnapshot is a filesystem snapshot utility ... > > Rather than a snapshot of the extant file system, I want to keep a > history of the files in the file system.
You should read more than one line of a page. That is exactly what it is intended for. Snapshots become history when you keep multiple of them. I have used rsnapshot a lot (decades worth of use) and it's good but it is not perfect (nothing is). It is probably a much better backup system than anything one can typically come up with by hand at short order, but here are some of its downsides: - No built in compression or encryption. You can implement these yourself using filesystem features. - Since it uses hardlinks for deduplication, this brings with it some inherent limitations: - The filesystem you use must support hardlinks - All versions of a file will have the same metadata (mtime, permissions, ownership, etc) because hardlinks must have the same metadata. As a consequence, any change of metadata will result in two separate files being stored (not hardlinked together) in order to represent that change. Even if the files have identical content. - Changing one byte of a file results in the storage of two separate full copies of the two versions of the file. With hardlinks either the file is entirely the same or it needs to not be a hardlink. This makes rsnapshot and things like it particularly bad for backing up large append-only files like log files. - rsnapshot only compares versions of a file at the same path and point in time. So for example /path/to/foo is only ever compared against /path/to/foo *from the previous backup run*. Other copies of foo anywhere else on the system being backed up, or from other systems being backed up, or from a backup run previous to the most recent, will not be considered so will not be hardlinked together. A typical system has a lot of duplicate files and once you start backing up multiple systems there tends to be an explosion of duplicate data. rsnapshot will not handle any of this specially and will just store it all. It is possible to improve this by for example running an external deduplication tool over the backups, or using deduplication facilities of a filesystem like zfs¹. This must be done carefully otherwise the workings of rsnapshot can be disrupted. - rsnapshot must walk through the entire previous backup to compare all the content of the files to the content of the new files. This is quite expensive and will involve tons of random seeks which is a killer for rotational storage media. Once you get to several million inodes in a backup run, you may find a run of rsnapshot taking several hours. On the other hand, rsnapshot's huge plus point is that everything is stored in a tree of files and hardlinks so it can just be explored and restored with normal filesystem tools. You don't need any part of rsnapshot to access and restore your content. That is such a good feature that many people feel able to overlook the negatives. More featureful backup systems chunk backup content up and store it by a has of its content, which tends to bring advantages like: - Never needing to store the same chunk twice no matter where (or when) it came from - Easy to compress and encrypt - Locating which data is in which chunk gets done by a database, not by random access to a filesystem, so it's much faster. When you say "I want /path/to/foo from a week ago, but also show me every copy you have going back 3 years", that is a database query, not a walk of a filesystem with potentially several million inodes in it. But, by doing that you lose the ability to just cp a file from your backups. Thanks, Andy ¹ Though someone heavily in to an advanced filesystem like zfs may be more inclined to take advantage of zfs's proper snapshot capabilities (and zfs-send to move them off-site) than use rsnapshot on it. -- https://bitfolk.com/ -- No-nonsense VPS hosting