Re: Re (3): Backup.

Andy Smith Sun, 30 Jun 2024 08:38:03 -0700

Hi,

On Sun, Jun 30, 2024 at 07:36:58AM -0700, pe...@easthope.ca wrote:
> >From https://rsnapshot.org/
> > rsnapshot is a filesystem snapshot utility ...
> 
> Rather than a snapshot of the extant file system, I want to keep a 
> history of the files in the file system.


You should read more than one line of a page. That is exactly what
it is intended for. Snapshots become history when you keep multiple
of them.

I have used rsnapshot a lot (decades worth of use) and it's good but
it is not perfect (nothing is). It is probably a much better backup
system than anything one can typically come up with by hand at short
order, but here are some of its downsides:

- No built in compression or encryption. You can implement these
  yourself using filesystem features.

- Since it uses hardlinks for deduplication, this brings with it
  some inherent limitations:

  - The filesystem you use must support hardlinks

  - All versions of a file will have the same metadata (mtime,
    permissions, ownership, etc) because hardlinks must have the
    same metadata. As a consequence, any change of metadata will
    result in two separate files being stored (not hardlinked
    together) in order to represent that change. Even if the files
    have identical content.

  - Changing one byte of a file results in the storage of two
    separate full copies of the two versions of the file. With
    hardlinks either the file is entirely the same or it needs to
    not be a hardlink. This makes rsnapshot and things like it
    particularly bad for backing up large append-only files like log
    files.

- rsnapshot only compares versions of a file at the same path and
  point in time. So for example /path/to/foo is only ever compared
  against /path/to/foo *from the previous backup run*. Other copies
  of foo anywhere else on the system being backed up, or from other
  systems being backed up, or from a backup run previous to the most
  recent, will not be considered so will not be hardlinked together.

  A typical system has a lot of duplicate files and once you start
  backing up multiple systems there tends to be an explosion of
  duplicate data. rsnapshot will not handle any of this specially
  and will just store it all.

  It is possible to improve this by for example running an external
  deduplication tool over the backups, or using deduplication
  facilities of a filesystem like zfs¹. This must be done carefully
  otherwise the workings of rsnapshot can be disrupted.

- rsnapshot must walk through the entire previous backup to compare
  all the content of the files to the content of the new files. This
  is quite expensive and will involve tons of random seeks which is
  a killer for rotational storage media. Once you get to several
  million inodes in a backup run, you may find a run of rsnapshot
  taking several hours.

On the other hand, rsnapshot's huge plus point is that everything is
stored in a tree of files and hardlinks so it can just be explored
and restored with normal filesystem tools. You don't need any part
of rsnapshot to access and restore your content. That is such a good
feature that many people feel able to overlook the negatives.

More featureful backup systems chunk backup content up and store it
by a has of its content, which tends to bring advantages like:

- Never needing to store the same chunk twice no matter where (or
  when) it came from

- Easy to compress and encrypt

- Locating which data is in which chunk gets done by a database,
  not by random access to a filesystem, so it's much faster. When
  you say "I want /path/to/foo from a week ago, but also show me
  every copy you have going back 3 years", that is a database query,
  not a walk of a filesystem with potentially several million inodes
  in it.

But, by doing that you lose the ability to just cp a file from your
backups.

Thanks,
Andy

¹ Though someone heavily in to an advanced filesystem like zfs may
  be more inclined to take advantage of zfs's proper snapshot
  capabilities (and zfs-send to move them off-site) than use
  rsnapshot on it.

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting

Re: Re (3): Backup.

Reply via email to