-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sun, May 07, 2017 at 10:53:51AM +1200, Ben Caradoc-Davies wrote:
[also a reply to Henrique, elsewhere in this thread] > If a file is updated while it is being copied, it may contain only > half a change set and be in an internally inconsistent state, > perhaps making it unusable as a backup. Writes are typically not > atomic. The same problem applies to collections of files that > reference each other. > > Kind regards, Ben, Henrique -- no questions. This is what I subsumed under "skew": application state may be dispersed across different places in a file, across different files or even partly not in files at all (e.g. in RAM: imagine a BTree with just parts of its pointer structure not yet committed to disk). Of course you can't ever win unless you collaborate with the application in those cases (even with magic file systems like ZFS or btrfs). Then there is this subtle "file data" and "file metadata" thing, which is an issue even with carefully designed applications and file systems. It's even difficult to reach a consensus on what is "right", remember the ext3/ext4 data loss episode[1]? This is where shapshotting magic, be it built-in (zfs, btrfs) or bolted-on (overlayfs, lvm) might help a bit: freeze a snapshot, back up that (in the first case, the file systems provide a native way to do that, in the second case, rsync is a pretty viable way of doing things). I said "might help a bit" because the ultimate consistency criterion is the application! A consistent file system view might just be this truncated-to-zero file, only the application "knows" at that point (e.g. by keeping its data in an already unlinked file which is still open, or somewhere in RAM, or...). So your choices are - for the applications you really care about, look into what they are doing. Grown up apps will support you in that (I gave the PostgreSQL example above). Typically you can wrap the backup process in guards like ("keep your on-disk state consistent"[1]..."now you can relax"). Note that to avoid races this structure is more or less necessary. The only real difference to the "magic snapshot" thing is that the latter happens very quickly. - for all the others... just relax. Otherwise, "on line" backup is simply not an option. cheers [1] That doesn't mean necessarily frozen. PostgreSQL, for example, continues writing to the WAL, it just eats through its storage at a higher pace. - -- tomás -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEARECAAYFAlkO7SgACgkQBcgs9XrR2kYf9ACeM2njgrSttOUPRk4D6fJqJtjQ qmkAn38VbkKiOlADe+33teN8uzcbLa2C =uKNk -----END PGP SIGNATURE-----