Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
Am Fri, 17 Nov 2017 06:51:52 +0300 schrieb Andrei Borzenkov: > 16.11.2017 19:13, Kai Krakow пишет: > ... > > > BTW: From user API perspective, btrfs snapshots do not guarantee > > perfect granular consistent backups. > > Is it documented somewhere? I was relying on crash-consistent > write-order-preserving snapshots in NetApp for as long as I remember. > And I was sure btrfs offers is as it is something obvious for > redirect-on-write idea. I think it has ordering guarantees, but it is not as atomic in time as one might think. That's the point. But devs may tell better. > > A user-level file transaction may > > still end up only partially in the snapshot. If you are running > > transaction sensitive applications, those usually do provide some > > means of preparing a freeze and a thaw of transactions. > > > > Is snapshot creation synchronous to know when thaw? I think you could do "btrfs snap create", then "btrfs fs sync", and everything should be fine. > > I think the user transactions API which could've been used for this > > will even be removed during the next kernel cycles. I remember > > reiserfs4 tried to deploy something similar. But there's no > > consistent layer in the VFS for subscribing applications to > > filesystem snapshots so they could prepare and notify the kernel > > when they are ready. > > I do not see what VFS has to do with it. NetApp works by simply > preserving previous consistency point instead of throwing it away. > I.e. snapshot is always last committed image on stable storage. Would > something like this be possible on btrfs level by duplicating current > on-disk root (sorry if I use wrong term)? I think btrfs gives the same consistency. But the moment you issue "btrfs snap create" may delay snapshot creation a little bit. So if your application relies on exact point in time snapshots, you need to ensure synchronizing your application to the filesystem. I think the same is true for NetApp. I just wanted to point that out because it may not be obvious, given that btrfs snapshot creation is built right into the tool chain of filesystem itself, unlike e.g. NetApp or LVM, or other storage layers. Background: A good while back I was told that btrfs snapshots during ongoing IO may result in some of the later IO carried over to before the snapshot. Transactional ordering of IO operations is still guaranteed but it may overlap with snapshot creation. So you can still loose a transaction you didn't expect to loose at that point in time. So I understood this as: If you just want to ensure transactional integrity of your database, you are all fine with btrfs snapshots. But if you want to ensure that a just finished transaction makes it into the snapshot completely, you have to sync the processes. However, things may have changed since then. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
16.11.2017 19:13, Kai Krakow пишет: ... > > BTW: From user API perspective, btrfs snapshots do not guarantee > perfect granular consistent backups. Is it documented somewhere? I was relying on crash-consistent write-order-preserving snapshots in NetApp for as long as I remember. And I was sure btrfs offers is as it is something obvious for redirect-on-write idea. > A user-level file transaction may > still end up only partially in the snapshot. If you are running > transaction sensitive applications, those usually do provide some means > of preparing a freeze and a thaw of transactions. > Is snapshot creation synchronous to know when thaw? > I think the user transactions API which could've been used for this > will even be removed during the next kernel cycles. I remember > reiserfs4 tried to deploy something similar. But there's no consistent > layer in the VFS for subscribing applications to filesystem snapshots > so they could prepare and notify the kernel when they are ready. > I do not see what VFS has to do with it. NetApp works by simply preserving previous consistency point instead of throwing it away. I.e. snapshot is always last committed image on stable storage. Would something like this be possible on btrfs level by duplicating current on-disk root (sorry if I use wrong term)? ... -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
Link 2 slipped away, adding it below... Am Tue, 14 Nov 2017 15:51:57 -0500 schrieb Dave: > On Tue, Nov 14, 2017 at 3:50 AM, Roman Mamedov wrote: > > > > On Mon, 13 Nov 2017 22:39:44 -0500 > > Dave wrote: > > > > > I have my live system on one block device and a backup snapshot > > > of it on another block device. I am keeping them in sync with > > > hourly rsync transfers. > > > > > > Here's how this system works in a little more detail: > > > > > > 1. I establish the baseline by sending a full snapshot to the > > > backup block device using btrfs send-receive. > > > 2. Next, on the backup device I immediately create a rw copy of > > > that baseline snapshot. > > > 3. I delete the source snapshot to keep the live filesystem free > > > of all snapshots (so it can be optimally defragmented, etc.) > > > 4. hourly, I take a snapshot of the live system, rsync all > > > changes to the backup block device, and then delete the source > > > snapshot. This hourly process takes less than a minute currently. > > > (My test system has only moderate usage.) > > > 5. hourly, following the above step, I use snapper to take a > > > snapshot of the backup subvolume to create/preserve a history of > > > changes. For example, I can find the version of a file 30 hours > > > prior. > > > > Sounds a bit complex, I still don't get why you need all these > > snapshot creations and deletions, and even still using btrfs > > send-receive. > > > Hopefully, my comments below will explain my reasons. > > > > > Here is my scheme: > > > > /mnt/dst <- mounted backup storage volume > > /mnt/dst/backup <- a subvolume > > /mnt/dst/backup/host1/ <- rsync destination for host1, regular > > directory /mnt/dst/backup/host2/ <- rsync destination for host2, > > regular directory /mnt/dst/backup/host3/ <- rsync destination for > > host3, regular directory etc. > > > > /mnt/dst/backup/host1/bin/ > > /mnt/dst/backup/host1/etc/ > > /mnt/dst/backup/host1/home/ > > ... > > Self explanatory. All regular directories, not subvolumes. > > > > Snapshots: > > /mnt/dst/snaps/backup <- a regular directory > > /mnt/dst/snaps/backup/2017-11-14T12:00/ <- snapshot 1 > > of /mnt/dst/backup /mnt/dst/snaps/backup/2017-11-14T13:00/ <- > > snapshot 2 > > of /mnt/dst/backup /mnt/dst/snaps/backup/2017-11-14T14:00/ <- > > snapshot 3 of /mnt/dst/backup > > > > Accessing historic data: > > /mnt/dst/snaps/backup/2017-11-14T12:00/host1/bin/bash > > ... > > /bin/bash for host1 as of 2017-11-14 12:00 (time on the backup > > system). > > > > > > No need for btrfs send-receive, only plain rsync is used, directly > > from hostX:/ to /mnt/dst/backup/host1/; > > > I prefer to start with a BTRFS snapshot at the backup destination. I > think that's the most "accurate" starting point. No, you should finish with a snapshot. Use the rsync destination as a "dirty" scratch area, let rsync also delete files which are no longer in the source. After successfully running rsync, make a snapshot of that directory and make it RO, leave the scratch in place (even when rsync dies or becomes killed). I once made some scripts[2] following those rules, you may want to adapt them. > > No need to create or delete snapshots during the actual backup > > process; > > Then you can't guarantee consistency of the backed up information. Take a temporary snapshot of the source, rsync to to the scratch destination, take a RO snapshot of that destination, remove the temporary snapshot. BTW: From user API perspective, btrfs snapshots do not guarantee perfect granular consistent backups. A user-level file transaction may still end up only partially in the snapshot. If you are running transaction sensitive applications, those usually do provide some means of preparing a freeze and a thaw of transactions. I think the user transactions API which could've been used for this will even be removed during the next kernel cycles. I remember reiserfs4 tried to deploy something similar. But there's no consistent layer in the VFS for subscribing applications to filesystem snapshots so they could prepare and notify the kernel when they are ready. > > A single common timeline is kept for all hosts to be backed up, > > snapshot count not multiplied by the number of hosts (in my case > > the backup location is multi-purpose, so I somewhat care about > > total number of snapshots there as well); > > > > Also, all of this works even with source hosts which do not use > > Btrfs. > > That's not a concern for me because I prefer to use BTRFS everywhere. At least I suggest looking into bees[1] to deduplicate the backup destination. Rsync is not very efficient to work with btrfs snapshots. It will break reflinks often and write inefficiently sized blocks, even with inplace option. Also,
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
Am Tue, 14 Nov 2017 15:51:57 -0500 schrieb Dave: > On Tue, Nov 14, 2017 at 3:50 AM, Roman Mamedov wrote: > > > > On Mon, 13 Nov 2017 22:39:44 -0500 > > Dave wrote: > > > > > I have my live system on one block device and a backup snapshot > > > of it on another block device. I am keeping them in sync with > > > hourly rsync transfers. > > > > > > Here's how this system works in a little more detail: > > > > > > 1. I establish the baseline by sending a full snapshot to the > > > backup block device using btrfs send-receive. > > > 2. Next, on the backup device I immediately create a rw copy of > > > that baseline snapshot. > > > 3. I delete the source snapshot to keep the live filesystem free > > > of all snapshots (so it can be optimally defragmented, etc.) > > > 4. hourly, I take a snapshot of the live system, rsync all > > > changes to the backup block device, and then delete the source > > > snapshot. This hourly process takes less than a minute currently. > > > (My test system has only moderate usage.) > > > 5. hourly, following the above step, I use snapper to take a > > > snapshot of the backup subvolume to create/preserve a history of > > > changes. For example, I can find the version of a file 30 hours > > > prior. > > > > Sounds a bit complex, I still don't get why you need all these > > snapshot creations and deletions, and even still using btrfs > > send-receive. > > > Hopefully, my comments below will explain my reasons. > > > > > Here is my scheme: > > > > /mnt/dst <- mounted backup storage volume > > /mnt/dst/backup <- a subvolume > > /mnt/dst/backup/host1/ <- rsync destination for host1, regular > > directory /mnt/dst/backup/host2/ <- rsync destination for host2, > > regular directory /mnt/dst/backup/host3/ <- rsync destination for > > host3, regular directory etc. > > > > /mnt/dst/backup/host1/bin/ > > /mnt/dst/backup/host1/etc/ > > /mnt/dst/backup/host1/home/ > > ... > > Self explanatory. All regular directories, not subvolumes. > > > > Snapshots: > > /mnt/dst/snaps/backup <- a regular directory > > /mnt/dst/snaps/backup/2017-11-14T12:00/ <- snapshot 1 > > of /mnt/dst/backup /mnt/dst/snaps/backup/2017-11-14T13:00/ <- > > snapshot 2 > > of /mnt/dst/backup /mnt/dst/snaps/backup/2017-11-14T14:00/ <- > > snapshot 3 of /mnt/dst/backup > > > > Accessing historic data: > > /mnt/dst/snaps/backup/2017-11-14T12:00/host1/bin/bash > > ... > > /bin/bash for host1 as of 2017-11-14 12:00 (time on the backup > > system). > > > > > > No need for btrfs send-receive, only plain rsync is used, directly > > from hostX:/ to /mnt/dst/backup/host1/; > > > I prefer to start with a BTRFS snapshot at the backup destination. I > think that's the most "accurate" starting point. No, you should finish with a snapshot. Use the rsync destination as a "dirty" scratch area, let rsync also delete files which are no longer in the source. After successfully running rsync, make a snapshot of that directory and make it RO, leave the scratch in place (even when rsync dies or becomes killed). I once made some scripts[2] following those rules, you may want to adapt them. > > No need to create or delete snapshots during the actual backup > > process; > > Then you can't guarantee consistency of the backed up information. Take a temporary snapshot of the source, rsync to to the scratch destination, take a RO snapshot of that destination, remove the temporary snapshot. BTW: From user API perspective, btrfs snapshots do not guarantee perfect granular consistent backups. A user-level file transaction may still end up only partially in the snapshot. If you are running transaction sensitive applications, those usually do provide some means of preparing a freeze and a thaw of transactions. I think the user transactions API which could've been used for this will even be removed during the next kernel cycles. I remember reiserfs4 tried to deploy something similar. But there's no consistent layer in the VFS for subscribing applications to filesystem snapshots so they could prepare and notify the kernel when they are ready. > > A single common timeline is kept for all hosts to be backed up, > > snapshot count not multiplied by the number of hosts (in my case > > the backup location is multi-purpose, so I somewhat care about > > total number of snapshots there as well); > > > > Also, all of this works even with source hosts which do not use > > Btrfs. > > That's not a concern for me because I prefer to use BTRFS everywhere. At least I suggest looking into bees[1] to deduplicate the backup destination. Rsync is not very efficient to work with btrfs snapshots. It will break reflinks often and write inefficiently sized blocks, even with inplace option. Also, rsync won't efficiently catch files
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Tue, Nov 14, 2017 at 3:50 AM, Roman Mamedovwrote: > > On Mon, 13 Nov 2017 22:39:44 -0500 > Dave wrote: > > > I have my live system on one block device and a backup snapshot of it > > on another block device. I am keeping them in sync with hourly rsync > > transfers. > > > > Here's how this system works in a little more detail: > > > > 1. I establish the baseline by sending a full snapshot to the backup > > block device using btrfs send-receive. > > 2. Next, on the backup device I immediately create a rw copy of that > > baseline snapshot. > > 3. I delete the source snapshot to keep the live filesystem free of > > all snapshots (so it can be optimally defragmented, etc.) > > 4. hourly, I take a snapshot of the live system, rsync all changes to > > the backup block device, and then delete the source snapshot. This > > hourly process takes less than a minute currently. (My test system has > > only moderate usage.) > > 5. hourly, following the above step, I use snapper to take a snapshot > > of the backup subvolume to create/preserve a history of changes. For > > example, I can find the version of a file 30 hours prior. > > Sounds a bit complex, I still don't get why you need all these snapshot > creations and deletions, and even still using btrfs send-receive. Hopefully, my comments below will explain my reasons. > > Here is my scheme: > > /mnt/dst <- mounted backup storage volume > /mnt/dst/backup <- a subvolume > /mnt/dst/backup/host1/ <- rsync destination for host1, regular directory > /mnt/dst/backup/host2/ <- rsync destination for host2, regular directory > /mnt/dst/backup/host3/ <- rsync destination for host3, regular directory > etc. > > /mnt/dst/backup/host1/bin/ > /mnt/dst/backup/host1/etc/ > /mnt/dst/backup/host1/home/ > ... > Self explanatory. All regular directories, not subvolumes. > > Snapshots: > /mnt/dst/snaps/backup <- a regular directory > /mnt/dst/snaps/backup/2017-11-14T12:00/ <- snapshot 1 of /mnt/dst/backup > /mnt/dst/snaps/backup/2017-11-14T13:00/ <- snapshot 2 of /mnt/dst/backup > /mnt/dst/snaps/backup/2017-11-14T14:00/ <- snapshot 3 of /mnt/dst/backup > > Accessing historic data: > /mnt/dst/snaps/backup/2017-11-14T12:00/host1/bin/bash > ... > /bin/bash for host1 as of 2017-11-14 12:00 (time on the backup system). > > > No need for btrfs send-receive, only plain rsync is used, directly from > hostX:/ to /mnt/dst/backup/host1/; I prefer to start with a BTRFS snapshot at the backup destination. I think that's the most "accurate" starting point. > > No need to create or delete snapshots during the actual backup process; Then you can't guarantee consistency of the backed up information. > > A single common timeline is kept for all hosts to be backed up, snapshot count > not multiplied by the number of hosts (in my case the backup location is > multi-purpose, so I somewhat care about total number of snapshots there as > well); > > Also, all of this works even with source hosts which do not use Btrfs. That's not a concern for me because I prefer to use BTRFS everywhere. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Mon, 13 Nov 2017 22:39:44 -0500 Davewrote: > I have my live system on one block device and a backup snapshot of it > on another block device. I am keeping them in sync with hourly rsync > transfers. > > Here's how this system works in a little more detail: > > 1. I establish the baseline by sending a full snapshot to the backup > block device using btrfs send-receive. > 2. Next, on the backup device I immediately create a rw copy of that > baseline snapshot. > 3. I delete the source snapshot to keep the live filesystem free of > all snapshots (so it can be optimally defragmented, etc.) > 4. hourly, I take a snapshot of the live system, rsync all changes to > the backup block device, and then delete the source snapshot. This > hourly process takes less than a minute currently. (My test system has > only moderate usage.) > 5. hourly, following the above step, I use snapper to take a snapshot > of the backup subvolume to create/preserve a history of changes. For > example, I can find the version of a file 30 hours prior. Sounds a bit complex, I still don't get why you need all these snapshot creations and deletions, and even still using btrfs send-receive. Here is my scheme: /mnt/dst <- mounted backup storage volume /mnt/dst/backup <- a subvolume /mnt/dst/backup/host1/ <- rsync destination for host1, regular directory /mnt/dst/backup/host2/ <- rsync destination for host2, regular directory /mnt/dst/backup/host3/ <- rsync destination for host3, regular directory etc. /mnt/dst/backup/host1/bin/ /mnt/dst/backup/host1/etc/ /mnt/dst/backup/host1/home/ ... Self explanatory. All regular directories, not subvolumes. Snapshots: /mnt/dst/snaps/backup <- a regular directory /mnt/dst/snaps/backup/2017-11-14T12:00/ <- snapshot 1 of /mnt/dst/backup /mnt/dst/snaps/backup/2017-11-14T13:00/ <- snapshot 2 of /mnt/dst/backup /mnt/dst/snaps/backup/2017-11-14T14:00/ <- snapshot 3 of /mnt/dst/backup Accessing historic data: /mnt/dst/snaps/backup/2017-11-14T12:00/host1/bin/bash ... /bin/bash for host1 as of 2017-11-14 12:00 (time on the backup system). No need for btrfs send-receive, only plain rsync is used, directly from hostX:/ to /mnt/dst/backup/host1/; No need to create or delete snapshots during the actual backup process; A single common timeline is kept for all hosts to be backed up, snapshot count not multiplied by the number of hosts (in my case the backup location is multi-purpose, so I somewhat care about total number of snapshots there as well); Also, all of this works even with source hosts which do not use Btrfs. -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Tue, 14 Nov 2017 10:14:55 +0300 Marat Khaliliwrote: > Don't keep snapshots under rsync target, place them under ../snapshots > (if snapper supports this): > Or, specify them in --exclude and avoid using --delete-excluded. Both are good suggestions, in my case each system does have its own snapshots as well, but they are retained for much shorter. So I both use --exclude to avoid fetching the entire /snaps tree from the source system, and store snapshots of the destination system outside of the rsync target dirs. >Or keep using -x if it works, why not? -x will exclude content of all subvolumes down the tree on the source side -- not only the time-based ones. If you take care to never casually create any subvolumes content of which you'd still want backed up, then I guess it can work. -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On 14/11/17 06:39, Dave wrote: My rsync command currently looks like this: rsync -axAHv --inplace --delete-delay --exclude-from="/some/file" "$source_snapshop/" "$backup_location" As I learned from Kai Krakow in this maillist, you should also add --no-whole-file if both sides are local. Otherwise target space usage can be much worse (but fragmentation much better). I wonder what is your justification for --delete-delay, I just use --delete. Here's what I use: --verbose --archive --hard-links --acls --xattrs --numeric-ids --inplace --delete --delete-excluded --stats. Since in my case source is always remote, there's no --no-whole-file, but there's --numeric-ids. In particular, I want to know if I should or should not be using these options: -H, --hard-linkspreserve hard links -A, --acls preserve ACLs (implies -p) -X, --xattrspreserve extended attributes -x, --one-file-system don't cross filesystem boundaries I don't know any semantic use of hard links in modern systems. There're ACLs on some files in /var/log/journal on systems with systemd. Synology actively uses ACL, but it's implementation is sadly incompatible with rsync. There can always be some ACLs or xattrs set by sysadmin manually. End result, I always specify first three options where possible just in case (even though man page says that --hard-links may affect performance). I had to use the "x" option to prevent rsync from deleting files in snapshots in the backup location (as the source location does not retain any snapshots). Is there a better way? Don't keep snapshots under rsync target, place them under ../snapshots (if snapper supports this): # find . -maxdepth 2 . ./snapshots ./snapshots/2017-11-08T13:18:20+00:00 ./snapshots/2017-11-08T15:10:03+00:00 ./snapshots/2017-11-08T23:28:44+00:00 ./snapshots/2017-11-09T23:41:30+00:00 ./snapshots/2017-11-10T22:44:36+00:00 ./snapshots/2017-11-11T21:48:19+00:00 ./snapshots/2017-11-12T21:27:41+00:00 ./snapshots/2017-11-13T23:29:49+00:00 ./rsync Or, specify them in --exclude and avoid using --delete-excluded. Or keep using -x if it works, why not? -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Wed, Nov 1, 2017 at 1:15 AM, Roman Mamedovwrote: > On Wed, 1 Nov 2017 01:00:08 -0400 > Dave wrote: > >> To reconcile those conflicting goals, the only idea I have come up >> with so far is to use btrfs send-receive to perform incremental >> backups as described here: >> https://btrfs.wiki.kernel.org/index.php/Incremental_Backup . > > Another option is to just use the regular rsync to a designated destination > subvolume on the backup host, AND snapshot that subvolume on that host from > time to time (or on backup completions, if you can synchronize that). > > rsync --inplace will keep space usage low as it will not reupload entire files > in case of changes/additions to them. > > Yes rsync has to traverse both directory trees to find changes, but that's > pretty fast (couple of minutes at most, for a typical root filesystem), > especially if you use SSD or SSD caching. Hello. I am implementing this suggestion. So far, so good. However, I need some further recommendations on rsync options to use for this purpose. My rsync command currently looks like this: rsync -axAHv --inplace --delete-delay --exclude-from="/some/file" "$source_snapshop/" "$backup_location" In particular, I want to know if I should or should not be using these options: -H, --hard-linkspreserve hard links -A, --acls preserve ACLs (implies -p) -X, --xattrspreserve extended attributes -x, --one-file-system don't cross filesystem boundaries I had to use the "x" option to prevent rsync from deleting files in snapshots in the backup location (as the source location does not retain any snapshots). Is there a better way? I have my live system on one block device and a backup snapshot of it on another block device. I am keeping them in sync with hourly rsync transfers. Here's how this system works in a little more detail: 1. I establish the baseline by sending a full snapshot to the backup block device using btrfs send-receive. 2. Next, on the backup device I immediately create a rw copy of that baseline snapshot. 3. I delete the source snapshot to keep the live filesystem free of all snapshots (so it can be optimally defragmented, etc.) 4. hourly, I take a snapshot of the live system, rsync all changes to the backup block device, and then delete the source snapshot. This hourly process takes less than a minute currently. (My test system has only moderate usage.) 5. hourly, following the above step, I use snapper to take a snapshot of the backup subvolume to create/preserve a history of changes. For example, I can find the version of a file 30 hours prior. The backup volume contains up to 100 snapshots while the live volume has no snapshots. Best of both worlds? I guess I'll find out over time. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
Am Thu, 2 Nov 2017 23:24:29 -0400 schrieb Dave: > On Thu, Nov 2, 2017 at 4:46 PM, Kai Krakow > wrote: > > Am Wed, 1 Nov 2017 02:51:58 -0400 > > schrieb Dave : > > > [...] > [...] > [...] > >> > >> Thanks for confirming. I must have missed those reports. I had > >> never considered this idea until now -- but I like it. > >> > >> Are there any blogs or wikis where people have done something > >> similar to what we are discussing here? > > > > I used rsync before, backup source and destination both were btrfs. > > I was experiencing the same btrfs bug from time to time on both > > devices, luckily not at the same time. > > > > I instead switched to using borgbackup, and xfs as the destination > > (to not fall the same-bug-in-two-devices pitfall). > > I'm going to stick with btrfs everywhere. My reasoning is that my > biggest pitfalls will be related to lack of knowledge. So focusing on > learning one filesystem better (vs poorly learning two) is the better > strategy for me, given my limited time. (I'm not an IT professional of > any sort.) > > Is there any problem with the Borgbackup repository being on btrfs? No. I just wanted to point out that keeping backup and source on different media (which includes different technology, too) is common best practice and adheres to the 3-2-1 backup strategy. > > Borgbackup achieves a > > much higher deduplication density and compression, and as such also > > is able to store much more backup history in the same storage > > space. The first run is much slower than rsync (due to enabled > > compression) but successive runs are much faster (like 20 minutes > > per backup run instead of 4-5 hours). > > > > I'm currently storing 107 TB of backup history in just 2.2 TB backup > > space, which counts a little more than one year of history now, > > containing 56 snapshots. This is my retention policy: > > > > * 5 yearly snapshots > > * 12 monthly snapshots > > * 14 weekly snapshots (worth around 3 months) > > * 30 daily snapshots > > > > Restore is fast enough, and a snapshot can even be fuse-mounted > > (tho, in that case mounted access can be very slow navigating > > directories). > > > > With latest borgbackup version, the backup time increased to around > > 1 hour from 15-20 minutes in the previous version. That is due to > > switching the file cache strategy from mtime to ctime. This can be > > tuned to get back to old performance, but it may miss some files > > during backup if you're doing awkward things to file timestamps. > > > > I'm also backing up some servers with it now, then use rsync to sync > > the borg repository to an offsite location. > > > > Combined with same-fs local btrfs snapshots with short retention > > times, this could be a viable solution for you. > > Yes, I appreciate the idea. I'm going to evaluate both rsync and > Borgbackup. > > The advantage of rsync, I think, is that it will likely run in just a > couple minutes. That will allow me to run it hourly and to keep my > live volume almost entire free of snapshots and fully defragmented. > It's also very simple as I already have rsync. And since I'm going to > run btrfs on the backup volume, I can perform hourly snapshots there > and use Snapper to manage retention. It's all very simple and relies > on tools I already have and know. > > However, the advantages of Borgbackup you mentioned (much higher > deduplication density and compression) make it worth considering. > Maybe Borgbackup won't take long to complete successive (incremental) > backups on my system. Once a full backup was taken, incremental backups are extremely fast. At least for me, it works much faster than rsync. And as with btrfs snapshots, each incremental backup is also a full backup. It's not like traditional backup software that needs the backup parent and grand parent to make use of the differential and/or incremental backups. There's one caveat, tho: Only one process can access a repository at a time, that is you need to serialize different backup jobs if you want them to go into the same repository. Deduplication is done only within the same repository. Tho, you might be able to leverage btrfs deduplication (e.g. using bees) across multiple repositories if you're not using encrypted repositories. But since you're currently using send/receive and/or rsync, encrypted storage of the backup doesn't seem to be an important point to you. Burp with its client/server approach may have an advantage here, so its setup seems to be more complicated. Borg is really easy to use. I never tried burp, tho. > I'll have to try it to see. It's a very nice > looking project. I'm surprised I never heard of it before. It seems to follow similar principles as burp (which I never heard of previously). It seems like the really good backup software has some sort of PR problem... ;-) -- Regards, Kai Replies to list-only preferred. --
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Thu, Nov 2, 2017 at 4:46 PM, Kai Krakowwrote: > Am Wed, 1 Nov 2017 02:51:58 -0400 > schrieb Dave : > >> > >> >> To reconcile those conflicting goals, the only idea I have come up >> >> with so far is to use btrfs send-receive to perform incremental >> >> backups >> > >> > As already said by Romain Mamedov, rsync is viable alternative to >> > send-receive with much less hassle. According to some reports it >> > can even be faster. >> >> Thanks for confirming. I must have missed those reports. I had never >> considered this idea until now -- but I like it. >> >> Are there any blogs or wikis where people have done something similar >> to what we are discussing here? > > I used rsync before, backup source and destination both were btrfs. I > was experiencing the same btrfs bug from time to time on both devices, > luckily not at the same time. > > I instead switched to using borgbackup, and xfs as the destination (to > not fall the same-bug-in-two-devices pitfall). I'm going to stick with btrfs everywhere. My reasoning is that my biggest pitfalls will be related to lack of knowledge. So focusing on learning one filesystem better (vs poorly learning two) is the better strategy for me, given my limited time. (I'm not an IT professional of any sort.) Is there any problem with the Borgbackup repository being on btrfs? > Borgbackup achieves a > much higher deduplication density and compression, and as such also is > able to store much more backup history in the same storage space. The > first run is much slower than rsync (due to enabled compression) but > successive runs are much faster (like 20 minutes per backup run instead > of 4-5 hours). > > I'm currently storing 107 TB of backup history in just 2.2 TB backup > space, which counts a little more than one year of history now, > containing 56 snapshots. This is my retention policy: > > * 5 yearly snapshots > * 12 monthly snapshots > * 14 weekly snapshots (worth around 3 months) > * 30 daily snapshots > > Restore is fast enough, and a snapshot can even be fuse-mounted (tho, > in that case mounted access can be very slow navigating directories). > > With latest borgbackup version, the backup time increased to around 1 > hour from 15-20 minutes in the previous version. That is due to > switching the file cache strategy from mtime to ctime. This can be > tuned to get back to old performance, but it may miss some files during > backup if you're doing awkward things to file timestamps. > > I'm also backing up some servers with it now, then use rsync to sync > the borg repository to an offsite location. > > Combined with same-fs local btrfs snapshots with short retention times, > this could be a viable solution for you. Yes, I appreciate the idea. I'm going to evaluate both rsync and Borgbackup. The advantage of rsync, I think, is that it will likely run in just a couple minutes. That will allow me to run it hourly and to keep my live volume almost entire free of snapshots and fully defragmented. It's also very simple as I already have rsync. And since I'm going to run btrfs on the backup volume, I can perform hourly snapshots there and use Snapper to manage retention. It's all very simple and relies on tools I already have and know. However, the advantages of Borgbackup you mentioned (much higher deduplication density and compression) make it worth considering. Maybe Borgbackup won't take long to complete successive (incremental) backups on my system. I'll have to try it to see. It's a very nice looking project. I'm surprised I never heard of it before. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
Am Wed, 1 Nov 2017 02:51:58 -0400 schrieb Dave: > > > >> To reconcile those conflicting goals, the only idea I have come up > >> with so far is to use btrfs send-receive to perform incremental > >> backups > > > > As already said by Romain Mamedov, rsync is viable alternative to > > send-receive with much less hassle. According to some reports it > > can even be faster. > > Thanks for confirming. I must have missed those reports. I had never > considered this idea until now -- but I like it. > > Are there any blogs or wikis where people have done something similar > to what we are discussing here? I used rsync before, backup source and destination both were btrfs. I was experiencing the same btrfs bug from time to time on both devices, luckily not at the same time. I instead switched to using borgbackup, and xfs as the destination (to not fall the same-bug-in-two-devices pitfall). Borgbackup achieves a much higher deduplication density and compression, and as such also is able to store much more backup history in the same storage space. The first run is much slower than rsync (due to enabled compression) but successive runs are much faster (like 20 minutes per backup run instead of 4-5 hours). I'm currently storing 107 TB of backup history in just 2.2 TB backup space, which counts a little more than one year of history now, containing 56 snapshots. This is my retention policy: * 5 yearly snapshots * 12 monthly snapshots * 14 weekly snapshots (worth around 3 months) * 30 daily snapshots Restore is fast enough, and a snapshot can even be fuse-mounted (tho, in that case mounted access can be very slow navigating directories). With latest borgbackup version, the backup time increased to around 1 hour from 15-20 minutes in the previous version. That is due to switching the file cache strategy from mtime to ctime. This can be tuned to get back to old performance, but it may miss some files during backup if you're doing awkward things to file timestamps. I'm also backing up some servers with it now, then use rsync to sync the borg repository to an offsite location. Combined with same-fs local btrfs snapshots with short retention times, this could be a viable solution for you. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
[ ... ] > The poor performance has existed from the beginning of using > BTRFS + KDE + Firefox (almost 2 years ago), at a point when > very few snapshots had yet been created. A comparison system > running similar hardware as well as KDE + Firefox (and LVM + > EXT4) did not have the performance problems. The difference > has been consistent and significant. That seems rather unlikely to depend on Btrfs, as I use Firefox 56 + KDE4 + Btrfs without issue, on somewhat old/small desktop and laptop, and is implausible on general grounds. You haven't provided so far any indication or quantification of your "speed" problem (which may or not be a "performance" issue". The things to look at usually at disk IO latency and rates, and system CPU time while the bad speed is observable (user CPU time is usually stuck at 100% on any JS based site as written earlier). To look at IO latency and rates the #1 choice is always: 'iostat -dk -zyx 1' and to look as system CPU (and user CPU) and other interesting details I suggest using 'htop' with the attached configuration file to write to "$HOME/.config/htop/htoprc". > Sometimes I have used Snapper settings like this: > TIMELINE_MIN_AGE="1800" > TIMELINE_LIMIT_HOURLY="36" > TIMELINE_LIMIT_DAILY="30" > TIMELINE_LIMIT_MONTHLY="12" > TIMELINE_LIMIT_YEARLY="10" > However, I also have some computers set like this: > TIMELINE_MIN_AGE="1800" > TIMELINE_LIMIT_HOURLY="10" > TIMELINE_LIMIT_DAILY="10" > TIMELINE_LIMIT_WEEKLY="0" > TIMELINE_LIMIT_MONTHLY="0" > TIMELINE_LIMIT_YEARLY="0" The first seems a bit "aspirational". IIRC "someone" confessed that the SUSE default of 'TIMELINE_LIMIT_YEARLY="10"' was imposed by external forces in the SUSE default configuration: https://github.com/openSUSE/snapper/blob/master/data/default-config https://wiki.archlinux.org/index.php/Snapper#Set_snapshot_limits https://lists.opensuse.org/yast-devel/2014-05/msg00036.html # Beware! This file is rewritten by htop when settings are changed in the interface. # The parser is also very primitive, and not human-friendly. fields=0 48 38 39 40 44 62 63 2 46 13 14 1 sort_key=47 sort_direction=1 hide_threads=1 hide_kernel_threads=1 hide_userland_threads=1 shadow_other_users=0 show_thread_names=1 highlight_base_name=1 highlight_megabytes=1 highlight_threads=1 tree_view=0 header_margin=0 detailed_cpu_time=1 cpu_count_from_zero=1 update_process_names=0 color_scheme=0 delay=15 left_meters=AllCPUs Memory Swap left_meter_modes=1 1 1 right_meters=Tasks LoadAverage Uptime right_meter_modes=2 2 2
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Wed, Nov 1, 2017 at 4:34 AM, Marat Khaliliwrote: >> We do experience severe performance problems now, especially with >> Firefox. Part of my experiment is to reduce the number of snapshots on >> the live volumes, hence this question. > > Just for statistics, how many snapshots do you have and how often do you > take them? It's on SSD, right? I don't think the severe performance problems stem solely from the number of snapshots. I think it is also related to Firefox stuff (cache fragmentation, lack of multi-processor mode maybe, etc.) I still have to investigate the Firefox issues, but I'm starting at the foundation by trying to get a basic BTRFS setup that will support better desktop application performance first. The poor performance has existed from the beginning of using BTRFS + KDE + Firefox (almost 2 years ago), at a point when very few snapshots had yet been created. A comparison system running similar hardware as well as KDE + Firefox (and LVM + EXT4) did not have the performance problems. The difference has been consistent and significant. For a while I thought the difference was due to the hardware, as one system used the z170 chipset and the other used the X99 chipset (but were otherwise equivalent). So I repeated the testing on identical hardware and the stark performance difference remained. When I realized that, I began focusing on BTRFS, as it is the only consistent difference I can recognize. Sometimes I have used Snapper settings like this: TIMELINE_MIN_AGE="1800" TIMELINE_LIMIT_HOURLY="36" TIMELINE_LIMIT_DAILY="30" TIMELINE_LIMIT_MONTHLY="12" TIMELINE_LIMIT_YEARLY="10" However, I also have some computers set like this: TIMELINE_MIN_AGE="1800" TIMELINE_LIMIT_HOURLY="10" TIMELINE_LIMIT_DAILY="10" TIMELINE_LIMIT_WEEKLY="0" TIMELINE_LIMIT_MONTHLY="0" TIMELINE_LIMIT_YEARLY="0" > BTW beware of deleting too many snapshots at once with any tool. Delete few > and let filesystem stabilize before proceeding. OK, thanks for the tip. > For deduplication tool to be useful you ought to have some duplicate data on > your live volume. Do you have any (e.g. many LXC containers with the same > distribution)? No, no containers and no duplication to that large extent. > P.S. I still think you need some off-system backup solution too, either > rsync+snapshot-based over ssh or e.g. Burp (shameless advertising: > http://burp.grke.org/ ). I agree, but that's beyond the scope of the current problem I'm trying to solve. However, I'll check out Burp once I have a base configuration that is working satisfactorily. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On 01/11/17 09:51, Dave wrote: As already said by Romain Mamedov, rsync is viable alternative to send-receive with much less hassle. According to some reports it can even be faster. Thanks for confirming. I must have missed those reports. I had never considered this idea until now -- but I like it. Are there any blogs or wikis where people have done something similar to what we are discussing here? I don't know any. Probably someone needs to write it. We will delete most snapshots on the live volume, but retain many (or all) snapshots on the backup block device. Is that a good strategy, given my goals? Depending on the way you use it, retaining even a dozen snapshots on a live volume might hurt performance (for high-performance databases) or be completely transparent (for user folders). You may want to experiment with this number. We do experience severe performance problems now, especially with Firefox. Part of my experiment is to reduce the number of snapshots on the live volumes, hence this question. Just for statistics, how many snapshots do you have and how often do you take them? It's on SSD, right? Thanks. I hope you do find time to publish it. (And what do you mean by portable?) For now, Snapper has a cleanup algorithm that we can use. At least one of the tools listed here has a thinout algorithm too: https://btrfs.wiki.kernel.org/index.php/Incremental_Backup It is currently a small part of yet another home-grown backup tool which is itself fairly big and tuned to particular environment. I thought many times that it would be very nice to have thinning tool separately and with no unnecessary dependencies, but... BTW beware of deleting too many snapshots at once with any tool. Delete few and let filesystem stabilize before proceeding. Should I consider a dedup tool like one of these? Certainly NOT for snapshot-based backups: it is already deduplicated almost as much as possible, dedup tools can only make it *less* deduplicated. The question is whether to use a dedup tool on the live volume which has a few snapshots. Even with the new strategy (based on rsync), the live volume may sometimes have two snapshots (pre- and post- pacman upgrades). For deduplication tool to be useful you ought to have some duplicate data on your live volume. Do you have any (e.g. many LXC containers with the same distribution)? Also still wondering about these options: no-holes, skinny metadata, or extended inode refs? I don't know anything about any of these, sorry. P.S. I still think you need some off-system backup solution too, either rsync+snapshot-based over ssh or e.g. Burp (shameless advertising: http://burp.grke.org/ ). -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Wed, Nov 1, 2017 at 2:19 AM, Marat Khaliliwrote: > You seem to have two tasks: (1) same-volume snapshots (I would not call them > backups) and (2) updating some backup volume (preferably on a different > box). By solving them separately you can avoid some complexity... Yes, it appears that is a very good strategy -- solve the concerns separately. Make the live volume performant and the backup volume historical. > >> To reconcile those conflicting goals, the only idea I have come up >> with so far is to use btrfs send-receive to perform incremental >> backups > > As already said by Romain Mamedov, rsync is viable alternative to > send-receive with much less hassle. According to some reports it can even be > faster. Thanks for confirming. I must have missed those reports. I had never considered this idea until now -- but I like it. Are there any blogs or wikis where people have done something similar to what we are discussing here? > >> Given the hourly snapshots, incremental backups are the only practical >> option. They take mere moments. Full backups could take an hour or >> more, which won't work with hourly backups. > > I don't see much sense in re-doing full backups to the same physical device. > If you care about backup integrity, it is probably more important to invest > in backups verification. (OTOH, while you didn't reveal data size, if full > backup takes just an hour on your system then why not?) I was saying that a full backup could take an hour or more. That means full backups are not compatible with an hourly backup schedule. And it is certainly not a potential solution to making the system perform better because the system will be spending all its time running backups -- it would be never ending. With hourly backups, they should complete in just a few moments, which is the case with incremental backups. (It sounds like this will be the case with rsync as well.) > >> We will delete most snapshots on the live volume, but retain many (or >> all) snapshots on the backup block device. Is that a good strategy, >> given my goals? > > Depending on the way you use it, retaining even a dozen snapshots on a live > volume might hurt performance (for high-performance databases) or be > completely transparent (for user folders). You may want to experiment with > this number. We do experience severe performance problems now, especially with Firefox. Part of my experiment is to reduce the number of snapshots on the live volumes, hence this question. > > In any case I'd not recommend retaining ALL snapshots on backup device, even > if you have infinite space. Such filesystem would be as dangerous as the > demon core, only good for adding more snapshots (not even deleting them), > and any little mistake will blow everything up. Keep a few dozen, hundred at > most. The intention -- if we were to keep all snapshots on a backup device -- would be to never ever try to delete them. However, with the suggestion to separate the concerns and use rsync, we could also easily run the Snapper timeline cleanup on the backup volume, thereby limiting the retained snapshots to some reasonable number. > Unlike other backup systems, you can fairly easily remove snapshots in the > middle of sequence, use this opportunity. My thinout rule is: remove > snapshot if resulting gap will be less than some fraction (e.g. 1/4) of its > age. One day I'll publish portable solution on github. Thanks. I hope you do find time to publish it. (And what do you mean by portable?) For now, Snapper has a cleanup algorithm that we can use. At least one of the tools listed here has a thinout algorithm too: https://btrfs.wiki.kernel.org/index.php/Incremental_Backup >> Given this minimal retention of snapshots on the live volume, should I >> defrag it (assuming there is at least 50% free space available on the >> device)? (BTW, is defrag OK on an NVMe drive? or an SSD?) >> >> In the above procedure, would I perform that defrag before or after >> taking the snapshot? Or should I use autodefrag? > > I ended up using autodefrag, didn't try manual defragmentation. I don't use > SSDs as backup volumes. I don't use SSD's as backup volumes either. I was asking about the live volume. > >> Should I consider a dedup tool like one of these? > > Certainly NOT for snapshot-based backups: it is already deduplicated almost > as much as possible, dedup tools can only make it *less* deduplicated. The question is whether to use a dedup tool on the live volume which has a few snapshots. Even with the new strategy (based on rsync), the live volume may sometimes have two snapshots (pre- and post- pacman upgrades). I still wish to know, in that case, about using both a dedup tool and defragmenting the btrfs filesystem. Also still wondering about these options: no-holes, skinny metadata, or extended inode refs? This is a very helpful discussion. Thank you. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Wed, Nov 1, 2017 at 1:15 AM, Roman Mamedovwrote: > On Wed, 1 Nov 2017 01:00:08 -0400 > Dave wrote: > >> To reconcile those conflicting goals, the only idea I have come up >> with so far is to use btrfs send-receive to perform incremental >> backups as described here: >> https://btrfs.wiki.kernel.org/index.php/Incremental_Backup . > > Another option is to just use the regular rsync to a designated destination > subvolume on the backup host, AND snapshot that subvolume on that host from > time to time (or on backup completions, if you can synchronize that). > > rsync --inplace will keep space usage low as it will not reupload entire files > in case of changes/additions to them. > This seems like a brilliant idea, something that has a lot of potential... On a system where the root filesystem is on an SSD and the backup volume on an HDD, I could rsync hourly, and then run Snapper on the backup volume hourly, as well as using Snapper's timeline cleanup on the backup volume. The live filesystem would have zero snapshots and could be optimized for performance. The backup volume could retain a large number of snapshots (even more than several hundred) because performance would not be very important (as far as I can guess). This seems to resolve our conflict. How about on a system (such as a laptop) with only a single SSD? Would this same idea work where the backup volume is on the same block device? I know that is not technically a backup, but what it does accomplish is separation of the live filesystem from the snapshotted backup volume for performance reasons -- yet the hourly snapshot history is still available. That would seem to meet our use case too. (An external backup disk would be connected to the laptop periodically, of course, too.) Currently, for most btrfs volumes, I have three volumes: the main volume, a snapshot subvolume which contains all the individual snapshots, and a backup volume* (on a different block device but on the same machine). With this new idea, I would have a main volume without any snapshots and a backup volume which contains all the snapshots. It simplifies things on that level and it also simplifies performance tuning on the main volume. In fact it simplifies backup snapshot management too. My initial impression is that this simplifies everything as well as optimizing everything. So surely it must have some disadvantages compared to btrfs send-receive incremental backups (https://btrfs.wiki.kernel.org/index.php/Incremental_Backup). What would those disadvantages be? The first one that comes to mind is that I would lose the functionality of pre- and post- upgrade snapshots on the root filesystem. But I think that's minor. I could either keep those two snapshots for a few hours or days after major upgrades or maybe I could find a pacman hook that uses rsync to make pre- and post- upgrade copies... * Footnote: on some workstation computers, we have 2 or 3 separate backup block devices (e..g, external USB hard drives, etc.). Laptops, however, generally only have a single block device and are not always connected to an external USB hard drive for backup as often as would be ideal. But we also don't keep any critical data on laptops. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
I'm active user of backup using btrfs snapshots. Generally it works with some caveats. You seem to have two tasks: (1) same-volume snapshots (I would not call them backups) and (2) updating some backup volume (preferably on a different box). By solving them separately you can avoid some complexity like accidental remove of snapshot that's still needed for updating backup volume. To reconcile those conflicting goals, the only idea I have come up with so far is to use btrfs send-receive to perform incremental backups as described here: https://btrfs.wiki.kernel.org/index.php/Incremental_Backup . As already said by Romain Mamedov, rsync is viable alternative to send-receive with much less hassle. According to some reports it can even be faster. Given the hourly snapshots, incremental backups are the only practical option. They take mere moments. Full backups could take an hour or more, which won't work with hourly backups. I don't see much sense in re-doing full backups to the same physical device. If you care about backup integrity, it is probably more important to invest in backups verification. (OTOH, while you didn't reveal data size, if full backup takes just an hour on your system then why not?) We will delete most snapshots on the live volume, but retain many (or all) snapshots on the backup block device. Is that a good strategy, given my goals? Depending on the way you use it, retaining even a dozen snapshots on a live volume might hurt performance (for high-performance databases) or be completely transparent (for user folders). You may want to experiment with this number. In any case I'd not recommend retaining ALL snapshots on backup device, even if you have infinite space. Such filesystem would be as dangerous as the demon core, only good for adding more snapshots (not even deleting them), and any little mistake will blow everything up. Keep a few dozen, hundred at most. Unlike other backup systems, you can fairly easily remove snapshots in the middle of sequence, use this opportunity. My thinout rule is: remove snapshot if resulting gap will be less than some fraction (e.g. 1/4) of its age. One day I'll publish portable solution on github. Given this minimal retention of snapshots on the live volume, should I defrag it (assuming there is at least 50% free space available on the device)? (BTW, is defrag OK on an NVMe drive? or an SSD?) In the above procedure, would I perform that defrag before or after taking the snapshot? Or should I use autodefrag? I ended up using autodefrag, didn't try manual defragmentation. I don't use SSDs as backup volumes. Should I consider a dedup tool like one of these? Certainly NOT for snapshot-based backups: it is already deduplicated almost as much as possible, dedup tools can only make it *less* deduplicated. * Footnote: On the backup device, maybe we will never delete snapshots. In any event, that's not a concern now. We'll retain many, many snapshots on the backup device. Again, DO NOT do this, btrfs in its current state does not support it. Good rule of thumb for time of some operations is data size multiplied by number of snapshots (raised to some power >= 1) and divided by IO/CPU speed. By creating snapshots it is very easy to create petabytes of data for kernel to process, which it won't be able to do in many years. -- With Best Regards, Marat Khalili -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Need help with incremental backup strategy (snapshots, defragmentingt & performance)
On Wed, 1 Nov 2017 01:00:08 -0400 Davewrote: > To reconcile those conflicting goals, the only idea I have come up > with so far is to use btrfs send-receive to perform incremental > backups as described here: > https://btrfs.wiki.kernel.org/index.php/Incremental_Backup . Another option is to just use the regular rsync to a designated destination subvolume on the backup host, AND snapshot that subvolume on that host from time to time (or on backup completions, if you can synchronize that). rsync --inplace will keep space usage low as it will not reupload entire files in case of changes/additions to them. Yes rsync has to traverse both directory trees to find changes, but that's pretty fast (couple of minutes at most, for a typical root filesystem), especially if you use SSD or SSD caching. -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Need help with incremental backup strategy (snapshots, defragmentingt & performance)
Our use case requires snapshots. btrfs snapshots are best solution we have found for our requirements, and over the last year snapshots have proven their value to us. (For this discussion I am considering both the "root" volume and the "home" volume on a typical desktop workstation. Also, all btfs volumes are mounted with noatime and nodiratime flags.) For performance reasons, I now wish to minimize the number of snapshots retained on the live btrfs volume. However, for backup purposes, I wish to maximize the number of snapshots retained over time. We'll keep yearly, monthly, weekly, daily and hourly snapshots for as long as possible. To reconcile those conflicting goals, the only idea I have come up with so far is to use btrfs send-receive to perform incremental backups as described here: https://btrfs.wiki.kernel.org/index.php/Incremental_Backup . Given the hourly snapshots, incremental backups are the only practical option. They take mere moments. Full backups could take an hour or more, which won't work with hourly backups. We will delete most snapshots on the live volume, but retain many (or all) snapshots on the backup block device. Is that a good strategy, given my goals? The steps: I know step one is to do the "bootstrapping" where a full initial copy of the live volume is sent to the backup volume. I also know the steps for doing incremental backups. However, the first problem I see is that performing incremental backups requires both the live volume and the backup volume to have an identical "parent" snapshot before each new incremental can be sent. I have found it easy to accidentally delete that specific required parent snapshot when hourly snapshots are being taken and many snaphots exist. Given that I want to retain the minimum number of snapshots on the live volume, how do I ensure that a valid "parent" subvolume exists there in order to perform the incremental backup? (Again, I have often run into the error "no valid parent exists" when doing incremental backups.) I think the rule is like this: Do not delete a snapshot from the live volume until the next snapshot based on it has been sent to the backup volume. In other words, always retain the *exact* snapshot that was the last one sent to the backup volume. Deleting that one then taking another one does not seem sufficient. BTRFS does not seem to recognize parent-child-grandchild relationships of snapshots when doing send-receive incremental backups. However, maybe I'm wrong. Would it be sufficient to first take another snapshot, then delete the prior snapshot? Will the send-receive algorithm be able to infer a parent exists on the backup volume when it receives an incremental based on a child snapshot? (My experience says "no", but I'd like a more authoritative answer.) The next step in my proposed procedure is to take a new snapshot, send it to the backup volume, and only then delete the prior snapshot ( and only from the live volume* ). Using this strategy, the live volume will always have the current snapshot (which I guess should not be called a snapshot -- it's the live volume) plus at least one more snapshot. Briefly, during the incremental backup, it will have an additional snapshot until the older one gets deleted. Given this minimal retention of snapshots on the live volume, should I defrag it (assuming there is at least 50% free space available on the device)? (BTW, is defrag OK on an NVMe drive? or an SSD?) In the above procedure, would I perform that defrag before or after taking the snapshot? Or should I use autodefrag? Should I consider a dedup tool like one of these? g2p/bedup: Btrfs deduplication https://github.com/g2p/bedup markfasheh/duperemove: Tools for deduping file systems https://github.com/markfasheh/duperemove Zygo/bees: Best-Effort Extent-Same, a btrfs dedup agent https://github.com/Zygo/bees Does anyone care to elaborate on the relationship between a dedup tool like Bees and defragmenting a btrfs filesystem with snapshots? I understand they do opposing things, but I think it was suggested in another thread on defragmenting that they can be combined to good effect. Should I consider this as a possible solution for my situation? Should I consider any of these options: no-holes, skinny metadata, or extended inode refs? Finally, are there any good BTRFS performance wiki articles or blogs I should refer to for my situation? * Footnote: On the backup device, maybe we will never delete snapshots. In any event, that's not a concern now. We'll retain many, many snapshots on the backup device. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html