Re: backing up a file server with many subvolumes
Am Mon, 27 Mar 2017 08:57:17 +0300 schrieb Marat Khalili: > Just some consideration, since I've faced similar but no exactly same > problem: use rsync, but create snapshots on target machine. Blind > rsync will destroy deduplication of your snapshots and take huge > amount of storage, so it's not a solution. But you can rsync --inline > your snapshots in chronological order to some folder and re-take > snapshots of that folder, thus recreating your snapshots structure on > target. Obviously, it can/should be automated. I think it's --inplace and --no-whole-file... Apparently, rsync cannot detect moved files which was a big deal for me regarding deduplication, so I found another solution which is even faster. See my other reply. > On 26/03/17 06:00, J. Hart wrote: > > I have a Btrfs filesystem on a backup server. This filesystem has > > a directory to hold backups for filesystems from remote machines. > > In this directory is a subdirectory for each machine. Under each > > machine subdirectory is one directory for each filesystem > > (ex /boot, /home, etc) on that machine. In each filesystem > > subdirectory are incremental snapshot subvolumes for that > > filesystem. The scheme is something like this: > > > > /backup/// > > > > I'd like to try to back up (duplicate) the file server filesystem > > containing these snapshot subvolumes for each remote machine. The > > problem is that I don't think I can use send/receive to do this. > > "Btrfs send" requires "read-only" snapshots, and snapshots are not > > recursive as yet. I think there are too many subvolumes which > > change too often to make doing this without recursion practical. > > > > Any thoughts would be most appreciated. > > > > J. Hart > > > > -- > > To unsubscribe from this list: send the line "unsubscribe > > linux-btrfs" in the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe > linux-btrfs" in the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: backing up a file server with many subvolumes
Am Mon, 27 Mar 2017 07:53:17 -0400 schrieb "Austin S. Hemmelgarn": > > I'd like to try to back up (duplicate) the file server filesystem > > containing these snapshot subvolumes for each remote machine. The > > problem is that I don't think I can use send/receive to do this. > > "Btrfs send" requires "read-only" snapshots, and snapshots are not > > recursive as yet. I think there are too many subvolumes which > > change too often to make doing this without recursion practical. > > > > Any thoughts would be most appreciated. > In general, I would tend to agree with everyone else so far if you > have to keep your current setup. Use rsync with the --inplace option > to send data to a staging location, then snapshot that staging > location to do the actual backup. > > Now, that said, I could probably give some more specific advice if I > had a bit more info on how you're actually storing the backups. > There are three general ways you can do this with BTRFS and > subvolumes: 1. Send/receive of snapshots from the system being backed > up. 2. Use some other software to transfer the data into a staging > location on the backup server, then snapshot that. > 3. Use some other software to transfer the data, and have it handle > snapshots instead of using BTRFS, possibly having it create > subvolumes instead of directories at the top level for each system. If you decide for (3), I can recommend borgbackup. It allows variable block size deduplication across all backup sources, tho to fully get that potential, your backups can only be done serially not in parallel. Borgbackup cannot access the same repository with two processes in parallel, and deduplication is only per repository. Another recommendation for backups is the 3-2-1 rule: * have at least 3 different copies of your data (that means, your original data, the backup copy, and another backup copy, separated in a way they cannot fail for the same reason) * use at least 2 different media (that also means: don't backup btrfs to btrfs, and/or use 2 different backup techniques) * keep at least 1 external copy (maybe rsync to a remote location) The 3 copy rule can be deployed by using different physical locations, different device types, different media, and/or different backup programs. So it's kind of entangled with the 2 and 1 rule. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: backing up a file server with many subvolumes
On 27/03/17 13:00, J. Hart wrote: > That is a very interesting idea. I'll try some experiments with this. You might want to look into two tools which I have found useful for similar backups: 1) rsnapshot -- this uses rsync for backing up multiple systems and has been stable for quite a long time. If the target disk is btrfs it is fairly easy to configure so that it uses btrfs snapshots to create and remove the snapshot directories, speeding up the process. This doesn't really use any complex btrfs features and has been stable for me even on my Debian stable (kernel 3.16.39) system. 2) btrbk -- this allows you to create and manage btrfs snapshots on the source disk as well as backup snapshots on a separate btrfs disk. You can separately control how many snapshots you keep online on both the source and the backup disk. This is particularly useful for cases where you want to take very frequent snapshots (say hourly) for which rsync may be too slow (and rsync does not take a consistent snapshot, of course). There are many other tools, of course (I also take daily backups with dar to an ext4 system, without using any btrfs features at all, just in case a new version of btrfs suddenly decided to correct all copies of IHATEBTRFS on the disk to ILOVEBTRFS, for example :-) ). Graham Note to self: re-read this message periodically to check that feature hasn't appeared yet. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: backing up a file server with many subvolumes
That is a very interesting idea. I'll try some experiments with this. Many Thanks for the assistance:-) J. Hart On 03/27/2017 01:57 AM, Marat Khalili wrote: Just some consideration, since I've faced similar but no exactly same problem: use rsync, but create snapshots on target machine. Blind rsync will destroy deduplication of your snapshots and take huge amount of storage, so it's not a solution. But you can rsync --inline your snapshots in chronological order to some folder and re-take snapshots of that folder, thus recreating your snapshots structure on target. Obviously, it can/should be automated. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: backing up a file server with many subvolumes
On 2017-03-25 23:00, J. Hart wrote: I have a Btrfs filesystem on a backup server. This filesystem has a directory to hold backups for filesystems from remote machines. In this directory is a subdirectory for each machine. Under each machine subdirectory is one directory for each filesystem (ex /boot, /home, etc) on that machine. In each filesystem subdirectory are incremental snapshot subvolumes for that filesystem. The scheme is something like this: /backup/// I'd like to try to back up (duplicate) the file server filesystem containing these snapshot subvolumes for each remote machine. The problem is that I don't think I can use send/receive to do this. "Btrfs send" requires "read-only" snapshots, and snapshots are not recursive as yet. I think there are too many subvolumes which change too often to make doing this without recursion practical. Any thoughts would be most appreciated. In general, I would tend to agree with everyone else so far if you have to keep your current setup. Use rsync with the --inplace option to send data to a staging location, then snapshot that staging location to do the actual backup. Now, that said, I could probably give some more specific advice if I had a bit more info on how you're actually storing the backups. There are three general ways you can do this with BTRFS and subvolumes: 1. Send/receive of snapshots from the system being backed up. 2. Use some other software to transfer the data into a staging location on the backup server, then snapshot that. 3. Use some other software to transfer the data, and have it handle snapshots instead of using BTRFS, possibly having it create subvolumes instead of directories at the top level for each system. Of the three, I would generally recommend method 2, as it doesn't require the remote system to be using BTRFS, and generally scales pretty well, and it also amounts to essentially what people are recommending you do to backup your backup server. On the note of needing read-only snapshots, in both cases 1 and 2, your snapshots should be read-only on the server (method 1 mandates it, method 2 makes it easy). In case 3, the snapshots should ideally be getting marked read-only some other way. Having backups be writable is a bad idea, it leads to too many opportunities for software to screw things up, and makes it impossible to tell if you just accidentally messed things up, or something went wrong in your backup system or hardware. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: backing up a file server with many subvolumes
Just some consideration, since I've faced similar but no exactly same problem: use rsync, but create snapshots on target machine. Blind rsync will destroy deduplication of your snapshots and take huge amount of storage, so it's not a solution. But you can rsync --inline your snapshots in chronological order to some folder and re-take snapshots of that folder, thus recreating your snapshots structure on target. Obviously, it can/should be automated. -- With Best Regards, Marat Khalili On 26/03/17 06:00, J. Hart wrote: I have a Btrfs filesystem on a backup server. This filesystem has a directory to hold backups for filesystems from remote machines. In this directory is a subdirectory for each machine. Under each machine subdirectory is one directory for each filesystem (ex /boot, /home, etc) on that machine. In each filesystem subdirectory are incremental snapshot subvolumes for that filesystem. The scheme is something like this: /backup/// I'd like to try to back up (duplicate) the file server filesystem containing these snapshot subvolumes for each remote machine. The problem is that I don't think I can use send/receive to do this. "Btrfs send" requires "read-only" snapshots, and snapshots are not recursive as yet. I think there are too many subvolumes which change too often to make doing this without recursion practical. Any thoughts would be most appreciated. J. Hart -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: backing up a file server with many subvolumes
> [ ... ] In each filesystem subdirectory are incremental > snapshot subvolumes for that filesystem. [ ... ] The scheme > is something like this: > /backup/// BTW hopefully this does not amounts to too many subvolumes in the '.../backup/' volume, because that can create complications, where "too many" IIRC is more than a few dozen (even if a low number of hundreds is still doable). > I'd like to try to back up (duplicate) the file server > filesystem containing these snapshot subvolumes for each > remote machine. The problem is that I don't think I can use > send/receive to do this. "Btrfs send" requires "read-only" > snapshots, and snapshots are not recursive as yet. Why is that a problem? What is a recursive snapshot? > I think there are too many subvolumes which change too often > to make doing this without recursion practical. It is not clear to me how the «incremental snapshot subvolumes for that filesystem» are made, whether with RSYNC or 'send' and 'receive' itself. It is also not clear to me why those snapshots «change too often», why would they change at all? Once a backup is made in whichever way to an «incremental snapshot», why would that «incremental snapshot» ever change but for being deleted? There are some tools that rely on the specific abilities of 'send' with options '-p' and '-c' to save a lot of network bandwidth and target storage space, perhaps you might be interested in searching for them. Anyhow I'll repeat here part of an answer to a similar message: issues like yours usually are based on incomplete understanding of 'send' and 'receive', and on IRC user "darkling" explained it fairly well: > When you use -c, you're telling the FS that it can expect to > find a sent copy of that subvol on the receiving side, and > that anything shared with it can be sent by reference. OK, so > with -c on its own, you're telling the FS that "all the data > in this subvol already exists on the remote". > So, when you send your subvol, *all* of the subvol's metadata > is sent, and where that metadata refers to an extent that's > shared with the -c subvol, the extent data isn't sent, because > it's known to be on the other end already, and can be shared > directly from there. > OK. So, with -p, there's a "base" subvol. The send subvol and > the -p reference subvol are both snapshots of that base (at > different times). The -p reference subvol, as with -c, is > assumed to be on the remote FS. However, because it's known to > be an earlier version of the same data, you can be more > efficient in the sending by saying "start from the earlier > version, and modify it in this way to get the new version" > So, with -p, not all of the metadata is sent, because you know > you've already got most of it on the remote in the form of the > earlier version. > So -p is "take this thing and apply these differences to it" > and -c is "build this thing from scratch, but you can share > some of the data with these sources" Also here some additional details: http://logs.tvrrug.org.uk/logs/%23btrfs/2016-06-29.html#2016-06-29T22:39:59 The requirement for read-only is because in that way it is pretty sure that the same stuff is on both origin and target volume. It may help to compare with RSYNC: it has to scan both the full origin and target trees, because it cannot be told that there is a parent tree that is the same on origin and target; but with option '--link-dest' it can do something similar to 'send -c'. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: backing up a file server with many subvolumes
On Sun, Mar 26, 2017 at 02:14:36PM +0500, Roman Mamedov wrote: > You could have done time-based snapshots on the top level (for /backup/), say, > every 6 hours, and keep those for e.g. a month. Then don't bother with any > other kind of subvolumes/snapshots on the backup machine, and do backups from > remote machines into their respective subdirectories using simple 'rsync'. > > That's what a sensible scheme looks like IMO, as opposed to a Btrfs-induced > exercise in futility that you have (there are subvolumes? must use them for > everything, even the frigging /boot/; there is send/receive? absolutely must > use it for backing up; etc.) Using old boring rsync is actually a pretty good idea, with caveats. I for one don't herd server farms, thus systems I manage tend to be special snowflakes. Some run modern btrfs, some are on ancient kernels, usually / is on a mdraid with a traditional filesystem, I got a bunch of ARM SoCs at home -- plus even an ARM hosted server at Scaleway. Standardizing on rsync lets me make all those snowflakes backup the same way. Only on the destination I make full use of btrfs features. Another benefit of rsync is that I don't exactly trust that send from 3.13 to receive on 4.9 won't have a data loss bug, while rsync is extremely well tested. On the other hand, rsync is _slow_. Mere stat() calls on a non-trivial piece of spinning rust can take half on hour. That's something that's fine in a nightly, but what if you want to back important stuff every 3 hours? Especially if those are, say, Maildir mails -- many many files to stat, almost all of them cold. Here send/receive shines. And did I say that's important stuff? So you send/receive to one target every 3 hours, and rsync nightly to another. -- ⢀⣴⠾⠻⢶⣦⠀ Meow! ⣾⠁⢠⠒⠀⣿⡁ ⢿⡄⠘⠷⠚⠋⠀ Collisions shmolisions, let's see them find a collision or second ⠈⠳⣄ preimage for double rot13! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: backing up a file server with many subvolumes
On Sat, 25 Mar 2017 23:00:20 -0400 "J. Hart"wrote: > I have a Btrfs filesystem on a backup server. This filesystem has a > directory to hold backups for filesystems from remote machines. In this > directory is a subdirectory for each machine. Under each machine > subdirectory is one directory for each filesystem (ex /boot, /home, etc) > on that machine. In each filesystem subdirectory are incremental > snapshot subvolumes for that filesystem. The scheme is something like > this: > > /backup/// > > I'd like to try to back up (duplicate) the file server filesystem > containing these snapshot subvolumes for each remote machine. The > problem is that I don't think I can use send/receive to do this. "Btrfs > send" requires "read-only" snapshots, and snapshots are not recursive as > yet. I think there are too many subvolumes which change too often to > make doing this without recursion practical. You could have done time-based snapshots on the top level (for /backup/), say, every 6 hours, and keep those for e.g. a month. Then don't bother with any other kind of subvolumes/snapshots on the backup machine, and do backups from remote machines into their respective subdirectories using simple 'rsync'. That's what a sensible scheme looks like IMO, as opposed to a Btrfs-induced exercise in futility that you have (there are subvolumes? must use them for everything, even the frigging /boot/; there is send/receive? absolutely must use it for backing up; etc.) -- With respect, Roman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
backing up a file server with many subvolumes
I have a Btrfs filesystem on a backup server. This filesystem has a directory to hold backups for filesystems from remote machines. In this directory is a subdirectory for each machine. Under each machine subdirectory is one directory for each filesystem (ex /boot, /home, etc) on that machine. In each filesystem subdirectory are incremental snapshot subvolumes for that filesystem. The scheme is something like this: /backup/// I'd like to try to back up (duplicate) the file server filesystem containing these snapshot subvolumes for each remote machine. The problem is that I don't think I can use send/receive to do this. "Btrfs send" requires "read-only" snapshots, and snapshots are not recursive as yet. I think there are too many subvolumes which change too often to make doing this without recursion practical. Any thoughts would be most appreciated. J. Hart -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html