Re: Fun BTRFS Stuff:

Duncan Wed, 24 Feb 2016 20:19:38 -0800

Henk Slager posted on Thu, 25 Feb 2016 03:07:12 +0100 as excerpted:

> On Tue, Feb 23, 2016 at 1:19 AM, Duncan <1i5t5.dun...@cox.net> wrote:
>>
>> I've not seen anyone else explicitly list the following as a practical
>> btrfs send/receive backup strategy, but it does rather directly follow
>> from the STDOUT/STDIN usage of the tools as practical, at least in
>> theory.  My primary worry would be the general one of btrfs maturity,
>> that it and the tools including btrfs send and receive are still
>> stabilizing and maturing, with occasional bugs being found, and the
>> following strategy won't find the receive bugs until restore time, at
>> which point you might be depending on it working, so the strategy is
>> really only appropriate once btrfs has settled down and matured
>> somewhat more.
>>
>> So here's the idea.
>>
>> 1) Btrfs send directly to files on some other filesystem, perhaps xfs,
>> intended to be used with larger files.  This can either be non-
>> incremental, or (much like full and incremental tape backups) initial
>> full, plus incremental sends.
> 
> I had not thought of the tape-archive method, interesting :)
> I am using this more or less, although not fully automated. It looks
> like:
> 
> btrfs send -p snap_base snap_last | tee
> /path-on-non-btrfs-fs/snap_base..snap_last.btrfs | btrfs receive
> /path-on-btrfs-fs/
> 
> The key thing is to keep diff as small as possible to that I can
> transport them over ~1 Mbps internet. But sometime the diff is huge,
> for example when an upgrade of an OS in a VM has been done. So then I
> carry the snap_last.btrfs 'by hand'.
> 
> If you mean sort of :  xfs receive /path-on-xfs-fs/  for the last step
> in the commadline pipe then this 'xfs receive'  implementation would
> face quite some challenges I think, but not impossible I think


No, I meant pretty much what you are doing, except just directing to a 
file, instead of using tee and sending it to btrfs receive as well, as 
the usage of tee is a variant I hadn't thought of, but is actually a 
quite creative solution to the problem you describe. =:^)

The reason I suggested xfs is that, based on what I know at least, xfs is 
supposed to be real good at handling large files, generally using a large 
block size, etc.  Perfect for long-term storage of likely multi-gig 
serialized backup streams.  But something like fat32, setup with a large 
block size, should work well too, and its lack of ownership metadata 
shouldn't really be an issue when the files are all simply rather large 
stream backups.

And actually, to make the parallel to tape backup even more direct, I 
/believe/ you could potentially use tar or the like for its original 
purpose as a tape-archive, feeding the streams via tar directly to raw 
device without a filesystem at all, just tar, which I /believe/ would 
provide indexing and let you later write a second (incremental) btrfs 
send file after the first one, and later a third after the second, etc.  
Except I'm not actually familiar with using tar that way, and it's quite 
possible tar doesn't work the way I think it does in that regard and/or 
simply isn't the best tool for that job.  But in theory at least, as long 
as you either manually tracked the blocks used for each send stream, or 
had something like tar doing it automatically, you wouldn't even need a 
proper filesystem and could use a raw device, either block device like a 
big multi-terabyte disk, or even a char/stream device like an archiving 
tape-drive.

>> 2) Store the backups as those send files, much like tape backup
>> archives.  One option would be to do the initial full send, and then
>> incremental sends as new files, until the multi-TB drive containing the
>> backups is full, at which point replace it and start with a new full
>> send to the fresh xfs or whatever on the new drive.
> 
> The issue here is that at the point you do a new full backup, you will
> need more than double the space of the original in order to still have a
> valid backup all the time. If it is backing up 'small SSD' to 'big HDD',
> then not such an issue

The idea here would be to rotate backup media.  But you are correct, in 
simplest form you'd need larger backup media than the original being 
backed up, tho that might be small ssd to big hdd, or simply 1 TB hdd to 
say one of those huge 8 TB SMR drives, which I believe are actually 
/intended/ for long-term archiving in this manner.

So taking that 1 TB hdd to 8 TB SMR archiving hdd example, you wouldn't 
let the 1 TB get totally full, so say 700 GB of data in the original full 
send.  Then say incrementals average 50 GB.  (We're using units-of-ten 
here instead of GiB/TiB just to make it easier.  After all, this is only 
an illustration.)

8 TB - 700 GB = 7.3 TB = 7300 GB left.  7300 GB / 50 GB = space for 146 
incrementals averaging 50 GB each.  So say that's 50 GB per day average 
with daily incrementals.  That'll fill roughly 2.5 8 TB archive drives 
per year, so to make the numbers nice and round, say five drives in 
rotation, keeping two year's worth of backups.

And each time you switch out archive drives, at least twice a year, you 
start with a full send, so you have it conveniently there on the same 
device as the incrementals and don't have to worry about tracking a 
second drive with the full send before you can replay your incrementals.

Of course if your primary/working and backup media are closer to the same 
size, perhaps a 4 device by 4 GB btrfs raid10, 8 GB usable space, working 
copy, 8 GB archive backups, with a correspondingly larger average 
incremental send size as well, you'd use pairs of backup devices, one for 
the full send, one for the backups, and rotate in a second 8 GB each 
device pair when the first pair got full.  And there's all sorts of 
individual variants on the same theme.

> 
> 
>> 3) When a restore is needed, then and only then, play back those
>> backups to a newly created btrfs using btrfs receive.  If the above
>> initial full plus incrementals until the backup media is full strategy
>> is used, the incrementals can be played back against the initial full,
>> just as the send was originally done.
> 
> Yes indeed. My motivation for this method was/is that unpacking (so
> doing the  btrfs receive ) takes time if is is a huge number of small
> files on a HDD

And the advantage, until a restore is actually needed, no playback is 
done.  So in the above five archive devices over two years case, if the 
production copy continues working for two years, that's say four full 
sends and 146*4 incrementals that will never need played back at all, 
thus reclaiming the time and energy that would have been unnecessarily 
spent maintaining the played back copy over that period.

>> Seems to me this should work fine, except as I said, that receive
>> errors would only be caught at the time receive is actually run, which
>> would be on restore.  But as most of those errors tend to be due to
>> incremental bugs, doing full sends all the time would eliminate them,
>> at the cost of much higher space usage over time, of course.  And if
>> incrementals /are/ done, with any luck, replay won't be for quite some
>> time and thus using a much newer and hopefully more mature btrfs
>> receive, with fewer bugs due to the bugs caught in the intervening
>> time.  Additionally, with any luck,
>> several generations of full backup plus incrementals will have been
>> done before the need to replay even one set, thus sparing the need to
>> reply the intervening sets entirely.
> 
> On the other hand, not replaying them makes it that they cannot be used
> for a lower performance backup or clone server and there is no way to
> check the actual state.  And there could also be silent send errors.
> If you do playback immediately, creating a writable snapshot on master
> and clone(s) sides allows online checking potential diffs (rsync -c )
> and copying the differences.

That being the primary disadvantage I suggested, and the reason one 
probably would not want to use this method until btrfs including send/
receive are fully stabilized, because simply put, it's not yet stable 
enough to actually trust without verification, that it'd actually receive 
properly, at this point.

But once btrfs is fully stable and people are routinely using send/
receive without known bugs for quite some time, then this scenario may 
well be quite viable.

> Using   btrfs sub find-new , I once then discovered some 100 MB
> difference in a multi-TB data set. It were only 2 OS/VM image files,
> on different clones. It probably has happened sometime early 2015, but
> quite unsure, so not sure which kernel/tools.

To my knowledge, there has been exactly one such known bug, since the 
initial feature introduction bugs were worked thru anyway, where a 
successful send and receive didn't produce a valid copy, and AFAIK, that 
didn't actually turn out to be a send/receive bug, but ultimately traced 
to a more general btrfs bug, and it was simply send/receive that happened 
to catch it.

So it'd be interesting to have more information about that event and 
track down what happened.  But it's likely to be way too late, with way 
too little reliable information about it still available, to do anything 
about it now.


Meanwhile, most of the bugs I'm aware of anyway have been in receive 
processing various corner-cases.  And as I pointed out, if you aren't 
replaying/receiving immediately, but instead, archiving the raw send 
streams to be replayed later, in the event you /do/ need to replay that 
stream to receive, receive will have matured further in the mean time, 
compared to if you played back the stream to a receive of the same 
version as the send used to produce the stream.  If it's two years later, 
that's two years worth of further bugs that have been fixed in the mean 
time, so in theory at least, chances for a successful replay and receive 
should better after waiting than they would have been had the replay been 
done immediately.

Which /somewhat/ counteracts the problem of btrfs receive in particular 
not yet being totally mature.  However, it doesn't entirely counteract 
the problem, and I'd still consider this solution to be too dangerous to 
use in practice at the current time.  Tho in say five years, it should be 
a much more reasonably viable solution to consider.


But I /really/ like your tee idea in this regard. =:^)  For people doing 
multiple levels of backup, that lets them enjoy the best of both worlds, 
while effectively taking care of two levels of backup at the same time.  
By teeing and replaying immediately (or simply replaying the stored send 
stream immediately, then keeping it instead of deleting it), you test 
that it works, and end up with the working btrfs level of backup.  By 
then continuing to archive the now tested send stream, you have a second 
level of backup, that can be replayed again, should something happen to 
both the production version and the btrfs level backup that you were 
replaying to for testing and primary backup.

That effectively gives you the best of both worlds. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Fun BTRFS Stuff:

Reply via email to