Re: 5 _thousand_ snapshots? even 160? (was: device balance times)

Duncan Tue, 21 Oct 2014 21:07:04 -0700

Robert White posted on Tue, 21 Oct 2014 18:10:27 -0700 as excerpted:

> Each snapshot is effectively stapling down one version of your entire
> metadata tree, right? So imagine leaving tape spikes (little marks on
> the floor to keep track of where something is so you can put it back)
> for the last 150 or 5000 positions of the chair you are sitting in. At
> some point the clarity and purpose of those marks becomes the opposite
> of useful.
> 
> Hourly for a day, daily for a week, weekly for a month, monthly for a
> year. And it's not a "backup" if you haven't moved it to another device.
> If you have 5k snapshots of a file that didn't change, you are still
> just one bad disk sector away from never having that data again because
> there's only one copy of the actual data stapled down in all of those
> snapshots.


Exactly.

I explain the same thing in different words:

(Note: "You" in this post is variously used to indicate the parent 
poster, and a "general you", including but not limited to the grandparent 
poster inquiring about his 5000 hourly snapshots.  As I'm not trying to 
write a book or a term paper I actively suppose it should be clear to 
which "you" I'm referring in each case based on context...)

Say you are taking hourly snapshots of a file, and you mistakenly delete 
it or need a copy from some time earlier.

If you figure that out a day later, yes, the hour the snapshot was taken 
can make a big difference.

If you don't figure it out until a month later, then is it going to be 
REALLY critical which HOUR you pick, or is simply picking one hour in the 
correct day (or possibly half-day) going to be as good, knowing that if 
you guess wrong you can always go back or forward another whole day?

And if it's a year later, is even the particular day going to matter, or 
will going forward or backward a week or a month going to be good enough?

And say it *IS* a year later, and the actual hour *DOES* matter.  A year 
later, exactly how are you planning to remember the EXACT hour you need, 
such that simply randomly picking just one out of the day or week is 
going to make THAT big a difference?

As you said but adjusted slightly to even out the weeks vs months, hourly 
for a day (or two), daily to complete the week (or two), weekly to 
complete the quarter (13 weeks), and if desired, quarterly for a year or 
two.

But as you also rightly pointed out, just as if it's not tested it's not 
a backup, if it's not on an entirely separate device and filesystem, it's 
not a backup.

And if you don't have real backups at least every quarter, why on earth 
are you worrying about a year's worth of hourly snapshots?  If disaster 
strikes and the filesystem blows up, without a separate backup, they're 
all gone, so why the trouble to keep them around in the first place?

And once you have that quarterly or whatever backup, then the advantage 
of continuing to lock down those 90-day-stale copies of all those files 
and metadata goes down dramatically, since if worse comes to worse, you 
simply retrieve it from backup, but meanwhile, all that stale locked down 
data and metadata is eating up room and dramatically complicating the job 
btrfs must do to manage it all!

Yes, there are use-cases and there are use-cases.  But if you aren't 
keeping at least quarterly backups, perhaps you better examine your 
backup plan and see if it really DOES match your use-case, ESPECIALLY if 
you're keeping thousands of snapshots around.  And once you DO have those 
quarterly or whatever backups, then do you REALLY need to keep around 
even quarterly snapshots covering the SAME period?

But let's say you do:

48 hourly snapshots, thinned after that to...

12 daily snapshots (2 weeks = 14, minus the two days of hourly), thinned 
after that to...

11 weekly snapshots (1 quarter = 13 weeks, minus the two weeks of daily), 
thinned after that to...

7 quarterly snapshots (2 years = 8 quarters, minus the quarter of weekly).

48 + 12 + 11 + 7 = ...

78 snapshots, appropriately spaced by age, covering two full years.

I've even done the math for the extreme case of per-minute snapshots.  
With reasonable thinning along the lines of the above, even per-minute 
snapshots ends up well under 300 snapshots being reasonably managed at 
any single time.

And keeping it under 300 snapshots really DOES help btrfs in terms of 
management task time-scaling.

If you're doing hourly, as I said, 78, tho killing the quarterly 
snapshots entirely because they're backed up reduces that to 71, but 
let's just say, EASILY under 100.

Tho that is of course per subvolume.  If you have multiple subvolumes on 
the same filesystem, that can still end up being a thousand or two 
snapshots per filesystem.  But those are all groups of something under 
300 (under 100 with hourly) highly connected to each other, with the 
interweaving inside each of those groups being the real complexity in 
terms of btrfs management.

But 5000 snapshots?

Why?  Are you *TRYING* to test btrfs until it breaks, or TRYING to 
demonstrate a balance taking an entire year?

Do a real backup (or more than one, using those snapshots) if you need 
to, then thin the snapshots to something reasonable.  As the above 
example shows, if it's a single subvolume being snapshotted, with hourly 
snapshots, 100 is /more/ than reasonable.

With some hard questions, keeping in mind the cost in extra maintenance 
time for each additional snapshot, you might even find that minimum 6-
hour snapshots (four per day) instead of 1-hour snapshots (24 per day) 
are fine.  Or you might find that you only need to keep hourly snapshots 
for 12 hours instead of the 48 I assumed above, and daily snapshots for a 
week instead of the two I assumed above.  Throwing in the nothing over a 
quarter because it's backed up assumption as well, that's...

8 4x-daily snapshots (2 days)

5 daily snapshots (a week, minus the two days above)

12 weekly snapshots (a quarter, minus the week above, then it's backed up 
to other storage)

8 + 5 + 12 = ...

25 snapshots total, 6-hours apart (four per day) at maximum frequency aka 
minimum spacing, reasonably spaced by age to no more than a week apart, 
with real backups taking over after a quarter.

Btrfs should be able to work thru that in something actually approaching 
reasonable time, even if you /are/ dealing with 4 TB of data. =:^)

Bonus hints:

Btrfs quotas significantly complicate management as well.  If you really 
need them, fine, but don't unnecessarily use them just because they are 
there.

Look into defrag.

If you don't have any half-gig plus VMs or databases or similar "internal 
rewrite pattern" files, consider the autodefrag mount option.  Note that 
if you haven't been using it and your files are highly fragmented, it can 
slow things down at first, but a manual defrag, possibly a directory tree 
at a time to split things up into reasonable size and timeframes, can 
help.

If you are running large VMs or databases or other half-gig-plus sized 
internal-rewrite-pattern files, the autodefrag mount option may not 
perform well for you.  There's other options for that, including separate 
subvolumes, setting nocow on those files, and setting up a scheduled 
defrag.  That's out of scope for this post, so do your research.  It has 
certainly been discussed enough on-list.

Meanwhile, do note that defrag is currently snapshot-aware-disabled, due 
to scaling issues.  IOW, if your files are highly fragmented as they may 
well be if you haven't been regularly defragging them, expect the defrag 
to eat a lot of space since it'll break the sharing with older snapshots 
as anything that defrag moves will be unshared.  However, if you've 
reduced snapshots to the quarter-max before off-filesystem backup as 
recommended above, a quarter from now all the undefragged snapshots will 
be expired and off the system and you'll have reclaimed that extra space. 
Meanwhile, your system should be /much/ easier to manage and will likely 
be snappier in its response as well.  =:^)

With all these points applied, balance performance should improve 
dramatically.  However, with 4 TB of data the shear data size will remain 
a factor.  Even in the best case typical thruput on spinning rust won't 
reach the ideal.  10 MiB/sec is a reasonable guide.  4 TiB/10 MiB/sec...

4*1024*1024 (MiB) /  10 MiB / sec = ...

nearly 420 thousand seconds ... / 60 sec/min = ...

7000 minutes ... / 60 min/hour = ...

nearly 120 hours or ...

a bit under 5 days.


So 4 TiB on spinning rust could reasonably take about 5 days to balance 
even under quite good conditions.  That's due to the simple mechanics of 
head seek to read, head seek again to write, on spinning rust, and the 
shear size of 4 TB of data and metadata (tho with a bit of luck some of 
that will disappear as you thin out those thousands of snapshots, and 
it'll be more like 3 TB than 4, or possibly even down to 2 TiB, by the 
time you actually do it).

IOW, it's not going to be instant, by any means.

But the good part of it is that you don't have to do it all at once.  You 
can use balance filters and balance start/pause/resume/cancel as 
necessary, to do only a portion of it at a time, and restart the balance 
using the convert,soft options so it doesn't redo already converted 
chunks when you have time to let it run.  As long as it completes at 
least one chunk each run it'll make progress.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 5 _thousand_ snapshots? even 160? (was: device balance times)

Reply via email to