Re: Major HDD performance degradation on btrfs receive

Nazar Mokrynskyi Mon, 22 Feb 2016 11:40:06 -0800

> I have 2 SSD with BTRFS filesystem (RAID) on them and several
> subvolumes. Each 15 minutes I'm creating read-only snapshot of
> subvolumes /root, /home and /web inside /backup.
> After this I'm searching for last common subvolume on /backup_hdd,
> sending difference between latest common snapshot and simply latest
> snapshot to /backup_hdd.
> On top of all above there is snapshots rotation, so that /backup
> contains much less snapshots than /backup_hdd.
One thing thing that you imply, but don't actually make explicit except
in the btrfs command output and mount options listing, is that /backup_hdd
is a mountpoint for a second entirely independent btrfs (LABEL=Backup),
while /backup is a subvolume on the primary / btrfs.  Knowing that is
quite helpful in figuring out exactly what you're doing. =:^)


Further, implied, but not explicit since some folks use hdd when
referring to ssds as well, is that the /backup_hdd hdd is spinning rust,
tho you do make it explicit that the primary btrfs is on ssds.

> I'm using this setup for last 7 months or so and this is luckily the
> longest period when I had no problems with BTRFS at all.
> However, last 2+ months btrfs receive command loads HDD so much that I
> can't even get list of directories in it.
> This happens even if diff between snapshots is really small.
> HDD contains 2 filesystems - mentioned BTRFS and ext4 for other files,
> so I can't even play mp3 file from ext4 filesystem while btrfs receive
> is running.
> Since I'm running everything each 15 minutes this is a real headache.

The *big* question is how many snapshots you have on LABEL=Backup, since
you mention rotating backups in /backup, but don't mention rotating/
thinning backups on LABEL=Backup, and do explicitly state that it has far
more snapshots, and with four snapshots an hour, they'll build up rather
fast if you aren't thinning them.

The rest of this post assumes that's the issue, since you didn't mention
thinning out the snapshots on LABEL=Backup.  If you're already familiar
with the snapshot scaling issue and snapshot caps and thinning
recommendations regularly posted here, feel free to skip the below as
it'll simply be review. =:^)

Btrfs has scaling issues when there's too many snapshots.  The
recommendation I've been using is a target of no more than 250 snapshots
per subvolume, with a target of no more than eight subvolumes and ideally
no more than four subvolumes being snapshotted per filesystem, which
doing the math leads to an overall filesystem target snapshot cap of
1000-2000, and definitely no more than 3000, tho by that point the
scaling issues are beginning to kick in and you'll feel it in lost
performance, particularly on spinning rust, when doing btrfs maintenance
such as snapshotting, send/receive, balance, check, etc.

Unfortunately, many people post here complaining about performance issues
when they're running 10K+ or even 100K+ snapshots per filesystem and the
various btrfs maintenance commands have almost ground to a halt. =:^(

You say you're snapshotting three subvolumes, / /home and /web, at 15
minute intervals.  That's 3*4=12 snapshots per hour, 12*24=288 snapshots
per day.  If all those are on LABEL=Backup, you're hitting the 250
snapshots per subvolume target in 250/4/24 = ... just over 2 and a half
days.  And you're hitting the total per-filesystem snapshots target cap
in 2000/288= ... just under seven days.

If you've been doing that for 7 months with no thinning, that's
7*30*288= ... over 60K snapshots!  No *WONDER* you're seeing performance
issues!

Meanwhile, say you need a file from a snapshot from six months ago.  Are
you *REALLY* going to care, or even _know_, exactly what 15 minute
snapshot it was?  And even if you do, just digging thru 60K+ snapshots...
OK, so we'll assume you sort them by snapshotted subvolume so only have
to dig thru 20K+ snapshots... just digging thru 20K snapshots to find the
exact 15-minute snapshot you need... is quite a bit of work!

Instead, suppose you have a "reasonable" thinning program.  First, do you
really need _FOUR_ snapshots an hour to LABEL=Backup?  Say you make it
every 20 minutes, three an hour instead of four.  That already kills a
third of them.  Then, say you take them every 15 or 20 minutes, but only
send one per hour to LABEL=Backup.  (Or if you want, do them every 15
minutes and send only ever other one, half-hourly to LABEL=Backup.  The
point is to keep it both something you're comfortable with but also more
reasonable.)

For illustration, I'll say you send once an hour.  That's 3*24=72
snapshots per day, 24/day per subvolume, already a great improvement over
the 96/day/subvolume and 288/day total you're doing now.

If then once a day, you thin down the third day back to every other hour,
you'll have 2-3 days worth of hourly snapshots on LABEL=backup, so upto
72 hourly snapshots per subvolume.  If on the 8th day you thin down to
six-hourly, 4/day, cutting out 2/3, you'll have five days of 12/day/
subvolume, 60 snapshots per subvolume, plus the 72, 132 snapshots per
subvolume total, to 8 days out so you can recover over a week's worth at
at least 2-hourly, if needed.

If then on the 32 day (giving you a month's worth of at least 4X/day),
you cut every other one, giving you twice a day snapshots, that's 24 days
of 2X/day or 48 snapshots per subvolume, plus the 132 from before, 180
snapshots per subvolume total, now.

If then on the 92 day (giving you two more months of 2X/day, a quarter's
worth of at least 2X/day) you again thin every other one, to one per day,
you have 60 days @ 2X/day or 120 snapshots per subvolume, plus the 180 we
had already, 300 snapshots per subvolume, now.

OK, so we're already over our target 250/subvolume, so we could thin a
bit more drastically.  However, we're only snapshotting three subvolumes,
so we can afford a bit of lenience on the per-subvolume cap as that's
assuming 4-8 snapshotted subvolumes, and we're still well under our total
filesystem snapshot cap.

If then you keep another quarter's worth of daily snapshots, out to 183
days, that's 91 days of daily snapshots, 91 per subvolume, on top of the
300 we had, so now 391 snapshots per subvolume.

If you then thin to weekly snapshots, cutting 6/7, and keep them around
another 27 weeks (just over half a year, thus over a year total), that's
27 more snapshots per subvolume, plus the 391 we had, 418 snapshots per
subvolume total.

418 snapshots per subvolume total, starting at 3-4X per hour to /backup
and hourly to LABEL=Backup, thinning down gradually to weekly after six
months and keeping that for the rest of the year.  Given that you're
snapshotting three subvolumes, that's 1254 snapshots total, still well
within the 1000-2000 total snapshots per filesystem target cap.

During that year, if the data is worth it, you should have done an offsite
or at least offline backup, we'll say quarterly.  After that, keeping the
local online backup around is merely for convenience, and with quarterly
backups, after a year you have multiple copies and can simply delete the
year-old snapshots, one a week, probably at the same time you thin down
the six-month-old daily snapshots to weekly.

Compare that just over 1200 snapshots to the 60K+ snapshots you may have
now, knowing that scaling over 10K snapshots is an issue particularly on
spinning rust, and you should be able to appreciate the difference it's
likely to make. =:^)

But at the same time, in practice it'll probably be much easier to
actually retrieve something from a snapshot a few months old, because you
won't have tens of thousands of effectively useless snapshots to sort
thru as you will be regularly thinning them down! =:^)

> ~> uname [-r]
> 4.5.0-rc4-haswell
>
> ~> btrfs --version
> btrfs-progs v4.4

You're staying current with your btrfs versions.  Kudos on that! =:^)

And on including btrfs fi show and btrfs fi df, as they were useful, tho
I'm snipping them here.

One more tip.  Btrfs quotas are known to have scaling issues as well.  If
you're using them, they'll exacerbate the problem.  And while I'm not
sure about current 4.4 status, thru 4.3 at least, they were buggy and not
reliable anyway.  So the recommendation is to leave quotas off on btrfs,
and use some other more mature filesystem where they're known to work
reliably if you really need them.

--
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

First of all, sorry for delay, for whatever reason was not subscribed to mailing list.

You are right, RAID is on 2 SSDs and backup_hdd (LABEL=Backup) is separate really HDD.

Example was simplified to give an overview to not dig too deep into details. I actually have correct backups rotation, so we are not talking about thousands of snapshots:) Here is tool I've created and using right now: https://github.com/nazar-pc/just-backup-btrfs I'm keeping all snapshots for last day, up to 90 for last month and up to 48 throughout the year.

So as result there are:
* 166 snapshots in /backup_hdd/root
* 166 snapshots in /backup_hdd/home
* 159 snapshots in /backup_hdd/web

I'm not using quotas, there is nothing on this BTRFS partition besides mentioned snapshots.


--
Sincerely, Nazar Mokrynskyi
github.com/nazar-pc
Skype: nazar-pc
Diaspora: naza...@diaspora.mokrynskyi.com
Tox: 
A9D95C9AA5F7A3ED75D83D0292E22ACE84BA40E912185939414475AF28FD2B2A5C8EF5261249

smime.p7s
Description: Кріптографічний підпис S/MIME

Re: Major HDD performance degradation on btrfs receive

Reply via email to