Austin S. Hemmelgarn posted on Fri, 07 Apr 2017 07:41:22 -0400 as excerpted:
> 2. Results from 'btrfs scrub'. This is somewhat tricky because scrub is > either asynchronous or blocks for a _long_ time. The simplest option > I've found is to fire off an asynchronous scrub to run during down-time, > and then schedule recurring checks with 'btrfs scrub status'. On the > plus side, 'btrfs scrub status' already returns non-zero if the scrub > found errors. This is (one place) where my "keep it small enough to be in-practice- manageable" comes in. I always run my scrubs with -B (don't background, always, because I've scripted it), and they normally come back within a minute. =:^) But that's because I'm running multiple btrfs pair-device raid1 on a pair of partitioned SSDs, with each independent btrfs built on a partition from each ssd, with all partitions under 50 GiB. So scrubs takes less than a minute to run (on the under 1 GiB /var/log, it returns effectively immediately, as soon as I hit enter on the command), but that's not entirely surprising at the sizes of the ssd-based btrfs' I am running. When scrubs (and balances, and checks) come back in a minute or so, it makes maintenance /so/ much less of a hassle. =:^) And the generally single-purpose and relatively small size of each filesystem means I can, for instance, keep / (with all the system libs, bins, manpages, and the installed-package database, among other things) mounted read-only by default, and keep the updates partition (gentoo so that's the gentoo and overlay trees, the sources and binpkg cache, ccache cache, etc) and (large non-ssd/non-btrfs) media partitions unmounted by default. Which in turn means when something /does/ go wrong, as long as it wasn't a physical device, there's much less data at risk, because most of it was probably either unmounted, or mounted read-only. Which in turn means I don't have to worry about scrub/check or other repair on those filesystems at all, only the ones that were actually mounted writable. And as mentioned, those scrub and check fast enough that I can literally wait at the terminal for command completion. =:^) Of course my setup's what most would call partitioned to the extreme, but it does have its advantages, and it works well for me, which after all is the important thing for /my/ setup. But the more generic point remains, if you setup multi-TB filesystems that take days or weeks for a maintenance command to complete, running those maintenance commands isn't going to be something done as often as one arguably should, and rebuilding from a filesystem or device failure is going to take far longer than one would like, as well. We've seen the reports here. If that's what you're doing, strongly consider breaking your filesystems down to something rather more manageable, say a couple TiB each. Broken along natural usage lines, it can save a lot on the caffeine and headache pills when something does go wrong. Unless of course like one poster here, you're handling double-digit-TB super-collider data files. Those tend to be a bit difficult to store on sub-double-digit-TB filesystems. =:^) But that's the other extreme from what I've done here, and he actually has a good /reason/ for /his/ double-digit- or even triple-digit-TB filesystems. There's not much to be done about his use-case, and indeed, AFAIK he decided btrfs simply isn't stable and mature enough for that use-case yet, tho I believe he's using it for some other, more minor and less gargantuan use-cases. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html