On 2015-11-24 12:23, Christoph Anton Mitterer wrote:
On Tue, 2015-11-24 at 11:14 -0600, Eric Sandeen wrote:
In a nutshell, though, I think a filesystem repair should be an
admin-initiated
action, not something that surprises you on a boot, at least for a
journaling
filesystem which is designed to maintain its integrity even in the
face of
a power loss or crash.

Well I wouldn't agree here... I maintain some >2PiB of storage for a
LHC Tier-2,... right now everything with ext4.
During normal operation we can of course not have any fsck, but every
now and then, when we reboot, it happens automatically,... and
regularly shows at least some (apparently non-serious) glitches.
Yeah, that's pretty normal for any large storage array with a high uptime. ext4 also doesn't correct anything on the fly, so it's more important that you always run a check on boot when you don't reboot often (which brings up why i personally suggest stuff like GlusterFS or Ceph for large scale data storage, you can reboot individual nodes one at a time, have zero down time, and maintain a high degree of performance and data safety).

IMHO, either the kernel driver itself already checks "everything", then
we wouldn't need a dedicated check tool.
Or it does not, but in that case, there will be people who want to have
that in-depth checks run regularly (and even if it's just every half a
year).
I better wait half an hour at boot, and find such errors, rather than
that they silently pile up until I really run into troubles.
Well, that depends on the type of errors. XFS doesn't need a fsck on mount usually, but there is still a xfs_repair tool for fixing badly damaged filesystems that the kernel can't mount. btrfs check falls into the same general usage as XFS repair, IOW, if the system was shut down cleanly, you're fine barring software bugs, but if it crashed, you should be running a check on the FS. Like I mentioned above, ext4 doesn't correct errors while online, it either (depending on how the fs is configured) ignores them, goes read-only, or panics the system. BTRFS on the other hand, can correct many types of errors while online (that's part of what scrub is for), and is usually pretty resilient when it comes to disk errors (I have a few TB worth of data on assorted BTRFS filesystems, I run scrubs on them weekly (which usually turns up about a single block error across the whole data set per month), and run a check on them monthly, which has never turned up anything unless the system had crashed).

That being said, of course it should be configurable for the admin...
and it is, via fstab.
So apart from that, given the expectation that btrfsck should be rock-
solid as e.g. e2fsck in some future, I wouldn't see why people
shouldn't have the necessary facilities to have it auto-run.
btrfsck has to parse all the data in the FS, and unlike ext4, BTRFS has multiple copies of each metadata block (and often on large filesystems, is configured for multiple copies of each data block), and has checksums on _everything_, which need to be validated. There is no way that this can be made all that much faster short of getting faster hardware to run it on.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to