Sebastian Ochmann posted on Sun, 04 Nov 2018 14:15:55 +0100 as excerpted:

> Hello,
> 
> I have a btrfs filesystem on a single encrypted (LUKS) 10 TB drive which
> stopped working correctly.

> Kernel 4.18.16 (Arch Linux)

I see upgrading to 4.19 seems to have solved your problem, but this is 
more about something I saw in the trace that has me wondering...

> [  368.267315]  touch_atime+0xc0/0xe0

Do you have any atime-related mount options set?

FWIW, noatime is strongly recommended on btrfs.

Now I'm not a dev, just a btrfs user and list regular, and I don't know 
if that function is called and just does nothing when noatime is set, so 
you may well already have it set and this is "much ado about nothing", 
but the chance that it's relevant, if not for you, perhaps for others 
that may read it, begs for this post...

The problem with atime, access time, is that it turns most otherwise read-
only operations into read-and-write operations in ordered to update the 
access time.  And on copy-on-write (COW) based filesystems such as btrfs, 
that can be a big problem, because updating that tiny bit of metadata 
will trigger a rewrite of the entire metadata block containing it, which 
will trigger an update of the metadata for /that/ block in the parent 
metadata tier... all the way up the metadata tree, ultimately to its 
root, the filesystem root and the superblocks, at the next commit 
(normally every 30 seconds or less).

Not only is that a bunch of otherwise unnecessary work for a bit of 
metadata barely anything actually uses, but forcing most read operations 
to read-write obviously compounds the risk for all of those would-be read-
only operations when a filesystem already has problems.

Additionally, if your use-case includes regular snapshotting, with atime 
on, on mostly read workloads with few writes (other than atime updates), 
it may actually be the case that most of the changes in a snapshot are 
actually atime updates, making reoccurring snapshot updates far larger 
than they'd be otherwise.

Now a few years ago the kernel did change the default to relatime, 
basically updating the atime for any particular file only once a day, 
which does help quite a bit, and on traditional filesystems it's arguably 
a reasonably sane default, but COW makes atime tracking enough more 
expensive that setting noatime is still strongly recommended on btrfs, 
particularly if you're doing regular snapshotting.

So do consider adding noatime to your mount options if you haven't done 
so already.  AFAIK, the only /semi-common/ app that actually uses atimes 
these days is mutt (for read-message tracking), and then not for mbox, so 
you should be safe to at least test turning it off.

And YMMV, but if you do use mutt or something else that uses atimes, I'd 
go so far as to recommend finding an alternative, replacing either btrfs 
(because as I said, relatime is arguably enough on a traditional non-COW 
filesystem) or whatever it is that uses atimes, your call, because IMO it 
really is that big a deal.

Meanwhile, particularly after seeing that in the trace, if the 4.19 
update hadn't already fixed it, I'd have suggested trying a read-only 
mount, both as a test, and assuming it worked, at least allowing you to 
access the data without the lockup, which would have then been related to 
the write due to the atime update, not the actual read.

Actually, a read-only mount test is always a good troubleshooting step 
when the trouble is a filesystem that either won't mount normally, or 
will, but then locks up when you try to access something.  It's far lest 
risky than a normal writable mount, and at minimum it provides you the 
additional test data of whether it worked or not, plus if it does, a 
chance to access the data and make sure your backups are current, before 
actually trying to do any repairs.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

Reply via email to