On Thu, Sep 14, 2023 at 04:02:25PM +0200, Christian Brauner wrote: > Yes, you're right that making the superblock and not the filesytem type > the bd_holder changes the logic and we are aware of that of course. And > it requires changes such as moving additional block device closing from > where some callers currently do it.
Details, please? > But the filesytem type is not a very useful holder itself and has other > drawbacks. The obvious one being that it requires us to wade through all > superblocks on the system trying to find the superblock associated with > a given block device continously grabbing and dropping sb_lock and > s_umount. None of that is very pleasant nor elegant and it is for sure > not very easy to understand (Plus, it's broken for btrfs freezing and > syncing via block level ioctls.). "Constantly" is a bit of a stretch - IIRC, we grabbed sb_lock once, then went through the list comparing ->s_bdev (without any extra locking), then bumped ->s_count on the found superblock, dropped sb_lock, grabbed ->s_umount on the sucker and verified it's still alive. Repeated grabbing of any lock happened only on a race with fs shutdown; normal case is one spin_lock, one spin_unlock, one down_read(). Oh, well... > Using the superblock as holder makes this go away and is overall a lot > more useful and intuitive and can be extended to filesystems with > multiple devices (Of which we apparently are bound to get more.). > > So I think this change is worth the pain. > > It's a fair point that these lifetime rules should be documented in > Documentation/filesystems/. The old lifetime documentation is too sparse > to be useful though. What *are* these lifetime rules? Seriously, you have 3 chunks of fs-dependent actions at the moment: * the things needed to get rid of internal references pinning inodes/dentries + whatever else we need done before generic_shutdown_super() * the stuff to be done between generic_shutdown_super() and making the sucker invisible to sget()/sget_fc() * the stuff that must be done after we are sure that sget callbacks won't be looking at this instance. Note that Christoph's series has mashed (2) and (3) together, resulting in UAF in a bunch of places. And I'm dead serious about Documentation/filesystems/porting being the right place; any development tree of any filesystem (in-tree one or not) will have to go through the changes and figure out WTF to do with their existing code. We are going to play whack-a-mole for at least several years as development branches get rebased and merged. Incidentally, I'm going to add a (belated by 10 years) chunk in porting.rst re making sure that anything in superblock that might be needed by methods called in RCU mode should *not* be freed without an RCU delay... Should've done that back in 3.12 merge window when RCU'd vfsmounts went in; as it is, today we have several filesystems with exact same kind of breakage. hfsplus and affs breakage had been there in 3.13 (missed those two), exfat and ntfs3 - introduced later, by initial merges of filesystems in question. Missed on review... Hell knows - perhaps Documentation/filesystems/whack-a-mole might be a good idea...