-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Linus Torvalds wrote: > > On Tue, 15 Mar 2005, Jeff Mahoney wrote: > >>This patch is another take at fixing the race between mount and umount >>resetting the blocksize and causing buffer errors, infinite loops in >>__getblk_slow, and possibly other undiscovered effects. > > > Ok. I had to go back and look up the original problem, and having looked > at this a bit more, I wonder whether the real problem is not that we do > that silly "set blocksize back to the original one" at umount time in the > first place. > > (It happens very indirectly, though the "->kill_sb()" fn pointer, which > ends up doing kill_block_super on a regular block device). > > Maybe we should just get rid of it entirely? There's really no point to > it. > > Instead, to make things repeatable, we'd always just set the blocksize to > its default value at the first open. We already do that anyway, don't we? > > Wouldn't that approach also just fix things? And then the fix would > literally be to just remove the set_blocksize() call in kill_block_super. > At that point, we know that the only people who set the block size are > either > - a first opener > - somebody who got exclusive access (ie a filesystem that sets it at > mount-time) > > (Yeah, it's a bit more complex than that one-liner, since somebody would > need to back me up on not being totally tripping on some 'shrooms. But Al > can probably do that trivially) > > Or maybe I misunderstood the problem. Jeff?
*smacks head* Yeah, I think this will fix the problem in a much cleaner way. I wanted to try to preserve the old behavior, but you validly point out that it's silly to do so. I don't believe that any opener of a block device has any expectation that the block size be anything in particular. They could open a block device that is mounted, or one that isn't. If they need it a different size, they can explicitly set it. The set_blocksize in kill_block_super is just being nice and returning it the default. Here's a quick analysis: Previous to the superblock being shut down, sget will return a superblock which will then lead to get_sb_bdev returning -EBUSY as expected. After the superblock being shut down, bd_release releases the holders. This could allow another opener to open it exclusively, but then the new mount would just return -EBUSY, which is safe. Otherwise, we'll take a holder reference under bdev_lock. Then, blkdev_put will sync and kill the blkdev under bdev->bd_sem if it holds the sole opener reference. This is what usually will happen. If it doesn't hold the only reference, blkdev_get in the new mount beat us there and won't need to reset the blocksize. The bdev doesn't get synced and killed either, though. This isn't that big a deal, since generic_shutdown_super->fsync_super takes care of syncing the fs, but the ->write_super call following it could leave the superblock unsynced under certain circumstances[*]. So, rather than completely remove the set_blocksize, I'd prefer to replace it with a sync_blockdev instead. - -Jeff [*]: It's a very specific corner case: Currently, since most filesystems use a block size that is different than the device block size, set_blocksize resetting the block size can be counted on to sync any writes that occured during that final ->write_super call. If we remove set_blocksize, then that final sync may not occur if there is another opener (such as a process reading the block device). Yes, it's a case of shooting-yourself-in-the-foot, but say it's a pen drive and knowing that it's umounted, the user removes the device. The final write is lost - because a read-only operation was in progress. That doesn't seem right to me, and can be fixed pretty easily by substituting a sync_blockdev for that set_blocksize rather than removing it completely. - -- Jeff Mahoney SuSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQFCN0yaLPWxlyuTD7IRAtp0AJ4xQDNjS+B6wq3sKu2t6c4tj6PHowCeJeqR 54cAg0bsuf1pMU0kSDJLy20= =t5Vf -----END PGP SIGNATURE----- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/