> On 3 Dec 2024, at 00:27, Alexander Bluhm <[email protected]> wrote:
> 
> On Tue, Dec 03, 2024 at 12:00:42AM +0300, Vitaliy Makkoveev wrote:
>>> On 2 Dec 2024, at 20:47, Alexander Bluhm <[email protected]> wrote:
>>> 
>>> On Mon, Dec 02, 2024 at 06:30:41PM +0100, Alexander Bluhm wrote:
>>>> On Mon, Dec 02, 2024 at 05:22:23PM +0100, Alexander Bluhm wrote:
>>>>> panic: Data modified on freelist: word 11 of object 0xd7632800 size 0x270 
>>>>> previous type mount (0xdead410f != 0xdead4110)
>>>> 
>>>> It looks like the struct mount mnt_lock rwl_readers was derecemented
>>>> after free.  rwl_readers = 3735896335 = 0xdead410f
>>>> 
>>>> This happens in vfs_busy() with RW_READ.  New rw_do_enter_read()
>>>> function added call to rw_dec(&rwl->rwl_readers) there.
>>>> 
>>>> I have the impression that the reimplementation of rw-lock makes
>>>> the problem visible.  Accessing struct mount after sleep is
>>>> questionalble.  Does anything prevent a file system unount while
>>>> sleeping in vfs_busy()?
>>> 
>>> New implementation of rw-locks is wrong.
>>> 
>>> vfs_lookup() checks the existance of dp->v_mountedhere
>>> each time before vfs_busy() is called.
>>> 
>>>     while (dp->v_type == VDIR && (mp = dp->v_mountedhere) &&
>>>         (cnp->cn_flags & NOCROSSMOUNT) == 0) {
>>>             if (vfs_busy(mp, VB_READ|VB_WAIT))
>>>                     continue;
>>> 
>>> vfs_busy() calls rw_enter() with RW_READ and RW_SLEEPFAIL.
>>> 
>>> Old rw_enter() did 
>>> 
>>>             error = sleep_finish(0, do_sleep);
>>>             ...
>>>             if (flags & RW_SLEEPFAIL)
>>>                     return (EAGAIN);
>>> 
>>> New rw_do_enter_read() does
>>> 
>>>             error = sleep_finish(RW_SLEEP_TMO,
>>>                 atomic_load_int(&rwl->rwl_waiters) > 0 ||
>>>                 ISSET(atomic_load_long(&rwl->rwl_owner), RWLOCK_WRLOCK));
>>>             ...
>>>             if (ISSET(flags, RW_SLEEPFAIL)) {
>>>                     error = EAGAIN;
>>>                     goto fail;
>>>             }
>>>     ...
>>> fail:
>>>       rw_dec(&rwl->rwl_readers);
>>>       return (error);
>>> 
>>> After sleep_finish() and RW_SLEEPFAIL there is no guarantee that
>>> rwl is still valid memory.  This is a use-after-free in rw_do_enter_read().
>>> 
>>> bluhm
>> 
>> Everything fine with rw_do_enter_read(). It seems the problem lays in the
>> vfs_busy(). How can it prevent mp being killed during context switch?
> 
> Semantics of rw_enter() changed.  Before this commit, memory of
> rw-lock had to be valid until sleep if RW_SLEEPFAIL was given.  Now
> it has to be valid after sleep.  That does not work with sleeping
> and kernel lock.  The whole idea of RW_SLEEPFAIL is that anything
> can happen during sleep.  The loop around vfs_busy() in vfs_lookup()
> seems to be built for the old semantics.
> 

Sorry, I disagree. It is obvious that you could sleep for all
except RW_NOSLEEP cases, so the object you operate should be
protected from concurrent destruction.  

It seems the vfs_busy() implementation started to be wrong
from the rev 1.87 of kern/vfs_subr.c while the ltsleep()
for the unmounting filesystem was removed. I mean ltsleep()
never does mp dereference, so it doesn’t matter is the mp
referenced object alive or not.

int
vfs_busy(mp, flags, interlkp, p)
        struct mount *mp;
        int flags;
        struct simplelock *interlkp;
        struct proc *p;
{
        int lkflags;

        if (mp->mnt_flag & MNT_UNMOUNT) {
                if (flags & LK_NOWAIT)
                        return (ENOENT);
                mp->mnt_flag |= MNT_MWAIT;
                /*
                 * Since all busy locks are shared except the exclusive
                 * lock granted when unmounting, the only place that a
                 * wakeup needs to be done is at the release of the
                 * exclusive lock at the end of dounmount.
                 */
                ltsleep(mp, PVFS, "vfs_bsy", 0, interlkp);
                return (ENOENT);
        }
        lkflags = LK_SHARED;
        if (interlkp)
                lkflags |= LK_INTERLOCK;
        if (lockmgr(&mp->mnt_lock, lkflags, interlkp, p))
                panic("vfs_busy: unexpected lock failure");
        return (0);
}

It has worked only because simplelock and following rwlock
implementations allowed that, but such rw_enter(RW_SLEEPFAIL)
usage is wrong.

> I notice the problem with vfs_busy(), but any other lock with
> RW_SLEEPFAIL could be wrong now.


Except ntfs_ntlookup() which is not affected, vfs_busy() is the
only place where RW_SLEEPFAIL flag is used, so the problem is
pretty local. I doubt the quick solution for vfs_busy() exists,
so it’s time to backout.

Reply via email to