> On 3 Dec 2024, at 00:27, Alexander Bluhm <[email protected]> wrote:
>
> On Tue, Dec 03, 2024 at 12:00:42AM +0300, Vitaliy Makkoveev wrote:
>>> On 2 Dec 2024, at 20:47, Alexander Bluhm <[email protected]> wrote:
>>>
>>> On Mon, Dec 02, 2024 at 06:30:41PM +0100, Alexander Bluhm wrote:
>>>> On Mon, Dec 02, 2024 at 05:22:23PM +0100, Alexander Bluhm wrote:
>>>>> panic: Data modified on freelist: word 11 of object 0xd7632800 size 0x270
>>>>> previous type mount (0xdead410f != 0xdead4110)
>>>>
>>>> It looks like the struct mount mnt_lock rwl_readers was derecemented
>>>> after free. rwl_readers = 3735896335 = 0xdead410f
>>>>
>>>> This happens in vfs_busy() with RW_READ. New rw_do_enter_read()
>>>> function added call to rw_dec(&rwl->rwl_readers) there.
>>>>
>>>> I have the impression that the reimplementation of rw-lock makes
>>>> the problem visible. Accessing struct mount after sleep is
>>>> questionalble. Does anything prevent a file system unount while
>>>> sleeping in vfs_busy()?
>>>
>>> New implementation of rw-locks is wrong.
>>>
>>> vfs_lookup() checks the existance of dp->v_mountedhere
>>> each time before vfs_busy() is called.
>>>
>>> while (dp->v_type == VDIR && (mp = dp->v_mountedhere) &&
>>> (cnp->cn_flags & NOCROSSMOUNT) == 0) {
>>> if (vfs_busy(mp, VB_READ|VB_WAIT))
>>> continue;
>>>
>>> vfs_busy() calls rw_enter() with RW_READ and RW_SLEEPFAIL.
>>>
>>> Old rw_enter() did
>>>
>>> error = sleep_finish(0, do_sleep);
>>> ...
>>> if (flags & RW_SLEEPFAIL)
>>> return (EAGAIN);
>>>
>>> New rw_do_enter_read() does
>>>
>>> error = sleep_finish(RW_SLEEP_TMO,
>>> atomic_load_int(&rwl->rwl_waiters) > 0 ||
>>> ISSET(atomic_load_long(&rwl->rwl_owner), RWLOCK_WRLOCK));
>>> ...
>>> if (ISSET(flags, RW_SLEEPFAIL)) {
>>> error = EAGAIN;
>>> goto fail;
>>> }
>>> ...
>>> fail:
>>> rw_dec(&rwl->rwl_readers);
>>> return (error);
>>>
>>> After sleep_finish() and RW_SLEEPFAIL there is no guarantee that
>>> rwl is still valid memory. This is a use-after-free in rw_do_enter_read().
>>>
>>> bluhm
>>
>> Everything fine with rw_do_enter_read(). It seems the problem lays in the
>> vfs_busy(). How can it prevent mp being killed during context switch?
>
> Semantics of rw_enter() changed. Before this commit, memory of
> rw-lock had to be valid until sleep if RW_SLEEPFAIL was given. Now
> it has to be valid after sleep. That does not work with sleeping
> and kernel lock. The whole idea of RW_SLEEPFAIL is that anything
> can happen during sleep. The loop around vfs_busy() in vfs_lookup()
> seems to be built for the old semantics.
>
Sorry, I disagree. It is obvious that you could sleep for all
except RW_NOSLEEP cases, so the object you operate should be
protected from concurrent destruction.
It seems the vfs_busy() implementation started to be wrong
from the rev 1.87 of kern/vfs_subr.c while the ltsleep()
for the unmounting filesystem was removed. I mean ltsleep()
never does mp dereference, so it doesn’t matter is the mp
referenced object alive or not.
int
vfs_busy(mp, flags, interlkp, p)
struct mount *mp;
int flags;
struct simplelock *interlkp;
struct proc *p;
{
int lkflags;
if (mp->mnt_flag & MNT_UNMOUNT) {
if (flags & LK_NOWAIT)
return (ENOENT);
mp->mnt_flag |= MNT_MWAIT;
/*
* Since all busy locks are shared except the exclusive
* lock granted when unmounting, the only place that a
* wakeup needs to be done is at the release of the
* exclusive lock at the end of dounmount.
*/
ltsleep(mp, PVFS, "vfs_bsy", 0, interlkp);
return (ENOENT);
}
lkflags = LK_SHARED;
if (interlkp)
lkflags |= LK_INTERLOCK;
if (lockmgr(&mp->mnt_lock, lkflags, interlkp, p))
panic("vfs_busy: unexpected lock failure");
return (0);
}
It has worked only because simplelock and following rwlock
implementations allowed that, but such rw_enter(RW_SLEEPFAIL)
usage is wrong.
> I notice the problem with vfs_busy(), but any other lock with
> RW_SLEEPFAIL could be wrong now.
Except ntfs_ntlookup() which is not affected, vfs_busy() is the
only place where RW_SLEEPFAIL flag is used, so the problem is
pretty local. I doubt the quick solution for vfs_busy() exists,
so it’s time to backout.