Jamie Lokier <[EMAIL PROTECTED]> wrote:
>
> ...
> 
> > > One attempt to fix this is included below. It works, but I'm not entirely
> > > happy with the fact that it's a bit messy solution. If anyone has a
> > > better idea for how to solve it I'd be all ears.
> > 
> > It's fairly sane.  Style-wise I'd be inclined to turn this:
> > 
> >     down_read(&current->mm->mmap_sem);
> >     while (!check_user_page_readable(current->mm, uaddr1)) {
> >             up_read(&current->mm->mmap_sem);
> >             /* Fault in the page through get_user() but discard result */
> >             if (get_user(curval, (int __user *)uaddr1) != 0)
> >                     return -EFAULT;
> >             down_read(&current->mm->mmap_sem);
> >     }
> 
> That won't work because the vma lock must be help between key
> calculation and get_user() - otherwise futex is not reliable.  It
> would work if the futex key calculation was inside the loop.

All the above is trying to do is to convert the initial down_read(mmap_sem)
into a function which, on exit, guarantees that

a) down_read(mmap_sem) is held and

b) the subsequent get_user() of that address will not generate a pagefault.

So it shouldn't affect the futex code's atomicity at all.

However the pte can get unmapped by memory reclaim so we could still take a
minor fault, and hit the same deadlock, yes?

> A much simpler solution (and sorry for not offering it earlier,
> because Andrew Morton pointed out this bug long ago, but I was busy), is:
> 
> In futex.c:
> 
>       down_read(&current->mm->mmap_sem);
>       get_futex_key(...) etc.
>       queue_me(...) etc.
>       current->flags |= PF_MMAP_SEM;             <- new
>       ret = get_user(...);
>       current->flags &= PF_MMAP_SEM;             <- new
>       /* the rest */
> 
> And in arch/*/mm/fault.c, replace every one of these:
> 
>       down_read(&mm->mmap_sem);
> 
>       up_read(&mm->mmap_sem);
> 
> with these:
> 
>       if (!(current & PF_MMAP_SEM))
>               down_read(&mm->mmap_sem);
> 
>       if (!(current & PF_MMAP_SEM))
>               up_read(&mm->mmap_sem);
> 

Yes, that will work.  However I do feel that it's cleaner to localise this
nastiness into a single function which the futex code calls, rather than
spreading it all around and adding overhead to every pagefault.  If we can
work out how.

wrt this down_read/down_write/down_read deadlock: iirc, the reason why
down_write() takes precedence over down_read() is to avoid the permanent
writer starvation which would occur if there is heavy down_read() traffic.

As Linus points out, an alternative would be to do an inc_preempt_count()
around the offending get_user(), then use __copy_from_user_inatomic(), then
take some sort of remedial action if __copy_from_user_inatomic() returns a
fault.  Something like:

retry:
        if (get_user(uaddr) == -EFAULT)
                return -EFAULT;
        down_read(mmap_sem);
        inc_preempt_count();
        if (__copy_from_user_inatomic(..., uaddr)) {
                up_read(mmap_sem);
                dec_preempt_count();
                goto retry;
        }

        dec_preempt_count();
        up_read(mmap_sem);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to