On 07/15/2011 04:44 PM, Peter Zijlstra wrote:
On Fri, 2011-07-15 at 16:38 +0800, MailingLists wrote:
On 07/15/2011 04:20 PM, Peter Zijlstra wrote:
On Fri, 2011-07-15 at 16:07 +0800, Shan Hai wrote:
The following test case could reveal a bug in the futex_lock_pi()

BUG: On FUTEX_LOCK_PI, there is a infinite loop in the futex_lock_pi()
          on Powerpc e500 core.
Cause: The linux kernel on the e500 core has no write permission on
          the COW page, refer the head comment of the following test code.

ftrace on test case:
[000]   353.990181: futex_lock_pi_atomic<-futex_lock_pi
[000]   353.990185: cmpxchg_futex_value_locked<-futex_lock_pi_atomic
[snip]
[000]   353.990191: do_page_fault<-handle_page_fault
[000]   353.990192: bad_page_fault<-handle_page_fault
[000]   353.990193: search_exception_tables<-bad_page_fault
[snip]
[000]   353.990199: get_user_pages<-fault_in_user_writeable
[snip]
[000]   353.990208: mark_page_accessed<-follow_page
[000]   353.990222: futex_lock_pi_atomic<-futex_lock_pi
[snip]
[000]   353.990230: cmpxchg_futex_value_locked<-futex_lock_pi_atomic
[ a loop occures here ]

But but but but, that get_user_pages(.write=1, .force=0) should result
in a COW break, getting our own writable page.

What is this e500 thing smoking that this doesn't work?
A page could be set to read only by the kernel (supervisor in the powerpc
literature) on the e500, and that's what the kernel do. Set SW(supervisor
write) bit in the TLB entry to grant write permission to the kernel on a
page.

And further the SW bit is set according to the DIRTY flag of the PTE,
PTE.DIRTY is set in the do_page_fault(), the futex_lock_pi() disabled
page fault, the PTE.DIRTY never can be set, so do the SW bit, unbreakable
COW occurred, infinite loop followed.
I'm fairly sure fault_in_user_writeable() has PF enabled as it takes
mmap_sem, an pagefaul_disable() is akin to preemp_disable() on mainline.

Also get_user_pages() fully expects to be able to schedule, and in fact
can call the full pf handler path all by its lonesome self.

The whole scenario should be,
- the child process triggers a page fault at the first time access to
    the lock, and it got its own writable page, but its *clean* for
    the reason just for checking the status of the lock.
    I am sorry for above "unbreakable COW".
- the futex_lock_pi() is invoked because of the lock contention,
    and the futex_atomic_cmpxchg_inatomic() tries to get the lock,
    it found out the lock is free so tries to write to the lock for
    reservation, a page fault occurs, because the page is read only
    for kernel(e500 specific), and returns -EFAULT to the caller
- the fault_in_user_writeable() tries to fix the fault,
    but from the get_user_pages() view everything is ok, because
    the COW was already broken, retry futex_lock_pi_atomic()
- futex_lock_pi_atomic() --> futex_atomic_cmpxchg_inatomic(),
    another write protection page fault
- infinite loop

Thanks
Shan Hai


_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to