* Nick Piggin <[EMAIL PROTECTED]> wrote: > Hi, > > Not sure if this should be fixed for 2.6.13. It can result in > pagecache corruption: so I guess that answers my own question. > > This was tested by Robin and appears to solve the problem. Roland had > a quick look and thought the basic idea was sound. I'd like to get a > couple more acks before going forward, and in particular Robin was > contemplating possible efficiency improvements (although efficiency > can wait on correctness). > > Feedback please, anyone.
it looks good to me, but wouldnt it be simpler (in terms of patch and architecture impact) to always retry the follow_page() in get_user_pages(), in case of a minor fault? The sequence of minor faults must always be finite so it's a safe solution, and should solve the race too. The extra overhead of an additional follow_page() is small. Especially with 2.6.13 being close this looks like the safest bet, any improvement to this (i.e. your patch) can then be done in 2.6.14. Ingo When get_user_pages for write access races with another process in the page fault handler that has established the pte for read access, handle_mm_fault in get_user_pages will return VM_FAULT_MINOR even if it hasn't made the page correctly writeable (for example, broken COW). Thus the assumption that get_user_pages has a writeable page at the mapping after handle_mm_fault returns is incorrect. Fix this by retrying the lookup and fault in get_user_pages before making the assumption that we have a writeable page. Great work by Robin Holt <[EMAIL PROTECTED]> to debug the problem. Originally-from: Nick Piggin <[EMAIL PROTECTED]> simplified the patch into a two-liner: Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> mm/memory.c | 7 +++++++ 1 files changed, 7 insertions(+) Index: linux-sched-curr/mm/memory.c =================================================================== --- linux-sched-curr.orig/mm/memory.c +++ linux-sched-curr/mm/memory.c @@ -961,6 +961,13 @@ int get_user_pages(struct task_struct *t switch (handle_mm_fault(mm,vma,start,write)) { case VM_FAULT_MINOR: tsk->min_flt++; + /* + * We might have raced with a readonly + * pagefault, retry to make sure we + * got write access: + */ + if (write) + continue; break; case VM_FAULT_MAJOR: tsk->maj_flt++; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/