On Tue, Nov 11, 2008 at 06:27:09PM -0600, Christoph Lameter wrote: > Then page migration will not occur because there is an unresolved > reference.
So are you checking if there's an unresolved reference only in the very place I just quoted in the previous email? If answer is yes: what should prevent get_user_pages from running in parallel from another thread? get_user_pages will trigger a minor fault and get the elevated reference just after you read page_count. To you it looks like there is no o_direct in progress when you proceed to the core of migration code, but in effect o_direct just started a moment after you read the page count. What can protect you is PG lock or mmap_sem in _write_ mode (and they've to be hold for the whole duration of the migration). I don't see any of the two being hold while you read the page count... You don't seem to be using stop_machine either (stop_machine pretty expensive on the 4096 way I guess). This wasn't reproduced in practice but it should be possible to reproduce it by just writing a testcase with three threads, one forks in a loop (child just quit) the other memset 0 the first 512bytes of a page, and then o_direct read from a 0xff 512byte region and checks that the first 512bytes are all non-zero in a loop, and the third writes 1 byte to the last 512bytes of the page in a loop. Eventually the comparison should show zero data in the page. To reproduce with migration just start the thread that memset 0, reads a 0xff region with o_direct, and checks it's all 0xff in a loop, and then migrate the memory of this thread back and forth between two nodes with the sys_move_pages (mpol is safe by luck because it surrounds migrate_pages with the mmap_sem in write mode). Eventually you should see zero bytes despite I/O is complete. Reproducing this is normal life would take time and for the fork bug it may not be reproducible depending of what the app is doing. Mixing sys_move_pages with o_direct in the same process with on two different threads, instead should eventually eventually reproduce it. And with gup_fast is now unfixable until more infrastructure is added to slowdown gup_fast a bit (unless Nick finds an RCU way of doing it). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html