On Tue, 2 Aug 2005, Linus Torvalds wrote: > > Since we will have dropped the page table lock when calling > handle_mm_fault() (which will just re-get the lock and then drop it > again) _and_ since we don't actually mark the page dirty if it was > writable, it's entirely possible that the VM scanner comes in and just > drops the page from the page tables. > > Now, that doesn't sound so bad, but what we have then is a page that is > marked dirty in the "struct page", but hasn't been actually dirtied yet. > It could get written out and marked clean (can anybody say "preemptible > kernel"?) before we ever actually do the write to the page. > > The thing is, we should always set the dirty bit either atomically with > the access (normal "CPU sets the dirty bit on write") _or_ we should set > it after the write (having kept a reference to the page). > > Or does anybody see anything that protects us here? > > Now, I don't think we can fix that race (which is probably pretty much > impossible to hit in practice) in the 2.6.13 timeframe.
I believe this particular race has been recognized since day one of get_user_pages, and we've always demanded that the caller must do a SetPageDirty (I should probably say set_page_dirty) before freeing the pages held for writing. Which is why I was a bit puzzled to see that prior set_page_dirty in __follow_page, which Andrew identified as for s390. > Maybe I'll have to just accept the horrid "VM_FAULT_RACE" patch. I don't > much like it, but.. I've not yet reached a conclusion on that, need to think more about doing mkclean in copy_one_pte. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/