Hi, I had a couple of questions which I'm hoping someone would be kind enough to explain :)
Andrew Morton wrote:
guys, aplication crashes on million-dollar machines aren't nice. Please review carefully and urgently? Begin forwarded message: Date: Wed, 25 Apr 2007 18:16:15 -0600 From: Mike Stroyan <[EMAIL PROTECTED]> To: "Luck, Tony" <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED], [email protected] Subject: [PATCH] ia64: race flushing icache in do_no_page path This is a very similar problem to a copy-on-write cache flushing problem that Tony Luck fixed in July 2006. In this case the do_no_page function handles a fault in an executable or library that is mmapped from an NFS file system. The code is copied into a newly reallocated page. The lazy_mmu_prot_update() function should be used to flush old entries from the icache for that page on ia64 processors. But that call is made after a set_pte_at call that makes the page accessible to other threads executing the same code. This was seen to cause application crashes when an OpenMP application ran many threads calling same functions at the same time. The first thread to reach a page starts to fault in the new code. One of the other threads overtakes the first and executes old data from the icache. That could result in bad instructions. It is more obvious when an old cache line contains prefetched non-instruction bits that result in an illegal instruction trap.
I wonder how this is different to all the other code which calls lazy_mmu_prot_update() after set_pte_at(). do_swap_page, for example, _could_ fault in executable code, couldn't it? It is because do_swap_page uses flush_icache_page()? So why doesn't the flush_icache_page() work in do_no_page as well? (It seems to look like a superset of lazy_mmu_prot_update on ia64?!?). And while we're looking at flush_icache_page, why is there none in do_wp_page (I admit, I'm not really up to scratch on d/i cache aliasing handling, but cachetlb.txt seems to suggest that cow_user_page fits the description). That is, if we're already trying to cover our butts wrt SMC, then do_wp_page _could_ be cow'ing executable code, couldn't it? And for that matter, I admit I don't understand how the icache flushing can be done lazily, only at change-protection time. Why is any flush_dcache_page() site not a problem for an _existing_ executable pte wrt d/i cache aliases? BTW. while I'm ranting, I hope all this stuff has gone so complex for a reason, and that being that the alternative simpler approach of more flushes, less lazy, less complex, less buggy was tested and found to be noticably slower... :)
The problem has only been seen on montecito processors which have separate level 2 icache and dcache. This dcache to icache coherency problem is more likely to occur there because of the much larger level 2 icache. I suspect that the non-NFS case is working because direct DMA into the new page is making the instruction cache coherent. Any file system that uses a non-DMA copy into the text page could show the same problem. Signed-off-by: Mike Stroyan <[EMAIL PROTECTED]> diff --git a/mm/memory.c b/mm/memory.c index e7066e7..50c8848 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2291,6 +2291,7 @@ retry: entry = mk_pte(new_page, vma->vm_page_prot); if (write_access) entry = maybe_mkwrite(pte_mkdirty(entry), vma); + lazy_mmu_prot_update(entry); set_pte_at(mm, address, page_table, entry); if (anon) { inc_mm_counter(mm, anon_rss); @@ -2312,7 +2313,6 @@ retry:/* no need to invalidate: a not-present page shouldn't be cached */update_mmu_cache(vma, address, entry); - lazy_mmu_prot_update(entry); unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) {
-- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

