Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Thu, Jul 05, 2007 at 10:57:00AM +0200, Zoltan Menyhart wrote: > KAMEZAWA Hiroyuki wrote: > >In our test, we confirmed that this can be fixed by flushing L2I just > >before SetPageUptodate() in NFS. > > I can agree. > We can be more permissive: it can be done anywhere after the new > data is put in place and before nfs_readpage() or nfs_readpages() > returns. > > I saw your patch http://marc.info/?l=linux-mm&m=118352909826277&w=2 > that modifies e.g. mm/memory.c and not the NFS layer. > > Have you proposed a patch against the NFS layer? This really doesn't look like a job for the file system layer. That would require all sorts of file system readpage routines to be modified to handle memory management details that are already handled by the memory.c functions. The do_no_page code is already dealing with the necessary icache flushing operations. It just happens to be doing it with a bad race condition for ia64. -- Mike Stroyan <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGFIX][PATCH] DO flush icache before set_pte() on ia64.
On Wed, Jul 04, 2007 at 03:05:04PM +0900, KAMEZAWA Hiroyuki wrote: > This is a experimental patch for fixing icache flush race of ia64(Montecito). > > Problem Description: > Montecito, new ia64 processor, has separated L2 i-cache and d-cache, > and i-cache and d-cache is not consistent in automatic way. > > L1 cache is also separated but L1 D-cache is write-through. Then, before > Montecito, any changes in L1-dcache is visible in L2-mixed-cache consistently. > > Montecito has separated L2 cache and Mixed L3 cache. But...L2 D-cache is > *write back*. (See http://download.intel.com/design/Itanium2/manuals/ > 30806501.pdf section 2.3.3) > > Assume : valid data is in L2 d-cache and old data in L3 mixed cache. > If write-back L2->L3 is delayed, at L2 i-cache miss cpu will fetch old data > in L3 mixed cache. > By this, L2-icache-miss will read wrong instruction from L3-mixed cache. > (Just I think so, is this correct ?) The L3 cache is involved in the HP-UX defect description because the earlier HP-UX patch PHKL_33781 added flushing of the instruction cache when an executable mapping was removed. Linux never added that unsuccessfull attempt at montecito cache coherency. In the current linux situation it can execute old cache lines straight from L2 icache. > Now, I think icache should be flushed before set_pte(). > This is a patch to try that. > > 1. remove all lazy_mmu_prot_update()...which is used by only ia64. > 2. implements flush_cache_page()/flush_icache_page() for ia64. > > Something unsure > 3. mprotect() flushes cache before removing pte. Is this sane ? >I added flush_icache_range() before set_pte() here. > > Any comments and advices ? I am concerned about performance consequences. With the change from lazy_mmu_prot_update to __flush_icache_page_ia64 you dropped the code that avoids icache flushes for non-executable pages. Section 4.6.2 of David Mosberger and Stephane Eranian's "ia-64 linux kernel design and implementation" goes into some detail about the performance penalties avoided by limiting icache flushes to executable pages and defering flushes until the first fault for execution. Have you done any benchmarking to measure the performance effect of these additional cache flushes? It would be particularly interesting to measure on large systems with many CPUs. The fc.i instruction needs to be broadcast to all CPUs in the system. The only defect that I see in the current implementation of lazy_mmu_prot_update() is that it is called too late in some functions that are already calling it. Are your large changes attempting to correct other defects? Or are you simplifying away potentially valuable code because you don't understand it? > > Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> > > --- > arch/ia64/mm/init.c |7 +-- > include/asm-generic/pgtable.h |4 > include/asm-ia64/cacheflush.h | 24 ++-- > include/asm-ia64/pgtable.h|9 - > mm/fremap.c |1 - > mm/memory.c | 13 ++--- > mm/migrate.c |6 +- > mm/mprotect.c | 10 +- > mm/rmap.c |1 - > 9 files changed, 43 insertions(+), 32 deletions(-) You don't seem to have removed the lazy_mmu_prot_update() calls from mm/hugetlb.c. Will that build with HUGETLBFS configured? -- Mike Stroyan <[EMAIL PROTECTED]> P.S. I am retired from hp. So the [EMAIL PROTECTED] address that this was previously cc'd to no longer reaches me. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Tue, May 01, 2007 at 09:43:29PM +1000, Nick Piggin wrote: > Rohit Seth wrote: ... > >Caches on Itanium are physical. So, it doesn't matter what virtual address > >you use to flush a cache line, cache line containing specific physical > >memory will be flushed. > > Really? I was under the vague impression that L1 i/d caches were virtual > and required this flushing... but I guess so long as the ISA says that > fc/fc.i flushes all caches corresponding to the physical address of the > provided virtual address, then that's what matters. The L1 caches on Itanium have interesting behavior. The cache lines are indexed by virtual address. But those L1 cache lines are invalidated whenever their corresponding L1 TLB entry is evicted or replaced. That means that old L1 icache lines will be invalidated by TLB changes even before a fc.i instruction flushed those icache lines. That would help make old kernels work on pre-montecito processors without correctly ordered fc.i instructions. The L1 icache was flushed by TLB inserts and the L2 icache was unified. fc.i is also defined to make an address coherent between data and instruction caches at all levels of cache. That handles the update of L1 icache lines during a st,fc.i,sync.i,srlz.i sequence. I see these details in section 6.1.1 of "IntelĀ® ItaniumĀ® 2 Processor Reference Manual". But I haven't found them in a general Itanium Architecture reference. -- Mike Stroyan, [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Thu, Apr 26, 2007 at 05:53:49PM +1000, Nick Piggin wrote: > I had a couple of questions which I'm hoping someone would be kind > enough to explain :) ... > I wonder how this is different to all the other code which calls > lazy_mmu_prot_update() after set_pte_at(). do_swap_page, for example, > _could_ fault in executable code, couldn't it? The do_swap_page code does look suspect. It seems to be working on ia64 because a DMA transfer of data from swap to the allocated page is removing old lines from the icache. If code on an anonymous page was swapping in without direct DMA to the page then the same problem could occur. I can't think of a reasonable situation that would cause swapping in to not use DMA. Swapping to/from NFS does not seem reasonable to me anyway. > It is because do_swap_page uses flush_icache_page()? So why doesn't > the flush_icache_page() work in do_no_page as well? (It seems to look > like a superset of lazy_mmu_prot_update on ia64?!?). flush_icache_page() on ia64 is provided by include/asm-ia64/cacheflush.h. It doesn't have any effect at all. #define flush_icache_page(vma,page) do { } while (0) lazy_mmu_prot_update() is supposed to get icache flushes done when they need to be. And it is supposed to avoid unneeded flushes when the icache is known to be clean for a page. -- Mike Stroyan, [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ia64: race flushing icache in do_no_page path
This is a very similar problem to a copy-on-write cache flushing problem that Tony Luck fixed in July 2006. In this case the do_no_page function handles a fault in an executable or library that is mmapped from an NFS file system. The code is copied into a newly reallocated page. The lazy_mmu_prot_update() function should be used to flush old entries from the icache for that page on ia64 processors. But that call is made after a set_pte_at call that makes the page accessible to other threads executing the same code. This was seen to cause application crashes when an OpenMP application ran many threads calling same functions at the same time. The first thread to reach a page starts to fault in the new code. One of the other threads overtakes the first and executes old data from the icache. That could result in bad instructions. It is more obvious when an old cache line contains prefetched non-instruction bits that result in an illegal instruction trap. The problem has only been seen on montecito processors which have separate level 2 icache and dcache. This dcache to icache coherency problem is more likely to occur there because of the much larger level 2 icache. I suspect that the non-NFS case is working because direct DMA into the new page is making the instruction cache coherent. Any file system that uses a non-DMA copy into the text page could show the same problem. Signed-off-by: Mike Stroyan <[EMAIL PROTECTED]> diff --git a/mm/memory.c b/mm/memory.c index e7066e7..50c8848 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2291,6 +2291,7 @@ retry: entry = mk_pte(new_page, vma->vm_page_prot); if (write_access) entry = maybe_mkwrite(pte_mkdirty(entry), vma); + lazy_mmu_prot_update(entry); set_pte_at(mm, address, page_table, entry); if (anon) { inc_mm_counter(mm, anon_rss); @@ -2312,7 +2313,6 @@ retry: /* no need to invalidate: a not-present page shouldn't be cached */ update_mmu_cache(vma, address, entry); - lazy_mmu_prot_update(entry); unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) { -- Mike Stroyan, [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/