Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-07-05 Thread Mike Stroyan
On Thu, Jul 05, 2007 at 10:57:00AM +0200, Zoltan Menyhart wrote:
> KAMEZAWA Hiroyuki wrote:
> >In our test, we confirmed that this can be fixed by flushing L2I just 
> >before SetPageUptodate() in NFS.
> 
> I can agree.
> We can be more permissive: it can be done anywhere after the new
> data is put in place and before nfs_readpage() or nfs_readpages()
> returns.
> 
> I saw your patch http://marc.info/?l=linux-mm&m=118352909826277&w=2
> that modifies e.g. mm/memory.c and not the NFS layer.
> 
> Have you proposed a patch against the NFS layer?

  This really doesn't look like a job for the file system layer.
That would require all sorts of file system readpage routines to
be modified to handle memory management details that are already
handled by the memory.c functions.  The do_no_page code is already
dealing with the necessary icache flushing operations.  It just
happens to be doing it with a bad race condition for ia64.

-- 
Mike Stroyan <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGFIX][PATCH] DO flush icache before set_pte() on ia64.

2007-07-05 Thread Mike Stroyan
On Wed, Jul 04, 2007 at 03:05:04PM +0900, KAMEZAWA Hiroyuki wrote:
> This is a experimental patch for fixing icache flush race of ia64(Montecito).
> 
> Problem Description:
> Montecito, new ia64 processor, has separated L2 i-cache and d-cache,
> and i-cache and d-cache is not consistent in automatic way.
> 
> L1 cache is also separated but L1 D-cache is write-through. Then, before
> Montecito, any changes in L1-dcache is visible in L2-mixed-cache consistently.
> 
> Montecito has separated L2 cache and Mixed L3 cache. But...L2 D-cache is
> *write back*. (See http://download.intel.com/design/Itanium2/manuals/
> 30806501.pdf section 2.3.3)
> 
> Assume : valid data is in L2 d-cache and old data in L3 mixed cache.
> If write-back L2->L3 is delayed, at L2 i-cache miss cpu will fetch old data
> in L3 mixed cache. 
> By this, L2-icache-miss will read wrong instruction from L3-mixed cache.
> (Just I think so, is this correct ?)

  The L3 cache is involved in the HP-UX defect description because the
earlier HP-UX patch PHKL_33781 added flushing of the instruction cache
when an executable mapping was removed.  Linux never added that
unsuccessfull attempt at montecito cache coherency.  In the current
linux situation it can execute old cache lines straight from L2 icache.

> Now, I think icache should be flushed before set_pte().
> This is a patch to try that.
> 
> 1. remove all lazy_mmu_prot_update()...which is used by only ia64.
> 2. implements flush_cache_page()/flush_icache_page() for ia64.
> 
> Something unsure
> 3. mprotect() flushes cache before removing pte. Is this sane ?
>I added flush_icache_range() before set_pte() here.
> 
> Any comments and advices ?

  I am concerned about performance consequences.  With the change
from lazy_mmu_prot_update to __flush_icache_page_ia64 you dropped
the code that avoids icache flushes for non-executable pages.
Section 4.6.2 of David Mosberger and Stephane Eranian's
"ia-64 linux kernel design and implementation" goes into some
detail about the performance penalties avoided by limiting icache
flushes to executable pages and defering flushes until the first
fault for execution.

  Have you done any benchmarking to measure the performance
effect of these additional cache flushes?  It would be particularly
interesting to measure on large systems with many CPUs.  The fc.i
instruction needs to be broadcast to all CPUs in the system.

  The only defect that I see in the current implementation of
lazy_mmu_prot_update() is that it is called too late in some
functions that are already calling it.  Are your large changes
attempting to correct other defects?  Or are you simplifying
away potentially valuable code because you don't understand it?

>  
> Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>
> 
> ---
>  arch/ia64/mm/init.c   |7 +--
>  include/asm-generic/pgtable.h |4 
>  include/asm-ia64/cacheflush.h |   24 ++--
>  include/asm-ia64/pgtable.h|9 -
>  mm/fremap.c   |1 -
>  mm/memory.c   |   13 ++---
>  mm/migrate.c  |6 +-
>  mm/mprotect.c |   10 +-
>  mm/rmap.c |1 -
>  9 files changed, 43 insertions(+), 32 deletions(-)

You don't seem to have removed the lazy_mmu_prot_update() calls from
mm/hugetlb.c.  Will that build with HUGETLBFS configured?

-- 
Mike Stroyan <[EMAIL PROTECTED]>

P.S.  I am retired from hp.  So the [EMAIL PROTECTED] address that
  this was previously cc'd to no longer reaches me.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-05-04 Thread Mike Stroyan
On Tue, May 01, 2007 at 09:43:29PM +1000, Nick Piggin wrote:
> Rohit Seth wrote:
...
> >Caches on Itanium are physical.  So, it doesn't matter what virtual address
> >you use to flush a cache line, cache line containing specific physical
> >memory will be flushed. 
> 
> Really? I was under the vague impression that L1 i/d caches were virtual
> and required this flushing... but I guess so long as the ISA says that
> fc/fc.i flushes all caches corresponding to the physical address of the
> provided virtual address, then that's what matters.

  The L1 caches on Itanium have interesting behavior.  The cache lines
are indexed by virtual address.  But those L1 cache lines are invalidated
whenever their corresponding L1 TLB entry is evicted or replaced.
That means that old L1 icache lines will be invalidated by TLB changes
even before a fc.i instruction flushed those icache lines.  That would
help make old kernels work on pre-montecito processors without correctly
ordered fc.i instructions.  The L1 icache was flushed by TLB inserts
and the L2 icache was unified.

  fc.i is also defined to make an address coherent between data and
instruction caches at all levels of cache.  That handles the update of
L1 icache lines during a st,fc.i,sync.i,srlz.i sequence.

  I see these details in section 6.1.1 of "IntelĀ® ItaniumĀ® 2 Processor
Reference Manual".  But I haven't found them in a general Itanium
Architecture reference.

-- 
Mike Stroyan, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-26 Thread Mike Stroyan
On Thu, Apr 26, 2007 at 05:53:49PM +1000, Nick Piggin wrote:
> I had a couple of questions which I'm hoping someone would be kind
> enough to explain :)
...
> I wonder how this is different to all the other code which calls
> lazy_mmu_prot_update() after set_pte_at(). do_swap_page, for example,
> _could_ fault in executable code, couldn't it?

  The do_swap_page code does look suspect.  It seems to be working on
ia64 because a DMA transfer of data from swap to the allocated page
is removing old lines from the icache.  If code on an anonymous page
was swapping in without direct DMA to the page then the same problem
could occur.  I can't think of a reasonable situation that would cause
swapping in to not use DMA.  Swapping to/from NFS does not seem reasonable
to me anyway.

> It is because do_swap_page uses flush_icache_page()? So why doesn't
> the flush_icache_page() work in do_no_page as well? (It seems to look
> like a superset of lazy_mmu_prot_update on ia64?!?).

flush_icache_page() on ia64 is provided by include/asm-ia64/cacheflush.h.
It doesn't have any effect at all.

#define flush_icache_page(vma,page) do { } while (0)

  lazy_mmu_prot_update() is supposed to get icache flushes done when they
need to be.  And it is supposed to avoid unneeded flushes when the icache
is known to be clean for a page.

-- 
Mike Stroyan, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ia64: race flushing icache in do_no_page path

2007-04-25 Thread Mike Stroyan
  This is a very similar problem to a copy-on-write cache flushing problem
that Tony Luck fixed in July 2006.  In this case the do_no_page function
handles a fault in an executable or library that is mmapped from an
NFS file system.  The code is copied into a newly reallocated page.
The lazy_mmu_prot_update() function should be used to flush old entries
from the icache for that page on ia64 processors.  But that call is made
after a set_pte_at call that makes the page accessible to other threads
executing the same code.  This was seen to cause application crashes
when an OpenMP application ran many threads calling same functions at
the same time.  The first thread to reach a page starts to fault in the
new code.  One of the other threads overtakes the first and executes old
data from the icache.  That could result in bad instructions.  It is more
obvious when an old cache line contains prefetched non-instruction bits
that result in an illegal instruction trap.

  The problem has only been seen on montecito processors which have
separate level 2 icache and dcache.  This dcache to icache coherency
problem is more likely to occur there because of the much larger level
2 icache.  I suspect that the non-NFS case is working because direct
DMA into the new page is making the instruction cache coherent.  Any
file system that uses a non-DMA copy into the text page could show the
same problem.

Signed-off-by: Mike Stroyan <[EMAIL PROTECTED]>

diff --git a/mm/memory.c b/mm/memory.c
index e7066e7..50c8848 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2291,6 +2291,7 @@ retry:
entry = mk_pte(new_page, vma->vm_page_prot);
if (write_access)
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+   lazy_mmu_prot_update(entry);
set_pte_at(mm, address, page_table, entry);
if (anon) {
inc_mm_counter(mm, anon_rss);
@@ -2312,7 +2313,6 @@ retry:
 
/* no need to invalidate: a not-present page shouldn't be cached */
update_mmu_cache(vma, address, entry);
-   lazy_mmu_prot_update(entry);
 unlock:
pte_unmap_unlock(page_table, ptl);
if (dirty_page) {

-- 
Mike Stroyan, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/