Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-07-05 Thread Mike Stroyan
On Thu, Jul 05, 2007 at 10:57:00AM +0200, Zoltan Menyhart wrote:
> KAMEZAWA Hiroyuki wrote:
> >In our test, we confirmed that this can be fixed by flushing L2I just 
> >before SetPageUptodate() in NFS.
> 
> I can agree.
> We can be more permissive: it can be done anywhere after the new
> data is put in place and before nfs_readpage() or nfs_readpages()
> returns.
> 
> I saw your patch http://marc.info/?l=linux-mm&m=118352909826277&w=2
> that modifies e.g. mm/memory.c and not the NFS layer.
> 
> Have you proposed a patch against the NFS layer?

  This really doesn't look like a job for the file system layer.
That would require all sorts of file system readpage routines to
be modified to handle memory management details that are already
handled by the memory.c functions.  The do_no_page code is already
dealing with the necessary icache flushing operations.  It just
happens to be doing it with a bad race condition for ia64.

-- 
Mike Stroyan <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-07-05 Thread Zoltan Menyhart

KAMEZAWA Hiroyuki wrote:

On Wed, 04 Jul 2007 16:24:38 +0200
Zoltan Menyhart <[EMAIL PROTECTED]> wrote:


Machines star up whit bit 5 = 0, reading instruction pages via
NFS has to flush them from L2I.



In our test, we confirmed that this can be fixed by flushing L2I just before 
SetPageUptodate() in NFS.


I can agree.
We can be more permissive: it can be done anywhere after the new
data is put in place and before nfs_readpage() or nfs_readpages()
returns.

I saw your patch http://marc.info/?l=linux-mm&m=118352909826277&w=2
that modifies e.g. mm/memory.c and not the NFS layer.

Have you proposed a patch against the NFS layer?


I was wondering if instead of modifying do_no_page() and Co., should
not we make nfs_readpage() be DMA-like?
(No possible regression for most of the page I/O-s.)
I.e. it should be the responsibility of a file system to make sure it
supports instruction pages correctly. The base kernel should provide
such file systems with an architecture dependent macro...



IMHO, for example, race in cooy-on-write  (was fixed by Tony Luck) has to be
fixed by MemoryManagement layer.


I can agree.
Note that it has not got much performance impact from our point
of view because there are not too many COW paves with executable code...


And only a race in do_no_page() seems to be able to be fixed by FS layer.


If it were do_no_page() that would fix the problem, then it should know
where the page comes from in order not to flush the I-cache in vain.
do_no_page() just calls vma->vm_ops->nopage() that goes down to
fs_op->readpage() / fs_op->readpages().

On the other hand, these latter routines do not know why they fetch
the page(s). Note that a page can be mapped more than one times, with
different permission bits.
As a consequence, these routines will flush the I-cache in vain in
most of the cases.

I prefer a performance impact to some non DMA based file systems
and adding nothing to the path for the majority of the cases.


BTW, can we know whether a page is filled by DMA or not  ?


Well, the file systems based on block devices use DMA transfer
(with the exception of using bounce buffers, in that case it is the
responsibility of the bounce buffering layer to flush the I-cache).

Network based file systems require revision and update...

Note that only the processors with separate L2I require
attention, the L1I is guaranteed to be flushed when you change
the corresponding TBL entry.

The base kernel should provide a macro / service (that cares for the
processor model) for the file systems.

Thanks,

Zoltan Menyhart
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-07-04 Thread KAMEZAWA Hiroyuki
On Wed, 04 Jul 2007 16:24:38 +0200
Zoltan Menyhart <[EMAIL PROTECTED]> wrote:
> Machines star up whit bit 5 = 0, reading instruction pages via
> NFS has to flush them from L2I.
> 
In our test, we confirmed that this can be fixed by flushing L2I just before 
SetPageUptodate() in NFS.

> 
> I was wondering if instead of modifying do_no_page() and Co., should
> not we make nfs_readpage() be DMA-like?
> (No possible regression for most of the page I/O-s.)
> I.e. it should be the responsibility of a file system to make sure it
> supports instruction pages correctly. The base kernel should provide
> such file systems with an architecture dependent macro...
> 
IMHO, for example, race in cooy-on-write  (was fixed by Tony Luck) has to be
fixed by MemoryManagement layer.
And only a race in do_no_page() seems to be able to be fixed by FS layer.

BTW, can we know whether a page is filled by DMA or not  ?

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-07-04 Thread Zoltan Menyhart

Could you please confirm that I understand correctly what is in the:

Dual-Core Update to the Intel Itanium 2 Processor Reference Manual...

"2.3.3.2 L2 Caches
...
Any coherence request to identify whether a cache line is in the processor
will invalidate that line from the L2I cache."

This makes sure that the DMAs invalidate the L2L cache.

"2.7.4 Instruction Cache Coherence Optimization
Coherence requests of the L1I and L2I caches will invalidate the line if
it is in the cache. Montecito allows instruction requests on the system
interface to be filtered such that they will not initiate coherence
requests of the L1I and L2I caches. This will allow instructions to be
cached at the L1I and L2I levels across multiple processors in a coherent
domain. This optimization is enabled by default, but may be disabled by
PAL_SET_PROC_FEATURES bit 5 of the Montecito feature_set (18)."

Machines star up whit bit 5 = 0, reading instruction pages via
NFS has to flush them from L2I.


I was wondering if instead of modifying do_no_page() and Co., should
not we make nfs_readpage() be DMA-like?
(No possible regression for most of the page I/O-s.)
I.e. it should be the responsibility of a file system to make sure it
supports instruction pages correctly. The base kernel should provide
such file systems with an architecture dependent macro...

Thanks,

Zoltan Menyhart
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-05-04 Thread Mike Stroyan
On Tue, May 01, 2007 at 09:43:29PM +1000, Nick Piggin wrote:
> Rohit Seth wrote:
...
> >Caches on Itanium are physical.  So, it doesn't matter what virtual address
> >you use to flush a cache line, cache line containing specific physical
> >memory will be flushed. 
> 
> Really? I was under the vague impression that L1 i/d caches were virtual
> and required this flushing... but I guess so long as the ISA says that
> fc/fc.i flushes all caches corresponding to the physical address of the
> provided virtual address, then that's what matters.

  The L1 caches on Itanium have interesting behavior.  The cache lines
are indexed by virtual address.  But those L1 cache lines are invalidated
whenever their corresponding L1 TLB entry is evicted or replaced.
That means that old L1 icache lines will be invalidated by TLB changes
even before a fc.i instruction flushed those icache lines.  That would
help make old kernels work on pre-montecito processors without correctly
ordered fc.i instructions.  The L1 icache was flushed by TLB inserts
and the L2 icache was unified.

  fc.i is also defined to make an address coherent between data and
instruction caches at all levels of cache.  That handles the update of
L1 icache lines during a st,fc.i,sync.i,srlz.i sequence.

  I see these details in section 6.1.1 of "IntelĀ® ItaniumĀ® 2 Processor
Reference Manual".  But I haven't found them in a general Itanium
Architecture reference.

-- 
Mike Stroyan, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-05-01 Thread Nick Piggin

Rohit Seth wrote:

On Tue, 2007-05-01 at 21:52 +1000, Nick Piggin wrote:


Rohit Seth wrote:


and 
it's only interested when it's executable i.e. "lazy_mmu_prot_update"

is a name concealing some overdesign.



You are right that ia64 is only interested in whne the execute permissions
kick in (and FWIW ia64 used to use update_mmu_cache API to do what it is now
doing lazy_mmu_prot_update).  Though the idea was to design an API that any
arch can use to know when ever there is change in protections on a mapping.


What I think what we should do is audit flush_icache_page coverage, and
convert ia64 to use that (because it needs this to happen _before_ the
pte is set).




That doesn't address the underlying requirement that arch specific code
should be told of change in protections. 


But you would add the flush_icache_page_chprot to address that.



For ia64, you are right that
it equates to flushing icache in some cases, but this API is more
generic.


And it is also broken, because it needs to be done before the set_pte
for ia64. It needs to be done where flush_icache_page is done. And on
ia64 it flushes the icache. So I don't understand why ia64 would not
use flush_icache_page but add something completely new "because it is
more generic".



All we should need to do is add a pte argument to flush_icache,



I'm sure this is doable.  Though it is more of design issue of whether
that is the right way to do it.  I understand this extra API is
difficult at this time because of one single consumer.


There is just no point in adding something else if you already have
a hook that seems to do what ia64 wants (or could, with a small amount
of work).

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-05-01 Thread Nick Piggin

Rohit Seth wrote:

On Tue, 2007-05-01 at 21:39 +1000, Nick Piggin wrote:


Rohit Seth wrote:




If a user is requesting kernel to do (for example) write on a page that is
already mapped with execute and write permissions then it should be treated
as if the user space is doing modifications to that page.  There is no
change in protections so lazy_prot_mmu_update shouldn't be called even
though PG_arch_1 is (I think) set.  Does it answer your concern?


I'm not sure that I would agree. For direct modifications of memory via
a passed in user virtual address, perhaps. For operations on pagecache,
we may not even have a handle to issue the flush cache instruction on (ie.
a user virtual address), let alone know whether anyone else is mapping
the page.




Can you please describe the page cache scenario in more detail?  IMO, if
a page is user mapped with at least one execute and write permission
then the responsibility of update caches lies with user.


What if a different user write(2)s the underlying page?


What if you were to say remove all the PG_arch_1 code, and do 
something really simple like flush icache in 
flush_dcache_page? Would performance suffer horribly?



On Itanium, I think it will have some performance penalty (horrible or not I
don't know) as you will be invalidating the caches more often.  And they
alsways look for last 0.1% performance that they can get.


Sure, but if we _only_ flushed when page_mapcount was raised,



You will need this every time there is change in protection (e.g.
mprotect) not only when page_mapcount is raised.


Yeah, you would retain the flush on fault, I meant you would
introduce a flush in flush_dcache_page for when mapcount is raised.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-05-01 Thread Rohit Seth
On Tue, 2007-05-01 at 21:47 +1000, Nick Piggin wrote:
> Rohit Seth wrote:
> >
> > 
> > It is invalidating any entries (containing same physical address) in both I
> > and D caches.  Any dirty lines in D cache are written back to memory before
> > getting invalidated (ofcourse).
> 
> OK. (should it be issuing both fc and fc.i to be robust in case a
> new implementation doesn't flush the dcache with fc.i?)
> 

For the Itanium case specifically, you only want to invalidate a stale
icache line.  Once that is done, next time icache will pick the correct
updated contents.

> 
> >>There are supposedly no icache lines at that point[**]:
> > 
> > 
> > For this bug to trigger there has to be a (stale) entry in icache containing
> > the old contents of a page that just got updated by kernel as explicit
> > copying of data (DMAs are coherent on ia64, meaning if a device were to
> > write to memory then architecture guarnatees that both I and D caches are
> > invalidated).
> 
> So if we have a dirty dcache line for a given physical address,
> it will _always_ be the case that a subsequent icache load will
> find that dirty data?

yes for ia64.

-rohit

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-05-01 Thread Rohit Seth
On Tue, 2007-05-01 at 21:52 +1000, Nick Piggin wrote:
> Rohit Seth wrote:
> 
> >>and 
> >>it's only interested when it's executable i.e. "lazy_mmu_prot_update"
> >>is a name concealing some overdesign.
> > 
> > 
> > You are right that ia64 is only interested in whne the execute permissions
> > kick in (and FWIW ia64 used to use update_mmu_cache API to do what it is now
> > doing lazy_mmu_prot_update).  Though the idea was to design an API that any
> > arch can use to know when ever there is change in protections on a mapping.
> 
> What I think what we should do is audit flush_icache_page coverage, and
> convert ia64 to use that (because it needs this to happen _before_ the
> pte is set).
> 

That doesn't address the underlying requirement that arch specific code
should be told of change in protections.  For ia64, you are right that
it equates to flushing icache in some cases, but this API is more
generic.

> All we should need to do is add a pte argument to flush_icache,

I'm sure this is doable.  Though it is more of design issue of whether
that is the right way to do it.  I understand this extra API is
difficult at this time because of one single consumer.

-rohit


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-05-01 Thread Rohit Seth
On Tue, 2007-05-01 at 21:39 +1000, Nick Piggin wrote:
> Rohit Seth wrote:

> > 
> > If a user is requesting kernel to do (for example) write on a page that is
> > already mapped with execute and write permissions then it should be treated
> > as if the user space is doing modifications to that page.  There is no
> > change in protections so lazy_prot_mmu_update shouldn't be called even
> > though PG_arch_1 is (I think) set.  Does it answer your concern?
> 
> I'm not sure that I would agree. For direct modifications of memory via
> a passed in user virtual address, perhaps. For operations on pagecache,
> we may not even have a handle to issue the flush cache instruction on (ie.
> a user virtual address), let alone know whether anyone else is mapping
> the page.
> 

Can you please describe the page cache scenario in more detail?  IMO, if
a page is user mapped with at least one execute and write permission
then the responsibility of update caches lies with user.

> >>What if you were to say remove all the PG_arch_1 code, and do 
> >>something really simple like flush icache in 
> >>flush_dcache_page? Would performance suffer horribly?
> > 
> > 
> > On Itanium, I think it will have some performance penalty (horrible or not I
> > don't know) as you will be invalidating the caches more often.  And they
> > alsways look for last 0.1% performance that they can get.
> 
> Sure, but if we _only_ flushed when page_mapcount was raised,

You will need this every time there is change in protection (e.g.
mprotect) not only when page_mapcount is raised.

-rohit


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-05-01 Thread Nick Piggin

Rohit Seth wrote:

and 
it's only interested when it's executable i.e. "lazy_mmu_prot_update"

is a name concealing some overdesign.



You are right that ia64 is only interested in whne the execute permissions
kick in (and FWIW ia64 used to use update_mmu_cache API to do what it is now
doing lazy_mmu_prot_update).  Though the idea was to design an API that any
arch can use to know when ever there is change in protections on a mapping.


What I think what we should do is audit flush_icache_page coverage, and
convert ia64 to use that (because it needs this to happen _before_ the
pte is set).

All we should need to do is add a pte argument to flush_icache, and it
should be possible to do what ia64 wants, and we can remove
lazy_mmu_prot_update (or at least rename it to something like
flush_icache_page_chprot and move it to the normal flush_icache_page
position above set_pte if not all architectures want their
flush_icache_page called at protection change time).

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-05-01 Thread Nick Piggin

Rohit Seth wrote:
Hi Nick, 


-Original Message-
From: Nick Piggin [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 27, 2007 11:03 PM

To: Hugh Dickins
Cc: [EMAIL PROTECTED]; Mike Stroyan; Andrew Morton; Luck, Tony;
[EMAIL PROTECTED]; linux-kernel@vger.kernel.org
Subject: Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

Hugh Dickins wrote:


On Sat, 28 Apr 2007, Nick Piggin wrote:


OIC, you need a virtual address to evict the icache, so you can't 
flush at flush_dcache time? Or does ia64 have an instruction to flush 
the whole icache? (it would be worth testing, to see how much 
performance suffers).



I'm puzzled by that remark: the ia64 flush_icache_range always has a 
virtual address, it uses the kernel virtual address; it takes no 
interest in whether there's a user virtual address.



I _think_ what it is doing is actually flushing dcache lines dirtied 
via the kernel virtual address (yes, I think flush_icache
in lazy_mmu_prot_update is actually just flushing the dcache, but 
I could be wrong? [*]).



It is invalidating any entries (containing same physical address) in both I
and D caches.  Any dirty lines in D cache are written back to memory before
getting invalidated (ofcourse).


OK. (should it be issuing both fc and fc.i to be robust in case a
new implementation doesn't flush the dcache with fc.i?)



There are supposedly no icache lines at that point[**]:



For this bug to trigger there has to be a (stale) entry in icache containing
the old contents of a page that just got updated by kernel as explicit
copying of data (DMAs are coherent on ia64, meaning if a device were to
write to memory then architecture guarnatees that both I and D caches are
invalidated).


So if we have a dirty dcache line for a given physical address,
it will _always_ be the case that a subsequent icache load will
find that dirty data?

... thanks for bearing with me ;)

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-05-01 Thread Nick Piggin

Rohit Seth wrote:
 


-Original Message-
From: Hugh Dickins [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 27, 2007 10:20 PM

To: Nick Piggin
Cc: [EMAIL PROTECTED]; Mike Stroyan; Andrew Morton; Luck, Tony;
[EMAIL PROTECTED]; linux-kernel@vger.kernel.org
Subject: Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

On Sat, 28 Apr 2007, Nick Piggin wrote:

OIC, you need a virtual address to evict the icache, so you can't 
flush at flush_dcache time? Or does ia64 have an instruction to flush 
the whole icache? (it would be worth testing, to see how much 
performance suffers).



IIRC, there is a PAL call to flush the whole cache (but that is quite a
heavy call).  Though you really don't need to be doing this.


I'm puzzled by that remark: the ia64 flush_icache_range always has a 
virtual address, it uses the kernel virtual address; it takes 
no interest in whether there's a user virtual address.



Caches on Itanium are physical.  So, it doesn't matter what virtual address
you use to flush a cache line, cache line containing specific physical
memory will be flushed. 


Really? I was under the vague impression that L1 i/d caches were virtual
and required this flushing... but I guess so long as the ISA says that
fc/fc.i flushes all caches corresponding to the physical address of the
provided virtual address, then that's what matters.


For the cases where you have virtual caches,
update_mmu_cache is the API to use.


But it happens after the pte is installed.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-05-01 Thread Nick Piggin

Rohit Seth wrote:
 


-Original Message-
From: Nick Piggin [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 27, 2007 7:00 PM

To: [EMAIL PROTECTED]
Cc: Mike Stroyan; Andrew Morton; Hugh Dickins; Luck, Tony;
[EMAIL PROTECTED]; linux-kernel@vger.kernel.org
Subject: Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

Rohit Seth wrote:



You mean by user space? If so, then it is user space responsibility to 
do the appropriate operations (like flush icache in this case).



No, I mean places that set PG_arch_1. flush_dcache_page. This can happen 
for mapped pages in write, splice, install_arg_page looks questionable,


direct IO...


If a user is requesting kernel to do (for example) write on a page that is
already mapped with execute and write permissions then it should be treated
as if the user space is doing modifications to that page.  There is no
change in protections so lazy_prot_mmu_update shouldn't be called even
though PG_arch_1 is (I think) set.  Does it answer your concern?


I'm not sure that I would agree. For direct modifications of memory via
a passed in user virtual address, perhaps. For operations on pagecache,
we may not even have a handle to issue the flush cache instruction on (ie.
a user virtual address), let alone know whether anyone else is mapping
the page.

What if you were to say remove all the PG_arch_1 code, and do 
something really simple like flush icache in 
flush_dcache_page? Would performance suffer horribly?



On Itanium, I think it will have some performance penalty (horrible or not I
don't know) as you will be invalidating the caches more often.  And they
alsways look for last 0.1% performance that they can get.


Sure, but if we _only_ flushed when page_mapcount was raised, then we
should have most of the benefits of lazy flushing, and the main places
were we do extra flushes are those where aliases could potentially occur
under the old scheme.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-28 Thread Rohit Seth
Hi Nick, 

-Original Message-
From: Nick Piggin [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 27, 2007 11:03 PM
To: Hugh Dickins
Cc: [EMAIL PROTECTED]; Mike Stroyan; Andrew Morton; Luck, Tony;
[EMAIL PROTECTED]; linux-kernel@vger.kernel.org
Subject: Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

Hugh Dickins wrote:
> On Sat, 28 Apr 2007, Nick Piggin wrote:
> 
>>OIC, you need a virtual address to evict the icache, so you can't 
>>flush at flush_dcache time? Or does ia64 have an instruction to flush 
>>the whole icache? (it would be worth testing, to see how much 
>>performance suffers).
> 
> 
> I'm puzzled by that remark: the ia64 flush_icache_range always has a 
> virtual address, it uses the kernel virtual address; it takes no 
> interest in whether there's a user virtual address.

>I _think_ what it is doing is actually flushing dcache lines dirtied 
>via the kernel virtual address (yes, I think flush_icache
> in lazy_mmu_prot_update is actually just flushing the dcache, but 
>I could be wrong? [*]).

It is invalidating any entries (containing same physical address) in both I
and D caches.  Any dirty lines in D cache are written back to memory before
getting invalidated (ofcourse).

> There are supposedly no icache lines at that point[**]:

For this bug to trigger there has to be a (stale) entry in icache containing
the old contents of a page that just got updated by kernel as explicit
copying of data (DMAs are coherent on ia64, meaning if a device were to
write to memory then architecture guarnatees that both I and D caches are
invalidated).

> the bug fix at the start of this thread 
>is due to icache lines becoming present before the flush.

For this mail thread, the old contents were not flushed out of icache when
the old mapping was removed.  And they were still present when the free page
got mapped again.

Cheers,
-rohit

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-28 Thread Rohit Seth
 Hi Hugh,

-Original Message-
From: Hugh Dickins [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 27, 2007 10:34 PM
To: Rohit Seth
Cc: Nick Piggin; Mike Stroyan; Andrew Morton; Luck, Tony;
[EMAIL PROTECTED]; linux-kernel@vger.kernel.org
Subject: Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

On Fri, 27 Apr 2007, Rohit Seth wrote:

> lazy_mmu_prot_update was added specifically for notifying change in 
> protection.  So, in a way it is closer to update_mmu_cache (Which is 
> for change in mappings itself).  Though for ia64 implementation, this 
> ends up flushing the icaches when needed.

>The ia64 implementation is the only one which has any use for it, 

Even Itanium didn't need it for almost 5 years :)  Though I think archs that
have incoherent I & D caches could be (theoritically) exposed to same
(original) mprotect code path bug that triggered this API.

>and 
>it's only interested when it's executable i.e. "lazy_mmu_prot_update"
>is a name concealing some overdesign.

You are right that ia64 is only interested in whne the execute permissions
kick in (and FWIW ia64 used to use update_mmu_cache API to do what it is now
doing lazy_mmu_prot_update).  Though the idea was to design an API that any
arch can use to know when ever there is change in protections on a mapping.

Cheers,
-rohit


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-28 Thread Rohit Seth
 

-Original Message-
From: Hugh Dickins [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 27, 2007 10:20 PM
To: Nick Piggin
Cc: [EMAIL PROTECTED]; Mike Stroyan; Andrew Morton; Luck, Tony;
[EMAIL PROTECTED]; linux-kernel@vger.kernel.org
Subject: Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

On Sat, 28 Apr 2007, Nick Piggin wrote:
> 
> OIC, you need a virtual address to evict the icache, so you can't 
> flush at flush_dcache time? Or does ia64 have an instruction to flush 
> the whole icache? (it would be worth testing, to see how much 
> performance suffers).

IIRC, there is a PAL call to flush the whole cache (but that is quite a
heavy call).  Though you really don't need to be doing this.

>I'm puzzled by that remark: the ia64 flush_icache_range always has a 
>virtual address, it uses the kernel virtual address; it takes 
>no interest in whether there's a user virtual address.

Caches on Itanium are physical.  So, it doesn't matter what virtual address
you use to flush a cache line, cache line containing specific physical
memory will be flushed.  For the cases where you have virtual caches,
update_mmu_cache is the API to use.

-rohit

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-28 Thread Rohit Seth
 

-Original Message-
From: Nick Piggin [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 27, 2007 7:00 PM
To: [EMAIL PROTECTED]
Cc: Mike Stroyan; Andrew Morton; Hugh Dickins; Luck, Tony;
[EMAIL PROTECTED]; linux-kernel@vger.kernel.org
Subject: Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

Rohit Seth wrote:
> 
> 
>> You mean by user space? If so, then it is user space responsibility to 
>> do the appropriate operations (like flush icache in this case).

>No, I mean places that set PG_arch_1. flush_dcache_page. This can happen 
>for mapped pages in write, splice, install_arg_page looks questionable,
direct IO...


If a user is requesting kernel to do (for example) write on a page that is
already mapped with execute and write permissions then it should be treated
as if the user space is doing modifications to that page.  There is no
change in protections so lazy_prot_mmu_update shouldn't be called even
though PG_arch_1 is (I think) set.  Does it answer your concern?

>What if you were to say remove all the PG_arch_1 code, and do 
>something really simple like flush icache in 
>flush_dcache_page? Would performance suffer horribly?

On Itanium, I think it will have some performance penalty (horrible or not I
don't know) as you will be invalidating the caches more often.  And they
alsways look for last 0.1% performance that they can get.

-rohit

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-27 Thread Nick Piggin

Hugh Dickins wrote:

On Sat, 28 Apr 2007, Nick Piggin wrote:


OIC, you need a virtual address to evict the icache, so you can't
flush at flush_dcache time? Or does ia64 have an instruction to
flush the whole icache? (it would be worth testing, to see how much
performance suffers).



I'm puzzled by that remark: the ia64 flush_icache_range always has
a virtual address, it uses the kernel virtual address; it takes no
interest in whether there's a user virtual address.


I _think_ what it is doing is actually flushing dcache lines dirtied
via the kernel virtual address (yes, I think flush_icache in
lazy_mmu_prot_update is actually just flushing the dcache, but I could
be wrong? [*]). There are supposedly no icache lines at that point[**]:
the bug fix at the start of this thread is due to icache lines
becoming present before the flush.

[*] and if I'm right, that means that icache fills don't see dirty
dcache, but rather always load from memory? I hope someone can
answer these questions?

[**] except due to _existing_ executable mappings, of course, but they
seem to have been comprehensively disregarded...

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-27 Thread Hugh Dickins
On Fri, 27 Apr 2007, Rohit Seth wrote:
> On Fri, 2007-04-27 at 15:18 +0100, Hugh Dickins wrote:
> 
> Right.  Extra flush_icache_page routines will add cost to archs that
> have non-null definition of this routine.  BTW, isn't flush_icache_page
> marked for deprecation?

Yes, flush_icache_page is marked for deprecation: but that's hardly
a reason to add another under a different name!  (Not quite what you
did, but...)

> lazy_mmu_prot_update was added specifically for notifying change in
> protection.  So, in a way it is closer to update_mmu_cache (Which is for
> change in mappings itself).  Though for ia64 implementation, this ends
> up flushing the icaches when needed.

The ia64 implementation is the only one which has any use for it, and
it's only interested when it's executable i.e. "lazy_mmu_prot_update"
is a name concealing some overdesign.

> Hopefully my reply is useful.

Yes, thanks Rohit, and I'll want to read through it again later.
In particular, I've now a better idea what's "lazy" about it.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-27 Thread Hugh Dickins
On Sat, 28 Apr 2007, Nick Piggin wrote:
> 
> OIC, you need a virtual address to evict the icache, so you can't
> flush at flush_dcache time? Or does ia64 have an instruction to
> flush the whole icache? (it would be worth testing, to see how much
> performance suffers).

I'm puzzled by that remark: the ia64 flush_icache_range always has
a virtual address, it uses the kernel virtual address; it takes no
interest in whether there's a user virtual address.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-27 Thread Nick Piggin

Nick Piggin wrote:

Rohit Seth wrote:



You mean by user space? If so, then it is user space responsibility to
do the appropriate operations (like flush icache in this case).



No, I mean places that set PG_arch_1. flush_dcache_page. This can
happen for mapped pages in write, splice, install_arg_page looks
questionable, direct IO...


Oh, and also ptrace! I think I was almost fooled by that attempt to flush
the cache in copy_to_user_page.

But that also fails if you map the underlying page with multiple virtual
addresses (or processes, if the icache is not flushed on ctxsw), because
those others won't have their caches flushed, right?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-27 Thread Nick Piggin

Nick Piggin wrote:


What if you were to say remove all the PG_arch_1 code, and do something
really simple like flush icache in flush_dcache_page? Would performance
suffer horribly?


OIC, you need a virtual address to evict the icache, so you can't
flush at flush_dcache time? Or does ia64 have an instruction to
flush the whole icache? (it would be worth testing, to see how much
performance suffers).

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-27 Thread Nick Piggin

Hugh Dickins wrote:

On Fri, 27 Apr 2007, Nick Piggin wrote:


But that's because of ia64's cache coherency implementation. I don't really
follow the documentation to know whether it should be one way or the other,
but surely it should be done either before or after the set_pte_at, not both.

Anyway, how about fremap or mprotect, for example?
... 


OK, I'm still not sure that I understand why lazy_mmu_prot_update should be
used rather than flush_icache_page (in concept, not ia64 implementation).
Sure, flush_icache_page isn't given the pte, but let's assume we can change
that.



You're asking lots of good questions.  I wish the ia64 people would
know the answers, but from the length of time the "lazy_mmu_prot_update"
stuff took to get into the tree, and the length of time it's taken to be
found defective, I suspect they don't, and we'll have to guess for them.

Some guesses I'm working with...

I presume Mike and Anil are correct, that it needs to be done before
putting pte into page table, not left until after: but as you've
guessed, that needs to be done everywhere, not just in the two
places so far identified.

When it was discussed last year (in connection with Peter's page
cleaning patches) it was thought to be a variant of update_mmu_cache()
(after setting pte), and we added the fremap one to accompany it;
but now it looks to be a variant of flush_icache_page() (before
setting pte).


Right. I think.



I believe lazy_mmu_prot_update(pteval) came into existence primarily
for mprotect's change_pte_range() case.  If ia64 filled in its
flush_icache_page(vma, page), that could have been used there
(checking 'vm_flags & VM_EXEC' instead of pte_exec): but that would
involve a relatively expensive(?) pte_page() in a place which doesn't
need to know the struct page for other cases.


Well, I think we could always add a pte argument to flush_icache_page...
Then, there might be logic to have a flush_lazy_icache_page when
changing protections, but that operation (currently called
lazy_mmu_prot_update) really doesn't seem like it should be called in
all the other places that it is, flush_icache_page should be used for
that.

But AFAIKS, if we really want correctness, flush_icache_page should go
away and be implemented in flush_dcache_page.



Well, not pte_page(), it needs to be vm_normal_page() doesn't it?
and ia64's current lazy_mmu_prot_update is unsafe when !pfn_valid.

Some flush_icache_pages are already in place, others are not: do
we need to add some?  But those architectures which have a non-empty
flush_icache_page seem to have survived without the additional calls
- so they might be unnecessarily slowed down by additional calls.


Well flush_icache seems to be intended solely to bring icache in sync
with dcache modifications, but they try to skimp out on most of the
flushes required to handle dcache aliases... but really, I don't think
that is possible to do 100% correctly.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-27 Thread Nick Piggin

Rohit Seth wrote:

On Fri, 2007-04-27 at 21:55 +1000, Nick Piggin wrote:



That's the theory. However, I'd still like to know how the arch code can
make the assertion that icache is known to be at all times other than at
the time of a fault?




Kernel needs to only worry about the updates that it does.  So, if
kernel is writing into a page that is getting marked with execute
permission then it will need to make sure that caches are coherent.
ia64 Kernel keeps track of whether it has done any write operation on a
page or not using PG_arch_1.  And accordingly flushes icaches.


It flushes icache at fault time, I know. What I don't know is why we
leave them to drift out of sync afterwards.



Ie. what if an operation which causes incoherency is carried out _after_
an executable mapping is installed for that page.




You mean by user space? If so, then it is user space responsibility to
do the appropriate operations (like flush icache in this case).


No, I mean places that set PG_arch_1. flush_dcache_page. This can
happen for mapped pages in write, splice, install_arg_page looks
questionable, direct IO...

Actually there are various windows where mapped pages can be !uptodate,
so there is technically most of the filesystem code as well, but I'm
trying to stamp those out, so let's ignore that for now.

What if you were to say remove all the PG_arch_1 code, and do something
really simple like flush icache in flush_dcache_page? Would performance
suffer horribly?

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-27 Thread Rohit Seth
On Fri, 2007-04-27 at 15:18 +0100, Hugh Dickins wrote:
> I presume Mike and Anil are correct, that it needs to be done before
> putting pte into page table, not left until after: but as you've
> guessed, that needs to be done everywhere, not just in the two
> places so far identified.
> 

That sounds about right.  Before installing new mapping, kernel should
ensure there are no stale contents in caches or TLB.
lazy_mmu_prot_update needs to be called whenever the permissions on pte
(about to) change.  So if remapping is causing change in protection then
lazy_mmu_prot_update needs to be called.  

> When it was discussed last year (in connection with Peter's page
> cleaning patches) it was thought to be a variant of update_mmu_cache()
> (after setting pte), and we added the fremap one to accompany it;
> but now it looks to be a variant of flush_icache_page() (before
> setting pte).
> 
> I believe lazy_mmu_prot_update(pteval) came into existence primarily
> for mprotect's change_pte_range() case.

Yup.

>   If ia64 filled in its
> flush_icache_page(vma, page), that could have been used there
> (checking 'vm_flags & VM_EXEC' instead of pte_exec): but that would
> involve a relatively expensive(?) pte_page() in a place which doesn't
> need to know the struct page for other cases.
> 
> Well, not pte_page(), it needs to be vm_normal_page() doesn't it?
> and ia64's current lazy_mmu_prot_update is unsafe when !pfn_valid.
> 
> Some flush_icache_pages are already in place, others are not: do
> we need to add some?  But those architectures which have a non-empty
> flush_icache_page seem to have survived without the additional calls
> - so they might be unnecessarily slowed down by additional calls.
> 

Right.  Extra flush_icache_page routines will add cost to archs that
have non-null definition of this routine.  BTW, isn't flush_icache_page
marked for deprecation?

> I believe that was the secondary reason for lazy_mmu_prot_update(),
> perhaps better called ia64_flush_icache_page(): to allow calls to
> be added where ia64 was (mistakenly) thought to want them, without
> needing a protracted audit of how other architectures might be
> impacted.
> 

lazy_mmu_prot_update was added specifically for notifying change in
protection.  So, in a way it is closer to update_mmu_cache (Which is for
change in mappings itself).  Though for ia64 implementation, this ends
up flushing the icaches when needed.

Hopefully my reply is useful.

-rohit

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-27 Thread Rohit Seth
On Fri, 2007-04-27 at 21:55 +1000, Nick Piggin wrote:

> That's the theory. However, I'd still like to know how the arch code can
> make the assertion that icache is known to be at all times other than at
> the time of a fault?
> 

Kernel needs to only worry about the updates that it does.  So, if
kernel is writing into a page that is getting marked with execute
permission then it will need to make sure that caches are coherent.
ia64 Kernel keeps track of whether it has done any write operation on a
page or not using PG_arch_1.  And accordingly flushes icaches.

> Ie. what if an operation which causes incoherency is carried out _after_
> an executable mapping is installed for that page.
> 

You mean by user space? If so, then it is user space responsibility to
do the appropriate operations (like flush icache in this case).

-rohit

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-27 Thread David Mosberger-Tang

My book has a fairly detailed discussion of how these operations were
supposed to work and what the reasoning behind them was.
Unfortunately, I don't have time to really participate this discussion
at the moment, but I hope somebody else has access to the book and
would (re-)read it for some background (not to claim that it got
everything right 100% but to ensure that earlier mistakes are not
repeated...).

 --david

On 4/27/07, Hugh Dickins <[EMAIL PROTECTED]> wrote:

On Fri, 27 Apr 2007, Nick Piggin wrote:
>
> But that's because of ia64's cache coherency implementation. I don't really
> follow the documentation to know whether it should be one way or the other,
> but surely it should be done either before or after the set_pte_at, not both.
>
> Anyway, how about fremap or mprotect, for example?
> ...
>
> OK, I'm still not sure that I understand why lazy_mmu_prot_update should be
> used rather than flush_icache_page (in concept, not ia64 implementation).
> Sure, flush_icache_page isn't given the pte, but let's assume we can change
> that.

You're asking lots of good questions.  I wish the ia64 people would
know the answers, but from the length of time the "lazy_mmu_prot_update"
stuff took to get into the tree, and the length of time it's taken to be
found defective, I suspect they don't, and we'll have to guess for them.

Some guesses I'm working with...

I presume Mike and Anil are correct, that it needs to be done before
putting pte into page table, not left until after: but as you've
guessed, that needs to be done everywhere, not just in the two
places so far identified.

When it was discussed last year (in connection with Peter's page
cleaning patches) it was thought to be a variant of update_mmu_cache()
(after setting pte), and we added the fremap one to accompany it;
but now it looks to be a variant of flush_icache_page() (before
setting pte).

I believe lazy_mmu_prot_update(pteval) came into existence primarily
for mprotect's change_pte_range() case.  If ia64 filled in its
flush_icache_page(vma, page), that could have been used there
(checking 'vm_flags & VM_EXEC' instead of pte_exec): but that would
involve a relatively expensive(?) pte_page() in a place which doesn't
need to know the struct page for other cases.

Well, not pte_page(), it needs to be vm_normal_page() doesn't it?
and ia64's current lazy_mmu_prot_update is unsafe when !pfn_valid.

Some flush_icache_pages are already in place, others are not: do
we need to add some?  But those architectures which have a non-empty
flush_icache_page seem to have survived without the additional calls
- so they might be unnecessarily slowed down by additional calls.

I believe that was the secondary reason for lazy_mmu_prot_update(),
perhaps better called ia64_flush_icache_page(): to allow calls to
be added where ia64 was (mistakenly) thought to want them, without
needing a protracted audit of how other architectures might be
impacted.

But I'm still trying to make sense of it.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
Mosberger Consulting LLC, http://www.mosberger-consulting.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-27 Thread Hugh Dickins
On Fri, 27 Apr 2007, Nick Piggin wrote:
> 
> But that's because of ia64's cache coherency implementation. I don't really
> follow the documentation to know whether it should be one way or the other,
> but surely it should be done either before or after the set_pte_at, not both.
> 
> Anyway, how about fremap or mprotect, for example?
> ... 
> 
> OK, I'm still not sure that I understand why lazy_mmu_prot_update should be
> used rather than flush_icache_page (in concept, not ia64 implementation).
> Sure, flush_icache_page isn't given the pte, but let's assume we can change
> that.

You're asking lots of good questions.  I wish the ia64 people would
know the answers, but from the length of time the "lazy_mmu_prot_update"
stuff took to get into the tree, and the length of time it's taken to be
found defective, I suspect they don't, and we'll have to guess for them.

Some guesses I'm working with...

I presume Mike and Anil are correct, that it needs to be done before
putting pte into page table, not left until after: but as you've
guessed, that needs to be done everywhere, not just in the two
places so far identified.

When it was discussed last year (in connection with Peter's page
cleaning patches) it was thought to be a variant of update_mmu_cache()
(after setting pte), and we added the fremap one to accompany it;
but now it looks to be a variant of flush_icache_page() (before
setting pte).

I believe lazy_mmu_prot_update(pteval) came into existence primarily
for mprotect's change_pte_range() case.  If ia64 filled in its
flush_icache_page(vma, page), that could have been used there
(checking 'vm_flags & VM_EXEC' instead of pte_exec): but that would
involve a relatively expensive(?) pte_page() in a place which doesn't
need to know the struct page for other cases.

Well, not pte_page(), it needs to be vm_normal_page() doesn't it?
and ia64's current lazy_mmu_prot_update is unsafe when !pfn_valid.

Some flush_icache_pages are already in place, others are not: do
we need to add some?  But those architectures which have a non-empty
flush_icache_page seem to have survived without the additional calls
- so they might be unnecessarily slowed down by additional calls.

I believe that was the secondary reason for lazy_mmu_prot_update(),
perhaps better called ia64_flush_icache_page(): to allow calls to
be added where ia64 was (mistakenly) thought to want them, without
needing a protracted audit of how other architectures might be
impacted.

But I'm still trying to make sense of it.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-27 Thread Nick Piggin

Mike Stroyan wrote:

On Thu, Apr 26, 2007 at 05:53:49PM +1000, Nick Piggin wrote:


I had a couple of questions which I'm hoping someone would be kind
enough to explain :)


...


I wonder how this is different to all the other code which calls
lazy_mmu_prot_update() after set_pte_at(). do_swap_page, for example,
_could_ fault in executable code, couldn't it?



  The do_swap_page code does look suspect.  It seems to be working on
ia64 because a DMA transfer of data from swap to the allocated page
is removing old lines from the icache.  If code on an anonymous page
was swapping in without direct DMA to the page then the same problem
could occur.  I can't think of a reasonable situation that would cause
swapping in to not use DMA.  Swapping to/from NFS does not seem reasonable
to me anyway.


But that's because of ia64's cache coherency implementation. I don't really
follow the documentation to know whether it should be one way or the other,
but surely it should be done either before or after the set_pte_at, not both.

Anyway, how about fremap or mprotect, for example?


It is because do_swap_page uses flush_icache_page()? So why doesn't
the flush_icache_page() work in do_no_page as well? (It seems to look
like a superset of lazy_mmu_prot_update on ia64?!?).



flush_icache_page() on ia64 is provided by include/asm-ia64/cacheflush.h.
It doesn't have any effect at all.

#define flush_icache_page(vma,page) do { } while (0)


You're right, sorry I was mistakenly looking at the flush_range. That was
my only immediate concern with your patch... So, for finding and fixing
that bug you have my admiration :)

OK, I'm still not sure that I understand why lazy_mmu_prot_update should be
used rather than flush_icache_page (in concept, not ia64 implementation).
Sure, flush_icache_page isn't given the pte, but let's assume we can change
that.


  lazy_mmu_prot_update() is supposed to get icache flushes done when they
need to be.  And it is supposed to avoid unneeded flushes when the icache
is known to be clean for a page.


That's the theory. However, I'd still like to know how the arch code can
make the assertion that icache is known to be at all times other than at
the time of a fault?

Ie. what if an operation which causes incoherency is carried out _after_
an executable mapping is installed for that page.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-26 Thread Mike Stroyan
On Thu, Apr 26, 2007 at 05:53:49PM +1000, Nick Piggin wrote:
> I had a couple of questions which I'm hoping someone would be kind
> enough to explain :)
...
> I wonder how this is different to all the other code which calls
> lazy_mmu_prot_update() after set_pte_at(). do_swap_page, for example,
> _could_ fault in executable code, couldn't it?

  The do_swap_page code does look suspect.  It seems to be working on
ia64 because a DMA transfer of data from swap to the allocated page
is removing old lines from the icache.  If code on an anonymous page
was swapping in without direct DMA to the page then the same problem
could occur.  I can't think of a reasonable situation that would cause
swapping in to not use DMA.  Swapping to/from NFS does not seem reasonable
to me anyway.

> It is because do_swap_page uses flush_icache_page()? So why doesn't
> the flush_icache_page() work in do_no_page as well? (It seems to look
> like a superset of lazy_mmu_prot_update on ia64?!?).

flush_icache_page() on ia64 is provided by include/asm-ia64/cacheflush.h.
It doesn't have any effect at all.

#define flush_icache_page(vma,page) do { } while (0)

  lazy_mmu_prot_update() is supposed to get icache flushes done when they
need to be.  And it is supposed to avoid unneeded flushes when the icache
is known to be clean for a page.

-- 
Mike Stroyan, [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path

2007-04-26 Thread Nick Piggin

Hi,

I had a couple of questions which I'm hoping someone would be kind
enough to explain :)

Andrew Morton wrote:

guys, aplication crashes on million-dollar machines aren't nice.  Please review 
carefully
and urgently?


Begin forwarded message:

Date: Wed, 25 Apr 2007 18:16:15 -0600
From: Mike Stroyan <[EMAIL PROTECTED]>
To: "Luck, Tony" <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED], linux-kernel@vger.kernel.org
Subject: [PATCH] ia64: race flushing icache in do_no_page path


  This is a very similar problem to a copy-on-write cache flushing problem
that Tony Luck fixed in July 2006.  In this case the do_no_page function
handles a fault in an executable or library that is mmapped from an
NFS file system.  The code is copied into a newly reallocated page.
The lazy_mmu_prot_update() function should be used to flush old entries
from the icache for that page on ia64 processors.  But that call is made
after a set_pte_at call that makes the page accessible to other threads
executing the same code.  This was seen to cause application crashes
when an OpenMP application ran many threads calling same functions at
the same time.  The first thread to reach a page starts to fault in the
new code.  One of the other threads overtakes the first and executes old
data from the icache.  That could result in bad instructions.  It is more
obvious when an old cache line contains prefetched non-instruction bits
that result in an illegal instruction trap.


I wonder how this is different to all the other code which calls
lazy_mmu_prot_update() after set_pte_at(). do_swap_page, for example,
_could_ fault in executable code, couldn't it?

It is because do_swap_page uses flush_icache_page()? So why doesn't
the flush_icache_page() work in do_no_page as well? (It seems to look
like a superset of lazy_mmu_prot_update on ia64?!?).

And while we're looking at flush_icache_page, why is there none in
do_wp_page (I admit, I'm not really up to scratch on d/i cache aliasing
handling, but cachetlb.txt seems to suggest that cow_user_page fits the
description). That is, if we're already trying to cover our butts wrt
SMC, then do_wp_page _could_ be cow'ing executable code, couldn't it?

And for that matter, I admit I don't understand how the icache flushing
can be done lazily, only at change-protection time. Why is any
flush_dcache_page() site not a problem for an _existing_ executable pte
wrt d/i cache aliases?

BTW. while I'm ranting, I hope all this stuff has gone so complex for a
reason, and that being that the alternative simpler approach of more
flushes, less lazy, less complex, less buggy was tested and found to be
noticably slower... :)





  The problem has only been seen on montecito processors which have
separate level 2 icache and dcache.  This dcache to icache coherency
problem is more likely to occur there because of the much larger level
2 icache.  I suspect that the non-NFS case is working because direct
DMA into the new page is making the instruction cache coherent.  Any
file system that uses a non-DMA copy into the text page could show the
same problem.

Signed-off-by: Mike Stroyan <[EMAIL PROTECTED]>

diff --git a/mm/memory.c b/mm/memory.c
index e7066e7..50c8848 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2291,6 +2291,7 @@ retry:
entry = mk_pte(new_page, vma->vm_page_prot);
if (write_access)
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+   lazy_mmu_prot_update(entry);
set_pte_at(mm, address, page_table, entry);
if (anon) {
inc_mm_counter(mm, anon_rss);
@@ -2312,7 +2313,6 @@ retry:
 
 	/* no need to invalidate: a not-present page shouldn't be cached */

update_mmu_cache(vma, address, entry);
-   lazy_mmu_prot_update(entry);
 unlock:
pte_unmap_unlock(page_table, ptl);
if (dirty_page) {




--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/