Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Thu, Jul 05, 2007 at 10:57:00AM +0200, Zoltan Menyhart wrote: > KAMEZAWA Hiroyuki wrote: > >In our test, we confirmed that this can be fixed by flushing L2I just > >before SetPageUptodate() in NFS. > > I can agree. > We can be more permissive: it can be done anywhere after the new > data is put in place and before nfs_readpage() or nfs_readpages() > returns. > > I saw your patch http://marc.info/?l=linux-mm&m=118352909826277&w=2 > that modifies e.g. mm/memory.c and not the NFS layer. > > Have you proposed a patch against the NFS layer? This really doesn't look like a job for the file system layer. That would require all sorts of file system readpage routines to be modified to handle memory management details that are already handled by the memory.c functions. The do_no_page code is already dealing with the necessary icache flushing operations. It just happens to be doing it with a bad race condition for ia64. -- Mike Stroyan <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
KAMEZAWA Hiroyuki wrote: On Wed, 04 Jul 2007 16:24:38 +0200 Zoltan Menyhart <[EMAIL PROTECTED]> wrote: Machines star up whit bit 5 = 0, reading instruction pages via NFS has to flush them from L2I. In our test, we confirmed that this can be fixed by flushing L2I just before SetPageUptodate() in NFS. I can agree. We can be more permissive: it can be done anywhere after the new data is put in place and before nfs_readpage() or nfs_readpages() returns. I saw your patch http://marc.info/?l=linux-mm&m=118352909826277&w=2 that modifies e.g. mm/memory.c and not the NFS layer. Have you proposed a patch against the NFS layer? I was wondering if instead of modifying do_no_page() and Co., should not we make nfs_readpage() be DMA-like? (No possible regression for most of the page I/O-s.) I.e. it should be the responsibility of a file system to make sure it supports instruction pages correctly. The base kernel should provide such file systems with an architecture dependent macro... IMHO, for example, race in cooy-on-write (was fixed by Tony Luck) has to be fixed by MemoryManagement layer. I can agree. Note that it has not got much performance impact from our point of view because there are not too many COW paves with executable code... And only a race in do_no_page() seems to be able to be fixed by FS layer. If it were do_no_page() that would fix the problem, then it should know where the page comes from in order not to flush the I-cache in vain. do_no_page() just calls vma->vm_ops->nopage() that goes down to fs_op->readpage() / fs_op->readpages(). On the other hand, these latter routines do not know why they fetch the page(s). Note that a page can be mapped more than one times, with different permission bits. As a consequence, these routines will flush the I-cache in vain in most of the cases. I prefer a performance impact to some non DMA based file systems and adding nothing to the path for the majority of the cases. BTW, can we know whether a page is filled by DMA or not ? Well, the file systems based on block devices use DMA transfer (with the exception of using bounce buffers, in that case it is the responsibility of the bounce buffering layer to flush the I-cache). Network based file systems require revision and update... Note that only the processors with separate L2I require attention, the L1I is guaranteed to be flushed when you change the corresponding TBL entry. The base kernel should provide a macro / service (that cares for the processor model) for the file systems. Thanks, Zoltan Menyhart - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Wed, 04 Jul 2007 16:24:38 +0200 Zoltan Menyhart <[EMAIL PROTECTED]> wrote: > Machines star up whit bit 5 = 0, reading instruction pages via > NFS has to flush them from L2I. > In our test, we confirmed that this can be fixed by flushing L2I just before SetPageUptodate() in NFS. > > I was wondering if instead of modifying do_no_page() and Co., should > not we make nfs_readpage() be DMA-like? > (No possible regression for most of the page I/O-s.) > I.e. it should be the responsibility of a file system to make sure it > supports instruction pages correctly. The base kernel should provide > such file systems with an architecture dependent macro... > IMHO, for example, race in cooy-on-write (was fixed by Tony Luck) has to be fixed by MemoryManagement layer. And only a race in do_no_page() seems to be able to be fixed by FS layer. BTW, can we know whether a page is filled by DMA or not ? Thanks, -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Could you please confirm that I understand correctly what is in the: Dual-Core Update to the Intel Itanium 2 Processor Reference Manual... "2.3.3.2 L2 Caches ... Any coherence request to identify whether a cache line is in the processor will invalidate that line from the L2I cache." This makes sure that the DMAs invalidate the L2L cache. "2.7.4 Instruction Cache Coherence Optimization Coherence requests of the L1I and L2I caches will invalidate the line if it is in the cache. Montecito allows instruction requests on the system interface to be filtered such that they will not initiate coherence requests of the L1I and L2I caches. This will allow instructions to be cached at the L1I and L2I levels across multiple processors in a coherent domain. This optimization is enabled by default, but may be disabled by PAL_SET_PROC_FEATURES bit 5 of the Montecito feature_set (18)." Machines star up whit bit 5 = 0, reading instruction pages via NFS has to flush them from L2I. I was wondering if instead of modifying do_no_page() and Co., should not we make nfs_readpage() be DMA-like? (No possible regression for most of the page I/O-s.) I.e. it should be the responsibility of a file system to make sure it supports instruction pages correctly. The base kernel should provide such file systems with an architecture dependent macro... Thanks, Zoltan Menyhart - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Tue, May 01, 2007 at 09:43:29PM +1000, Nick Piggin wrote: > Rohit Seth wrote: ... > >Caches on Itanium are physical. So, it doesn't matter what virtual address > >you use to flush a cache line, cache line containing specific physical > >memory will be flushed. > > Really? I was under the vague impression that L1 i/d caches were virtual > and required this flushing... but I guess so long as the ISA says that > fc/fc.i flushes all caches corresponding to the physical address of the > provided virtual address, then that's what matters. The L1 caches on Itanium have interesting behavior. The cache lines are indexed by virtual address. But those L1 cache lines are invalidated whenever their corresponding L1 TLB entry is evicted or replaced. That means that old L1 icache lines will be invalidated by TLB changes even before a fc.i instruction flushed those icache lines. That would help make old kernels work on pre-montecito processors without correctly ordered fc.i instructions. The L1 icache was flushed by TLB inserts and the L2 icache was unified. fc.i is also defined to make an address coherent between data and instruction caches at all levels of cache. That handles the update of L1 icache lines during a st,fc.i,sync.i,srlz.i sequence. I see these details in section 6.1.1 of "IntelĀ® ItaniumĀ® 2 Processor Reference Manual". But I haven't found them in a general Itanium Architecture reference. -- Mike Stroyan, [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Rohit Seth wrote: On Tue, 2007-05-01 at 21:52 +1000, Nick Piggin wrote: Rohit Seth wrote: and it's only interested when it's executable i.e. "lazy_mmu_prot_update" is a name concealing some overdesign. You are right that ia64 is only interested in whne the execute permissions kick in (and FWIW ia64 used to use update_mmu_cache API to do what it is now doing lazy_mmu_prot_update). Though the idea was to design an API that any arch can use to know when ever there is change in protections on a mapping. What I think what we should do is audit flush_icache_page coverage, and convert ia64 to use that (because it needs this to happen _before_ the pte is set). That doesn't address the underlying requirement that arch specific code should be told of change in protections. But you would add the flush_icache_page_chprot to address that. For ia64, you are right that it equates to flushing icache in some cases, but this API is more generic. And it is also broken, because it needs to be done before the set_pte for ia64. It needs to be done where flush_icache_page is done. And on ia64 it flushes the icache. So I don't understand why ia64 would not use flush_icache_page but add something completely new "because it is more generic". All we should need to do is add a pte argument to flush_icache, I'm sure this is doable. Though it is more of design issue of whether that is the right way to do it. I understand this extra API is difficult at this time because of one single consumer. There is just no point in adding something else if you already have a hook that seems to do what ia64 wants (or could, with a small amount of work). -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Rohit Seth wrote: On Tue, 2007-05-01 at 21:39 +1000, Nick Piggin wrote: Rohit Seth wrote: If a user is requesting kernel to do (for example) write on a page that is already mapped with execute and write permissions then it should be treated as if the user space is doing modifications to that page. There is no change in protections so lazy_prot_mmu_update shouldn't be called even though PG_arch_1 is (I think) set. Does it answer your concern? I'm not sure that I would agree. For direct modifications of memory via a passed in user virtual address, perhaps. For operations on pagecache, we may not even have a handle to issue the flush cache instruction on (ie. a user virtual address), let alone know whether anyone else is mapping the page. Can you please describe the page cache scenario in more detail? IMO, if a page is user mapped with at least one execute and write permission then the responsibility of update caches lies with user. What if a different user write(2)s the underlying page? What if you were to say remove all the PG_arch_1 code, and do something really simple like flush icache in flush_dcache_page? Would performance suffer horribly? On Itanium, I think it will have some performance penalty (horrible or not I don't know) as you will be invalidating the caches more often. And they alsways look for last 0.1% performance that they can get. Sure, but if we _only_ flushed when page_mapcount was raised, You will need this every time there is change in protection (e.g. mprotect) not only when page_mapcount is raised. Yeah, you would retain the flush on fault, I meant you would introduce a flush in flush_dcache_page for when mapcount is raised. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Tue, 2007-05-01 at 21:47 +1000, Nick Piggin wrote: > Rohit Seth wrote: > > > > > > It is invalidating any entries (containing same physical address) in both I > > and D caches. Any dirty lines in D cache are written back to memory before > > getting invalidated (ofcourse). > > OK. (should it be issuing both fc and fc.i to be robust in case a > new implementation doesn't flush the dcache with fc.i?) > For the Itanium case specifically, you only want to invalidate a stale icache line. Once that is done, next time icache will pick the correct updated contents. > > >>There are supposedly no icache lines at that point[**]: > > > > > > For this bug to trigger there has to be a (stale) entry in icache containing > > the old contents of a page that just got updated by kernel as explicit > > copying of data (DMAs are coherent on ia64, meaning if a device were to > > write to memory then architecture guarnatees that both I and D caches are > > invalidated). > > So if we have a dirty dcache line for a given physical address, > it will _always_ be the case that a subsequent icache load will > find that dirty data? yes for ia64. -rohit - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Tue, 2007-05-01 at 21:52 +1000, Nick Piggin wrote: > Rohit Seth wrote: > > >>and > >>it's only interested when it's executable i.e. "lazy_mmu_prot_update" > >>is a name concealing some overdesign. > > > > > > You are right that ia64 is only interested in whne the execute permissions > > kick in (and FWIW ia64 used to use update_mmu_cache API to do what it is now > > doing lazy_mmu_prot_update). Though the idea was to design an API that any > > arch can use to know when ever there is change in protections on a mapping. > > What I think what we should do is audit flush_icache_page coverage, and > convert ia64 to use that (because it needs this to happen _before_ the > pte is set). > That doesn't address the underlying requirement that arch specific code should be told of change in protections. For ia64, you are right that it equates to flushing icache in some cases, but this API is more generic. > All we should need to do is add a pte argument to flush_icache, I'm sure this is doable. Though it is more of design issue of whether that is the right way to do it. I understand this extra API is difficult at this time because of one single consumer. -rohit - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Tue, 2007-05-01 at 21:39 +1000, Nick Piggin wrote: > Rohit Seth wrote: > > > > If a user is requesting kernel to do (for example) write on a page that is > > already mapped with execute and write permissions then it should be treated > > as if the user space is doing modifications to that page. There is no > > change in protections so lazy_prot_mmu_update shouldn't be called even > > though PG_arch_1 is (I think) set. Does it answer your concern? > > I'm not sure that I would agree. For direct modifications of memory via > a passed in user virtual address, perhaps. For operations on pagecache, > we may not even have a handle to issue the flush cache instruction on (ie. > a user virtual address), let alone know whether anyone else is mapping > the page. > Can you please describe the page cache scenario in more detail? IMO, if a page is user mapped with at least one execute and write permission then the responsibility of update caches lies with user. > >>What if you were to say remove all the PG_arch_1 code, and do > >>something really simple like flush icache in > >>flush_dcache_page? Would performance suffer horribly? > > > > > > On Itanium, I think it will have some performance penalty (horrible or not I > > don't know) as you will be invalidating the caches more often. And they > > alsways look for last 0.1% performance that they can get. > > Sure, but if we _only_ flushed when page_mapcount was raised, You will need this every time there is change in protection (e.g. mprotect) not only when page_mapcount is raised. -rohit - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Rohit Seth wrote: and it's only interested when it's executable i.e. "lazy_mmu_prot_update" is a name concealing some overdesign. You are right that ia64 is only interested in whne the execute permissions kick in (and FWIW ia64 used to use update_mmu_cache API to do what it is now doing lazy_mmu_prot_update). Though the idea was to design an API that any arch can use to know when ever there is change in protections on a mapping. What I think what we should do is audit flush_icache_page coverage, and convert ia64 to use that (because it needs this to happen _before_ the pte is set). All we should need to do is add a pte argument to flush_icache, and it should be possible to do what ia64 wants, and we can remove lazy_mmu_prot_update (or at least rename it to something like flush_icache_page_chprot and move it to the normal flush_icache_page position above set_pte if not all architectures want their flush_icache_page called at protection change time). -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Rohit Seth wrote: Hi Nick, -Original Message- From: Nick Piggin [mailto:[EMAIL PROTECTED] Sent: Friday, April 27, 2007 11:03 PM To: Hugh Dickins Cc: [EMAIL PROTECTED]; Mike Stroyan; Andrew Morton; Luck, Tony; [EMAIL PROTECTED]; linux-kernel@vger.kernel.org Subject: Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path Hugh Dickins wrote: On Sat, 28 Apr 2007, Nick Piggin wrote: OIC, you need a virtual address to evict the icache, so you can't flush at flush_dcache time? Or does ia64 have an instruction to flush the whole icache? (it would be worth testing, to see how much performance suffers). I'm puzzled by that remark: the ia64 flush_icache_range always has a virtual address, it uses the kernel virtual address; it takes no interest in whether there's a user virtual address. I _think_ what it is doing is actually flushing dcache lines dirtied via the kernel virtual address (yes, I think flush_icache in lazy_mmu_prot_update is actually just flushing the dcache, but I could be wrong? [*]). It is invalidating any entries (containing same physical address) in both I and D caches. Any dirty lines in D cache are written back to memory before getting invalidated (ofcourse). OK. (should it be issuing both fc and fc.i to be robust in case a new implementation doesn't flush the dcache with fc.i?) There are supposedly no icache lines at that point[**]: For this bug to trigger there has to be a (stale) entry in icache containing the old contents of a page that just got updated by kernel as explicit copying of data (DMAs are coherent on ia64, meaning if a device were to write to memory then architecture guarnatees that both I and D caches are invalidated). So if we have a dirty dcache line for a given physical address, it will _always_ be the case that a subsequent icache load will find that dirty data? ... thanks for bearing with me ;) -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Rohit Seth wrote: -Original Message- From: Hugh Dickins [mailto:[EMAIL PROTECTED] Sent: Friday, April 27, 2007 10:20 PM To: Nick Piggin Cc: [EMAIL PROTECTED]; Mike Stroyan; Andrew Morton; Luck, Tony; [EMAIL PROTECTED]; linux-kernel@vger.kernel.org Subject: Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path On Sat, 28 Apr 2007, Nick Piggin wrote: OIC, you need a virtual address to evict the icache, so you can't flush at flush_dcache time? Or does ia64 have an instruction to flush the whole icache? (it would be worth testing, to see how much performance suffers). IIRC, there is a PAL call to flush the whole cache (but that is quite a heavy call). Though you really don't need to be doing this. I'm puzzled by that remark: the ia64 flush_icache_range always has a virtual address, it uses the kernel virtual address; it takes no interest in whether there's a user virtual address. Caches on Itanium are physical. So, it doesn't matter what virtual address you use to flush a cache line, cache line containing specific physical memory will be flushed. Really? I was under the vague impression that L1 i/d caches were virtual and required this flushing... but I guess so long as the ISA says that fc/fc.i flushes all caches corresponding to the physical address of the provided virtual address, then that's what matters. For the cases where you have virtual caches, update_mmu_cache is the API to use. But it happens after the pte is installed. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Rohit Seth wrote: -Original Message- From: Nick Piggin [mailto:[EMAIL PROTECTED] Sent: Friday, April 27, 2007 7:00 PM To: [EMAIL PROTECTED] Cc: Mike Stroyan; Andrew Morton; Hugh Dickins; Luck, Tony; [EMAIL PROTECTED]; linux-kernel@vger.kernel.org Subject: Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path Rohit Seth wrote: You mean by user space? If so, then it is user space responsibility to do the appropriate operations (like flush icache in this case). No, I mean places that set PG_arch_1. flush_dcache_page. This can happen for mapped pages in write, splice, install_arg_page looks questionable, direct IO... If a user is requesting kernel to do (for example) write on a page that is already mapped with execute and write permissions then it should be treated as if the user space is doing modifications to that page. There is no change in protections so lazy_prot_mmu_update shouldn't be called even though PG_arch_1 is (I think) set. Does it answer your concern? I'm not sure that I would agree. For direct modifications of memory via a passed in user virtual address, perhaps. For operations on pagecache, we may not even have a handle to issue the flush cache instruction on (ie. a user virtual address), let alone know whether anyone else is mapping the page. What if you were to say remove all the PG_arch_1 code, and do something really simple like flush icache in flush_dcache_page? Would performance suffer horribly? On Itanium, I think it will have some performance penalty (horrible or not I don't know) as you will be invalidating the caches more often. And they alsways look for last 0.1% performance that they can get. Sure, but if we _only_ flushed when page_mapcount was raised, then we should have most of the benefits of lazy flushing, and the main places were we do extra flushes are those where aliases could potentially occur under the old scheme. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Hi Nick, -Original Message- From: Nick Piggin [mailto:[EMAIL PROTECTED] Sent: Friday, April 27, 2007 11:03 PM To: Hugh Dickins Cc: [EMAIL PROTECTED]; Mike Stroyan; Andrew Morton; Luck, Tony; [EMAIL PROTECTED]; linux-kernel@vger.kernel.org Subject: Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path Hugh Dickins wrote: > On Sat, 28 Apr 2007, Nick Piggin wrote: > >>OIC, you need a virtual address to evict the icache, so you can't >>flush at flush_dcache time? Or does ia64 have an instruction to flush >>the whole icache? (it would be worth testing, to see how much >>performance suffers). > > > I'm puzzled by that remark: the ia64 flush_icache_range always has a > virtual address, it uses the kernel virtual address; it takes no > interest in whether there's a user virtual address. >I _think_ what it is doing is actually flushing dcache lines dirtied >via the kernel virtual address (yes, I think flush_icache > in lazy_mmu_prot_update is actually just flushing the dcache, but >I could be wrong? [*]). It is invalidating any entries (containing same physical address) in both I and D caches. Any dirty lines in D cache are written back to memory before getting invalidated (ofcourse). > There are supposedly no icache lines at that point[**]: For this bug to trigger there has to be a (stale) entry in icache containing the old contents of a page that just got updated by kernel as explicit copying of data (DMAs are coherent on ia64, meaning if a device were to write to memory then architecture guarnatees that both I and D caches are invalidated). > the bug fix at the start of this thread >is due to icache lines becoming present before the flush. For this mail thread, the old contents were not flushed out of icache when the old mapping was removed. And they were still present when the free page got mapped again. Cheers, -rohit - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Hi Hugh, -Original Message- From: Hugh Dickins [mailto:[EMAIL PROTECTED] Sent: Friday, April 27, 2007 10:34 PM To: Rohit Seth Cc: Nick Piggin; Mike Stroyan; Andrew Morton; Luck, Tony; [EMAIL PROTECTED]; linux-kernel@vger.kernel.org Subject: Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path On Fri, 27 Apr 2007, Rohit Seth wrote: > lazy_mmu_prot_update was added specifically for notifying change in > protection. So, in a way it is closer to update_mmu_cache (Which is > for change in mappings itself). Though for ia64 implementation, this > ends up flushing the icaches when needed. >The ia64 implementation is the only one which has any use for it, Even Itanium didn't need it for almost 5 years :) Though I think archs that have incoherent I & D caches could be (theoritically) exposed to same (original) mprotect code path bug that triggered this API. >and >it's only interested when it's executable i.e. "lazy_mmu_prot_update" >is a name concealing some overdesign. You are right that ia64 is only interested in whne the execute permissions kick in (and FWIW ia64 used to use update_mmu_cache API to do what it is now doing lazy_mmu_prot_update). Though the idea was to design an API that any arch can use to know when ever there is change in protections on a mapping. Cheers, -rohit - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Fw: [PATCH] ia64: race flushing icache in do_no_page path
-Original Message- From: Hugh Dickins [mailto:[EMAIL PROTECTED] Sent: Friday, April 27, 2007 10:20 PM To: Nick Piggin Cc: [EMAIL PROTECTED]; Mike Stroyan; Andrew Morton; Luck, Tony; [EMAIL PROTECTED]; linux-kernel@vger.kernel.org Subject: Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path On Sat, 28 Apr 2007, Nick Piggin wrote: > > OIC, you need a virtual address to evict the icache, so you can't > flush at flush_dcache time? Or does ia64 have an instruction to flush > the whole icache? (it would be worth testing, to see how much > performance suffers). IIRC, there is a PAL call to flush the whole cache (but that is quite a heavy call). Though you really don't need to be doing this. >I'm puzzled by that remark: the ia64 flush_icache_range always has a >virtual address, it uses the kernel virtual address; it takes >no interest in whether there's a user virtual address. Caches on Itanium are physical. So, it doesn't matter what virtual address you use to flush a cache line, cache line containing specific physical memory will be flushed. For the cases where you have virtual caches, update_mmu_cache is the API to use. -rohit - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Fw: [PATCH] ia64: race flushing icache in do_no_page path
-Original Message- From: Nick Piggin [mailto:[EMAIL PROTECTED] Sent: Friday, April 27, 2007 7:00 PM To: [EMAIL PROTECTED] Cc: Mike Stroyan; Andrew Morton; Hugh Dickins; Luck, Tony; [EMAIL PROTECTED]; linux-kernel@vger.kernel.org Subject: Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path Rohit Seth wrote: > > >> You mean by user space? If so, then it is user space responsibility to >> do the appropriate operations (like flush icache in this case). >No, I mean places that set PG_arch_1. flush_dcache_page. This can happen >for mapped pages in write, splice, install_arg_page looks questionable, direct IO... If a user is requesting kernel to do (for example) write on a page that is already mapped with execute and write permissions then it should be treated as if the user space is doing modifications to that page. There is no change in protections so lazy_prot_mmu_update shouldn't be called even though PG_arch_1 is (I think) set. Does it answer your concern? >What if you were to say remove all the PG_arch_1 code, and do >something really simple like flush icache in >flush_dcache_page? Would performance suffer horribly? On Itanium, I think it will have some performance penalty (horrible or not I don't know) as you will be invalidating the caches more often. And they alsways look for last 0.1% performance that they can get. -rohit - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Hugh Dickins wrote: On Sat, 28 Apr 2007, Nick Piggin wrote: OIC, you need a virtual address to evict the icache, so you can't flush at flush_dcache time? Or does ia64 have an instruction to flush the whole icache? (it would be worth testing, to see how much performance suffers). I'm puzzled by that remark: the ia64 flush_icache_range always has a virtual address, it uses the kernel virtual address; it takes no interest in whether there's a user virtual address. I _think_ what it is doing is actually flushing dcache lines dirtied via the kernel virtual address (yes, I think flush_icache in lazy_mmu_prot_update is actually just flushing the dcache, but I could be wrong? [*]). There are supposedly no icache lines at that point[**]: the bug fix at the start of this thread is due to icache lines becoming present before the flush. [*] and if I'm right, that means that icache fills don't see dirty dcache, but rather always load from memory? I hope someone can answer these questions? [**] except due to _existing_ executable mappings, of course, but they seem to have been comprehensively disregarded... -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Fri, 27 Apr 2007, Rohit Seth wrote: > On Fri, 2007-04-27 at 15:18 +0100, Hugh Dickins wrote: > > Right. Extra flush_icache_page routines will add cost to archs that > have non-null definition of this routine. BTW, isn't flush_icache_page > marked for deprecation? Yes, flush_icache_page is marked for deprecation: but that's hardly a reason to add another under a different name! (Not quite what you did, but...) > lazy_mmu_prot_update was added specifically for notifying change in > protection. So, in a way it is closer to update_mmu_cache (Which is for > change in mappings itself). Though for ia64 implementation, this ends > up flushing the icaches when needed. The ia64 implementation is the only one which has any use for it, and it's only interested when it's executable i.e. "lazy_mmu_prot_update" is a name concealing some overdesign. > Hopefully my reply is useful. Yes, thanks Rohit, and I'll want to read through it again later. In particular, I've now a better idea what's "lazy" about it. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Sat, 28 Apr 2007, Nick Piggin wrote: > > OIC, you need a virtual address to evict the icache, so you can't > flush at flush_dcache time? Or does ia64 have an instruction to > flush the whole icache? (it would be worth testing, to see how much > performance suffers). I'm puzzled by that remark: the ia64 flush_icache_range always has a virtual address, it uses the kernel virtual address; it takes no interest in whether there's a user virtual address. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Nick Piggin wrote: Rohit Seth wrote: You mean by user space? If so, then it is user space responsibility to do the appropriate operations (like flush icache in this case). No, I mean places that set PG_arch_1. flush_dcache_page. This can happen for mapped pages in write, splice, install_arg_page looks questionable, direct IO... Oh, and also ptrace! I think I was almost fooled by that attempt to flush the cache in copy_to_user_page. But that also fails if you map the underlying page with multiple virtual addresses (or processes, if the icache is not flushed on ctxsw), because those others won't have their caches flushed, right? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Nick Piggin wrote: What if you were to say remove all the PG_arch_1 code, and do something really simple like flush icache in flush_dcache_page? Would performance suffer horribly? OIC, you need a virtual address to evict the icache, so you can't flush at flush_dcache time? Or does ia64 have an instruction to flush the whole icache? (it would be worth testing, to see how much performance suffers). -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Hugh Dickins wrote: On Fri, 27 Apr 2007, Nick Piggin wrote: But that's because of ia64's cache coherency implementation. I don't really follow the documentation to know whether it should be one way or the other, but surely it should be done either before or after the set_pte_at, not both. Anyway, how about fremap or mprotect, for example? ... OK, I'm still not sure that I understand why lazy_mmu_prot_update should be used rather than flush_icache_page (in concept, not ia64 implementation). Sure, flush_icache_page isn't given the pte, but let's assume we can change that. You're asking lots of good questions. I wish the ia64 people would know the answers, but from the length of time the "lazy_mmu_prot_update" stuff took to get into the tree, and the length of time it's taken to be found defective, I suspect they don't, and we'll have to guess for them. Some guesses I'm working with... I presume Mike and Anil are correct, that it needs to be done before putting pte into page table, not left until after: but as you've guessed, that needs to be done everywhere, not just in the two places so far identified. When it was discussed last year (in connection with Peter's page cleaning patches) it was thought to be a variant of update_mmu_cache() (after setting pte), and we added the fremap one to accompany it; but now it looks to be a variant of flush_icache_page() (before setting pte). Right. I think. I believe lazy_mmu_prot_update(pteval) came into existence primarily for mprotect's change_pte_range() case. If ia64 filled in its flush_icache_page(vma, page), that could have been used there (checking 'vm_flags & VM_EXEC' instead of pte_exec): but that would involve a relatively expensive(?) pte_page() in a place which doesn't need to know the struct page for other cases. Well, I think we could always add a pte argument to flush_icache_page... Then, there might be logic to have a flush_lazy_icache_page when changing protections, but that operation (currently called lazy_mmu_prot_update) really doesn't seem like it should be called in all the other places that it is, flush_icache_page should be used for that. But AFAIKS, if we really want correctness, flush_icache_page should go away and be implemented in flush_dcache_page. Well, not pte_page(), it needs to be vm_normal_page() doesn't it? and ia64's current lazy_mmu_prot_update is unsafe when !pfn_valid. Some flush_icache_pages are already in place, others are not: do we need to add some? But those architectures which have a non-empty flush_icache_page seem to have survived without the additional calls - so they might be unnecessarily slowed down by additional calls. Well flush_icache seems to be intended solely to bring icache in sync with dcache modifications, but they try to skimp out on most of the flushes required to handle dcache aliases... but really, I don't think that is possible to do 100% correctly. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Rohit Seth wrote: On Fri, 2007-04-27 at 21:55 +1000, Nick Piggin wrote: That's the theory. However, I'd still like to know how the arch code can make the assertion that icache is known to be at all times other than at the time of a fault? Kernel needs to only worry about the updates that it does. So, if kernel is writing into a page that is getting marked with execute permission then it will need to make sure that caches are coherent. ia64 Kernel keeps track of whether it has done any write operation on a page or not using PG_arch_1. And accordingly flushes icaches. It flushes icache at fault time, I know. What I don't know is why we leave them to drift out of sync afterwards. Ie. what if an operation which causes incoherency is carried out _after_ an executable mapping is installed for that page. You mean by user space? If so, then it is user space responsibility to do the appropriate operations (like flush icache in this case). No, I mean places that set PG_arch_1. flush_dcache_page. This can happen for mapped pages in write, splice, install_arg_page looks questionable, direct IO... Actually there are various windows where mapped pages can be !uptodate, so there is technically most of the filesystem code as well, but I'm trying to stamp those out, so let's ignore that for now. What if you were to say remove all the PG_arch_1 code, and do something really simple like flush icache in flush_dcache_page? Would performance suffer horribly? -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Fri, 2007-04-27 at 15:18 +0100, Hugh Dickins wrote: > I presume Mike and Anil are correct, that it needs to be done before > putting pte into page table, not left until after: but as you've > guessed, that needs to be done everywhere, not just in the two > places so far identified. > That sounds about right. Before installing new mapping, kernel should ensure there are no stale contents in caches or TLB. lazy_mmu_prot_update needs to be called whenever the permissions on pte (about to) change. So if remapping is causing change in protection then lazy_mmu_prot_update needs to be called. > When it was discussed last year (in connection with Peter's page > cleaning patches) it was thought to be a variant of update_mmu_cache() > (after setting pte), and we added the fremap one to accompany it; > but now it looks to be a variant of flush_icache_page() (before > setting pte). > > I believe lazy_mmu_prot_update(pteval) came into existence primarily > for mprotect's change_pte_range() case. Yup. > If ia64 filled in its > flush_icache_page(vma, page), that could have been used there > (checking 'vm_flags & VM_EXEC' instead of pte_exec): but that would > involve a relatively expensive(?) pte_page() in a place which doesn't > need to know the struct page for other cases. > > Well, not pte_page(), it needs to be vm_normal_page() doesn't it? > and ia64's current lazy_mmu_prot_update is unsafe when !pfn_valid. > > Some flush_icache_pages are already in place, others are not: do > we need to add some? But those architectures which have a non-empty > flush_icache_page seem to have survived without the additional calls > - so they might be unnecessarily slowed down by additional calls. > Right. Extra flush_icache_page routines will add cost to archs that have non-null definition of this routine. BTW, isn't flush_icache_page marked for deprecation? > I believe that was the secondary reason for lazy_mmu_prot_update(), > perhaps better called ia64_flush_icache_page(): to allow calls to > be added where ia64 was (mistakenly) thought to want them, without > needing a protracted audit of how other architectures might be > impacted. > lazy_mmu_prot_update was added specifically for notifying change in protection. So, in a way it is closer to update_mmu_cache (Which is for change in mappings itself). Though for ia64 implementation, this ends up flushing the icaches when needed. Hopefully my reply is useful. -rohit - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Fri, 2007-04-27 at 21:55 +1000, Nick Piggin wrote: > That's the theory. However, I'd still like to know how the arch code can > make the assertion that icache is known to be at all times other than at > the time of a fault? > Kernel needs to only worry about the updates that it does. So, if kernel is writing into a page that is getting marked with execute permission then it will need to make sure that caches are coherent. ia64 Kernel keeps track of whether it has done any write operation on a page or not using PG_arch_1. And accordingly flushes icaches. > Ie. what if an operation which causes incoherency is carried out _after_ > an executable mapping is installed for that page. > You mean by user space? If so, then it is user space responsibility to do the appropriate operations (like flush icache in this case). -rohit - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
My book has a fairly detailed discussion of how these operations were supposed to work and what the reasoning behind them was. Unfortunately, I don't have time to really participate this discussion at the moment, but I hope somebody else has access to the book and would (re-)read it for some background (not to claim that it got everything right 100% but to ensure that earlier mistakes are not repeated...). --david On 4/27/07, Hugh Dickins <[EMAIL PROTECTED]> wrote: On Fri, 27 Apr 2007, Nick Piggin wrote: > > But that's because of ia64's cache coherency implementation. I don't really > follow the documentation to know whether it should be one way or the other, > but surely it should be done either before or after the set_pte_at, not both. > > Anyway, how about fremap or mprotect, for example? > ... > > OK, I'm still not sure that I understand why lazy_mmu_prot_update should be > used rather than flush_icache_page (in concept, not ia64 implementation). > Sure, flush_icache_page isn't given the pte, but let's assume we can change > that. You're asking lots of good questions. I wish the ia64 people would know the answers, but from the length of time the "lazy_mmu_prot_update" stuff took to get into the tree, and the length of time it's taken to be found defective, I suspect they don't, and we'll have to guess for them. Some guesses I'm working with... I presume Mike and Anil are correct, that it needs to be done before putting pte into page table, not left until after: but as you've guessed, that needs to be done everywhere, not just in the two places so far identified. When it was discussed last year (in connection with Peter's page cleaning patches) it was thought to be a variant of update_mmu_cache() (after setting pte), and we added the fremap one to accompany it; but now it looks to be a variant of flush_icache_page() (before setting pte). I believe lazy_mmu_prot_update(pteval) came into existence primarily for mprotect's change_pte_range() case. If ia64 filled in its flush_icache_page(vma, page), that could have been used there (checking 'vm_flags & VM_EXEC' instead of pte_exec): but that would involve a relatively expensive(?) pte_page() in a place which doesn't need to know the struct page for other cases. Well, not pte_page(), it needs to be vm_normal_page() doesn't it? and ia64's current lazy_mmu_prot_update is unsafe when !pfn_valid. Some flush_icache_pages are already in place, others are not: do we need to add some? But those architectures which have a non-empty flush_icache_page seem to have survived without the additional calls - so they might be unnecessarily slowed down by additional calls. I believe that was the secondary reason for lazy_mmu_prot_update(), perhaps better called ia64_flush_icache_page(): to allow calls to be added where ia64 was (mistakenly) thought to want them, without needing a protracted audit of how other architectures might be impacted. But I'm still trying to make sense of it. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-ia64" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Mosberger Consulting LLC, http://www.mosberger-consulting.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Fri, 27 Apr 2007, Nick Piggin wrote: > > But that's because of ia64's cache coherency implementation. I don't really > follow the documentation to know whether it should be one way or the other, > but surely it should be done either before or after the set_pte_at, not both. > > Anyway, how about fremap or mprotect, for example? > ... > > OK, I'm still not sure that I understand why lazy_mmu_prot_update should be > used rather than flush_icache_page (in concept, not ia64 implementation). > Sure, flush_icache_page isn't given the pte, but let's assume we can change > that. You're asking lots of good questions. I wish the ia64 people would know the answers, but from the length of time the "lazy_mmu_prot_update" stuff took to get into the tree, and the length of time it's taken to be found defective, I suspect they don't, and we'll have to guess for them. Some guesses I'm working with... I presume Mike and Anil are correct, that it needs to be done before putting pte into page table, not left until after: but as you've guessed, that needs to be done everywhere, not just in the two places so far identified. When it was discussed last year (in connection with Peter's page cleaning patches) it was thought to be a variant of update_mmu_cache() (after setting pte), and we added the fremap one to accompany it; but now it looks to be a variant of flush_icache_page() (before setting pte). I believe lazy_mmu_prot_update(pteval) came into existence primarily for mprotect's change_pte_range() case. If ia64 filled in its flush_icache_page(vma, page), that could have been used there (checking 'vm_flags & VM_EXEC' instead of pte_exec): but that would involve a relatively expensive(?) pte_page() in a place which doesn't need to know the struct page for other cases. Well, not pte_page(), it needs to be vm_normal_page() doesn't it? and ia64's current lazy_mmu_prot_update is unsafe when !pfn_valid. Some flush_icache_pages are already in place, others are not: do we need to add some? But those architectures which have a non-empty flush_icache_page seem to have survived without the additional calls - so they might be unnecessarily slowed down by additional calls. I believe that was the secondary reason for lazy_mmu_prot_update(), perhaps better called ia64_flush_icache_page(): to allow calls to be added where ia64 was (mistakenly) thought to want them, without needing a protracted audit of how other architectures might be impacted. But I'm still trying to make sense of it. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Mike Stroyan wrote: On Thu, Apr 26, 2007 at 05:53:49PM +1000, Nick Piggin wrote: I had a couple of questions which I'm hoping someone would be kind enough to explain :) ... I wonder how this is different to all the other code which calls lazy_mmu_prot_update() after set_pte_at(). do_swap_page, for example, _could_ fault in executable code, couldn't it? The do_swap_page code does look suspect. It seems to be working on ia64 because a DMA transfer of data from swap to the allocated page is removing old lines from the icache. If code on an anonymous page was swapping in without direct DMA to the page then the same problem could occur. I can't think of a reasonable situation that would cause swapping in to not use DMA. Swapping to/from NFS does not seem reasonable to me anyway. But that's because of ia64's cache coherency implementation. I don't really follow the documentation to know whether it should be one way or the other, but surely it should be done either before or after the set_pte_at, not both. Anyway, how about fremap or mprotect, for example? It is because do_swap_page uses flush_icache_page()? So why doesn't the flush_icache_page() work in do_no_page as well? (It seems to look like a superset of lazy_mmu_prot_update on ia64?!?). flush_icache_page() on ia64 is provided by include/asm-ia64/cacheflush.h. It doesn't have any effect at all. #define flush_icache_page(vma,page) do { } while (0) You're right, sorry I was mistakenly looking at the flush_range. That was my only immediate concern with your patch... So, for finding and fixing that bug you have my admiration :) OK, I'm still not sure that I understand why lazy_mmu_prot_update should be used rather than flush_icache_page (in concept, not ia64 implementation). Sure, flush_icache_page isn't given the pte, but let's assume we can change that. lazy_mmu_prot_update() is supposed to get icache flushes done when they need to be. And it is supposed to avoid unneeded flushes when the icache is known to be clean for a page. That's the theory. However, I'd still like to know how the arch code can make the assertion that icache is known to be at all times other than at the time of a fault? Ie. what if an operation which causes incoherency is carried out _after_ an executable mapping is installed for that page. -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
On Thu, Apr 26, 2007 at 05:53:49PM +1000, Nick Piggin wrote: > I had a couple of questions which I'm hoping someone would be kind > enough to explain :) ... > I wonder how this is different to all the other code which calls > lazy_mmu_prot_update() after set_pte_at(). do_swap_page, for example, > _could_ fault in executable code, couldn't it? The do_swap_page code does look suspect. It seems to be working on ia64 because a DMA transfer of data from swap to the allocated page is removing old lines from the icache. If code on an anonymous page was swapping in without direct DMA to the page then the same problem could occur. I can't think of a reasonable situation that would cause swapping in to not use DMA. Swapping to/from NFS does not seem reasonable to me anyway. > It is because do_swap_page uses flush_icache_page()? So why doesn't > the flush_icache_page() work in do_no_page as well? (It seems to look > like a superset of lazy_mmu_prot_update on ia64?!?). flush_icache_page() on ia64 is provided by include/asm-ia64/cacheflush.h. It doesn't have any effect at all. #define flush_icache_page(vma,page) do { } while (0) lazy_mmu_prot_update() is supposed to get icache flushes done when they need to be. And it is supposed to avoid unneeded flushes when the icache is known to be clean for a page. -- Mike Stroyan, [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Fw: [PATCH] ia64: race flushing icache in do_no_page path
Hi, I had a couple of questions which I'm hoping someone would be kind enough to explain :) Andrew Morton wrote: guys, aplication crashes on million-dollar machines aren't nice. Please review carefully and urgently? Begin forwarded message: Date: Wed, 25 Apr 2007 18:16:15 -0600 From: Mike Stroyan <[EMAIL PROTECTED]> To: "Luck, Tony" <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED], linux-kernel@vger.kernel.org Subject: [PATCH] ia64: race flushing icache in do_no_page path This is a very similar problem to a copy-on-write cache flushing problem that Tony Luck fixed in July 2006. In this case the do_no_page function handles a fault in an executable or library that is mmapped from an NFS file system. The code is copied into a newly reallocated page. The lazy_mmu_prot_update() function should be used to flush old entries from the icache for that page on ia64 processors. But that call is made after a set_pte_at call that makes the page accessible to other threads executing the same code. This was seen to cause application crashes when an OpenMP application ran many threads calling same functions at the same time. The first thread to reach a page starts to fault in the new code. One of the other threads overtakes the first and executes old data from the icache. That could result in bad instructions. It is more obvious when an old cache line contains prefetched non-instruction bits that result in an illegal instruction trap. I wonder how this is different to all the other code which calls lazy_mmu_prot_update() after set_pte_at(). do_swap_page, for example, _could_ fault in executable code, couldn't it? It is because do_swap_page uses flush_icache_page()? So why doesn't the flush_icache_page() work in do_no_page as well? (It seems to look like a superset of lazy_mmu_prot_update on ia64?!?). And while we're looking at flush_icache_page, why is there none in do_wp_page (I admit, I'm not really up to scratch on d/i cache aliasing handling, but cachetlb.txt seems to suggest that cow_user_page fits the description). That is, if we're already trying to cover our butts wrt SMC, then do_wp_page _could_ be cow'ing executable code, couldn't it? And for that matter, I admit I don't understand how the icache flushing can be done lazily, only at change-protection time. Why is any flush_dcache_page() site not a problem for an _existing_ executable pte wrt d/i cache aliases? BTW. while I'm ranting, I hope all this stuff has gone so complex for a reason, and that being that the alternative simpler approach of more flushes, less lazy, less complex, less buggy was tested and found to be noticably slower... :) The problem has only been seen on montecito processors which have separate level 2 icache and dcache. This dcache to icache coherency problem is more likely to occur there because of the much larger level 2 icache. I suspect that the non-NFS case is working because direct DMA into the new page is making the instruction cache coherent. Any file system that uses a non-DMA copy into the text page could show the same problem. Signed-off-by: Mike Stroyan <[EMAIL PROTECTED]> diff --git a/mm/memory.c b/mm/memory.c index e7066e7..50c8848 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2291,6 +2291,7 @@ retry: entry = mk_pte(new_page, vma->vm_page_prot); if (write_access) entry = maybe_mkwrite(pte_mkdirty(entry), vma); + lazy_mmu_prot_update(entry); set_pte_at(mm, address, page_table, entry); if (anon) { inc_mm_counter(mm, anon_rss); @@ -2312,7 +2313,6 @@ retry: /* no need to invalidate: a not-present page shouldn't be cached */ update_mmu_cache(vma, address, entry); - lazy_mmu_prot_update(entry); unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) { -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] ia64: race flushing icache in do_no_page path
This is a very similar problem to a copy-on-write cache flushing problem that Tony Luck fixed in July 2006. In this case the do_no_page function handles a fault in an executable or library that is mmapped from an NFS file system. The code is copied into a newly reallocated page. The lazy_mmu_prot_update() function should be used to flush old entries from the icache for that page on ia64 processors. But that call is made after a set_pte_at call that makes the page accessible to other threads executing the same code. This was seen to cause application crashes when an OpenMP application ran many threads calling same functions at the same time. The first thread to reach a page starts to fault in the new code. One of the other threads overtakes the first and executes old data from the icache. That could result in bad instructions. It is more obvious when an old cache line contains prefetched non-instruction bits that result in an illegal instruction trap. The problem has only been seen on montecito processors which have separate level 2 icache and dcache. This dcache to icache coherency problem is more likely to occur there because of the much larger level 2 icache. I suspect that the non-NFS case is working because direct DMA into the new page is making the instruction cache coherent. Any file system that uses a non-DMA copy into the text page could show the same problem. Signed-off-by: Mike Stroyan <[EMAIL PROTECTED]> diff --git a/mm/memory.c b/mm/memory.c index e7066e7..50c8848 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2291,6 +2291,7 @@ retry: entry = mk_pte(new_page, vma->vm_page_prot); if (write_access) entry = maybe_mkwrite(pte_mkdirty(entry), vma); + lazy_mmu_prot_update(entry); set_pte_at(mm, address, page_table, entry); if (anon) { inc_mm_counter(mm, anon_rss); @@ -2312,7 +2313,6 @@ retry: /* no need to invalidate: a not-present page shouldn't be cached */ update_mmu_cache(vma, address, entry); - lazy_mmu_prot_update(entry); unlock: pte_unmap_unlock(page_table, ptl); if (dirty_page) { -- Mike Stroyan, [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/