Re: [PATCH] add a clear_pages function to clear pages of higher order
On Tue, Apr 05, 2005 at 10:15:18PM -0700, Gerrit Huizenga wrote: > SpecSDET, Aim7 or ReAim from OSDL are probably what you are thinking of. SDET isn't publicly available. I hope by now osdl-reaim is called "osdl-aim7": http://lkml.org/lkml/2003/8/1/172 grant - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Tue, 05 Apr 2005 21:48:22 PDT, David Mosberger wrote: > > On Tue, 5 Apr 2005 17:33:59 -0700 (PDT), Christoph Lameter <[EMAIL > > PROTECTED]> said: > > Christoph> Which benchmark would you recommend for this? > > I don't know about "recommend", but I think SPECweb, SPECjbb, > the-UNIX-multi-user-benchmark-whose-name-I-keep-forgetting, and in > general anything that involves process-activity and/or large working > sets might be interesting (in other words: anything but > microbenchmarks; I'm afraid). SpecSDET, Aim7 or ReAim from OSDL are probably what you are thinking of. gerrit - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
> On Tue, 5 Apr 2005 17:33:59 -0700 (PDT), Christoph Lameter <[EMAIL > PROTECTED]> said: Christoph> Which benchmark would you recommend for this? I don't know about "recommend", but I think SPECweb, SPECjbb, the-UNIX-multi-user-benchmark-whose-name-I-keep-forgetting, and in general anything that involves process-activity and/or large working sets might be interesting (in other words: anything but microbenchmarks; I'm afraid). --david - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Tue, 5 Apr 2005, David Mosberger wrote: > What LMbench test other than fork/exec would you have expected to be > affected by this? LMbench is not a good benchmark for this (remember: > it's a _micro_ benchmark). LMbench does a variety of things and I expected to see at least something on the page fault test and hopefully also some variations for other tests. Which benchmark would you recommend for this? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
> On Tue, 5 Apr 2005 17:15:53 -0700 (PDT), Christoph Lameter <[EMAIL > PROTECTED]> said: Christoph> On Thu, 24 Mar 2005, David Mosberger wrote: >> That's definitely the case. See my earlier post on this topic: >> http://www.gelato.unsw.edu.au/linux-ia64/0409/11012.html >> Unfortunately, nobody reported any results for larger machines >> and/or more interesting workloads, so the patch is in limbo at >> this time. Clearly, if the CPU that's clearing the page is >> likely to use that same page soon after, it'd be useful to use >> temporal stores. Christoph> Here are some numbers using lmbench of temporal writes Christoph> vs. non temporal writes on ia64 (8p machine but lmbench Christoph> run only for one load). There seems to be some benefit Christoph> for fork/exec but overall this does not seem to be a Christoph> clear win. I suspect that the distinction between Christoph> temporal vs. nontemporal writes is be more beneficial on Christoph> machines with smaller pagesizes since the likelyhood that Christoph> most cachelines of a page are used soon is increased and Christoph> therefore hot zeroing is more beneficial. What LMbench test other than fork/exec would you have expected to be affected by this? LMbench is not a good benchmark for this (remember: it's a _micro_ benchmark). --david - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Thu, 24 Mar 2005, David Mosberger wrote: > That's definitely the case. See my earlier post on this topic: > > http://www.gelato.unsw.edu.au/linux-ia64/0409/11012.html > > Unfortunately, nobody reported any results for larger machines and/or > more interesting workloads, so the patch is in limbo at this time. > Clearly, if the CPU that's clearing the page is likely to use that > same page soon after, it'd be useful to use temporal stores. Here are some numbers using lmbench of temporal writes vs. non temporal writes on ia64 (8p machine but lmbench run only for one load). There seems to be some benefit for fork/exec but overall this does not seem to be a clear win. I suspect that the distinction between temporal vs. nontemporal writes is be more beneficial on machines with smaller pagesizes since the likelyhood that most cachelines of a page are used soon is increased and therefore hot zeroing is more beneficial. L M B E N C H 3 . 0 S U M M A R Y (Alpha software, do not distribute) Basic system parameters --- Host OS Description Mhz tlb cache mem scal pages line par load bytes - - --- - - -- margin Linux 2.6.12-rc1-bk3 ia64-linux-gnu 1300 128 1 margin Linux 2.6.12-rc1-bk3 ia64-linux-gnu 1300 128 1 margin Linux 2.6.12-rc1-bk3 ia64-linux-gnu 1300 128 1 margin Linux 2.6.12-rc1-bk3 ia64-linux-gnu 1300 128 1 margin Linux 2.6.12-rc1-bk3 ia64-linux-gnu 1300 128 1 margin Linux 2.6.12-rc1-bk3 ia64-linux-gnu 1300 128 1 margin Linux 2.6.12-rc1-bk3 ia64-linux-gnu 1300 128 1 margin Linux 2.6.12-rc1-bk3-dm ia64-linux-gnu 1300 128 1 margin Linux 2.6.12-rc1-bk3-dm ia64-linux-gnu 1300 128 1 margin Linux 2.6.12-rc1-bk3-dm ia64-linux-gnu 1300 128 1 Processor, Processes - times in microseconds - smaller is better -- Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc - - margin Linux 2.6.12-rc1-bk3 1300 0.04 0.26 4.90 6.11 15.7 0.39 2.43 528. 1926 4853 margin Linux 2.6.12-rc1-bk3 1300 0.04 0.27 4.86 6.10 15.7 0.39 2.45 522. 1910 4260 margin Linux 2.6.12-rc1-bk3 1300 0.04 0.26 4.85 6.10 15.8 0.39 2.40 526. 1916 4429 margin Linux 2.6.12-rc1-bk3 1300 0.04 0.26 4.84 6.11 15.7 0.39 2.40 531. 1838 4429 margin Linux 2.6.12-rc1-bk3 1300 0.04 0.26 4.85 6.11 15.8 0.39 2.47 553. 1931 5118 margin Linux 2.6.12-rc1-bk3 1300 0.04 0.26 5.09 6.37 15.7 0.39 2.40 537. 1934 5133 margin Linux 2.6.12-rc1-bk3 1300 0.04 0.26 5.09 6.35 15.8 0.39 2.40 555. 1939 5389 margin Linux 2.6.12-rc1-bk3-dm 1300 0.04 0.26 4.88 6.10 15.8 0.39 2.42 519. 1829 4787 margin Linux 2.6.12-rc1-bk3-dm 1300 0.04 0.26 4.87 6.09 15.8 0.39 2.40 516. 1830 5057 margin Linux 2.6.12-rc1-bk3-dm 1300 0.04 0.27 4.86 6.10 15.8 0.39 2.40 512. 1878 5166 Context switching - times in microseconds - smaller is better - Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw - - -- -- -- -- -- --- --- margin Linux 2.6.12-rc1-bk3 7.3300 2.7400 7.0400 4.4600 6.6200 3.94000 8.38000 margin Linux 2.6.12-rc1-bk3 7.6100 8.1000 7.3200 4.5900 7.1700 5.5 7.84000 margin Linux 2.6.12-rc1-bk3 7.2400 8. 7.2100 4.3800 6.7500 4.77000 7.37000 margin Linux 2.6.12-rc1-bk3 7.4100 8.0400 7.0500 4.5100 7.2500 4.11000 7.03000 margin Linux 2.6.12-rc1-bk3 7.2600 8.2100 7.2400 4.6500 6.6500 4.08000 7.81000 margin Linux 2.6.12-rc1-bk3 7.4600 7.9000 7.3800 4.3800 6.6200 4.83000 7.27000 margin Linux 2.6.12-rc1-bk3 7.4400 8.2000 7.2000 5.8700 6.8000 4.86000 7.95000 margin Linux 2.6.12-rc1-bk3-dm 7.4400 8.3100 7.1300 5.6900 6.6500 5.49000 7.49000 margin Linux 2.6.12-rc1-bk3-dm 2.1300 8.0100 7.3800 4.6700 6.5500 4.
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Sun, 27 Mar 2005, Andi Kleen wrote: > > Clearly, if the CPU that's clearing the page is likely to use that > > same page soon after, it'd be useful to use temporal stores. > > That is always the case in the current code (without Christophers > pre cleaning daemon). The page fault handler clears and user space > is guaranteed to need at least one cacheline from the fresh page > because it just did a page fault on it. With non temporal stores > you guarantee at least one hard cache miss directly after > the return to user space. It is not the case that *all* the cachelines of a page are going to be used right after zeroing. For the page fault case it is only guaranteed that *one* cacheline will be used. In the PTE/PMD/PUD page allocation cases it is likely that only a single cacheline is used. There are some cases in the code (apart from the fault handler) where zeroed pages are allocated with no guarantee of use (f.e. the allocations for buffers for shared memory or pipes). > I suspect even with precleaning the average time from cleaning to use will be > quite short. If the time is short then hot cleaning is the right way to go and then prezeroing is of no benefit. Prezeroing can only be of benefit if there is sufficient time between the zeroing and the use of the data. It must be sufficiently long to cause the the cachelines to no longer be in in the caches. Then the loading of these cachelines may be avoided which yields the performance benefit. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
On 27 Mar 2005 19:12:20 +0200 Andi Kleen <[EMAIL PROTECTED]> wrote: > With non temporal stores > you guarantee at least one hard cache miss directly after > the return to user space. This is true if the cacheline were not present already at the time of the non-temporal store. I know what you're trying to say, I'm just clarifying. The real question is if a large enough ratio of those cachelines in the page get similarly accessed. I happen to think the answer to that for any real example is yes. Yet, I have no way to prove this. It would be cool to do some hacks under Xen or user-mode Linux to get some real statistics about this. Actually, this could be done also with hacks to valgrind or other similar tools. QEMU could also be used. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
> Clearly, if the CPU that's clearing the page is likely to use that > same page soon after, it'd be useful to use temporal stores. That is always the case in the current code (without Christophers pre cleaning daemon). The page fault handler clears and user space is guaranteed to need at least one cacheline from the fresh page because it just did a page fault on it. With non temporal stores you guarantee at least one hard cache miss directly after the return to user space. I suspect even with precleaning the average time from cleaning to use will be quite short. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Thu, 24 Mar 2005, David S. Miller wrote: > Erm... were any of your test builds done with the new CONFIG_CLEAR_COLD > option enabled? :-) These were all fixed but I failed to do a "quilt refresh" sigh... The email issues are also fixed now sigh. What a day. > Next, replace your arch/sparc64/lib/clear_page.S diff with this one and > things would be working and we'll be using the proper temporal vs. > non-temporal stores on that platform. Thanks. Here is the patch with your changes and a "quilt refresh" ;-) - Introduces a new function clear_cold(void *pageaddress, int order) to clear pages of an arbitrary size with non temporal stores. Cold clearing is typically faster than hot clearing. Hot clearing is beneficial when the data is to be used soon. (Will also work well with the new hot and cold aware prezeroing daemon) Use cold clearing for huge pages. For ia64 also make clear_page uses temporal stores. Patch needs fixes to work properly on i386 and x86_64. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.11/mm/hugetlb.c === --- linux-2.6.11.orig/mm/hugetlb.c 2005-03-01 23:38:12.0 -0800 +++ linux-2.6.11/mm/hugetlb.c 2005-03-24 14:12:53.0 -0800 @@ -78,7 +78,6 @@ void free_huge_page(struct page *page) struct page *alloc_huge_page(void) { struct page *page; - int i; spin_lock(&hugetlb_lock); page = dequeue_huge_page(); @@ -89,8 +88,7 @@ struct page *alloc_huge_page(void) spin_unlock(&hugetlb_lock); set_page_count(page, 1); page[1].mapping = (void *)free_huge_page; - for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i) - clear_highpage(&page[i]); + prep_zero_page(page, HUGETLB_PAGE_ORDER, GFP_HIGHUSER | __GFP_COLD); return page; } Index: linux-2.6.11/mm/page_alloc.c === --- linux-2.6.11.orig/mm/page_alloc.c 2005-03-24 13:15:40.0 -0800 +++ linux-2.6.11/mm/page_alloc.c2005-03-24 18:39:22.0 -0800 @@ -633,11 +633,17 @@ void fastcall free_cold_page(struct page free_hot_cold_page(page, 1); } -static inline void prep_zero_page(struct page *page, int order, int gfp_flags) +void prep_zero_page(struct page *page, unsigned int order, int gfp_flags) { int i; BUG_ON((gfp_flags & (__GFP_WAIT | __GFP_HIGHMEM)) == __GFP_HIGHMEM); + +#ifdef CONFIG_CLEAR_COLD + if ((gfp_flags & __GFP_COLD) && !PageHighMem(page)) + clear_cold(page_address(page), order); + else +#endif for(i = 0; i < (1 << order); i++) clear_highpage(page + i); } Index: linux-2.6.11/include/linux/gfp.h === --- linux-2.6.11.orig/include/linux/gfp.h 2005-03-01 23:37:50.0 -0800 +++ linux-2.6.11/include/linux/gfp.h2005-03-24 14:16:44.0 -0800 @@ -131,4 +131,5 @@ extern void FASTCALL(free_cold_page(stru void page_alloc_init(void); +void prep_zero_page(struct page *, unsigned int order, int gfp_flags); #endif /* __LINUX_GFP_H */ Index: linux-2.6.11/arch/ia64/Kconfig === --- linux-2.6.11.orig/arch/ia64/Kconfig 2005-03-01 23:38:26.0 -0800 +++ linux-2.6.11/arch/ia64/Kconfig 2005-03-24 14:12:53.0 -0800 @@ -46,6 +46,10 @@ config GENERIC_IOMAP bool default y +config CLEAR_COLD + bool + default y + choice prompt "System type" default IA64_GENERIC Index: linux-2.6.11/include/asm-ia64/page.h === --- linux-2.6.11.orig/include/asm-ia64/page.h 2005-03-01 23:37:48.0 -0800 +++ linux-2.6.11/include/asm-ia64/page.h2005-03-24 14:12:53.0 -0800 @@ -57,6 +57,8 @@ # define STRICT_MM_TYPECHECKS extern void clear_page (void *page); +/* Clear arbitrary order page using nontemporal writes */ +extern void clear_cold (void *page, unsigned int order); extern void copy_page (void *to, void *from); /* Index: linux-2.6.11/arch/ia64/kernel/ia64_ksyms.c === --- linux-2.6.11.orig/arch/ia64/kernel/ia64_ksyms.c 2005-03-01 23:38:08.0 -0800 +++ linux-2.6.11/arch/ia64/kernel/ia64_ksyms.c 2005-03-24 14:12:53.0 -0800 @@ -39,6 +39,7 @@ EXPORT_SYMBOL(__up); #include EXPORT_SYMBOL(clear_page); +EXPORT_SYMBOL(clear_cold); #ifdef CONFIG_VIRTUAL_MEM_MAP #include Index: linux-2.6.11/arch/ia64/lib/clear_page.S === --- linux-2.6.11.orig/arch/ia64/lib/clear_page.S2005-03-01 23:37:47.0 -0800 +++ linux-2.6.11/arch/ia64/lib/clear_page.S 2005-03-24 14:24:29.0 -0800 @@ -7,6 +7,8
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Thu, 24 Mar 2005 14:49:55 -0800 (PST) Christoph Lameter <[EMAIL PROTECTED]> wrote: > Could you help me fix up this patch replacing the old clear_pages patch? Ok, first you need to mark the order and gfp arguments as unsigned for mm/page_alloc.c:prep_zero_page() so that it matches the prototype you added to include/linux/gfp.h else the compiler warns a lot. Next, in the same function in mm/page_alloc.c, "PageHighmem()" is typo'd, it should be "PageHighMem()". The clear_cold() call on the next line needs a semicolon. Erm... were any of your test builds done with the new CONFIG_CLEAR_COLD option enabled? :-) Next, replace your arch/sparc64/lib/clear_page.S diff with this one and things would be working and we'll be using the proper temporal vs. non-temporal stores on that platform. = arch/sparc64/lib/clear_page.S 1.1 vs edited = --- 1.1/arch/sparc64/lib/clear_page.S 2004-08-08 19:54:07 -07:00 +++ edited/arch/sparc64/lib/clear_page.S2005-03-24 15:56:33 -08:00 @@ -72,26 +72,34 @@ mov 1, %o4 clear_page_common: - VISEntryHalf membar #StoreLoad | #StoreStore | #LoadStore - fzero %f0 sethi %hi(PAGE_SIZE/64), %o1 mov %o0, %g1! remember vaddr for tlbflush - fzero %f2 or %o1, %lo(PAGE_SIZE/64), %o1 - faddd %f0, %f2, %f4 - fmuld %f0, %f2, %f6 - faddd %f0, %f2, %f8 - fmuld %f0, %f2, %f10 - faddd %f0, %f2, %f12 - fmuld %f0, %f2, %f14 -1: stda%f0, [%o0 + %g0] ASI_BLK_P +#define PREFETCH(x, y) prefetch x, y +#define PREFETCH_CODE 2 + + PREFETCH([%o0 + 0x000], PREFETCH_CODE) + PREFETCH([%o0 + 0x040], PREFETCH_CODE) + PREFETCH([%o0 + 0x080], PREFETCH_CODE) + PREFETCH([%o0 + 0x0c0], PREFETCH_CODE) + PREFETCH([%o0 + 0x100], PREFETCH_CODE) + PREFETCH([%o0 + 0x140], PREFETCH_CODE) + PREFETCH([%o0 + 0x180], PREFETCH_CODE) +1: + stx %g0, [%o0 + 0x00] + stx %g0, [%o0 + 0x08] + stx %g0, [%o0 + 0x10] + stx %g0, [%o0 + 0x18] + stx %g0, [%o0 + 0x20] + stx %g0, [%o0 + 0x28] + stx %g0, [%o0 + 0x30] + stx %g0, [%o0 + 0x38] + PREFETCH([%o0 + 0x1c0], PREFETCH_CODE) subcc %o1, 1, %o1 bne,pt %icc, 1b add%o0, 0x40, %o0 - membar #Sync - VISExitHalf brz,pn %o4, out nop @@ -101,5 +109,32 @@ stw %o2, [%g6 + TI_PRE_COUNT] out: retl +nop + + .globl clear_cold +clear_cold:/* %o0=dest, %o1=order */ + sethi %hi(PAGE_SIZE/64), %o2 + clr %o4 + or %o2, %lo(PAGE_SIZE/64), %o2 + sllx%o2, %o1, %o1 + VISEntryHalf + membar #StoreLoad | #StoreStore | #LoadStore + fzero %f0 + fzero %f2 + faddd %f0, %f2, %f4 + fmuld %f0, %f2, %f6 + faddd %f0, %f2, %f8 + fmuld %f0, %f2, %f10 + + faddd %f0, %f2, %f12 + fmuld %f0, %f2, %f14 +2: stda%f0, [%o0 + %g0] ASI_BLK_P + subcc %o1, 1, %o1 + bne,pt %icc, 2b +add%o0, 0x40, %o0 + membar #Sync + VISExitHalf + + retl nop - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Thu, 24 Mar 2005 14:49:55 -0800 (PST) Christoph Lameter <[EMAIL PROTECTED]> wrote: > On Thu, 24 Mar 2005, David S. Miller wrote: > > > > prep_zero_page would use a temporal clear for an order 0 page but a > > > nontemporal clear for higher order pages. > > > > That sounds about right to me. > > > > Hmmm, I'm inspired to experiment with this on sparc64 a bit. > > Could you help me fix up this patch replacing the old clear_pages patch? Sure, I'll play with it. Meanwhile, here are some numbers. I changed just the clear_page() implementation on sparc64 so that it used prefetching and normal temporal stores. The machine is a uniprocessor 1.5Ghz Ultra-IIIi, 64K write-through D-cache, 64K I-cache, 1MB L2 cache. I did 4 timed 'vmlinux' builds after a fresh boot: BEFORE: real9m8.720s user8m28.345s sys 0m32.734s real9m2.034s user8m28.763s sys 0m32.512s real9m1.848s user8m28.970s sys 0m32.204s real9m1.701s user8m28.715s sys 0m32.394s AFTER: real9m2.241s user8m16.633s sys 0m36.451s real8m53.739s user8m17.165s sys 0m36.052s real8m54.089s user8m17.266s sys 0m36.219s real8m54.071s user8m17.473s sys 0m36.073s So, at the very least, my results agree with D. Mosberger's on IA64. At the cost of ~4 seconds of system time, we gain ~11 seconds of user time. I'm pretty much convinced this is a win. I wonder if it matters to do something similar for copy_page*() as well. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Thu, 24 Mar 2005, David S. Miller wrote: > > prep_zero_page would use a temporal clear for an order 0 page but a > > nontemporal clear for higher order pages. > > That sounds about right to me. > > Hmmm, I'm inspired to experiment with this on sparc64 a bit. Could you help me fix up this patch replacing the old clear_pages patch? Introduces a new function clear_cold(void *pageaddress, int order) to clear pages of an arbitrary size with non temporal stores. Cold clearing is typically faster than hot clearing. Hot clearing is beneficial when the data is to be used soon. (The hot cold distincion also work well with the new hot and cold aware prezeroing daemon) - Use cold clearing for huge pages. - For ia64 also make clear_page uses temporal stores. - Patch needs fixes to work properly on i386, x86_64 and sparc64. - There may be other allocations that can benefit from the increased performance possible for cold zeroed pages if the pages are not to be used right away. Add __GFP_COLD to the gfp_flags for those. Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.11/mm/hugetlb.c === --- linux-2.6.11.orig/mm/hugetlb.c 2005-03-01 23:38:12.0 -0800 +++ linux-2.6.11/mm/hugetlb.c 2005-03-24 14:12:53.0 -0800 @@ -78,7 +78,6 @@ void free_huge_page(struct page *page) struct page *alloc_huge_page(void) { struct page *page; - int i; spin_lock(&hugetlb_lock); page = dequeue_huge_page(); @@ -89,8 +88,7 @@ struct page *alloc_huge_page(void) spin_unlock(&hugetlb_lock); set_page_count(page, 1); page[1].mapping = (void *)free_huge_page; - for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i) - clear_highpage(&page[i]); + prep_zero_page(page, HUGETLB_PAGE_ORDER, GFP_HIGHUSER | __GFP_COLD); return page; } Index: linux-2.6.11/mm/page_alloc.c === --- linux-2.6.11.orig/mm/page_alloc.c 2005-03-24 13:15:40.0 -0800 +++ linux-2.6.11/mm/page_alloc.c2005-03-24 14:15:15.0 -0800 @@ -633,11 +633,17 @@ void fastcall free_cold_page(struct page free_hot_cold_page(page, 1); } -static inline void prep_zero_page(struct page *page, int order, int gfp_flags) +void prep_zero_page(struct page *page, int order, int gfp_flags) { int i; BUG_ON((gfp_flags & (__GFP_WAIT | __GFP_HIGHMEM)) == __GFP_HIGHMEM); + +#ifdef CONFIG_CLEAR_COLD + if ((gfp_flags & __GFP_COLD) && !PageHighmem(page)) + clear_cold(page_address(page), order) + else +#endif for(i = 0; i < (1 << order); i++) clear_highpage(page + i); } Index: linux-2.6.11/include/linux/gfp.h === --- linux-2.6.11.orig/include/linux/gfp.h 2005-03-01 23:37:50.0 -0800 +++ linux-2.6.11/include/linux/gfp.h2005-03-24 14:12:53.0 -0800 @@ -131,4 +131,5 @@ extern void FASTCALL(free_cold_page(stru void page_alloc_init(void); +void prep_zero_page(struct page *, unsigned int order, unsigned int gfp_flags); #endif /* __LINUX_GFP_H */ Index: linux-2.6.11/arch/ia64/Kconfig === --- linux-2.6.11.orig/arch/ia64/Kconfig 2005-03-01 23:38:26.0 -0800 +++ linux-2.6.11/arch/ia64/Kconfig 2005-03-24 14:12:53.0 -0800 @@ -46,6 +46,10 @@ config GENERIC_IOMAP bool default y +config CLEAR_COLD + bool + default y + choice prompt "System type" default IA64_GENERIC Index: linux-2.6.11/include/asm-ia64/page.h === --- linux-2.6.11.orig/include/asm-ia64/page.h 2005-03-01 23:37:48.0 -0800 +++ linux-2.6.11/include/asm-ia64/page.h2005-03-24 14:12:53.0 -0800 @@ -57,6 +57,8 @@ # define STRICT_MM_TYPECHECKS extern void clear_page (void *page); +/* Clear arbitrary order page using nontemporal writes */ +extern void clear_cold (void *page, unsigned int order); extern void copy_page (void *to, void *from); /* Index: linux-2.6.11/arch/ia64/kernel/ia64_ksyms.c === --- linux-2.6.11.orig/arch/ia64/kernel/ia64_ksyms.c 2005-03-01 23:38:08.0 -0800 +++ linux-2.6.11/arch/ia64/kernel/ia64_ksyms.c 2005-03-24 14:12:53.0 -0800 @@ -39,6 +39,7 @@ EXPORT_SYMBOL(__up); #include EXPORT_SYMBOL(clear_page); +EXPORT_SYMBOL(clear_cold); #ifdef CONFIG_VIRTUAL_MEM_MAP #include Index: linux-2.6.11/arch/ia64/lib/clear_page.S === --- linux-2.6.11.orig/arch/ia64/lib/clear_page.S2005-03-01 23:37:47.0 -0800 +++ linux-2.6.11/arch/ia64/lib/clear_page.S 2005-03-24 14:12:53.0 -0800 @@ -7,6 +7,8 @@ * 1/06/01 davidm
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Thu, 24 Mar 2005 10:41:06 -0800 (PST) Christoph Lameter <[EMAIL PROTECTED]> wrote: > So it would be useful to have > > clear_page-> Temporal. Only zaps one page > > and > > clear_pages -> Zaps arbitrary order of page non-temporal > > > Rework the clear_pages patch to do just that? Maybe rename clear_pages > clear_pages_nt? > > prep_zero_page would use a temporal clear for an order 0 page but a > nontemporal clear for higher order pages. That sounds about right to me. Hmmm, I'm inspired to experiment with this on sparc64 a bit. :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Thu, 24 Mar 2005, David Mosberger wrote: > That's definitely the case. See my earlier post on this topic: > > http://www.gelato.unsw.edu.au/linux-ia64/0409/11012.html > > Unfortunately, nobody reported any results for larger machines and/or > more interesting workloads, so the patch is in limbo at this time. > Clearly, if the CPU that's clearing the page is likely to use that > same page soon after, it'd be useful to use temporal stores. So it would be useful to have clear_page -> Temporal. Only zaps one page and clear_pages -> Zaps arbitrary order of page non-temporal Rework the clear_pages patch to do just that? Maybe rename clear_pages clear_pages_nt? prep_zero_page would use a temporal clear for an order 0 page but a nontemporal clear for higher order pages. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
> On Fri, 18 Mar 2005 20:28:08 +0100, Andi Kleen <[EMAIL PROTECTED]> said: >> stores in general for clearing pages? I checked and Itanium has >> always used non-temporal stores. So there will be no benefit for >> us from this Andi> That is weird. I would actually try to switch to temporal Andi> stores, maybe it will improve some benchmarks. That's definitely the case. See my earlier post on this topic: http://www.gelato.unsw.edu.au/linux-ia64/0409/11012.html Unfortunately, nobody reported any results for larger machines and/or more interesting workloads, so the patch is in limbo at this time. Clearly, if the CPU that's clearing the page is likely to use that same page soon after, it'd be useful to use temporal stores. --david - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Friday 18 March 2005 21:28, Andi Kleen wrote: > On Fri, Mar 18, 2005 at 07:00:06AM -0800, Christoph Lameter wrote: > > On Fri, 18 Mar 2005, Denis Vlasenko wrote: > > > > > NT stores are not about 5% increase. 200%-300%. Provided you are ok with > > > the fact that zeroed page ends up evicted from cache. Luckily, this is > > > exactly > > > what you want with prezeroing. > > > > These are pretty significant results. Maybe its best to use non-temporal > > The differences are actually less. I do not know what Denis benchmarked, > but in my tests the difference was never more than ~10%. He got a zero > too much? No. See attached. # gcc -O2 0main.c # ./a.out Page clear/copy benchmark program. buffer size: 1 Mb Each test tried 64 times, max and min CPU cycles per page are reported. Please disregard max values. They are due to system interference only. clear_page() tests: normal_clear_page - took 44214 max,12615 min cycles per page normal_clear_page - took 18969 max,12649 min cycles per page repstosl_clear_page - took 19897 max,12655 min cycles per page movq_clear_page - took 39391 max,10782 min cycles per page movntq_clear_page - took 21612 max, 4779 min cycles per page copy_page() tests: I'm basically saying that 'microbenchmark-visible' performance of NT stores is 200-300% higher than 'normal' stores. BTW: cache eviction is not an intrisic property of non-temporal stores. It's merely how they're implemented in current CPUs: if NT stores hit cached line, invalidate it and push stores to bus. Else just push stores to bus without reading cacheline from RAM first. It is possible that some future CPU won't evict cacheline if NT stores happened to hit it: "if NT stores hit cached line, MODIFY it and push stores to bus". -- vda page_asm.tar.bz2 Description: application/tbz
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Fri, 18 Mar 2005, Andi Kleen wrote: > It does not make any sense if you think of it - the memory bus > of the CPU cannot be that much faster than the cache. The memory bus would be able to reach a higher rate if properly optimized for sequential writes to memory. A cache typically does random writes. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Fri, Mar 18, 2005 at 07:00:06AM -0800, Christoph Lameter wrote: > On Fri, 18 Mar 2005, Denis Vlasenko wrote: > > > NT stores are not about 5% increase. 200%-300%. Provided you are ok with > > the fact that zeroed page ends up evicted from cache. Luckily, this is > > exactly > > what you want with prezeroing. > > These are pretty significant results. Maybe its best to use non-temporal The differences are actually less. I do not know what Denis benchmarked, but in my tests the difference was never more than ~10%. He got a zero too much? It does not make any sense if you think of it - the memory bus of the CPU cannot be that much faster than the cache. And the drawback of eating the cache misses later is really very significant. > stores in general for clearing pages? I checked and Itanium has always > used non-temporal stores. So there will be no benefit for us from this That is weird. I would actually try to switch to temporal stores, maybe it will improve some benchmarks. > approach (we have 16k and 64k page sizes which may make the situation a > bit different). Try to update the i386 architectures to do the same? Definitely not. You can experiment with using it for the cleaner daemon, but even there I would use some heuristic to make sure you only use it on a page that are at the end of a pretty long queue. e.g. if you can guarantee that the page allocator will go through 500k-1MB before going to the NT page that is cache cold it may be a good idea. But that might be pretty complicated and I am not sure it will be worth it. But for the clear running in the page fault handler context it is definitely a bad idea. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Fri, 18 Mar 2005, Denis Vlasenko wrote: > NT stores are not about 5% increase. 200%-300%. Provided you are ok with > the fact that zeroed page ends up evicted from cache. Luckily, this is exactly > what you want with prezeroing. These are pretty significant results. Maybe its best to use non-temporal stores in general for clearing pages? I checked and Itanium has always used non-temporal stores. So there will be no benefit for us from this approach (we have 16k and 64k page sizes which may make the situation a bit different). Try to update the i386 architectures to do the same? Or for prezeroing, you could register a zeroing driver that would use the non-temporal stores with V8 of the prezeroing patches. In any case the clear_pages patch is not useful the way it was intended for us and I am have dropped this from the prezeroing patch. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
> Andi Kleen (iirc) says that non-temporal stores seem to be > big win in microbenchmarks (and I second that), but they are > a net loss when we are going to use zeroed page just after > zeroing. He recommends avoid using non-temporal stores The rule of thumb is to only use non temporal stores when your data set is bigger than the L2/L3 caches of the CPU. This means >1MB. The kernel normally never works on data sets that big. For Christophers new background cleaner daemon it may be worth it when the queue is a LILO. This means it is likely there is a relatively long time between the clearing operation and a workload using it. But even then it is a very close call and would need clear benchmark numbers in macrobenchmarks. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Thursday 17 March 2005 03:33, Christoph Lameter wrote: > On Fri, 11 Mar 2005, Denis Vlasenko wrote: > > > Andi Kleen (iirc) says that non-temporal stores seem to be > > big win in microbenchmarks (and I second that), but they are > > a net loss when we are going to use zeroed page just after > > zeroing. He recommends avoid using non-temporal stores > > > > With this new page prezeroing infrastructure, that argument > > most likely is not right anymore. Especially clearing of > > high-order pages definitely will benefit from NT stores > > because they do not kill L1 data cache in the process. > > > > I don't have K8 and therefore cannot be 100% sure, but > > I really doubt that K8 optimize "rep stosq" into _NT_ stores. > > Hmm. That would be interesting to know and may be necessary to justify > the continued existence of this patch. I tried to get some numbers on > the performance wins for zeroing larger pages with the patch as is (no > NT stores) and came up with: > > Processor Performance Increase > > Itanium 2 1.3Ghz M1/R51.5% > AMD Athlon 64 3200+ i386 mode 3% > AMD Athlon 64 3200+ x86_64 mode 3.3% > > (this is if the zeroing engine is the cpu of course. Prezeroing > may be done through some DMA gizmo independent of the cpu) > > Itanium has more extensive optimization capabilities and > seems to be able to better cope with the loop logic for regular > clear_page. Thus the improvement is even less on Itanium. > > Numbers obtained with the following patch that allows to get performance > data from /proc/meminfo on zeroing performance (just divide Cycles by > Pages for clear_page and clear_pages): Here is a patch which allows to try different page zeroing optimizations to be tested at runtime via sysctl. Was run tested in 2.6.8 time. Rediffed to 2.6.11. Feel free to adapt to your patch and test. Also attached is a tarball for microbenchmarking routines. There are two result files. Duron: normal_clear_page - took 8644 max, 8400 min cycles per page repstosl_clear_page - took 8626 max, 8418 min cycles per page movq_clear_page - took 8647 max, 8300 min cycles per page movntq_clear_page - took 2777 max, 2720 min cycles per page And amd64: normal_clear_page - took 9427 max, 5781 min cycles per page repstosl_clear_page - took 9305 max, 5680 min cycles per page movq_clear_page - took 6167 max, 5576 min cycles per page movntq_clear_page - took 5456 max, 2354 min cycles per page NT stores are not about 5% increase. 200%-300%. Provided you are ok with the fact that zeroed page ends up evicted from cache. Luckily, this is exactly what you want with prezeroing. -- vda diff -urpN linux-2.6.11.src/arch/i386/lib/Makefile linux-2.6.11-nt.src/arch/i386/lib/Makefile --- linux-2.6.11.src/arch/i386/lib/Makefile Tue Oct 19 00:53:10 2004 +++ linux-2.6.11-nt.src/arch/i386/lib/Makefile Fri Mar 18 11:30:51 2005 @@ -4,7 +4,7 @@ lib-y = checksum.o delay.o usercopy.o getuser.o memcpy.o strstr.o \ - bitops.o + bitops.o page_ops.o mmx_page.o sse_page.o lib-$(CONFIG_X86_USE_3DNOW) += mmx.o lib-$(CONFIG_HAVE_DEC_LOCK) += dec_and_lock.o diff -urpN linux-2.6.11.src/arch/i386/lib/mmx.c linux-2.6.11-nt.src/arch/i386/lib/mmx.c --- linux-2.6.11.src/arch/i386/lib/mmx.c Tue Oct 19 00:54:23 2004 +++ linux-2.6.11-nt.src/arch/i386/lib/mmx.c Fri Mar 18 11:30:51 2005 @@ -120,280 +120,3 @@ void *_mmx_memcpy(void *to, const void * kernel_fpu_end(); return p; } - -#ifdef CONFIG_MK7 - -/* - * The K7 has streaming cache bypass load/store. The Cyrix III, K6 and - * other MMX using processors do not. - */ - -static void fast_clear_page(void *page) -{ - int i; - - kernel_fpu_begin(); - - __asm__ __volatile__ ( - " pxor %%mm0, %%mm0\n" : : - ); - - for(i=0;i<4096/64;i++) - { - __asm__ __volatile__ ( - " movntq %%mm0, (%0)\n" - " movntq %%mm0, 8(%0)\n" - " movntq %%mm0, 16(%0)\n" - " movntq %%mm0, 24(%0)\n" - " movntq %%mm0, 32(%0)\n" - " movntq %%mm0, 40(%0)\n" - " movntq %%mm0, 48(%0)\n" - " movntq %%mm0, 56(%0)\n" - : : "r" (page) : "memory"); - page+=64; - } - /* since movntq is weakly-ordered, a "sfence" is needed to become - * ordered again. - */ - __asm__ __volatile__ ( - " sfence \n" : : - ); - kernel_fpu_end(); -} - -static void fast_copy_page(void *to, void *from) -{ - int i; - - kernel_fpu_begin(); - - /* maybe the prefetch stuff can go before the expensive fnsave... - * but that is for later. -AV - */ - __asm__ __volatile__ ( - "1: prefetch (%0)\n" - " prefetch 64(%0)\n" - " prefetch 128(%0)\n" - " prefetch 192(%0)\n" - " prefetch 256(%0)\n" - "2: \n" - ".section .fixup, \"ax\"\n" - "3: movw $0x1AEB, 1b\n" /* jmp on 26 bytes */ - " jmp 2b\n" - ".previous\n" - ".section __ex_table,\"a\"\n" - " .align 4\n" - "
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Fri, 11 Mar 2005, Denis Vlasenko wrote: > Andi Kleen (iirc) says that non-temporal stores seem to be > big win in microbenchmarks (and I second that), but they are > a net loss when we are going to use zeroed page just after > zeroing. He recommends avoid using non-temporal stores > > With this new page prezeroing infrastructure, that argument > most likely is not right anymore. Especially clearing of > high-order pages definitely will benefit from NT stores > because they do not kill L1 data cache in the process. > > I don't have K8 and therefore cannot be 100% sure, but > I really doubt that K8 optimize "rep stosq" into _NT_ stores. Hmm. That would be interesting to know and may be necessary to justify the continued existence of this patch. I tried to get some numbers on the performance wins for zeroing larger pages with the patch as is (no NT stores) and came up with: Processor Performance Increase Itanium 2 1.3Ghz M1/R5 1.5% AMD Athlon 64 3200+ i386 mode 3% AMD Athlon 64 3200+ x86_64 mode 3.3% (this is if the zeroing engine is the cpu of course. Prezeroing may be done through some DMA gizmo independent of the cpu) Itanium has more extensive optimization capabilities and seems to be able to better cope with the loop logic for regular clear_page. Thus the improvement is even less on Itanium. Numbers obtained with the following patch that allows to get performance data from /proc/meminfo on zeroing performance (just divide Cycles by Pages for clear_page and clear_pages): Index: linux-2.6.11/mm/page_alloc.c === --- linux-2.6.11.orig/mm/page_alloc.c 2005-03-16 17:12:51.0 -0800 +++ linux-2.6.11/mm/page_alloc.c2005-03-16 17:17:28.0 -0800 @@ -633,13 +633,33 @@ void fastcall free_cold_page(struct page free_hot_cold_page(page, 1); } -static inline void prep_zero_page(struct page *page, int order, int gfp_flags) +void prep_zero_page(struct page *page, unsigned int order, unsigned int gfp_flags) { int i; + unsigned long t1; BUG_ON((gfp_flags & (__GFP_WAIT | __GFP_HIGHMEM)) == __GFP_HIGHMEM); + +#ifdef CONFIG_CLEAR_PAGES + if (!PageHighMem(page) && order>4) { + unsigned long t; + + t1=get_cycles(); + clear_pages(page_address(page), order); + t = get_cycles() - t1; + add_page_state(clear_pages_cycles, t); + add_page_state(clear_pages_order, 1 << order); + inc_page_state(clear_pages_nr); + return; + } +#endif + + t1=get_cycles(); for(i = 0; i < (1 << order); i++) clear_highpage(page + i); + add_page_state(clear_page_cycles, get_cycles() - t1); + add_page_state(clear_page_order, 1 << order); + inc_page_state(clear_page_nr); } /* Index: linux-2.6.11/include/linux/page-flags.h === --- linux-2.6.11.orig/include/linux/page-flags.h2005-03-16 17:12:51.0 -0800 +++ linux-2.6.11/include/linux/page-flags.h 2005-03-16 17:13:02.0 -0800 @@ -131,6 +131,13 @@ struct page_state { unsigned long allocstall; /* direct reclaim calls */ unsigned long pgrotated;/* pages rotated to tail of the LRU */ + + unsigned long clear_page_nr;/* Nr of clear_page request */ + unsigned long clear_page_cycles; /* Cycles spent in clear_page */ + unsigned long clear_page_order; /* Sum of orders */ + unsigned long clear_pages_nr; /* Nr of clear_pages requests */ + unsigned long clear_pages_cycles; /* Nr of cycles in clear_pages */ + unsigned long clear_pages_order;/* Sum of orders */ }; extern void get_page_state(struct page_state *ret); Index: linux-2.6.11/fs/proc/proc_misc.c === --- linux-2.6.11.orig/fs/proc/proc_misc.c 2005-03-16 17:12:50.0 -0800 +++ linux-2.6.11/fs/proc/proc_misc.c2005-03-16 17:22:18.0 -0800 @@ -127,7 +127,7 @@ static int meminfo_read_proc(char *page, unsigned long allowed; struct vmalloc_info vmi; - get_page_state(&ps); + get_full_page_state(&ps); get_zone_counts(&active, &inactive, &free); /* @@ -168,7 +168,13 @@ static int meminfo_read_proc(char *page, "PageTables: %8lu kB\n" "VmallocTotal: %8lu kB\n" "VmallocUsed: %8lu kB\n" - "VmallocChunk: %8lu kB\n", + "VmallocChunk: %8lu kB\n" + "ClearPage # %8lu\n" + "ClearPage Pgs %8lu\n" + "ClearPage Cyc %8lu\n" + "ClearPages # %8lu\n" + "ClearPages Pg %8lu\n" + "ClearPages Cy %8lu\n",
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Friday 11 March 2005 03:03, Christoph Lameter wrote: > Changelog: > - use Kconfig and CONFIG_CLEAR_PAGES > > The zeroing of a page of a arbitrary order in page_alloc.c and in hugetlb.c > may benefit from a > clear_page that is capable of zeroing multiple pages at once. The following > patch adds > a function "clear_pages" that is capable of clearing multiple continuous > pages at once. > > Patch against 2.6.11-bk6 > > Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> [snip] > -clear_page_end: > +clear_pages_end: > > /* C stepping K8 run faster using the string instructions. > It is also a lot simpler. Use this when possible */ Andi Kleen (iirc) says that non-temporal stores seem to be big win in microbenchmarks (and I second that), but they are a net loss when we are going to use zeroed page just after zeroing. He recommends avoid using non-temporal stores With this new page prezeroing infrastructure, that argument most likely is not right anymore. Especially clearing of high-order pages definitely will benefit from NT stores because they do not kill L1 data cache in the process. I don't have K8 and therefore cannot be 100% sure, but I really doubt that K8 optimize "rep stosq" into _NT_ stores. Andi? -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
Changelog: - use Kconfig and CONFIG_CLEAR_PAGES The zeroing of a page of a arbitrary order in page_alloc.c and in hugetlb.c may benefit from a clear_page that is capable of zeroing multiple pages at once. The following patch adds a function "clear_pages" that is capable of clearing multiple continuous pages at once. Patch against 2.6.11-bk6 Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]> Index: linux-2.6.11/mm/page_alloc.c === --- linux-2.6.11.orig/mm/page_alloc.c 2005-03-10 14:42:43.0 -0800 +++ linux-2.6.11/mm/page_alloc.c2005-03-10 15:01:53.0 -0800 @@ -628,11 +628,19 @@ void fastcall free_cold_page(struct page free_hot_cold_page(page, 1); } -static inline void prep_zero_page(struct page *page, int order, int gfp_flags) +void prep_zero_page(struct page *page, unsigned int order, unsigned int gfp_flags) { int i; BUG_ON((gfp_flags & (__GFP_WAIT | __GFP_HIGHMEM)) == __GFP_HIGHMEM); + +#ifdef CONFIG_CLEAR_PAGES + if (!PageHighMem(page)) { + clear_pages(page_address(page), order); + return; + } +#endif + for(i = 0; i < (1 << order); i++) clear_highpage(page + i); } Index: linux-2.6.11/mm/hugetlb.c === --- linux-2.6.11.orig/mm/hugetlb.c 2005-03-01 23:38:12.0 -0800 +++ linux-2.6.11/mm/hugetlb.c 2005-03-10 15:01:53.0 -0800 @@ -78,7 +78,6 @@ void free_huge_page(struct page *page) struct page *alloc_huge_page(void) { struct page *page; - int i; spin_lock(&hugetlb_lock); page = dequeue_huge_page(); @@ -89,8 +88,7 @@ struct page *alloc_huge_page(void) spin_unlock(&hugetlb_lock); set_page_count(page, 1); page[1].mapping = (void *)free_huge_page; - for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i) - clear_highpage(&page[i]); + prep_zero_page(page, HUGETLB_PAGE_ORDER, GFP_HIGHUSER); return page; } Index: linux-2.6.11/include/asm-ia64/page.h === --- linux-2.6.11.orig/include/asm-ia64/page.h 2005-03-01 23:37:48.0 -0800 +++ linux-2.6.11/include/asm-ia64/page.h2005-03-10 15:02:47.0 -0800 @@ -56,8 +56,9 @@ # ifdef __KERNEL__ # define STRICT_MM_TYPECHECKS -extern void clear_page (void *page); +extern void clear_pages (void *page, int order); extern void copy_page (void *to, void *from); +#define clear_page(__page) clear_pages(__page, 0) /* * clear_user_page() and copy_user_page() can't be inline functions because Index: linux-2.6.11/arch/ia64/kernel/ia64_ksyms.c === --- linux-2.6.11.orig/arch/ia64/kernel/ia64_ksyms.c 2005-03-01 23:38:08.0 -0800 +++ linux-2.6.11/arch/ia64/kernel/ia64_ksyms.c 2005-03-10 15:01:53.0 -0800 @@ -38,7 +38,7 @@ EXPORT_SYMBOL(__down_trylock); EXPORT_SYMBOL(__up); #include -EXPORT_SYMBOL(clear_page); +EXPORT_SYMBOL(clear_pages); #ifdef CONFIG_VIRTUAL_MEM_MAP #include Index: linux-2.6.11/arch/ia64/lib/clear_page.S === --- linux-2.6.11.orig/arch/ia64/lib/clear_page.S2005-03-01 23:37:47.0 -0800 +++ linux-2.6.11/arch/ia64/lib/clear_page.S 2005-03-10 15:01:53.0 -0800 @@ -7,6 +7,7 @@ * 1/06/01 davidm Tuned for Itanium. * 2/12/02 kchen Tuned for both Itanium and McKinley * 3/08/02 davidm Some more tweaking + * 12/10/04 clameter Make it work on pages of order size */ #include @@ -29,27 +30,33 @@ #define dst4 r11 #define dst_last r31 +#define totsizer14 -GLOBAL_ENTRY(clear_page) +GLOBAL_ENTRY(clear_pages) .prologue - .regstk 1,0,0,0 - mov r16 = PAGE_SIZE/L3_LINE_SIZE-1 // main loop count, -1=repeat/until + .regstk 2,0,0,0 + mov r16 = PAGE_SIZE/L3_LINE_SIZE// main loop count + mov totsize = PAGE_SIZE .save ar.lc, saved_lc mov saved_lc = ar.lc - + ;; .body + adds dst1 = 16, in0 mov ar.lc = (PREFETCH_LINES - 1) mov dst_fetch = in0 - adds dst1 = 16, in0 adds dst2 = 32, in0 + shl r16 = r16, in1 + shl totsize = totsize, in1 ;; .fetch:stf.spill.nta [dst_fetch] = f0, L3_LINE_SIZE adds dst3 = 48, in0 // executing this multiple times is harmless br.cloop.sptk.few .fetch + add r16 = -1,r16 + add dst_last = totsize, dst_fetch + adds dst4 = 64, in0 ;; - addl dst_last = (PAGE_SIZE - PREFETCH_LINES*L3_LINE_SIZE), dst_fetch mov ar.lc = r16 // one L3 line per iteration - adds dst4 = 64, in0 + adds dst_last = -PREFETCH_LINES*L3_LINE_SIZE, dst_last ;; #ifdef CONFIG_ITANI
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Thu, 10 Mar 2005, Dave Hansen wrote: > > +extern void clear_pages (void *page, int order); > > extern void copy_page (void *to, void *from); > > +#define clear_page(__page) clear_pages(__page, 0) > > +#define __HAVE_ARCH_CLEAR_PAGES > > Although this is a simple instance, could this please be done in a > Kconfig file? If that #define happens inside of other #ifdefs, it can > be quite hard to decipher the special .config incantation to get it set. > On the other hand, if the dependencies are spelled out in a Kconfig > entry... Ok will do. > BTW, I tried applying this to 2.6.11-bk6, and it rejected: > ... > patching file include/asm-i386/page.h > Hunk #2 FAILED at 28. > 1 out of 2 hunks FAILED -- saving rejects to file > include/asm-i386/page.h.rej > ... > > There were some more rejects as well. Were there some other patches > applied first? Patches work fine here. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add a clear_pages function to clear pages of higher order
On Thu, 2005-03-10 at 12:35 -0800, Christoph Lameter wrote: > +#ifdef __HAVE_ARCH_CLEAR_PAGES > + if (!PageHighMem(page)) { > + clear_pages(page_address(page), order); > + return; > + } > +#endif > + > for(i = 0; i < (1 << order); i++) > clear_highpage(page + i); > } ... > --- linux-2.6.11.orig/include/asm-ia64/page.h 2005-03-01 23:37:48.0 > -0800 > +++ linux-2.6.11/include/asm-ia64/page.h 2005-03-10 10:57:10.0 > -0800 > @@ -56,8 +56,10 @@ > # ifdef __KERNEL__ > # define STRICT_MM_TYPECHECKS > > -extern void clear_page (void *page); > +extern void clear_pages (void *page, int order); > extern void copy_page (void *to, void *from); > +#define clear_page(__page) clear_pages(__page, 0) > +#define __HAVE_ARCH_CLEAR_PAGES Although this is a simple instance, could this please be done in a Kconfig file? If that #define happens inside of other #ifdefs, it can be quite hard to decipher the special .config incantation to get it set. On the other hand, if the dependencies are spelled out in a Kconfig entry... BTW, I tried applying this to 2.6.11-bk6, and it rejected: ... patching file include/asm-i386/page.h Hunk #2 FAILED at 28. 1 out of 2 hunks FAILED -- saving rejects to file include/asm-i386/page.h.rej ... There were some more rejects as well. Were there some other patches applied first? -- Dave - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/