Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page

2012-08-09 Thread H. Peter Anvin

On 07/20/2012 05:50 AM, Kirill A. Shutemov wrote:

From: "Kirill A. Shutemov" 

Clearing a 2MB huge page will typically blow away several levels of CPU
caches.  To avoid this only cache clear the 4K area around the fault
address and use a cache avoiding clears for the rest of the 2MB area.

It would be nice to test the patchset with more workloads. Especially if
you see performance regression with THP.

Any feedback is appreciated.

Andi Kleen (6):
   THP: Use real address for NUMA policy
   mm: make clear_huge_page tolerate non aligned address
   THP: Pass real, not rounded, address to clear_huge_page
   x86: Add clear_page_nocache
   mm: make clear_huge_page cache clear only around the fault address
   x86: switch the 64bit uncached page clear to SSE/AVX v2



This is a mix of x86-specific and generic changes... does anyone mind if 
I put this into the -tip tree?


-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page

2012-07-25 Thread Christoph Lameter
On Wed, 25 Jul 2012, Andi Kleen wrote:

> > why exempt the 4K around the fault address? Is there a regression if that
> > is not exempted?
>
> You would get an immediate cache miss when the faulting instruction
> is reexecuted.

Nope. You would not get cache misses for all cachelines in the 4k range.
Only one.

> > I guess for anonymous huge pages one may assume that there will be at
> > least one write to one cache line in the 4k page. Is it useful to get all
> > the cachelines in the page in the cache.
>
> We did some measurements -- comparing 4K and 2MB with some tracing
> of fault patterns -- and a lot of apps don't use the full 2MB area.
> The apps with THP regressions usually used less than others.
> The patchkit significantly reduced some of the regressions.

Yup they wont use the full 2MB area. But are they using all the cache
lines of the 4k page that we are making hot?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page

2012-07-25 Thread Andi Kleen
On Wed, Jul 25, 2012 at 01:51:01PM -0500, Christoph Lameter wrote:
> On Fri, 20 Jul 2012, Kirill A. Shutemov wrote:
> 
> > From: "Kirill A. Shutemov" 
> >
> > Clearing a 2MB huge page will typically blow away several levels of CPU
> > caches.  To avoid this only cache clear the 4K area around the fault
> > address and use a cache avoiding clears for the rest of the 2MB area.
> 
> why exempt the 4K around the fault address? Is there a regression if that
> is not exempted?

You would get an immediate cache miss when the faulting instruction
is reexecuted.

> 
> I guess for anonymous huge pages one may assume that there will be at
> least one write to one cache line in the 4k page. Is it useful to get all
> the cachelines in the page in the cache.

We did some measurements -- comparing 4K and 2MB with some tracing 
of fault patterns -- and a lot of apps don't use the full 2MB area. 
The apps with THP regressions usually used less than others.
The patchkit significantly reduced some of the regressions.

> 
> Also note that if we get later into hugepage use for the page cache we
> would want the cache to be cold because the contents have to come in from
> a storage medium.

Page cache is not cleared, so never runs this code.


-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page

2012-07-25 Thread Christoph Lameter
On Fri, 20 Jul 2012, Kirill A. Shutemov wrote:

> From: "Kirill A. Shutemov" 
>
> Clearing a 2MB huge page will typically blow away several levels of CPU
> caches.  To avoid this only cache clear the 4K area around the fault
> address and use a cache avoiding clears for the rest of the 2MB area.

why exempt the 4K around the fault address? Is there a regression if that
is not exempted?

I guess for anonymous huge pages one may assume that there will be at
least one write to one cache line in the 4k page. Is it useful to get all
the cachelines in the page in the cache.

Also note that if we get later into hugepage use for the page cache we
would want the cache to be cold because the contents have to come in from
a storage medium.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page

2012-07-24 Thread Kirill A. Shutemov
On Mon, Jul 23, 2012 at 04:30:20PM -0700, Andrew Morton wrote:
> On Fri, 20 Jul 2012 15:50:16 +0300
> "Kirill A. Shutemov"  wrote:
> 
> > Clearing a 2MB huge page will typically blow away several levels of CPU
> > caches.  To avoid this only cache clear the 4K area around the fault
> > address and use a cache avoiding clears for the rest of the 2MB area.
> > 
> > It would be nice to test the patchset with more workloads. Especially if
> > you see performance regression with THP.
> > 
> > Any feedback is appreciated.
> 
> This all looks pretty sane to me.  Some detail-poking from the x86 guys
> would be nice.
> 
> What do other architectures need to do?  Simply implement
> clear_page_nocache()?

And define ARCH_HAS_USER_NOCACHE to 1.

> I believe that powerpc is one, not sure about
> others.  Please update the changelogs to let arch maintainers know
> what they should do and cc those people on future versions?

Okay.

-- 
 Kirill A. Shutemov


signature.asc
Description: Digital signature


Re: [PATCH, RFC 0/6] Avoid cache trashing on clearing huge/gigantic page

2012-07-23 Thread Andrew Morton
On Fri, 20 Jul 2012 15:50:16 +0300
"Kirill A. Shutemov"  wrote:

> Clearing a 2MB huge page will typically blow away several levels of CPU
> caches.  To avoid this only cache clear the 4K area around the fault
> address and use a cache avoiding clears for the rest of the 2MB area.
> 
> It would be nice to test the patchset with more workloads. Especially if
> you see performance regression with THP.
> 
> Any feedback is appreciated.

This all looks pretty sane to me.  Some detail-poking from the x86 guys
would be nice.

What do other architectures need to do?  Simply implement
clear_page_nocache()?  I believe that powerpc is one, not sure about
others.  Please update the changelogs to let arch maintainers know
what they should do and cc those people on future versions?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/