Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Rik van Riel
On Thu, 31 Jan 2008 12:32:24 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Thu, Jan 31, 2008 at 05:52:09AM -0500, Rik van Riel wrote:

> > Don't malloc() and free() hopelessly fragment memory
> > over time, ensuring that little related data can be
> > found inside each 1MB chunk if the process is large
> > enough?  (say, firefox)
> 
> Even if they do (I don't know if it's true or not) it does not really 
> matter because on modern hard disks/systems it does not cost less to 
> transfer 1MB versus 4K. The actual threshold seems to be rising in
> fact.

That is definately true.

> The only drawback is that the swap might be full sooner, but 
> I would actually consider this a feature because it would likely
> end many prolonged oom death dances much sooner.

A second drawback would be that we evict more potentially
useful data every time we swap in a whole lot of extra
data around the little bit of data we need.

On the other hand, swapping should be the exception on
many of today's workloads.

Maybe we can measure how many of the swapped in pages end
up being used and how many are evicted again without being
used and automatically change our chunk size based on those
statistics?

I would expect most desktop systems to end up with large
chunks, because they rarely swap.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Andi Kleen
On Thu, Jan 31, 2008 at 05:52:09AM -0500, Rik van Riel wrote:
> On Thu, 31 Jan 2008 12:06:10 +0100
> Andi Kleen <[EMAIL PROTECTED]> wrote:
> 
> > > Yeah, the 2.5 switch to physical scanning killed us there.
> > > 
> > > I still don't know why my
> > > allocate-swapspace-according-to-virtual-address change didn't
> > > help.  Much.  Marcelo played with that a bit too.
> > 
> > I've been thinking about just always doing swap on > page clusters. 
> > Any reason swapping couldn't be done on e.g. 1MB chunks? 
> 
> Don't malloc() and free() hopelessly fragment memory
> over time, ensuring that little related data can be
> found inside each 1MB chunk if the process is large
> enough?  (say, firefox)

Even if they do (I don't know if it's true or not) it does not really 
matter because on modern hard disks/systems it does not cost less to 
transfer 1MB versus 4K. The actual threshold seems to be rising in fact.

The only drawback is that the swap might be full sooner, but 
I would actually consider this a feature because it would likely
end many prolonged oom death dances much sooner.

-Andi 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Rik van Riel
On Thu, 31 Jan 2008 12:06:10 +0100
Andi Kleen <[EMAIL PROTECTED]> wrote:

> > Yeah, the 2.5 switch to physical scanning killed us there.
> > 
> > I still don't know why my
> > allocate-swapspace-according-to-virtual-address change didn't
> > help.  Much.  Marcelo played with that a bit too.
> 
> I've been thinking about just always doing swap on > page clusters. 
> Any reason swapping couldn't be done on e.g. 1MB chunks? 

Don't malloc() and free() hopelessly fragment memory
over time, ensuring that little related data can be
found inside each 1MB chunk if the process is large
enough?  (say, firefox)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Andi Kleen
> Yeah, the 2.5 switch to physical scanning killed us there.
> 
> I still don't know why my allocate-swapspace-according-to-virtual-address
> change didn't help.  Much.  Marcelo played with that a bit too.

I've been thinking about just always doing swap on > page clusters. 
Any reason swapping couldn't be done on e.g. 1MB chunks? 

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Andrew Morton
On Thu, 31 Jan 2008 11:15:08 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote:

> Peter Zijlstra <[EMAIL PROTECTED]> writes:
> >
> > Ah, that is Lennarts Pulse Audio thing, he has samples in memory which
> > might not have been used for a while, and he wants to be able to
> > pre-fetch those when he suspects they might need to be played. So that
> > once the audio thread comes along and stuffs them down /dev/dsp its all
> > nice in memory.
> 
> The real problem that seems to make swapping so slow is that the data
> tends to be badly fragmented on the swap partition. I suspect if that
> problem was attached the need for such prefetching would be far less
> because swap in would be much faster.
> 

Yeah, the 2.5 switch to physical scanning killed us there.

I still don't know why my allocate-swapspace-according-to-virtual-address
change didn't help.  Much.  Marcelo played with that a bit too.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Andrew Morton
On Thu, 31 Jan 2008 11:10:13 +0100 Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> 
> On Thu, 2008-01-31 at 02:05 -0800, Andrew Morton wrote:
> > On Thu, 31 Jan 2008 10:53:26 +0100 Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > 
> > > 
> > > On Thu, 2008-01-31 at 01:47 -0800, Andrew Morton wrote:
> > > > On Thu, 31 Jan 2008 10:35:18 +0100 Peter Zijlstra <[EMAIL PROTECTED]> 
> > > > wrote:
> > > > 
> > > > > 
> > > > > On Thu, 2008-01-31 at 01:12 -0800, Andrew Morton wrote:
> > > > > 
> > > > > > Implementation-wise: make_pages_present() _can_ be converted to do 
> > > > > > this. 
> > > > > > But it's a lot of patching, and the result will be a cleaner, 
> > > > > > faster and
> > > > > > smaller core MM.  Whereas your approach is easy, but adds more code 
> > > > > > and
> > > > > > leaves the old stuff slow-and-dirty.
> > > > > > 
> > > > > > Guess which approach is preferred? ;)
> > > > > 
> > > > > Ok, I'll look at using make_pages_present().
> > > > 
> > > > Am still curious to know what inspired this change.  What are the use
> > > > cases?  Performance testing results, etc?
> > > 
> > > Ah, that is Lennarts Pulse Audio thing, he has samples in memory which
> > > might not have been used for a while, and he wants to be able to
> > > pre-fetch those when he suspects they might need to be played. So that
> > > once the audio thread comes along and stuffs them down /dev/dsp its all
> > > nice in memory.
> > > 
> > > Since its all soft real-time at best he feels its better to do a best
> > > effort at not hitting swap than it is to strain the system with mlock
> > > usage.
> > 
> > hrm.  Does he know about pthread_create()?
> 
> I'm very sure he does. So you're suggesting to just create a thread and
> touch that memory and be done with it?
> 
> Lennart?

That would get him out of trouble.  But it certainly makes _sense_ for the
kernel to implement MADV_WILLNEED for anon memory.  From a consistency POV.

But I don't know that the usefulness of the feature is worth actually
expending code on.  Heck, after five-odd years I'm still asking every
second person I meet "why don't you use fadvise()?"  (Reponse: h!)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Andi Kleen
Peter Zijlstra <[EMAIL PROTECTED]> writes:
>
> Ah, that is Lennarts Pulse Audio thing, he has samples in memory which
> might not have been used for a while, and he wants to be able to
> pre-fetch those when he suspects they might need to be played. So that
> once the audio thread comes along and stuffs them down /dev/dsp its all
> nice in memory.

The real problem that seems to make swapping so slow is that the data
tends to be badly fragmented on the swap partition. I suspect if that
problem was attached the need for such prefetching would be far less
because swap in would be much faster.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Peter Zijlstra

On Thu, 2008-01-31 at 02:05 -0800, Andrew Morton wrote:
> On Thu, 31 Jan 2008 10:53:26 +0100 Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> 
> > 
> > On Thu, 2008-01-31 at 01:47 -0800, Andrew Morton wrote:
> > > On Thu, 31 Jan 2008 10:35:18 +0100 Peter Zijlstra <[EMAIL PROTECTED]> 
> > > wrote:
> > > 
> > > > 
> > > > On Thu, 2008-01-31 at 01:12 -0800, Andrew Morton wrote:
> > > > 
> > > > > Implementation-wise: make_pages_present() _can_ be converted to do 
> > > > > this. 
> > > > > But it's a lot of patching, and the result will be a cleaner, faster 
> > > > > and
> > > > > smaller core MM.  Whereas your approach is easy, but adds more code 
> > > > > and
> > > > > leaves the old stuff slow-and-dirty.
> > > > > 
> > > > > Guess which approach is preferred? ;)
> > > > 
> > > > Ok, I'll look at using make_pages_present().
> > > 
> > > Am still curious to know what inspired this change.  What are the use
> > > cases?  Performance testing results, etc?
> > 
> > Ah, that is Lennarts Pulse Audio thing, he has samples in memory which
> > might not have been used for a while, and he wants to be able to
> > pre-fetch those when he suspects they might need to be played. So that
> > once the audio thread comes along and stuffs them down /dev/dsp its all
> > nice in memory.
> > 
> > Since its all soft real-time at best he feels its better to do a best
> > effort at not hitting swap than it is to strain the system with mlock
> > usage.
> 
> hrm.  Does he know about pthread_create()?

I'm very sure he does. So you're suggesting to just create a thread and
touch that memory and be done with it?

Lennart?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Andrew Morton
On Thu, 31 Jan 2008 10:53:26 +0100 Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> 
> On Thu, 2008-01-31 at 01:47 -0800, Andrew Morton wrote:
> > On Thu, 31 Jan 2008 10:35:18 +0100 Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > 
> > > 
> > > On Thu, 2008-01-31 at 01:12 -0800, Andrew Morton wrote:
> > > 
> > > > Implementation-wise: make_pages_present() _can_ be converted to do 
> > > > this. 
> > > > But it's a lot of patching, and the result will be a cleaner, faster and
> > > > smaller core MM.  Whereas your approach is easy, but adds more code and
> > > > leaves the old stuff slow-and-dirty.
> > > > 
> > > > Guess which approach is preferred? ;)
> > > 
> > > Ok, I'll look at using make_pages_present().
> > 
> > Am still curious to know what inspired this change.  What are the use
> > cases?  Performance testing results, etc?
> 
> Ah, that is Lennarts Pulse Audio thing, he has samples in memory which
> might not have been used for a while, and he wants to be able to
> pre-fetch those when he suspects they might need to be played. So that
> once the audio thread comes along and stuffs them down /dev/dsp its all
> nice in memory.
> 
> Since its all soft real-time at best he feels its better to do a best
> effort at not hitting swap than it is to strain the system with mlock
> usage.

hrm.  Does he know about pthread_create()?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Peter Zijlstra

On Thu, 2008-01-31 at 01:47 -0800, Andrew Morton wrote:
> On Thu, 31 Jan 2008 10:35:18 +0100 Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> 
> > 
> > On Thu, 2008-01-31 at 01:12 -0800, Andrew Morton wrote:
> > 
> > > Implementation-wise: make_pages_present() _can_ be converted to do this. 
> > > But it's a lot of patching, and the result will be a cleaner, faster and
> > > smaller core MM.  Whereas your approach is easy, but adds more code and
> > > leaves the old stuff slow-and-dirty.
> > > 
> > > Guess which approach is preferred? ;)
> > 
> > Ok, I'll look at using make_pages_present().
> 
> Am still curious to know what inspired this change.  What are the use
> cases?  Performance testing results, etc?

Ah, that is Lennarts Pulse Audio thing, he has samples in memory which
might not have been used for a while, and he wants to be able to
pre-fetch those when he suspects they might need to be played. So that
once the audio thread comes along and stuffs them down /dev/dsp its all
nice in memory.

Since its all soft real-time at best he feels its better to do a best
effort at not hitting swap than it is to strain the system with mlock
usage.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Andrew Morton
On Thu, 31 Jan 2008 10:35:18 +0100 Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> 
> On Thu, 2008-01-31 at 01:12 -0800, Andrew Morton wrote:
> 
> > Implementation-wise: make_pages_present() _can_ be converted to do this. 
> > But it's a lot of patching, and the result will be a cleaner, faster and
> > smaller core MM.  Whereas your approach is easy, but adds more code and
> > leaves the old stuff slow-and-dirty.
> > 
> > Guess which approach is preferred? ;)
> 
> Ok, I'll look at using make_pages_present().

Am still curious to know what inspired this change.  What are the use
cases?  Performance testing results, etc?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Peter Zijlstra

On Thu, 2008-01-31 at 01:12 -0800, Andrew Morton wrote:

> Implementation-wise: make_pages_present() _can_ be converted to do this. 
> But it's a lot of patching, and the result will be a cleaner, faster and
> smaller core MM.  Whereas your approach is easy, but adds more code and
> leaves the old stuff slow-and-dirty.
> 
> Guess which approach is preferred? ;)

Ok, I'll look at using make_pages_present().

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Andrew Morton
On Thu, 31 Jan 2008 09:44:00 +0100 Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> 
> On Wed, 2008-01-30 at 14:40 -0800, Andrew Morton wrote:
> > On Wed, 30 Jan 2008 18:28:59 +0100
> > Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> > 
> > > Implement MADV_WILLNEED for anonymous pages by walking the page tables and
> > > starting asynchonous swap cache reads for all encountered swap pages.
> > 
> > Why cannot this use (a perhaps suitably-modified) make_pages_present()?
> 
> Because make_pages_present() relies on page faults to bring data in and
> will thus wait for all data to be present before returning.
> 
> This solution is async; it will just issue a read for the requested
> pages and moves on.
> 

I of course realise that.  I also realise that swapin_readahead() is
_supposed_ to make the difference moot.

There's something you guys aren't telling us.  Several things, actually. 
Please don't do that.



Implementation-wise: make_pages_present() _can_ be converted to do this. 
But it's a lot of patching, and the result will be a cleaner, faster and
smaller core MM.  Whereas your approach is easy, but adds more code and
leaves the old stuff slow-and-dirty.

Guess which approach is preferred? ;)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Peter Zijlstra

On Wed, 2008-01-30 at 14:40 -0800, Andrew Morton wrote:
> On Wed, 30 Jan 2008 18:28:59 +0100
> Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> 
> > Implement MADV_WILLNEED for anonymous pages by walking the page tables and
> > starting asynchonous swap cache reads for all encountered swap pages.
> 
> Why cannot this use (a perhaps suitably-modified) make_pages_present()?

Because make_pages_present() relies on page faults to bring data in and
will thus wait for all data to be present before returning.

This solution is async; it will just issue a read for the requested
pages and moves on.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Peter Zijlstra

On Wed, 2008-01-30 at 14:40 -0800, Andrew Morton wrote:
 On Wed, 30 Jan 2008 18:28:59 +0100
 Peter Zijlstra [EMAIL PROTECTED] wrote:
 
  Implement MADV_WILLNEED for anonymous pages by walking the page tables and
  starting asynchonous swap cache reads for all encountered swap pages.
 
 Why cannot this use (a perhaps suitably-modified) make_pages_present()?

Because make_pages_present() relies on page faults to bring data in and
will thus wait for all data to be present before returning.

This solution is async; it will just issue a read for the requested
pages and moves on.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Andrew Morton
On Thu, 31 Jan 2008 09:44:00 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote:

 
 On Wed, 2008-01-30 at 14:40 -0800, Andrew Morton wrote:
  On Wed, 30 Jan 2008 18:28:59 +0100
  Peter Zijlstra [EMAIL PROTECTED] wrote:
  
   Implement MADV_WILLNEED for anonymous pages by walking the page tables and
   starting asynchonous swap cache reads for all encountered swap pages.
  
  Why cannot this use (a perhaps suitably-modified) make_pages_present()?
 
 Because make_pages_present() relies on page faults to bring data in and
 will thus wait for all data to be present before returning.
 
 This solution is async; it will just issue a read for the requested
 pages and moves on.
 

I of course realise that.  I also realise that swapin_readahead() is
_supposed_ to make the difference moot.

There's something you guys aren't telling us.  Several things, actually. 
Please don't do that.



Implementation-wise: make_pages_present() _can_ be converted to do this. 
But it's a lot of patching, and the result will be a cleaner, faster and
smaller core MM.  Whereas your approach is easy, but adds more code and
leaves the old stuff slow-and-dirty.

Guess which approach is preferred? ;)


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Peter Zijlstra

On Thu, 2008-01-31 at 01:12 -0800, Andrew Morton wrote:

 Implementation-wise: make_pages_present() _can_ be converted to do this. 
 But it's a lot of patching, and the result will be a cleaner, faster and
 smaller core MM.  Whereas your approach is easy, but adds more code and
 leaves the old stuff slow-and-dirty.
 
 Guess which approach is preferred? ;)

Ok, I'll look at using make_pages_present().

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Andrew Morton
On Thu, 31 Jan 2008 10:35:18 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote:

 
 On Thu, 2008-01-31 at 01:12 -0800, Andrew Morton wrote:
 
  Implementation-wise: make_pages_present() _can_ be converted to do this. 
  But it's a lot of patching, and the result will be a cleaner, faster and
  smaller core MM.  Whereas your approach is easy, but adds more code and
  leaves the old stuff slow-and-dirty.
  
  Guess which approach is preferred? ;)
 
 Ok, I'll look at using make_pages_present().

Am still curious to know what inspired this change.  What are the use
cases?  Performance testing results, etc?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Peter Zijlstra

On Thu, 2008-01-31 at 01:47 -0800, Andrew Morton wrote:
 On Thu, 31 Jan 2008 10:35:18 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote:
 
  
  On Thu, 2008-01-31 at 01:12 -0800, Andrew Morton wrote:
  
   Implementation-wise: make_pages_present() _can_ be converted to do this. 
   But it's a lot of patching, and the result will be a cleaner, faster and
   smaller core MM.  Whereas your approach is easy, but adds more code and
   leaves the old stuff slow-and-dirty.
   
   Guess which approach is preferred? ;)
  
  Ok, I'll look at using make_pages_present().
 
 Am still curious to know what inspired this change.  What are the use
 cases?  Performance testing results, etc?

Ah, that is Lennarts Pulse Audio thing, he has samples in memory which
might not have been used for a while, and he wants to be able to
pre-fetch those when he suspects they might need to be played. So that
once the audio thread comes along and stuffs them down /dev/dsp its all
nice in memory.

Since its all soft real-time at best he feels its better to do a best
effort at not hitting swap than it is to strain the system with mlock
usage.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Andrew Morton
On Thu, 31 Jan 2008 10:53:26 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote:

 
 On Thu, 2008-01-31 at 01:47 -0800, Andrew Morton wrote:
  On Thu, 31 Jan 2008 10:35:18 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote:
  
   
   On Thu, 2008-01-31 at 01:12 -0800, Andrew Morton wrote:
   
Implementation-wise: make_pages_present() _can_ be converted to do 
this. 
But it's a lot of patching, and the result will be a cleaner, faster and
smaller core MM.  Whereas your approach is easy, but adds more code and
leaves the old stuff slow-and-dirty.

Guess which approach is preferred? ;)
   
   Ok, I'll look at using make_pages_present().
  
  Am still curious to know what inspired this change.  What are the use
  cases?  Performance testing results, etc?
 
 Ah, that is Lennarts Pulse Audio thing, he has samples in memory which
 might not have been used for a while, and he wants to be able to
 pre-fetch those when he suspects they might need to be played. So that
 once the audio thread comes along and stuffs them down /dev/dsp its all
 nice in memory.
 
 Since its all soft real-time at best he feels its better to do a best
 effort at not hitting swap than it is to strain the system with mlock
 usage.

hrm.  Does he know about pthread_create()?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Peter Zijlstra

On Thu, 2008-01-31 at 02:05 -0800, Andrew Morton wrote:
 On Thu, 31 Jan 2008 10:53:26 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote:
 
  
  On Thu, 2008-01-31 at 01:47 -0800, Andrew Morton wrote:
   On Thu, 31 Jan 2008 10:35:18 +0100 Peter Zijlstra [EMAIL PROTECTED] 
   wrote:
   

On Thu, 2008-01-31 at 01:12 -0800, Andrew Morton wrote:

 Implementation-wise: make_pages_present() _can_ be converted to do 
 this. 
 But it's a lot of patching, and the result will be a cleaner, faster 
 and
 smaller core MM.  Whereas your approach is easy, but adds more code 
 and
 leaves the old stuff slow-and-dirty.
 
 Guess which approach is preferred? ;)

Ok, I'll look at using make_pages_present().
   
   Am still curious to know what inspired this change.  What are the use
   cases?  Performance testing results, etc?
  
  Ah, that is Lennarts Pulse Audio thing, he has samples in memory which
  might not have been used for a while, and he wants to be able to
  pre-fetch those when he suspects they might need to be played. So that
  once the audio thread comes along and stuffs them down /dev/dsp its all
  nice in memory.
  
  Since its all soft real-time at best he feels its better to do a best
  effort at not hitting swap than it is to strain the system with mlock
  usage.
 
 hrm.  Does he know about pthread_create()?

I'm very sure he does. So you're suggesting to just create a thread and
touch that memory and be done with it?

Lennart?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Andi Kleen
Peter Zijlstra [EMAIL PROTECTED] writes:

 Ah, that is Lennarts Pulse Audio thing, he has samples in memory which
 might not have been used for a while, and he wants to be able to
 pre-fetch those when he suspects they might need to be played. So that
 once the audio thread comes along and stuffs them down /dev/dsp its all
 nice in memory.

The real problem that seems to make swapping so slow is that the data
tends to be badly fragmented on the swap partition. I suspect if that
problem was attached the need for such prefetching would be far less
because swap in would be much faster.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Andrew Morton
On Thu, 31 Jan 2008 11:10:13 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote:

 
 On Thu, 2008-01-31 at 02:05 -0800, Andrew Morton wrote:
  On Thu, 31 Jan 2008 10:53:26 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote:
  
   
   On Thu, 2008-01-31 at 01:47 -0800, Andrew Morton wrote:
On Thu, 31 Jan 2008 10:35:18 +0100 Peter Zijlstra [EMAIL PROTECTED] 
wrote:

 
 On Thu, 2008-01-31 at 01:12 -0800, Andrew Morton wrote:
 
  Implementation-wise: make_pages_present() _can_ be converted to do 
  this. 
  But it's a lot of patching, and the result will be a cleaner, 
  faster and
  smaller core MM.  Whereas your approach is easy, but adds more code 
  and
  leaves the old stuff slow-and-dirty.
  
  Guess which approach is preferred? ;)
 
 Ok, I'll look at using make_pages_present().

Am still curious to know what inspired this change.  What are the use
cases?  Performance testing results, etc?
   
   Ah, that is Lennarts Pulse Audio thing, he has samples in memory which
   might not have been used for a while, and he wants to be able to
   pre-fetch those when he suspects they might need to be played. So that
   once the audio thread comes along and stuffs them down /dev/dsp its all
   nice in memory.
   
   Since its all soft real-time at best he feels its better to do a best
   effort at not hitting swap than it is to strain the system with mlock
   usage.
  
  hrm.  Does he know about pthread_create()?
 
 I'm very sure he does. So you're suggesting to just create a thread and
 touch that memory and be done with it?
 
 Lennart?

That would get him out of trouble.  But it certainly makes _sense_ for the
kernel to implement MADV_WILLNEED for anon memory.  From a consistency POV.

But I don't know that the usefulness of the feature is worth actually
expending code on.  Heck, after five-odd years I'm still asking every
second person I meet why don't you use fadvise()?  (Reponse: h!)


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Andrew Morton
On Thu, 31 Jan 2008 11:15:08 +0100 Andi Kleen [EMAIL PROTECTED] wrote:

 Peter Zijlstra [EMAIL PROTECTED] writes:
 
  Ah, that is Lennarts Pulse Audio thing, he has samples in memory which
  might not have been used for a while, and he wants to be able to
  pre-fetch those when he suspects they might need to be played. So that
  once the audio thread comes along and stuffs them down /dev/dsp its all
  nice in memory.
 
 The real problem that seems to make swapping so slow is that the data
 tends to be badly fragmented on the swap partition. I suspect if that
 problem was attached the need for such prefetching would be far less
 because swap in would be much faster.
 

Yeah, the 2.5 switch to physical scanning killed us there.

I still don't know why my allocate-swapspace-according-to-virtual-address
change didn't help.  Much.  Marcelo played with that a bit too.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Andi Kleen
 Yeah, the 2.5 switch to physical scanning killed us there.
 
 I still don't know why my allocate-swapspace-according-to-virtual-address
 change didn't help.  Much.  Marcelo played with that a bit too.

I've been thinking about just always doing swap on  page clusters. 
Any reason swapping couldn't be done on e.g. 1MB chunks? 

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Rik van Riel
On Thu, 31 Jan 2008 12:06:10 +0100
Andi Kleen [EMAIL PROTECTED] wrote:

  Yeah, the 2.5 switch to physical scanning killed us there.
  
  I still don't know why my
  allocate-swapspace-according-to-virtual-address change didn't
  help.  Much.  Marcelo played with that a bit too.
 
 I've been thinking about just always doing swap on  page clusters. 
 Any reason swapping couldn't be done on e.g. 1MB chunks? 

Don't malloc() and free() hopelessly fragment memory
over time, ensuring that little related data can be
found inside each 1MB chunk if the process is large
enough?  (say, firefox)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Rik van Riel
On Thu, 31 Jan 2008 12:32:24 +0100
Andi Kleen [EMAIL PROTECTED] wrote:
 On Thu, Jan 31, 2008 at 05:52:09AM -0500, Rik van Riel wrote:

  Don't malloc() and free() hopelessly fragment memory
  over time, ensuring that little related data can be
  found inside each 1MB chunk if the process is large
  enough?  (say, firefox)
 
 Even if they do (I don't know if it's true or not) it does not really 
 matter because on modern hard disks/systems it does not cost less to 
 transfer 1MB versus 4K. The actual threshold seems to be rising in
 fact.

That is definately true.

 The only drawback is that the swap might be full sooner, but 
 I would actually consider this a feature because it would likely
 end many prolonged oom death dances much sooner.

A second drawback would be that we evict more potentially
useful data every time we swap in a whole lot of extra
data around the little bit of data we need.

On the other hand, swapping should be the exception on
many of today's workloads.

Maybe we can measure how many of the swapped in pages end
up being used and how many are evicted again without being
used and automatically change our chunk size based on those
statistics?

I would expect most desktop systems to end up with large
chunks, because they rarely swap.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-31 Thread Andi Kleen
On Thu, Jan 31, 2008 at 05:52:09AM -0500, Rik van Riel wrote:
 On Thu, 31 Jan 2008 12:06:10 +0100
 Andi Kleen [EMAIL PROTECTED] wrote:
 
   Yeah, the 2.5 switch to physical scanning killed us there.
   
   I still don't know why my
   allocate-swapspace-according-to-virtual-address change didn't
   help.  Much.  Marcelo played with that a bit too.
  
  I've been thinking about just always doing swap on  page clusters. 
  Any reason swapping couldn't be done on e.g. 1MB chunks? 
 
 Don't malloc() and free() hopelessly fragment memory
 over time, ensuring that little related data can be
 found inside each 1MB chunk if the process is large
 enough?  (say, firefox)

Even if they do (I don't know if it's true or not) it does not really 
matter because on modern hard disks/systems it does not cost less to 
transfer 1MB versus 4K. The actual threshold seems to be rising in fact.

The only drawback is that the swap might be full sooner, but 
I would actually consider this a feature because it would likely
end many prolonged oom death dances much sooner.

-Andi 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-30 Thread Andrew Morton
On Wed, 30 Jan 2008 18:28:59 +0100
Peter Zijlstra <[EMAIL PROTECTED]> wrote:

> Implement MADV_WILLNEED for anonymous pages by walking the page tables and
> starting asynchonous swap cache reads for all encountered swap pages.

Why cannot this use (a perhaps suitably-modified) make_pages_present()?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-30 Thread Matt Mackall

On Wed, 2008-01-30 at 18:28 +0100, Peter Zijlstra wrote:
> Subject: mm: MADV_WILLNEED implementation for anonymous memory
> 
> Implement MADV_WILLNEED for anonymous pages by walking the page tables and
> starting asynchonous swap cache reads for all encountered swap pages.
> 
> Doing so required a modification to the page table walking library functions.
> Previously ->pte_entry() could be called while holding a kmap_atomic, to
> overcome this problem the pte walker is changed to copy batches of the pmd
> and iterate them.

That's a pretty reasonable approach. My original approach was to buffer
a page worth of PTEs with all the attendant malloc annoyances. Then
Andrew and I came up with another fix a bit ago by effectively doing a
batch of size 1: mapping and immediately unmapping per PTE. That's
basically a no-op on !HIGHPTE but could potentially be expensive in the
HIGHPTE case. Your approach might be a good complexity/performance
middle ground.

Unfortunately, I think we only implemented our fix in one of the
relevant places: the /proc/pid/pagemap code hooks a callback at the pte
table level and then does its own walk across the table. Perhaps I
should refactor this so that it hooks in at the pte entry level of the
walker instead.

> +/*
> + * Much of the complication here is to work around CONFIG_HIGHPTE which needs
> + * to kmap the pmd. So copy batches of ptes from the pmd and iterate over
> + * those.
> + */
> +#define WALK_BATCH_SIZE  32
> +
>  static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
> const struct mm_walk *walk, void *private)
>  {
>   pte_t *pte;
> + pte_t ptes[WALK_BATCH_SIZE];
> + unsigned long start;
> + unsigned int i;
>   int err = 0;
>  
> - pte = pte_offset_map(pmd, addr);
>   do {
> - err = walk->pte_entry(pte, addr, addr + PAGE_SIZE, private);
> - if (err)
> -break;
> - } while (pte++, addr += PAGE_SIZE, addr != end);
> + start = addr;
>  
> - pte_unmap(pte);
> + pte = pte_offset_map(pmd, addr);
> + for (i = 0; i < WALK_BATCH_SIZE && addr != end;
> + i++, pte++, addr += PAGE_SIZE)
> + ptes[i] = *pte;

Looks like this could be:

for (i = 0; i < WALK_BATCH_SIZE && addr + i * PAGE_SIZE != end; 
i++)
ptes[i] = pte[i];

> + pte_unmap(pte);
> +
> + for (i = 0, pte = ptes, addr = start;
> + i < WALK_BATCH_SIZE && addr != end;
> + i++, pte++, addr += PAGE_SIZE) {
> + err = walk->pte_entry(pte, addr, addr + PAGE_SIZE,
> + private);
for (i = 0; i < WALK_BATCH_SIZE && addr != end;
i++, addr+= PAGE_SIZE) {
err = walk->pte_entry(ptes[i], addr, addr + PAGE_SIZE,
private);

And we can ditch start.

Also, one wonders if setting batch size to 1 will then convince the
compiler to collapse this into a more trivial loop in the !HIGHPTE case.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-30 Thread Matt Mackall

On Wed, 2008-01-30 at 18:28 +0100, Peter Zijlstra wrote:
 Subject: mm: MADV_WILLNEED implementation for anonymous memory
 
 Implement MADV_WILLNEED for anonymous pages by walking the page tables and
 starting asynchonous swap cache reads for all encountered swap pages.
 
 Doing so required a modification to the page table walking library functions.
 Previously -pte_entry() could be called while holding a kmap_atomic, to
 overcome this problem the pte walker is changed to copy batches of the pmd
 and iterate them.

That's a pretty reasonable approach. My original approach was to buffer
a page worth of PTEs with all the attendant malloc annoyances. Then
Andrew and I came up with another fix a bit ago by effectively doing a
batch of size 1: mapping and immediately unmapping per PTE. That's
basically a no-op on !HIGHPTE but could potentially be expensive in the
HIGHPTE case. Your approach might be a good complexity/performance
middle ground.

Unfortunately, I think we only implemented our fix in one of the
relevant places: the /proc/pid/pagemap code hooks a callback at the pte
table level and then does its own walk across the table. Perhaps I
should refactor this so that it hooks in at the pte entry level of the
walker instead.

 +/*
 + * Much of the complication here is to work around CONFIG_HIGHPTE which needs
 + * to kmap the pmd. So copy batches of ptes from the pmd and iterate over
 + * those.
 + */
 +#define WALK_BATCH_SIZE  32
 +
  static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 const struct mm_walk *walk, void *private)
  {
   pte_t *pte;
 + pte_t ptes[WALK_BATCH_SIZE];
 + unsigned long start;
 + unsigned int i;
   int err = 0;
  
 - pte = pte_offset_map(pmd, addr);
   do {
 - err = walk-pte_entry(pte, addr, addr + PAGE_SIZE, private);
 - if (err)
 -break;
 - } while (pte++, addr += PAGE_SIZE, addr != end);
 + start = addr;
  
 - pte_unmap(pte);
 + pte = pte_offset_map(pmd, addr);
 + for (i = 0; i  WALK_BATCH_SIZE  addr != end;
 + i++, pte++, addr += PAGE_SIZE)
 + ptes[i] = *pte;

Looks like this could be:

for (i = 0; i  WALK_BATCH_SIZE  addr + i * PAGE_SIZE != end; 
i++)
ptes[i] = pte[i];

 + pte_unmap(pte);
 +
 + for (i = 0, pte = ptes, addr = start;
 + i  WALK_BATCH_SIZE  addr != end;
 + i++, pte++, addr += PAGE_SIZE) {
 + err = walk-pte_entry(pte, addr, addr + PAGE_SIZE,
 + private);
for (i = 0; i  WALK_BATCH_SIZE  addr != end;
i++, addr+= PAGE_SIZE) {
err = walk-pte_entry(ptes[i], addr, addr + PAGE_SIZE,
private);

And we can ditch start.

Also, one wonders if setting batch size to 1 will then convince the
compiler to collapse this into a more trivial loop in the !HIGHPTE case.

-- 
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: MADV_WILLNEED implementation for anonymous memory

2008-01-30 Thread Andrew Morton
On Wed, 30 Jan 2008 18:28:59 +0100
Peter Zijlstra [EMAIL PROTECTED] wrote:

 Implement MADV_WILLNEED for anonymous pages by walking the page tables and
 starting asynchonous swap cache reads for all encountered swap pages.

Why cannot this use (a perhaps suitably-modified) make_pages_present()?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/