subject:"Re\: VM Requirement Document \- v0.0"

Re: VM Requirement Document - v0.0

2001-07-08 Thread Pavel Machek


Hi!

>   Here's my first pass at a VM requirements document,
> for the embedded, desktop, and server cases. At the end is 
> a summary of general rules that should take care of all of 
> these cases.
> 
> Bandwidth Descriptions:
> 
>   immediate: RAM, on-chip cache, etc. 
>   fast:  Flash reads, ROMs, etc.

Flash reads aresometimes pretty slow. (Flash over IDE over PCMCIA...2MB/sec 
bandwidth. Slower than most harddrives.

>   medium:Hard drives, CD-ROMs, 100Mb ethernet, etc.

CDroms are way slower than harddrives (mostly to seek times).

>   slow:  Flash writes, floppy disks,  CD-WR burners
>   packeted:  Reads/write should be in as large a packet as possible
> 
> Embedded Case
> -
> 
>   Overview
>   
> In the embedded case, the primary VM motiviation is to
>   use as _little_ caching of the filesystem for reads as
>   possible because (a) reads are very fast and (b) we don't
>   have any swap. However, we want to cache _writes_ as hard
>   as possible, because Flash is slow, and prone to wear.
> 
>   Machine Description
>   --
>   RAM:4-64Mb   (reads: immediate, writes: immediate)

MB not Mb. 4Mb = 0.5MB.

>   Flash:  4-128Mb  (reads: fast, writes: slow, packeted)
>   CDROM:  640-800Mb (reads: medium)
>   Swap:   0Mb
> 
>   Motiviations
>   
>   * Don't write to the (slow,packeted) devices until
> you need to free up memory for processes.
>   * Never cache reads from immediate/fast devices.

Flash connected over PCMCIA over IDE is *very* slow. You must cache it.

-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-06 Thread Daniel Phillips

On Friday 06 July 2001 21:09, Rik van Riel wrote:
> On Thu, 5 Jul 2001, Daniel Phillips wrote:
> > Let me comment on this again, having spent a couple of minutes
> > more thinking about it.  Would you be happy paying 1% of your
> > battery life to get 80% less sluggish response after a memory
> > pig exits?
>
> Just to pull a few random numbers out of my ass too,
> how about 50% of battery life for the same optimistic
> 80% less sluggishness ?
>
> How about if it were only 30% of battery life?

It's not as random as that.  The idea being considered was: suppose a 
program starts up, goes through a period of intense, cache-sucking 
activity, then exits.  Could we reload the applications it just 
displaced so that the disk activity to reload them doesn't have to take 
place the first time the user touches the keyboard/mouse.  Sure, we 
obviously can, with how much complexity is another question entirely ;-)

So probably, we could eliminate more than 80% of the latency we now see 
in such a situation, I was being conservative.

Now what's the cost in battery life?  Suppose it's a 128 meg machine, 
1/3 filled with program text and data.  Hopefully, the working sets 
that were evicted are largely coherent so we'll read it back in at a 
rate not too badly degraded from the drive's transfer rate, say 5 
MB/sec.  This gives about three seconds of intense reading to restore 
something resembling the previous working set, then the disk can spin 
down and perhaps the machine will suspend itself.  So the question is, 
how much longer did the machine have to run to do this?  Well, on my 
machine updatedb takes 5-10 minutes, so the 3 seconds of activity 
tacked onto the end of the episode amounts to less than 1%, and this is 
where the 1% figure came from.

I'm not saying this would be an easy hack, just that it's possible and 
the numbers work.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-06 Thread Rik van Riel


On Thu, 5 Jul 2001, Daniel Phillips wrote:

> Let me comment on this again, having spent a couple of minutes
> more thinking about it.  Would you be happy paying 1% of your
> battery life to get 80% less sluggish response after a memory
> pig exits?

Just to pull a few random numbers out of my ass too,
how about 50% of battery life for the same optimistic
80% less sluggishness ?

How about if it were only 30% of battery life?

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-05 Thread mike_phillips


> Well, on a laptop memory and disk bandwith are rarely wasted - they cost
> battery life.

I've been playing around with different scenarios to see the differences 
in performance. A good way to trigger the cache problem is to untar a 
couple of kernel source trees or other large amounts of files, until free 
memory is down to less than 2mb. Then try to fire up a few apps that need 
some memory. The hard drive thrashes around as the VM tries to free up 
enough space, often using swap instead of flushing out the cache. 

These source trees can then be deleted which frees up the memory the cache 
was using and performance returns to where it should be. 

However, if I just fire up enough apps to use up all the memory and then 
go into swap, response is still acceptable. If the app requires loading 
from swap there is just a short lag while the VM does its thing and then 
life is good. 

I don't expect to be able to run more apps than I have memory for without 
a performance hit, but I do expect to be able to run with over 128MB of 
"real" free memory and not suffer from performance degradation (which 
doesn't happen at present)

Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-05 Thread Alan Shutko


Daniel Phillips <[EMAIL PROTECTED]> writes:

> Also, notice that the scenario we were originally discussing, the off-hours 
> updatedb, doesn't normally happen on laptops because they tend to be 
> suspended at that time.

No, even worse, it happens when you open the laptop for the first time
in the morning, thanks to anacron.

-- 
Alan Shutko <[EMAIL PROTECTED]> - In a variety of flavors!
For children with short attention spans: boomerangs that don't come back.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-05 Thread Daniel Phillips


On Thursday 05 July 2001 17:00, Xavier Bestel wrote:
> On 05 Jul 2001 17:04:00 +0200, Daniel Phillips wrote:
> > Also, notice that the scenario we were originally discussing, the
> > off-hours updatedb, doesn't normally happen on laptops because they tend
> > to be suspended at that time.
>
> Suspended != halted. The updatedb stuff starts over when I bring it back
> to life (RH6.2, dunno for other distribs)

Yes, but then it's normally overlapped with other work you are doing, like 
trying to read your mail.  Different problem, one we also perform poorly at 
but for different reasons.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-05 Thread Xavier Bestel


On 05 Jul 2001 17:04:00 +0200, Daniel Phillips wrote:
> > Well, on a laptop memory and disk bandwith are rarely wasted - they cost
> > battery life.
> 
> Let me comment on this again, having spent a couple of minutes more 
> thinking about it.  Would you be happy paying 1% of your battery life to get 
> 80% less sluggish response after a memory pig exits?

Told like this, of course I agree !

> Also, notice that the scenario we were originally discussing, the off-hours 
> updatedb, doesn't normally happen on laptops because they tend to be 
> suspended at that time.

Suspended != halted. The updatedb stuff starts over when I bring it back
to life (RH6.2, dunno for other distribs)

Xav

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-05 Thread Daniel Phillips


On Thursday 05 July 2001 16:00, Xavier Bestel wrote:
> On 05 Jul 2001 15:02:51 +0200, Daniel Phillips wrote:
> > Here's an idea I just came up with while I was composing this... along
> > the lines of using unused bandwidth for something that at least has a
> > chance of being useful.  Suppose we come to the end of a period of
> > activity, the general 'temperature' starts to drop and disks fall idle. 
> > At this point we could consult a history of which currently running
> > processes have been historically active and grow their working sets by
> > reading in from disk. Otherwise, the memory and the disk bandwidth is
> > just wasted, right?  This we can do inside the kernel and not require
> > coders to mess up their apps with hints.  Of course, they should still
> > take the time to reengineer them to reduce the cache footprint.
>
> Well, on a laptop memory and disk bandwith are rarely wasted - they cost
> battery life.

Let me comment on this again, having spent a couple of minutes more 
thinking about it.  Would you be happy paying 1% of your battery life to get 
80% less sluggish response after a memory pig exits?

Also, notice that the scenario we were originally discussing, the off-hours 
updatedb, doesn't normally happen on laptops because they tend to be 
suspended at that time.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-05 Thread Daniel Phillips


On Thursday 05 July 2001 16:00, Xavier Bestel wrote:
> On 05 Jul 2001 15:02:51 +0200, Daniel Phillips wrote:
> > Here's an idea I just came up with while I was composing this... along
> > the lines of using unused bandwidth for something that at least has a
> > chance of being useful.  Suppose we come to the end of a period of
> > activity, the general 'temperature' starts to drop and disks fall idle. 
> > At this point we could consult a history of which currently running
> > processes have been historically active and grow their working sets by
> > reading in from disk. Otherwise, the memory and the disk bandwidth is
> > just wasted, right?  This we can do inside the kernel and not require
> > coders to mess up their apps with hints.  Of course, they should still
> > take the time to reengineer them to reduce the cache footprint.
>
> Well, on a laptop memory and disk bandwith are rarely wasted - they cost
> battery life.

Then turn the feature off.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-05 Thread Xavier Bestel


On 05 Jul 2001 15:02:51 +0200, Daniel Phillips wrote:
> Here's an idea I just came up with while I was composing this... along the 
> lines of using unused bandwidth for something that at least has a chance of 
> being useful.  Suppose we come to the end of a period of activity, the 
> general 'temperature' starts to drop and disks fall idle.  At this point we 
> could consult a history of which currently running processes have been 
> historically active and grow their working sets by reading in from disk.  
> Otherwise, the memory and the disk bandwidth is just wasted, right?  This we 
> can do inside the kernel and not require coders to mess up their apps with 
> hints.  Of course, they should still take the time to reengineer them to 
> reduce the cache footprint.

Well, on a laptop memory and disk bandwith are rarely wasted - they cost
battery life.

Xav

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-05 Thread Daniel Phillips

On Thursday 05 July 2001 03:49, you wrote:
> > Getting the user's "interactive" programs loaded back
> > in afterwards is a separate, much more difficult problem
> > IMHO, but no doubt still has a reasonable solution.
>
> Possibly stupid suggestion... Maybe the interactive/GUI programs should
> wake up once in a while and touch a couple of their pages? Go too far with
> this and you'll just get in the way of performance, but I don't think it
> would hurt to have processes waking up every couple of minutes and touching
> glibc, libqt, libgtk, etc so they stay hot in memory... A very slow
> incremental "caress" of the address space could eliminate the
> "I-just-logged-in-this-morning-and-dammit-everything-has-been-paged-out"
> problem.

Personally, I'm in idea collection mode for that one.  First things first, 
from my point of view, our basic replacement policy seems to be broken.  The 
algorithms seem to be burning too much cpu and not doing enough useful work.  
Worse, they seem to have a nasty tendency to livelock themselves, i.e., get 
into situations where the mm is doing little other than scanning and 
transfering pages from list to list.  IMHO, if these things were fixed much 
of the 'interactive problem' would go away because reloading the working set 
for the mouse, for example, would just take a few milliseconds.  If not then 
we should take a good hard look at why the desktops have such poor working 
set granularity.

Furthermore, approaches that rely on applications touching what they believe 
to be their own working sets aren't going to work very well if the mm 
incorrectly processes the page reference information, or incorectly balances 
it against other things that might be going on, so lets be sure the basics 
are working properly.  Marcello has the right idea with his attention to 
better memory management statistical monitoring.  How nice it would be if he 
got together with the guy working on the tracing module...

That said, yes, it's good to think about hinting ideas, and maybe bless the 
idea of applications 'touching themselves' (yes, the allusion was 
intentional).

Here's an idea I just came up with while I was composing this... along the 
lines of using unused bandwidth for something that at least has a chance of 
being useful.  Suppose we come to the end of a period of activity, the 
general 'temperature' starts to drop and disks fall idle.  At this point we 
could consult a history of which currently running processes have been 
historically active and grow their working sets by reading in from disk.  
Otherwise, the memory and the disk bandwidth is just wasted, right?  This we 
can do inside the kernel and not require coders to mess up their apps with 
hints.  Of course, they should still take the time to reengineer them to 
reduce the cache footprint.

/me decides to stop spouting and write some code

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-04 Thread Dan Maas


> Getting the user's "interactive" programs loaded back
> in afterwards is a separate, much more difficult problem
> IMHO, but no doubt still has a reasonable solution.

Possibly stupid suggestion... Maybe the interactive/GUI programs should wake
up once in a while and touch a couple of their pages? Go too far with this
and you'll just get in the way of performance, but I don't think it would
hurt to have processes waking up every couple of minutes and touching glibc,
libqt, libgtk, etc so they stay hot in memory... A very slow incremental
"caress" of the address space could eliminate the
"I-just-logged-in-this-morning-and-dammit-everything-has-been-paged-out"
problem.

Regards,
Dan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-04 Thread mike_phillips


> Remember that the first message was about a laptop. At 4:00AM there's
> no activity but the updatedb one (and the other cron jobs). Simply,
> there's no 'accessed-often' data.  Moreover, I'd bet that 90% of the
> metadata touched by updatedb won't be accessed at all in the future.
> Laptop users don't do find /usr/share/terminfo/ so often.

Maybe, but I would think that most laptops get switched off at night. Then 
when turned on again in the morning, anacron realizes it missed the 
nightly cron jobs and then runs everything. 

This really does make an incredible difference to the system. If I remove 
the updatedb job from cron.daily, the machine won't touch swap all day and 
runs like charm. (That's with vmware, mozilla, openoffice, all 
applications that like big chunks of memory)

Mike



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-04 Thread Daniel Phillips


On Wednesday 04 July 2001 11:41, Marco Colombo wrote:
> On Tue, 3 Jul 2001, Daniel Phillips wrote:
> > On Tuesday 03 July 2001 12:33, Marco Colombo wrote:
> > > Oh, yes, since that PAGE_AGE_BG_INTERACTIVE_MINIMUM is applied only
> > > when background aging, maybe it's not enough to keep processes like
> > > updatedb from causing interactive pages to be evicted.
> > > That's why I said we should have another way to detect that kind of
> > > activity... well, the application could just let us know (no need to
> > > embed an autotuning-genetic-page-replacement-optimizer into the
> > > kernel). We should just drop all FS metadata accessed by updatedb,
> > > since we know that's one-shot only, without raising pressure at all.
> >
> > Note that some of updatedb's metadata pages are of the accessed-often
> > kind, e.g., directory blocks and inodes.  A blanket low priority on all
> > the pages updatedb touches just won't do.
>
> Remember that the first message was about a laptop. At 4:00AM there's
> no activity but the updatedb one (and the other cron jobs). Simply,
> there's no 'accessed-often' data.  Moreover, I'd bet that 90% of the
> metadata touched by updatedb won't be accessed at all in the future.
> Laptop users don't do find /usr/share/terminfo/ so often.

The problem is when you have a directory block, say, that has to stay around 
quite a few seconds before dropping into disuse.  You sure don't want that 
block treated as 'accessed-once'.

The goal here is to get through the updatedb as quickly as possible.  Getting 
the user's "interactive" programs loaded back in afterwards is a separate, 
much more difficult problem IMHO, but no doubt still has a reasonable 
solution.  I'm not that worried about it, my feeling is: if we fix up the MM 
so it doesn't bog down with a lot of pages in cache and, in addition, do 
better readahead, interactive performance will be just fine.

> > > Just like
> > > (not that I'm proposing it) putting those "one-shot" pages directly on
> > > the inactive-clean list instead of the active list. How an application
> > > could declare such a behaviour is an open question, of course. Maybe
> > > it's even possible to detect it. And BTW that's really fine tuning.
> > > Evicting an 8 hours old page may be a mistake sometime, but it's never
> > > a *big* mistake.
> >
> > IMHO, updatedb *should* evict all the "interactive" pages that aren't
> > actually doing anything[1].  That way it should run faster, provided of
> > course its accessed-once pages are properly given low priority.
>
> So in the morning you find your Gnome session completely on swap,
> and at the same time a lot of free mem.
>
> > I see three page priority levels:
> >
> >   0 - accessed-never/aged to zero
> >   1 - accessed-once/just loaded
> >   2 - accessed-often
> >
> > with these transitions:
> >
> >   0 -> 1, if a page is accessed
> >   1 -> 2, if a page is accessed a second time
> >   1 -> 0, if a page gets old
> >   2 -> 0, if a page gets old
> >
> > The 0 and 1 level pages are on a fifo queue, the 2 level pages are
> > scanned clock-wise, relying on the age computation[2].  Eviction
> > candidates are taken from the cold end of the 0 level list, unless it is
> > empty, in which case they are taken from the 1 level list. In
> > desperation, eviction candidates are taken from the 2 level list, i.e.,
> > random eviction policy, as opposed to what we do now which is to initiate
> > an emergency scan of the active list for new inactive candidates - rather
> > like calling a quick board meeting when the building is on fire.
>
> Well, it's just aging faster when it's needed. Random evicting is not
> good.

It's better than getting bogged down in scanning latency just at the point 
you should be starting new writeouts.  Obviously, it's a tradeoff.

> List 2 is ordered by age, and there're always better candidates
> at the end of the list than at the front. The higher the pressure,
> the shorter is the time a page has to rest idle to get at the end of the
> list. But the list *is* ordered.

No, list 2 is randomly ordered.  Pages move from the initial trial list to 
the active list with 0 temperature, and drop in just behind the one-hand scan 
pointer (which we actually implement as the head of the list).  After that 
they get "aged" up or down as we do now.  (New improved terminology: heated 
or cooled according to the referenced bit.)

> > Note that the above is only a very slight departure from the current
> > design. And by the way, this is just brainstorming, it hasn't reached the
> > 'proposal' stage yet.
> >
> > [1] It would be nice to have a mechanism whereby the evicted
> > 'interactive' pages are automatically reloaded when updatedb has finished
> > its work.  This is a case of scavenging unused disk bandwidth for
> > something useful, i.e., improving the interactive experience.
>
> updatedb doesn't really need all the memory it takes. All it needs is
> a small buffer to sequentially scan all the disk. So we s

Re: VM Requirement Document - v0.0

2001-07-04 Thread Daniel Phillips


On Wednesday 04 July 2001 10:32, Marco Colombo wrote:
> On Tue, 3 Jul 2001, Daniel Phillips wrote:
> > On Monday 02 July 2001 20:42, Rik van Riel wrote:
> > > On Thu, 28 Jun 2001, Marco Colombo wrote:
> > > > I'm not sure that, in general, recent pages with only one access are
> > > > still better eviction candidates compared to 8 hours old pages. Here
> > > > we need either another way to detect one-shot activity (like the one
> > > > performed by updatedb),
> > >
> > > Fully agreed, but there is one problem with this idea.
> > > Suppose you have a maximum of 20% of your RAM for these
> > > "one-shot" things, now how are you going to be able to
> > > page in an application with a working set of, say, 25%
> > > the size of RAM ?
> >
> > Easy.  What's the definition of working set?  Those pages that are
> > frequently referenced.  So as the application starts up some of its pages
> > will get promoted from used-once to used-often.  (On the other hand, the
> > target behavior here conflicts with the goal of grouping together several
> > temporally-related accesses to the same page together as one access, so
> > there's a subtle distinction to be made here, see below.)
>
> [...]
>
> In Rik example, the ws is larger than available memory. Part of it
> (the "hottest" one) will get double-accesses, but other pages will keep
> condending the few available (physical) pages with no chance of being
> accessed twice.  But see my previous posting...

But that's exactly what we want.  Note that the idea of reserving a fixed 
amount of memory for "one-shot" pages wasn't mine.  I see no reason to set a 
limit.  There's only one critereon: does a page get referenced between the 
time it's created and when its probation period expires?

Once a page makes it into the active (level 2) set it's on an equal footing 
with lots of others and it's up to our intrepid one-hand clock to warm it up 
or cool it down as appropriate.  On the other hand, if the page gets sent to 
death row it still has a few chances to prove its worth before being cleaned 
up and sent to the aba^H^H^H^H^H^H^H^H reclaimed.  (Apologies for the 
multiplying metaphors ;-)

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-04 Thread Marco Colombo


On Tue, 3 Jul 2001, Daniel Phillips wrote:

> On Tuesday 03 July 2001 12:33, Marco Colombo wrote:
> > Oh, yes, since that PAGE_AGE_BG_INTERACTIVE_MINIMUM is applied only
> > when background aging, maybe it's not enough to keep processes like
> > updatedb from causing interactive pages to be evicted.
> > That's why I said we should have another way to detect that kind of
> > activity... well, the application could just let us know (no need to
> > embed an autotuning-genetic-page-replacement-optimizer into the kernel).
> > We should just drop all FS metadata accessed by updatedb, since we
> > know that's one-shot only, without raising pressure at all.
>
> Note that some of updatedb's metadata pages are of the accessed-often kind,
> e.g., directory blocks and inodes.  A blanket low priority on all the pages
> updatedb touches just won't do.

Remember that the first message was about a laptop. At 4:00AM there's
no activity but the updatedb one (and the other cron jobs). Simply,
there's no 'accessed-often' data.  Moreover, I'd bet that 90% of the
metadata touched by updatedb won't be accessed at all in the future.
Laptop users don't do find /usr/share/terminfo/ so often.

> > Just like
> > (not that I'm proposing it) putting those "one-shot" pages directly on
> > the inactive-clean list instead of the active list. How an application
> > could declare such a behaviour is an open question, of course. Maybe it's
> > even possible to detect it. And BTW that's really fine tuning.
> > Evicting an 8 hours old page may be a mistake sometime, but it's never
> > a *big* mistake.
>
> IMHO, updatedb *should* evict all the "interactive" pages that aren't
> actually doing anything[1].  That way it should run faster, provided of
> course its accessed-once pages are properly given low priority.

So in the morning you find your Gnome session completely on swap,
and at the same time a lot of free mem.

> I see three page priority levels:
>
>   0 - accessed-never/aged to zero
>   1 - accessed-once/just loaded
>   2 - accessed-often
>
> with these transitions:
>
>   0 -> 1, if a page is accessed
>   1 -> 2, if a page is accessed a second time
>   1 -> 0, if a page gets old
>   2 -> 0, if a page gets old
>
> The 0 and 1 level pages are on a fifo queue, the 2 level pages are scanned
> clock-wise, relying on the age computation[2].  Eviction candidates are taken
> from the cold end of the 0 level list, unless it is empty, in which case they
> are taken from the 1 level list. In desperation, eviction candidates are
> taken from the 2 level list, i.e., random eviction policy, as opposed to what
> we do now which is to initiate an emergency scan of the active list for new
> inactive candidates - rather like calling a quick board meeting when the
> building is on fire.

Well, it's just aging faster when it's needed. Random evicting is not
good. List 2 is ordered by age, and there're always better candidates
at the end of the list than at the front. The higher the pressure,
the shorter is the time a page has to rest idle to get at the end of the
list. But the list *is* ordered.

> Note that the above is only a very slight departure from the current design.
> And by the way, this is just brainstorming, it hasn't reached the 'proposal'
> stage yet.
>
> [1] It would be nice to have a mechanism whereby the evicted 'interactive'
> pages are automatically reloaded when updatedb has finished its work.  This
> is a case of scavenging unused disk bandwidth for something useful, i.e.,
> improving the interactive experience.

updatedb doesn't really need all the memory it takes. All it needs is
a small buffer to sequentially scan all the disk. So we should just
drop all the pages it references, since we already know they won't be
referenced again by noone else.

> [2] I much prefer the hot/cold terminology over old/young.  The latter gets
> confusing because a 'high' age is 'young'.  I'd rather think of a high value
> as being 'hot'.

True. s/page->age/page->temp/g B-)

.TM.
-- 
  /  /   /
 /  /   /   Marco Colombo
___/  ___  /   /  Technical Manager
   /  /   /  ESI s.r.l.
 _/ _/  _/ [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-04 Thread Marco Colombo


On Tue, 3 Jul 2001, Daniel Phillips wrote:

> On Monday 02 July 2001 20:42, Rik van Riel wrote:
> > On Thu, 28 Jun 2001, Marco Colombo wrote:
> > > I'm not sure that, in general, recent pages with only one access are
> > > still better eviction candidates compared to 8 hours old pages. Here
> > > we need either another way to detect one-shot activity (like the one
> > > performed by updatedb),
> >
> > Fully agreed, but there is one problem with this idea.
> > Suppose you have a maximum of 20% of your RAM for these
> > "one-shot" things, now how are you going to be able to
> > page in an application with a working set of, say, 25%
> > the size of RAM ?
>
> Easy.  What's the definition of working set?  Those pages that are frequently
> referenced.  So as the application starts up some of its pages will get
> promoted from used-once to used-often.  (On the other hand, the target
> behavior here conflicts with the goal of grouping together several
> temporally-related accesses to the same page together as one access, so
> there's a subtle distinction to be made here, see below.)
[...]

In Rik example, the ws is larger than available memory. Part of it
(the "hottest" one) will get double-accesses, but other pages will keep
condending the few available (physical) pages with no chance of being
accessed twice.  But see my previous posting...

.TM.
-- 
  /  /   /
 /  /   /   Marco Colombo
___/  ___  /   /  Technical Manager
   /  /   /  ESI s.r.l.
 _/ _/  _/ [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-04 Thread Ari Heitner

On Tue, 3 Jul 2001, Daniel Phillips wrote:

> And by the way, this is just brainstorming, it hasn't reached the 'proposal' 
> stage yet.

So while we're here, an idea someone proposed in #debian while discussion this
thread ([EMAIL PROTECTED], you know who you are): QoS for application paging
on desktops. Basically you designate to the kernel which applications you want
to give priviledges, and it avoids swapping them out, even if they've been idle
for a long time. You designate your desktop apps, and then when updatedb comes
along they don't get kicked (but something more intensive like a kernel compile
would claim the pages). Maybe it would be as simple as a category of apps whose
pages won't get kicked before a singly-touched page (like and updatedb or
streaming media run).

For the record, I'm impressed with the new VM design, and I think its unbiased
behaviour (once the bugs are ironed out) will be exactly what I'm looking for
in life (traditional Unix "the fair way") :) Currently using a 4-way RS/6000
running AIX 4.2 which has been up for a long time and is running a lot of
programs (even though the active set is quite reasonable), and decides to swap
at evil times :)

Looking forward to the tweaks/settings options that will appear on this VM over
the next little while...

Cheers,

Ari Heitner

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-03 Thread Daniel Phillips

On Monday 02 July 2001 20:42, Rik van Riel wrote:
> On Thu, 28 Jun 2001, Marco Colombo wrote:
> > I'm not sure that, in general, recent pages with only one access are
> > still better eviction candidates compared to 8 hours old pages. Here
> > we need either another way to detect one-shot activity (like the one
> > performed by updatedb),
>
> Fully agreed, but there is one problem with this idea.
> Suppose you have a maximum of 20% of your RAM for these
> "one-shot" things, now how are you going to be able to
> page in an application with a working set of, say, 25%
> the size of RAM ?

Easy.  What's the definition of working set?  Those pages that are frequently 
referenced.  So as the application starts up some of its pages will get 
promoted from used-once to used-often.  (On the other hand, the target 
behavior here conflicts with the goal of grouping together several 
temporally-related accesses to the same page together as one access, so 
there's a subtle distinction to be made here, see below.)

The point here is that there are such things as run-once program pages, just 
as there are use-once file pages.  Both should get low priority and be 
evicted early, regardless of the fact they were just loaded.

> If you don't have any special measures, the pages from
> this "new" application will always be treated as one-shot
> pages and the process will never be able to be cached in
> memory completely...

The self-balancing way of doing this is to promote pages from the old end of 
the used-once list to the used-often (active) list at a rate corresponding to 
the fault-in rate so we get more aggressive promotion of referenced-often 
pages during program loading, and conversely, aggressive demotion of 
referenced-once pages.

--
Daniel

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-03 Thread Daniel Phillips


An ammendment to my previous post...

> I see three page priority levels:
>
>   0 - accessed-never/aged to zero
>   1 - accessed-once/just loaded
>   2 - accessed-often
>
> with these transitions:
>
>   0 -> 1, if a page is accessed
>   1 -> 2, if a page is accessed a second time
>   1 -> 0, if a page gets old
>   2 -> 0, if a page gets old

Better:

   1 -> 0, if a page was not referenced before arriving at the old end
   1 -> 2, if it was

Meaning that multiple accesses to pages on the level 1 list are treated as a 
single access.  In addition, this reflects what we can do practically with 
the hardware referenced bit.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-03 Thread Daniel Phillips

On Tuesday 03 July 2001 12:33, Marco Colombo wrote:
> Oh, yes, since that PAGE_AGE_BG_INTERACTIVE_MINIMUM is applied only
> when background aging, maybe it's not enough to keep processes like
> updatedb from causing interactive pages to be evicted.
> That's why I said we should have another way to detect that kind of
> activity... well, the application could just let us know (no need to
> embed an autotuning-genetic-page-replacement-optimizer into the kernel).
> We should just drop all FS metadata accessed by updatedb, since we
> know that's one-shot only, without raising pressure at all.

Note that some of updatedb's metadata pages are of the accessed-often kind, 
e.g., directory blocks and inodes.  A blanket low priority on all the pages 
updatedb touches just won't do.

> Just like
> (not that I'm proposing it) putting those "one-shot" pages directly on
> the inactive-clean list instead of the active list. How an application
> could declare such a behaviour is an open question, of course. Maybe it's
> even possible to detect it. And BTW that's really fine tuning.
> Evicting an 8 hours old page may be a mistake sometime, but it's never
> a *big* mistake.

IMHO, updatedb *should* evict all the "interactive" pages that aren't 
actually doing anything[1].  That way it should run faster, provided of 
course its accessed-once pages are properly given low priority.

I see three page priority levels:

  0 - accessed-never/aged to zero
  1 - accessed-once/just loaded
  2 - accessed-often

with these transitions:

  0 -> 1, if a page is accessed
  1 -> 2, if a page is accessed a second time
  1 -> 0, if a page gets old
  2 -> 0, if a page gets old

The 0 and 1 level pages are on a fifo queue, the 2 level pages are scanned 
clock-wise, relying on the age computation[2].  Eviction candidates are taken 
from the cold end of the 0 level list, unless it is empty, in which case they 
are taken from the 1 level list. In desperation, eviction candidates are 
taken from the 2 level list, i.e., random eviction policy, as opposed to what 
we do now which is to initiate an emergency scan of the active list for new 
inactive candidates - rather like calling a quick board meeting when the 
building is on fire.

Note that the above is only a very slight departure from the current design.  
And by the way, this is just brainstorming, it hasn't reached the 'proposal' 
stage yet.

[1] It would be nice to have a mechanism whereby the evicted 'interactive' 
pages are automatically reloaded when updatedb has finished its work.  This 
is a case of scavenging unused disk bandwidth for something useful, i.e., 
improving the interactive experience.

[2] I much prefer the hot/cold terminology over old/young.  The latter gets 
confusing because a 'high' age is 'young'.  I'd rather think of a high value 
as being 'hot'.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-03 Thread Marco Colombo

On Mon, 2 Jul 2001, Rik van Riel wrote:

> On Thu, 28 Jun 2001, Marco Colombo wrote:
>
> > I'm not sure that, in general, recent pages with only one access are
> > still better eviction candidates compared to 8 hours old pages. Here
> > we need either another way to detect one-shot activity (like the one
> > performed by updatedb),
>
> Fully agreed, but there is one problem with this idea.
> Suppose you have a maximum of 20% of your RAM for these
> "one-shot" things, now how are you going to be able to
> page in an application with a working set of, say, 25%
> the size of RAM ?
>
> If you don't have any special measures, the pages from
> this "new" application will always be treated as one-shot
> pages and the process will never be able to be cached in
> memory completely...

I see your point. Running Gnome on a 64MB box means you have most
of the pages that are "warm" (using my definition), so there's little
room for "cold" (new) pages, and maybe they don't get a chance of
being accessed a second time before they are evicted, which leads to
thrashing if you're trying to start something really big (well, I guess
the access pattern within a typical ws is not uniformly distributed, so
some pages will get accessed twice, but I see the problem).

I'll try and make my point a bit clearer.
I was referring to background aging only. When aging
is caused by pressure, you don't make any difference between pages.
I don't know how the idea to give high values for page->age on the second
access instead of the first is going to be implemented, but I'm assuming
that new pages are going to be placed on the active list with a low age
value (PAGE_AGE_START_FIRST ?), maybe even 0 (well, I'm not a guru of
course). I'm just saying that, to avoid Mike's "problem" (which BTW
I don't believe is a big one, really), we could stop background aging
on interactive pages (short form for "pages that belong to the ws of an
interactive process") at a certain miminum age, say
PAGE_AGE_BG_INTERACTIVE_MINIMUM, with PAGE_AGE_BG_INTERACTIVE_MINIMUM
 > PAGE_AGE_START_FIRST). Weighting the difference between the two
ages, you can give long-standing interactive pages some advantage vs
new pages. But they will be aged below PAGE_AGE_START_FIRST and eventually
moved to the inactive list. After all, they *are* good candidates.
Does this make some sense?

Oh, yes, since that PAGE_AGE_BG_INTERACTIVE_MINIMUM is applied only
when background aging, maybe it's not enough to keep processes like
updatedb from causing interactive pages to be evicted.
That's why I said we should have another way to detect that kind of
activity... well, the application could just let us know (no need to
embed an autotuning-genetic-page-replacement-optimizer into the kernel).
We should just drop all FS metadata accessed by updatedb, since we
know that's one-shot only, without raising pressure at all. Just like
(not that I'm proposing it) putting those "one-shot" pages directly on
the inactive-clean list instead of the active list. How an application
could declare such a behaviour is an open question, of course. Maybe it's
even possible to detect it. And BTW that's really fine tuning.
Evicting an 8 hours old page may be a mistake sometime, but it's never
a *big* mistake.

>
> Rik
> --
> Virtual memory is like a game you can't win;
> However, without VM there's truly nothing to lose...
>
> http://www.surriel.com/   http://distro.conectiva.com/
>
> Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

.TM.
-- 
  /  /   /
 /  /   /   Marco Colombo
___/  ___  /   /  Technical Manager
   /  /   /  ESI s.r.l.
 _/ _/  _/ [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-07-02 Thread Rik van Riel

On Thu, 28 Jun 2001, Marco Colombo wrote:

> I'm not sure that, in general, recent pages with only one access are
> still better eviction candidates compared to 8 hours old pages. Here
> we need either another way to detect one-shot activity (like the one
> performed by updatedb),

Fully agreed, but there is one problem with this idea.
Suppose you have a maximum of 20% of your RAM for these
"one-shot" things, now how are you going to be able to
page in an application with a working set of, say, 25%
the size of RAM ?

If you don't have any special measures, the pages from
this "new" application will always be treated as one-shot
pages and the process will never be able to be cached in
memory completely...

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/ http://distro.conectiva.com/

Send all your spam to [EMAIL PROTECTED] (spam digging piggy)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread John Fremlin



[...]

>   immediate: RAM, on-chip cache, etc. 
>   fast:  Flash reads, ROMs, etc.
>   medium:Hard drives, CD-ROMs, 100Mb ethernet, etc.
>   slow:  Flash writes, floppy disks,  CD-WR burners
>   packeted:  Reads/write should be in as large a packet as possible
> 
> Embedded Case

[...]

> Desktop Case

I'm not sure there's any point in separating the cases like this.  The
complex part of the VM is the caching part => to be a good cache you
must take into account the speed of accesses to the cached medium,
including warm up times for sleepy drives etc.

It would be really cool if the VM could do that, so e.g. in the ideal
world you could connect up a slow harddrive and have its contents
cached as swap on your fast harddrive(!) (not a new idea btw and
already implemented elsewhere). I.e. from the point of view of the VM a
computer is just a group of data storage units and it's allowed to use
up certain parts of each one to do stuff

[...]

-- 

http://ape.n3.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread Marco Colombo

On Thu, 28 Jun 2001, Daniel Phillips wrote:

> On Thursday 28 June 2001 14:20, [EMAIL PROTECTED] wrote:
> > > If individual pages could be classified as code (text segments),
> > > data, file cache, and so on, I would specify costs to the paging
> > > of such pages in or out.  This way I can make the system perfer
> > > to drop a file cache page that has not been accessed for five
> > > minutes, over a program text page that has not been acccessed
> > > for one hour (or much more).
> >
> > This would be extremely useful. My laptop has 256mb of ram, but every day
> > it runs the updatedb for locate. This fills the memory with the file
> > cache. Interactivity is then terrible, and swap is unnecessarily used. On
> > the laptop all this hard drive thrashing is bad news for battery life
> > (plus the fact that laptop hard drives are not the fastest around). I
> > purposely do not run more applications than can comfortably fit in the
> > 256mb of memory.
> >
> > If fact, to get interactivity back, I've got a small 10 liner that mallocs
> > memory to *force* stuff into swap purely so I can have a large block of
> > memory back for interactivity.
> >
> > Something simple that did "you haven't used this file for 30mins, flush it
> > out of the cache would be sufficient"
>
> Updatedb fills memory full of clean file pages so there's nothing to flush.
> Did you mean "evict"?

Well, I believe all inodes get dirtied for access time update, unless the
FS is mounted no_atime. And it does write its database file...

> Roughly speaking we treat clean pages as "instantly relaimable".  Eviction
> and reclaiming are done in the same step (look at reclaim_page).  The key to
> efficient mm is nothing more or less than choosing the best victim for
> reclaiming and we aren't doing a spectacularly good job of that right now.
>
> There is a simple change in strategy that will fix up the updatedb case quite
> nicely, it goes something like this: a single access to a page (e.g., reading
> it) isn't enough to bring it to the front of the LRU queue, but accessing it
> twice or more is.  This is being looked at.

You mean that pages that belong to interactive applications (working sets)
won't be evicted to make room for the cache? And that pages just filled
with data read by updatedb will be chosen instead (a kind of drop-behind)?

There's nothing really wrong in the kernel "swapping out" interactive
applications at 4 a.m., their pages have the property of both not being
accessed recently and (the kernel doesn't know, of course) not going to
be useful in the near future (say for another 4 hours). In the end they
*are* good canditates for eviction.

> Note that we don't actually use a LRU queue, we use a more efficient
> approximation called aging, so the above is not a recipe for implementation.

I'm not sure that, in general, recent pages with only one access are
still better eviction candidates compared to 8 hours old pages. Here we
need either another way to detect one-shot activity (like the one
performed by updatedb), or to keep pages that belong to the working set
of interactive processes somewhat "warm", and never let them age too much.
A page with one one (read) access can be "cold". A page with more than one
access becomes "hot". Aging moves page towards the "cold" state, and of
course "cold" pages are the best candidates for eviction. Pages belonging
to interactive processes are never moved from the "warm" state into
the "cold" state by the background aging. Maybe this can be implemented
just leaving such pages on the active list, and deactivating them
only on pressure. Or not leaving their age reach 0. (Well, i'm not really
into current VM implementation. I guess that those single access pages
will be placed on the end of the active list with age 0, or something
like that).

If I understand the current VM code, after 8 hours of idle time, all
pages of interactive applications will be on the inactive(_clean?) list,
ready for eviction. Even if you place new pages (the updatedb activity)
at the *end* of the active list, (instead of the front), it won't be
enough to prevent application pages from being evicted. It won't solve
Mike's problem, that is.

>
> --
> Daniel
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

.TM.
-- 
  /  /   /
 /  /   /   Marco Colombo
___/  ___  /   /  Technical Manager
   /  /   /  ESI s.r.l.
 _/ _/  _/ [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread Daniel Phillips

On Thursday 28 June 2001 17:21, Jonathan Morton wrote:
> >There is a simple change in strategy that will fix up the updatedb case
> > quite nicely, it goes something like this: a single access to a page
> > (e.g., reading it) isn't enough to bring it to the front of the LRU
> > queue, but accessing it twice or more is.  This is being looked at.
>
> Say, when a page is created due to a page fault, page->age is set to
> zero instead of whatever it is now.

This isn't quite enough.  We do want to be able to assign a ranking to 
members of the accessed-once set, and we do want to distinguish between newly 
created pages and pages that have aged all the way to zero.

> Then, on the first access, it is
> incremented to one.  All accesses where page->age was previously zero
> cause it to be incremented to one, and subsequent accesses where
> page->age is non-zero cause a doubling rather than an increment.
> This gives a nice heavy priority boost to frequently-accessed pages...

While on that topic, could somebody please explain to me why exponential 
aging is better than linear aging by a suitably chosen increment?  It's clear 
what's wrong with it: after 32 hits you lose all further information.  I 
suspect there are more problems with it than that.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread Jonathan Morton


>There is a simple change in strategy that will fix up the updatedb case quite
>nicely, it goes something like this: a single access to a page (e.g., reading
>it) isn't enough to bring it to the front of the LRU queue, but accessing it
>twice or more is.  This is being looked at.

Say, when a page is created due to a page fault, page->age is set to 
zero instead of whatever it is now.  Then, on the first access, it is 
incremented to one.  All accesses where page->age was previously zero 
cause it to be incremented to one, and subsequent accesses where 
page->age is non-zero cause a doubling rather than an increment. 
This gives a nice heavy priority boost to frequently-accessed pages...

>Note that we don't actually use a LRU queue, we use a more efficient
>approximation called aging, so the above is not a recipe for implementation.

Maybe it is, but in a slightly lateral manner as above.

-- 
--
from: Jonathan "Chromatix" Morton
mail: [EMAIL PROTECTED]  (not for attachments)
website:  http://www.chromatix.uklinux.net/vnc/
geekcode: GCS$/E dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$
   V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
tagline:  The key to knowledge is not to rely on people to teach you it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread Daniel Phillips


On Thursday 28 June 2001 15:37, Alan Cox wrote:
> > The problem with updatedb is that it pushes all applications to the swap,
> > and when you get back in the morning, everything has to be paged back
> > from swap just because the (stupid) OS is prepared for yet another
> > updatedb run.
>
> Updatedb is a bit odd in that it mostly sucks in metadata and the buffer to
> page cache balancing is a bit suspect IMHO.

For Ext2, most or all of that metadata will be moved into the page cache 
early in 2.5, and other filesystem will likely follow that lead.  That's not 
to say the buffer/page cache balancing shouldn't get attention, just that 
this particular problem will die by itself.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread Daniel Phillips

On Thursday 28 June 2001 14:20, [EMAIL PROTECTED] wrote:
> > If individual pages could be classified as code (text segments),
> > data, file cache, and so on, I would specify costs to the paging
> > of such pages in or out.  This way I can make the system perfer
> > to drop a file cache page that has not been accessed for five
> > minutes, over a program text page that has not been acccessed
> > for one hour (or much more).
>
> This would be extremely useful. My laptop has 256mb of ram, but every day
> it runs the updatedb for locate. This fills the memory with the file
> cache. Interactivity is then terrible, and swap is unnecessarily used. On
> the laptop all this hard drive thrashing is bad news for battery life
> (plus the fact that laptop hard drives are not the fastest around). I
> purposely do not run more applications than can comfortably fit in the
> 256mb of memory.
>
> If fact, to get interactivity back, I've got a small 10 liner that mallocs
> memory to *force* stuff into swap purely so I can have a large block of
> memory back for interactivity.
>
> Something simple that did "you haven't used this file for 30mins, flush it
> out of the cache would be sufficient"

Updatedb fills memory full of clean file pages so there's nothing to flush.  
Did you mean "evict"?

Roughly speaking we treat clean pages as "instantly relaimable".  Eviction 
and reclaiming are done in the same step (look at reclaim_page).  The key to 
efficient mm is nothing more or less than choosing the best victim for 
reclaiming and we aren't doing a spectacularly good job of that right now.

There is a simple change in strategy that will fix up the updatedb case quite 
nicely, it goes something like this: a single access to a page (e.g., reading 
it) isn't enough to bring it to the front of the LRU queue, but accessing it 
twice or more is.  This is being looked at.

Note that we don't actually use a LRU queue, we use a more efficient 
approximation called aging, so the above is not a recipe for implementation.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread Alan Cox


> > Updatedb is a bit odd in that it mostly sucks in metadata and the buffer to
> > page cache balancing is a bit suspect IMHO.
> 
> In 2.4.6-pre, the buffer cache is no longer used for metata, right?

For ext2 directory blocks the page cache is now used
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread Tobias Ringstrom


On Thu, 28 Jun 2001, Alan Cox wrote:

> > > That isnt really down to labelling pages, what you are talking qbout is what
> > > you get for free when page aging works right (eg 2.0.39) but don't get in
> > > 2.2 - and don't yet (although its coming) quite get right in 2.4.6pre.
> >
> > Correct, but all pages are not equal.
>
> That is the whole point of page aging done right. The use of a page dictates
> how it is aged before being discarded. So pages referenced once are aged
> rapidly, but once they get touched a couple of times then you know they arent
> streaming I/O. There are other related techniques like punishing pages that
> are touched when streaming I/O is done to pages further down the same file -
> FreeBSD does this one for example

Are you saying that classification of pages will not be useful?

Only looking at the page access patterns can certainly reveal a lot, but
tuning how to punish different pages is useful.

> > The problem with updatedb is that it pushes all applications to the swap,
> > and when you get back in the morning, everything has to be paged back from
> > swap just because the (stupid) OS is prepared for yet another updatedb
> > run.
>
> Updatedb is a bit odd in that it mostly sucks in metadata and the buffer to
> page cache balancing is a bit suspect IMHO.

In 2.4.6-pre, the buffer cache is no longer used for metata, right?

/Tobias

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread Alan Cox


> > That isnt really down to labelling pages, what you are talking qbout is what
> > you get for free when page aging works right (eg 2.0.39) but don't get in
> > 2.2 - and don't yet (although its coming) quite get right in 2.4.6pre.
> 
> Correct, but all pages are not equal.

That is the whole point of page aging done right. The use of a page dictates
how it is aged before being discarded. So pages referenced once are aged
rapidly, but once they get touched a couple of times then you know they arent
streaming I/O. There are other related techniques like punishing pages that
are touched when streaming I/O is done to pages further down the same file -
FreeBSD does this one for example

> The problem with updatedb is that it pushes all applications to the swap,
> and when you get back in the morning, everything has to be paged back from
> swap just because the (stupid) OS is prepared for yet another updatedb
> run.

Updatedb is a bit odd in that it mostly sucks in metadata and the buffer to
page cache balancing is a bit suspect IMHO.

Alan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread Tobias Ringstrom

On Thu, 28 Jun 2001, Alan Cox wrote:

> > This would be extremely useful. My laptop has 256mb of ram, but every day
> > it runs the updatedb for locate. This fills the memory with the file
> > cache. Interactivity is then terrible, and swap is unnecessarily used. On
> > the laptop all this hard drive thrashing is bad news for battery life
>
> That isnt really down to labelling pages, what you are talking qbout is what
> you get for free when page aging works right (eg 2.0.39) but don't get in
> 2.2 - and don't yet (although its coming) quite get right in 2.4.6pre.

Correct, but all pages are not equal.

The problem with updatedb is that it pushes all applications to the swap,
and when you get back in the morning, everything has to be paged back from
swap just because the (stupid) OS is prepared for yet another updatedb
run.

Other bad activities include copying lots of files, tar/untar:ing and CD
writing.  They all cause unwanted paging, at least for the desktop user.

/Tobias

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread John Fremlin


Stefan Hoffmeister <[EMAIL PROTECTED]> writes:

[...]

> Windows NT/2000 has flags that can be for each CreateFile operation
> ("open" in Unix terms), for instance
> 
>   FILE_ATTRIBUTE_TEMPORARY
> 
>   FILE_FLAG_WRITE_THROUGH
>   FILE_FLAG_NO_BUFFERING
>   FILE_FLAG_RANDOM_ACCESS
>   FILE_FLAG_SEQUENTIAL_SCAN
> 
> If Linux does not have mechanism that would allow the signalling of
> specific use case, it might be helpful to implement such a hinting
> system?

madvise(2) does it on mappings IIRC

-- 
Seeking summer job at last minute - see http://ape.n3.net/cv.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread Tobias Ringstrom

On 28 Jun 2001, Xavier Bestel wrote:

> On 28 Jun 2001 14:02:09 +0200, Tobias Ringstrom wrote:
>
> > This would be very useful, I think.  Would it be very hard to classify
> > pages like this (text/data/cache/...)?
>
> How would you classify a page of perl code ?

I do know how the Perl interpreter works, but I think it byte-compiles the
code and puts it in the data segment, which also would have a high paging
cost.

The perl source code would be paged in/out before running binaries such as
shells and the window system, but the same thing would happen to binaries
with short life-span, I suppose.  Perhaps cached executables and cached
data files can be classified differently as well.

What I meant to ask with the question above was if it would be hard to
implement the classification in the kernel.

/Tobias

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread Xavier Bestel


On 28 Jun 2001 14:02:09 +0200, Tobias Ringstrom wrote:

> This would be very useful, I think.  Would it be very hard to classify
> pages like this (text/data/cache/...)?

How would you classify a page of perl code ?

Xav

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread Alan Cox


> This would be extremely useful. My laptop has 256mb of ram, but every day 
> it runs the updatedb for locate. This fills the memory with the file 
> cache. Interactivity is then terrible, and swap is unnecessarily used. On 
> the laptop all this hard drive thrashing is bad news for battery life 

That isnt really down to labelling pages, what you are talking qbout is what
you get for free when page aging works right (eg 2.0.39) but don't get in
2.2 - and don't yet (although its coming) quite get right in 2.4.6pre.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread mike_phillips


> If individual pages could be classified as code (text segments), 
> data, file cache, and so on, I would specify costs to the paging 
> of such pages in or out.  This way I can make the system perfer 
> to drop a file cache page that has not been accessed for five 
> minutes, over a program text page that has not been acccessed 
> for one hour (or much more).

This would be extremely useful. My laptop has 256mb of ram, but every day 
it runs the updatedb for locate. This fills the memory with the file 
cache. Interactivity is then terrible, and swap is unnecessarily used. On 
the laptop all this hard drive thrashing is bad news for battery life 
(plus the fact that laptop hard drives are not the fastest around). I 
purposely do not run more applications than can comfortably fit in the 
256mb of memory.

If fact, to get interactivity back, I've got a small 10 liner that mallocs 
memory to *force* stuff into swap purely so I can have a large block of 
memory back for interactivity.

Something simple that did "you haven't used this file for 30mins, flush it 
out of the cache would be sufficient"

Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread Tobias Ringstrom

On Thu, 28 Jun 2001, Helge Hafting wrote:
> Preventing swap-trashing at all cost doesn't help if the
> machine loose to io-trashing instead.  Performance will be
> just as much down, although perhaps more satisfying because
> people aren't that surprised if explicit file operations
> take a long time.  They hate it when moving the mouse
> or something cause a disk access even if their
> apps runs faster. :-(

Exactly.  I still want the ability to tune the system according to my
taste.  I've been thinking about this for some time, and I've specifically
tried to come up with nice tunables, completely ignoring if it is possible
now or not.

If individual pages could be classified as code (text segments), data,
file cache, and so on, I would specify costs to the paging of such pages
in or out.  This way I can make the system perfer to drop a file cache
page that has not been accessed for five minutes, over a program text page
that has not been acccessed for one hour (or much more).

This would be very useful, I think.  Would it be very hard to classify
pages like this (text/data/cache/...)?

Any reason why this is a bad idea?

/Tobias

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread Martin Knoblauch

Helge Hafting wrote:
> 
> Martin Knoblauch wrote:
> 
> >
> >  maybe more specific: If the hit-rate is low and the cache is already
> > 70+% of the systems memory, the chances maybe slim that more cache is
> > going to improve the hit-rate.
> >
> Oh, but this is posible.  You can get into situations where
> the (file cache) working set needs 80% or so of memory
> to get a near-perfect hitrate, and where
> using 70% of memory will trash madly due to the file access

 thats why I said "maybe" :-) Sure, another 5% of cache may improve
things, but they also may kill the interactive performance. Thats why
there should be probably more than one VM strategy to accomodate Servers
and Workstations/Lpatops.

> pattern.  And this won't be a problem either, if
> the working set of "other" (non-file)
> stuff is below 20% of memory.  The total size of
> non-file stuff may be above 20% though, so something goes
> into swap.
> 

 And that is the problem. To much seems to go into swap. At least for
interactive work. Unfortunatelly, with 128MB of memory I cannot entirely
turn of swap. I will see how things are going once I have 256 or 512 MB
(hopefully soon :-)

> I definitely want the machine to work under such circumstances,
> so an arbitrary limit of 70% won't work.
>

 Do not take the 70% as an arbitrary limit. I never said that. The 70%
is just my situation. The problems may arise at 60% cache or at 97.38%
cache.

> Preventing swap-trashing at all cost doesn't help if the

 Never said at all cost.

> machine loose to io-trashing instead.  Performance will be
> just as much down, although perhaps more satisfying because
> people aren't that surprised if explicit file operations
> take a long time.  They hate it when moving the mouse
> or something cause a disk access even if their
> apps runs faster. :-(
> 

 Absolutely true. And if the main purpose of the machine is interactive
work (we do want to be Linux a success on the desktop, don't we?), it
should not be hampered by by an IO improvement that may be only of
secondary importance to the user (that the final "customer" for all the
work that is done to the kernel :-). On big servers a litle paging now
and then may be absolutely OK, as long as the IO is going strong.

 I am observing the the discussions of VM behaviour in 2.4.x for some
time. They are mostly very entertaining and revealing. But they also
show that one solution does not seem to benefit all possible scenarios.
Therfore either more than one VM strategy is necessary, or better means
of tuning the cache behaviour, or both. Definitely better ways of
measuring the VM efficiency seem to be needed.

 While implementing VM strategies is probably out of question for a lot
of the people that complain, I hope that at least my complaints are kind
of useful.

Martin
-- 
--
Martin Knoblauch |email:  [EMAIL PROTECTED]
TeraPort GmbH|Phone:  +49-89-510857-309
C+ITS|Fax:+49-89-510857-111
http://www.teraport.de   |Mobile: +49-170-4904759
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-28 Thread Helge Hafting

Martin Knoblauch wrote:

> 
>  maybe more specific: If the hit-rate is low and the cache is already
> 70+% of the systems memory, the chances maybe slim that more cache is
> going to improve the hit-rate.
> 
Oh, but this is posible.  You can get into situations where
the (file cache) working set needs 80% or so of memory
to get a near-perfect hitrate, and where
using 70% of memory will trash madly due to the file access
pattern.  And this won't be a problem either, if
the working set of "other" (non-file) 
stuff is below 20% of memory.  The total size of
non-file stuff may be above 20% though, so something goes
into swap.

I definitely want the machine to work under such circumstances,
so an arbitrary limit of 70% won't work.

Preventing swap-trashing at all cost doesn't help if the
machine loose to io-trashing instead.  Performance will be
just as much down, although perhaps more satisfying because
people aren't that surprised if explicit file operations
take a long time.  They hate it when moving the mouse
or something cause a disk access even if their
apps runs faster. :-(

Helge Hafting
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-27 Thread Martin Knoblauch


Rik van Riel wrote:
> 
> On Wed, 27 Jun 2001, Martin Knoblauch wrote:
> 
> >  I do not care much whether the cache is using 99% of the systems memory
> > or 50%. As long as there is free memory, using it for cache is great. I
> > care a lot if the cache takes down interactivity, because it pushes out
> > processes that it thinks idle, but that I need in 5 seconds. The caches
> > pressure against processes
> 
> Too bad that processes are in general cached INSIDE the cache.
> 
> You'll have to write a new balancing story now ;)
> 

 maybe that is part of "the answer" :-)

Martin
-- 
--
Martin Knoblauch |email:  [EMAIL PROTECTED]
TeraPort GmbH|Phone:  +49-89-510857-309
C+ITS|Fax:+49-89-510857-111
http://www.teraport.de   |Mobile: +49-170-4904759
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-27 Thread Rik van Riel


On Wed, 27 Jun 2001, Martin Knoblauch wrote:

>  I do not care much whether the cache is using 99% of the systems memory
> or 50%. As long as there is free memory, using it for cache is great. I
> care a lot if the cache takes down interactivity, because it pushes out
> processes that it thinks idle, but that I need in 5 seconds. The caches
> pressure against processes

Too bad that processes are in general cached INSIDE the cache.

You'll have to write a new balancing story now ;)

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-27 Thread Pozsar Balazs



> Rik> ... but I fail to see this one. If we get a low cache hit rate,
> Rik> couldn't that just mean we allocated too little memory for the
> Rik> cache ?
> Or that we're doing big sequential reads of file(s) which are larger
> than memory, in which case expanding the cache size buys us nothing,
> and can actually hurt us alot.

I've got an idea about how to handle this situation generally (without
sending 'tips' to kernel via madvice() or anything similar).

Instead of sorting chached pages (i mean blocks of files) by last touch
time, and dropping the oldest page(s) if we're sort on memory, i would
propose this nicer algorithm: (i this is relevant only to the read cache)

Suppose that f1,f2,...fN files cached, their sizes are s1,s2,...sN and
that they were last touched t1,t2,...tN seconds ago. (t1 I personally don't feel that the cache should be allowed to grow over
> 50% of the system's memory at all, we've got so much in the cache at
> that point, that we're probably not hitting it all that much.
>
> This is why the discussion on the other cache scanning algorithm
> (2Q+?) was so interesting, since it looked to handle both the LRU
> vs. FIFO tradeoffs very nicely.

-- 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-27 Thread Marco Colombo

On Tue, 26 Jun 2001, Rik van Riel wrote:

> On Tue, 26 Jun 2001, John Stoffel wrote:
>
> > >> * If we're getting low cache hit rates, don't flush
> > >> processes to swap.
> > >> * If we're getting good cache hit rates, flush old, idle
> > >> processes to swap.
> >
> > Rik> ... but I fail to see this one. If we get a low cache hit rate,
> > Rik> couldn't that just mean we allocated too little memory for the
> > Rik> cache ?
> >
> > Or that we're doing big sequential reads of file(s) which are
> > larger than memory, in which case expanding the cache size buys
> > us nothing, and can actually hurt us alot.
>
> That's a big "OR".  I think we should have an algorithm to
> see which of these two is the case, otherwise we're just
> making the wrong decision half of the time.
>
> Also, in many systems we'll be doing IO on _multiple_ files
> at the same time, so I guess this will have to be a file-by-file
> decision.

Of course, you can always think of a "bad" behaviour. That should
really be a page-by-page decision. An application may have both data and
meta-data on the same file. You want to keep the metadata on core
(think of access by an index, it's much better if all the index is there,
even some unused parts) *and* cache commonly used data (that's just
a cache of hot objects, normal replacement algoriths may be used) *and*
drop-behind data on sequential scans...  trying to understand what
an application is doing, in order to foresee what it will be doing,
it's bad attitude. Let's give an application writer a way to code it
sanely (setting per-file VM attributes is fine).  If an application
is not friendly (gives no hints on its VM behaviour) just punish it.
I mean, when tuning the VM behaviour, system health and friendly
applications performance are the goals - do whatever necessary to preserve
them, even kill the offender and rm its executable if someone it's
running it again (*grin*) B-).

.TM.
-- 
  /  /   /
 /  /   /   Marco Colombo
___/  ___  /   /  Technical Manager
   /  /   /  ESI s.r.l.
 _/ _/  _/ [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-27 Thread Xavier Bestel


On 26 Jun 2001 20:43:33 -0400, Dan Maas wrote:
> > Windows NT/2000 has flags that can be for each CreateFile operation
> > ("open" in Unix terms), for instance
> >
> >   FILE_ATTRIBUTE_TEMPORARY
> >   FILE_FLAG_WRITE_THROUGH
> >   FILE_FLAG_NO_BUFFERING
> >   FILE_FLAG_RANDOM_ACCESS
> >   FILE_FLAG_SEQUENTIAL_SCAN
> >
> 

We do (nearly) already have O_DIRECT which won't touch cache (alas I
don't think I will read-ahead more)

Xav

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-27 Thread Martin Knoblauch


>> * If we're getting low cache hit rates, don't flush 
>> processes to swap. 
>> * If we're getting good cache hit rates, flush old, idle 
>> processes to swap. 

Rik> ... but I fail to see this one. If we get a low cache hit rate, 
Rik> couldn't that just mean we allocated too little memory for the 
Rik> cache ? 

 maybe more specific: If the hit-rate is low and the cache is already
70+% of the systems memory, the chances maybe slim that more cache is
going to improve the hit-rate. 

 I do not care much whether the cache is using 99% of the systems memory
or 50%. As long as there is free memory, using it for cache is great. I
care a lot if the cache takes down interactivity, because it pushes out
processes that it thinks idle, but that I need in 5 seconds. The caches
pressure against processes should decrease with the (relative) size of
the cache. Especially in low hit-rate situations.

 OT: I asked the question before somewhere else. Are there interfaces to
the VM that expose the various cache sizes and, more important,
hit-rates to userland? I would love to see (or maybe help writing in my
free time) a tool to just visualize/analyze the efficiency of the VM
system.

Martin
-- 
--
Martin Knoblauch |email:  [EMAIL PROTECTED]
TeraPort GmbH|Phone:  +49-89-510857-309
C+ITS|Fax:+49-89-510857-111
http://www.teraport.de   |Mobile: +49-170-4904759
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-26 Thread Daniel Phillips


> I personally don't feel that the cache should be allowed to grow over
> 50% of the system's memory at all, we've got so much in the cache at
> that point, that we're probably not hitting it all that much.

That depends very much on what you're using the system for.  Suppose you're 
running a trivial database application on a gigantic disk array - the name of 
the game is to cache as much metadata as possible, and that goes directly to 
the bottom line as performance.  Might as well use 90%+ of your memory for 
that.

The conclusion to draw here is, the balance between file cache and process 
memory should be able to slide all the way from one extreme to the other.  
It's not a requirement that that be fully automatic but it's highly 
desireable.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-26 Thread Mike Castle


On Tue, Jun 26, 2001 at 08:43:33PM -0400, Dan Maas wrote:
> (hrm, maybe I could hack up my own manual read-ahead/drop-behind with mmap()
> and memory locking...)

Just to argue portability for a moment (portability on the expected
results, that is, vs APIs).

Would this technique work across a variety of OSes?

Would the recent caching difficulties of the 2.4.* series have handled such
a technique in a reasonable fashion?

mrc
-- 
 Mike Castle  [EMAIL PROTECTED]  www.netcom.com/~dalgoda/
We are all of us living in the shadow of Manhattan.  -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-26 Thread Dan Maas


> Windows NT/2000 has flags that can be for each CreateFile operation
> ("open" in Unix terms), for instance
>
>   FILE_ATTRIBUTE_TEMPORARY
>   FILE_FLAG_WRITE_THROUGH
>   FILE_FLAG_NO_BUFFERING
>   FILE_FLAG_RANDOM_ACCESS
>   FILE_FLAG_SEQUENTIAL_SCAN
>

There is a BSD-originated convention for this - madvise().

If you look in the Linux VM code there is a bit of explicit code for
different madvise access patterns, but I'm not sure if it's 100% supported.

Drop-behind would be really, really nice to have for my multimedia
applications. I routinely deal with very large video files (several times
larger than my RAM). When I sequentially read though such files a bit at a
time, I do NOT want the old pages sitting there in RAM while all of my other
running programs are rudely paged out...

(hrm, maybe I could hack up my own manual read-ahead/drop-behind with mmap()
and memory locking...)

Regards,
Dan


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-26 Thread Mike Castle


On Tue, Jun 26, 2001 at 03:48:09PM -0700, Jeffrey W. Baker wrote:
> These flags would be really handy.  We already have the raw device for
> sequential reading of e.g. CDROM and DVD devices.

Not going to help 99% of the applications out there.

mrc
-- 
 Mike Castle  [EMAIL PROTECTED]  www.netcom.com/~dalgoda/
We are all of us living in the shadow of Manhattan.  -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-26 Thread Jeffrey W. Baker




On Wed, 27 Jun 2001, Stefan Hoffmeister wrote:

> : On Tue, 26 Jun 2001 18:42:56 -0300 (BRST), Rik van Riel wrote:
>
> >On Tue, 26 Jun 2001, John Stoffel wrote:
> >
> >> Or that we're doing big sequential reads of file(s) which are
> >> larger than memory, in which case expanding the cache size buys
> >> us nothing, and can actually hurt us alot.
> >
> >That's a big "OR".  I think we should have an algorithm to
> >see which of these two is the case, otherwise we're just
> >making the wrong decision half of the time.
>
> Windows NT/2000 has flags that can be for each CreateFile operation
> ("open" in Unix terms), for instance
>
>   FILE_ATTRIBUTE_TEMPORARY
>
>   FILE_FLAG_WRITE_THROUGH
>   FILE_FLAG_NO_BUFFERING
>   FILE_FLAG_RANDOM_ACCESS
>   FILE_FLAG_SEQUENTIAL_SCAN
>
> If Linux does not have mechanism that would allow the signalling of
> specific use case, it might be helpful to implement such a hinting system?

These flags would be really handy.  We already have the raw device for
sequential reading of e.g. CDROM and DVD devices.

-jwb

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-26 Thread Stefan Hoffmeister

: On Tue, 26 Jun 2001 18:42:56 -0300 (BRST), Rik van Riel wrote:

>On Tue, 26 Jun 2001, John Stoffel wrote:
>
>> Or that we're doing big sequential reads of file(s) which are
>> larger than memory, in which case expanding the cache size buys
>> us nothing, and can actually hurt us alot.
>
>That's a big "OR".  I think we should have an algorithm to
>see which of these two is the case, otherwise we're just
>making the wrong decision half of the time.

Windows NT/2000 has flags that can be for each CreateFile operation
("open" in Unix terms), for instance

  FILE_ATTRIBUTE_TEMPORARY

  FILE_FLAG_WRITE_THROUGH
  FILE_FLAG_NO_BUFFERING
  FILE_FLAG_RANDOM_ACCESS
  FILE_FLAG_SEQUENTIAL_SCAN

If Linux does not have mechanism that would allow the signalling of
specific use case, it might be helpful to implement such a hinting system?

Disclaimer: I am clueless about what the kernel provides at this time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-26 Thread Jason McMullan

On Tue, Jun 26, 2001 at 06:21:21PM -0300, Rik van Riel wrote:
> > * If we're getting low cache hit rates, don't flush
> >   processes to swap.
> > * If we're getting good cache hit rates, flush old, idle
> >   processes to swap.
> 
> ... but I fail to see this one. If we get a low cache hit
> rate, couldn't that just mean we allocated too little memory
> for the cache ?

Hmmm. I didn't take that into consideration. But at the
same time, shouldn't a VM be able to determine that its cache
strategy is causing _more_ (absolute) misses by increasing it 
cache size? The percentage of misses may go down, but total 
device I/O may stay the same.

So let's see... I'll rephrase that 'Motiviation' as:

* Minimize the total medium/slow I/Os that occur over a 
  sliding window of time. 

Is that a more general case?

> Also, how would we translate all these requirements into
> VM strategies ?

First, I would like to translate them into measurements.
Once we know how to measure these criteria, its possible to
formalize the feedback mechanism/accounting that a VM should
be aware of.

In the end, I would like a VM to have some idea of
how well its performing, and be able to attempt various
well-known strategies based upon its own performance.

-- 
Jason McMullan, Senior Linux Consultant
Linuxcare, Inc. 412.432.6457 tel, 412.656.3519 cell
[EMAIL PROTECTED], http://www.linuxcare.com/
Linuxcare. Putting open source to work.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-26 Thread Rik van Riel


On Tue, 26 Jun 2001, John Stoffel wrote:

> >> * If we're getting low cache hit rates, don't flush
> >> processes to swap.
> >> * If we're getting good cache hit rates, flush old, idle
> >> processes to swap.
>
> Rik> ... but I fail to see this one. If we get a low cache hit rate,
> Rik> couldn't that just mean we allocated too little memory for the
> Rik> cache ?
>
> Or that we're doing big sequential reads of file(s) which are
> larger than memory, in which case expanding the cache size buys
> us nothing, and can actually hurt us alot.

That's a big "OR".  I think we should have an algorithm to
see which of these two is the case, otherwise we're just
making the wrong decision half of the time.

Also, in many systems we'll be doing IO on _multiple_ files
at the same time, so I guess this will have to be a file-by-file
decision.

> I personally don't feel that the cache should be allowed to grow over
> 50% of the system's memory at all, we've got so much in the cache at
> that point, that we're probably not hitting it all that much.

Remember that disk cache includes stuff like mmap()ed
executables and swap-backed user memory. Do you really
want to limit those too ?


regards,

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-26 Thread John Stoffel



>> * If we're getting low cache hit rates, don't flush
>> processes to swap.
>> * If we're getting good cache hit rates, flush old, idle
>> processes to swap.

Rik> ... but I fail to see this one. If we get a low cache hit rate,
Rik> couldn't that just mean we allocated too little memory for the
Rik> cache ?

Or that we're doing big sequential reads of file(s) which are larger
than memory, in which case expanding the cache size buys us nothing,
and can actually hurt us alot.  

I personally don't feel that the cache should be allowed to grow over
50% of the system's memory at all, we've got so much in the cache at
that point, that we're probably not hitting it all that much.

This is why the discussion on the other cache scanning algorithm
(2Q+?) was so interesting, since it looked to handle both the LRU
vs. FIFO tradeoffs very nicely.  

Rik> I am very much interested in continuing this discussion...

Me too, even if I can just contribute comments and not much code.  

John
   John Stoffel - Senior Unix Systems Administrator - Lucent Technologies
 [EMAIL PROTECTED] - http://www.lucent.com - 978-952-7548
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM Requirement Document - v0.0

2001-06-26 Thread Rik van Riel


On Tue, 26 Jun 2001, Jason McMullan wrote:

>   If we take all the motivations from the above, and list
> them, we get:
>
>   * Don't write to the (slow,packeted) devices until
> you need to free up memory for processes.
>   * Never cache reads from immediate/fast devices.
>   * Keep packetized devices as continuously-idle as possible.
> Small chunks of idleness don't count. You want to have
> maximal stetches of idleness for the device.
>   * Keep running processes as fully in memory as possible.

I agree with your modification, and with the obvious 4
points above ...

>   * If we're getting low cache hit rates, don't flush
> processes to swap.
>   * If we're getting good cache hit rates, flush old, idle
> processes to swap.

... but I fail to see this one. If we get a low cache hit
rate, couldn't that just mean we allocated too little memory
for the cache ?

I am very much interested in continuing this discussion...

Also, how would we translate all these requirements into
VM strategies ?

regards,

Rik
--
Executive summary of a recent Microsoft press release:
   "we are concerned about the GNU General Public License (GPL)"


http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

57 matches

Mail list logo