Re: VM Requirement Document - v0.0
Hi! > Here's my first pass at a VM requirements document, > for the embedded, desktop, and server cases. At the end is > a summary of general rules that should take care of all of > these cases. > > Bandwidth Descriptions: > > immediate: RAM, on-chip cache, etc. > fast: Flash reads, ROMs, etc. Flash reads aresometimes pretty slow. (Flash over IDE over PCMCIA...2MB/sec bandwidth. Slower than most harddrives. > medium:Hard drives, CD-ROMs, 100Mb ethernet, etc. CDroms are way slower than harddrives (mostly to seek times). > slow: Flash writes, floppy disks, CD-WR burners > packeted: Reads/write should be in as large a packet as possible > > Embedded Case > - > > Overview > > In the embedded case, the primary VM motiviation is to > use as _little_ caching of the filesystem for reads as > possible because (a) reads are very fast and (b) we don't > have any swap. However, we want to cache _writes_ as hard > as possible, because Flash is slow, and prone to wear. > > Machine Description > -- > RAM:4-64Mb (reads: immediate, writes: immediate) MB not Mb. 4Mb = 0.5MB. > Flash: 4-128Mb (reads: fast, writes: slow, packeted) > CDROM: 640-800Mb (reads: medium) > Swap: 0Mb > > Motiviations > > * Don't write to the (slow,packeted) devices until > you need to free up memory for processes. > * Never cache reads from immediate/fast devices. Flash connected over PCMCIA over IDE is *very* slow. You must cache it. -- Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt, details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Friday 06 July 2001 21:09, Rik van Riel wrote: > On Thu, 5 Jul 2001, Daniel Phillips wrote: > > Let me comment on this again, having spent a couple of minutes > > more thinking about it. Would you be happy paying 1% of your > > battery life to get 80% less sluggish response after a memory > > pig exits? > > Just to pull a few random numbers out of my ass too, > how about 50% of battery life for the same optimistic > 80% less sluggishness ? > > How about if it were only 30% of battery life? It's not as random as that. The idea being considered was: suppose a program starts up, goes through a period of intense, cache-sucking activity, then exits. Could we reload the applications it just displaced so that the disk activity to reload them doesn't have to take place the first time the user touches the keyboard/mouse. Sure, we obviously can, with how much complexity is another question entirely ;-) So probably, we could eliminate more than 80% of the latency we now see in such a situation, I was being conservative. Now what's the cost in battery life? Suppose it's a 128 meg machine, 1/3 filled with program text and data. Hopefully, the working sets that were evicted are largely coherent so we'll read it back in at a rate not too badly degraded from the drive's transfer rate, say 5 MB/sec. This gives about three seconds of intense reading to restore something resembling the previous working set, then the disk can spin down and perhaps the machine will suspend itself. So the question is, how much longer did the machine have to run to do this? Well, on my machine updatedb takes 5-10 minutes, so the 3 seconds of activity tacked onto the end of the episode amounts to less than 1%, and this is where the 1% figure came from. I'm not saying this would be an easy hack, just that it's possible and the numbers work. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Thu, 5 Jul 2001, Daniel Phillips wrote: > Let me comment on this again, having spent a couple of minutes > more thinking about it. Would you be happy paying 1% of your > battery life to get 80% less sluggish response after a memory > pig exits? Just to pull a few random numbers out of my ass too, how about 50% of battery life for the same optimistic 80% less sluggishness ? How about if it were only 30% of battery life? Rik -- Executive summary of a recent Microsoft press release: "we are concerned about the GNU General Public License (GPL)" http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
> Well, on a laptop memory and disk bandwith are rarely wasted - they cost > battery life. I've been playing around with different scenarios to see the differences in performance. A good way to trigger the cache problem is to untar a couple of kernel source trees or other large amounts of files, until free memory is down to less than 2mb. Then try to fire up a few apps that need some memory. The hard drive thrashes around as the VM tries to free up enough space, often using swap instead of flushing out the cache. These source trees can then be deleted which frees up the memory the cache was using and performance returns to where it should be. However, if I just fire up enough apps to use up all the memory and then go into swap, response is still acceptable. If the app requires loading from swap there is just a short lag while the VM does its thing and then life is good. I don't expect to be able to run more apps than I have memory for without a performance hit, but I do expect to be able to run with over 128MB of "real" free memory and not suffer from performance degradation (which doesn't happen at present) Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
Daniel Phillips <[EMAIL PROTECTED]> writes: > Also, notice that the scenario we were originally discussing, the off-hours > updatedb, doesn't normally happen on laptops because they tend to be > suspended at that time. No, even worse, it happens when you open the laptop for the first time in the morning, thanks to anacron. -- Alan Shutko <[EMAIL PROTECTED]> - In a variety of flavors! For children with short attention spans: boomerangs that don't come back. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Thursday 05 July 2001 17:00, Xavier Bestel wrote: > On 05 Jul 2001 17:04:00 +0200, Daniel Phillips wrote: > > Also, notice that the scenario we were originally discussing, the > > off-hours updatedb, doesn't normally happen on laptops because they tend > > to be suspended at that time. > > Suspended != halted. The updatedb stuff starts over when I bring it back > to life (RH6.2, dunno for other distribs) Yes, but then it's normally overlapped with other work you are doing, like trying to read your mail. Different problem, one we also perform poorly at but for different reasons. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On 05 Jul 2001 17:04:00 +0200, Daniel Phillips wrote: > > Well, on a laptop memory and disk bandwith are rarely wasted - they cost > > battery life. > > Let me comment on this again, having spent a couple of minutes more > thinking about it. Would you be happy paying 1% of your battery life to get > 80% less sluggish response after a memory pig exits? Told like this, of course I agree ! > Also, notice that the scenario we were originally discussing, the off-hours > updatedb, doesn't normally happen on laptops because they tend to be > suspended at that time. Suspended != halted. The updatedb stuff starts over when I bring it back to life (RH6.2, dunno for other distribs) Xav - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Thursday 05 July 2001 16:00, Xavier Bestel wrote: > On 05 Jul 2001 15:02:51 +0200, Daniel Phillips wrote: > > Here's an idea I just came up with while I was composing this... along > > the lines of using unused bandwidth for something that at least has a > > chance of being useful. Suppose we come to the end of a period of > > activity, the general 'temperature' starts to drop and disks fall idle. > > At this point we could consult a history of which currently running > > processes have been historically active and grow their working sets by > > reading in from disk. Otherwise, the memory and the disk bandwidth is > > just wasted, right? This we can do inside the kernel and not require > > coders to mess up their apps with hints. Of course, they should still > > take the time to reengineer them to reduce the cache footprint. > > Well, on a laptop memory and disk bandwith are rarely wasted - they cost > battery life. Let me comment on this again, having spent a couple of minutes more thinking about it. Would you be happy paying 1% of your battery life to get 80% less sluggish response after a memory pig exits? Also, notice that the scenario we were originally discussing, the off-hours updatedb, doesn't normally happen on laptops because they tend to be suspended at that time. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Thursday 05 July 2001 16:00, Xavier Bestel wrote: > On 05 Jul 2001 15:02:51 +0200, Daniel Phillips wrote: > > Here's an idea I just came up with while I was composing this... along > > the lines of using unused bandwidth for something that at least has a > > chance of being useful. Suppose we come to the end of a period of > > activity, the general 'temperature' starts to drop and disks fall idle. > > At this point we could consult a history of which currently running > > processes have been historically active and grow their working sets by > > reading in from disk. Otherwise, the memory and the disk bandwidth is > > just wasted, right? This we can do inside the kernel and not require > > coders to mess up their apps with hints. Of course, they should still > > take the time to reengineer them to reduce the cache footprint. > > Well, on a laptop memory and disk bandwith are rarely wasted - they cost > battery life. Then turn the feature off. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On 05 Jul 2001 15:02:51 +0200, Daniel Phillips wrote: > Here's an idea I just came up with while I was composing this... along the > lines of using unused bandwidth for something that at least has a chance of > being useful. Suppose we come to the end of a period of activity, the > general 'temperature' starts to drop and disks fall idle. At this point we > could consult a history of which currently running processes have been > historically active and grow their working sets by reading in from disk. > Otherwise, the memory and the disk bandwidth is just wasted, right? This we > can do inside the kernel and not require coders to mess up their apps with > hints. Of course, they should still take the time to reengineer them to > reduce the cache footprint. Well, on a laptop memory and disk bandwith are rarely wasted - they cost battery life. Xav - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Thursday 05 July 2001 03:49, you wrote: > > Getting the user's "interactive" programs loaded back > > in afterwards is a separate, much more difficult problem > > IMHO, but no doubt still has a reasonable solution. > > Possibly stupid suggestion... Maybe the interactive/GUI programs should > wake up once in a while and touch a couple of their pages? Go too far with > this and you'll just get in the way of performance, but I don't think it > would hurt to have processes waking up every couple of minutes and touching > glibc, libqt, libgtk, etc so they stay hot in memory... A very slow > incremental "caress" of the address space could eliminate the > "I-just-logged-in-this-morning-and-dammit-everything-has-been-paged-out" > problem. Personally, I'm in idea collection mode for that one. First things first, from my point of view, our basic replacement policy seems to be broken. The algorithms seem to be burning too much cpu and not doing enough useful work. Worse, they seem to have a nasty tendency to livelock themselves, i.e., get into situations where the mm is doing little other than scanning and transfering pages from list to list. IMHO, if these things were fixed much of the 'interactive problem' would go away because reloading the working set for the mouse, for example, would just take a few milliseconds. If not then we should take a good hard look at why the desktops have such poor working set granularity. Furthermore, approaches that rely on applications touching what they believe to be their own working sets aren't going to work very well if the mm incorrectly processes the page reference information, or incorectly balances it against other things that might be going on, so lets be sure the basics are working properly. Marcello has the right idea with his attention to better memory management statistical monitoring. How nice it would be if he got together with the guy working on the tracing module... That said, yes, it's good to think about hinting ideas, and maybe bless the idea of applications 'touching themselves' (yes, the allusion was intentional). Here's an idea I just came up with while I was composing this... along the lines of using unused bandwidth for something that at least has a chance of being useful. Suppose we come to the end of a period of activity, the general 'temperature' starts to drop and disks fall idle. At this point we could consult a history of which currently running processes have been historically active and grow their working sets by reading in from disk. Otherwise, the memory and the disk bandwidth is just wasted, right? This we can do inside the kernel and not require coders to mess up their apps with hints. Of course, they should still take the time to reengineer them to reduce the cache footprint. /me decides to stop spouting and write some code -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
> Getting the user's "interactive" programs loaded back > in afterwards is a separate, much more difficult problem > IMHO, but no doubt still has a reasonable solution. Possibly stupid suggestion... Maybe the interactive/GUI programs should wake up once in a while and touch a couple of their pages? Go too far with this and you'll just get in the way of performance, but I don't think it would hurt to have processes waking up every couple of minutes and touching glibc, libqt, libgtk, etc so they stay hot in memory... A very slow incremental "caress" of the address space could eliminate the "I-just-logged-in-this-morning-and-dammit-everything-has-been-paged-out" problem. Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
> Remember that the first message was about a laptop. At 4:00AM there's > no activity but the updatedb one (and the other cron jobs). Simply, > there's no 'accessed-often' data. Moreover, I'd bet that 90% of the > metadata touched by updatedb won't be accessed at all in the future. > Laptop users don't do find /usr/share/terminfo/ so often. Maybe, but I would think that most laptops get switched off at night. Then when turned on again in the morning, anacron realizes it missed the nightly cron jobs and then runs everything. This really does make an incredible difference to the system. If I remove the updatedb job from cron.daily, the machine won't touch swap all day and runs like charm. (That's with vmware, mozilla, openoffice, all applications that like big chunks of memory) Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Wednesday 04 July 2001 11:41, Marco Colombo wrote: > On Tue, 3 Jul 2001, Daniel Phillips wrote: > > On Tuesday 03 July 2001 12:33, Marco Colombo wrote: > > > Oh, yes, since that PAGE_AGE_BG_INTERACTIVE_MINIMUM is applied only > > > when background aging, maybe it's not enough to keep processes like > > > updatedb from causing interactive pages to be evicted. > > > That's why I said we should have another way to detect that kind of > > > activity... well, the application could just let us know (no need to > > > embed an autotuning-genetic-page-replacement-optimizer into the > > > kernel). We should just drop all FS metadata accessed by updatedb, > > > since we know that's one-shot only, without raising pressure at all. > > > > Note that some of updatedb's metadata pages are of the accessed-often > > kind, e.g., directory blocks and inodes. A blanket low priority on all > > the pages updatedb touches just won't do. > > Remember that the first message was about a laptop. At 4:00AM there's > no activity but the updatedb one (and the other cron jobs). Simply, > there's no 'accessed-often' data. Moreover, I'd bet that 90% of the > metadata touched by updatedb won't be accessed at all in the future. > Laptop users don't do find /usr/share/terminfo/ so often. The problem is when you have a directory block, say, that has to stay around quite a few seconds before dropping into disuse. You sure don't want that block treated as 'accessed-once'. The goal here is to get through the updatedb as quickly as possible. Getting the user's "interactive" programs loaded back in afterwards is a separate, much more difficult problem IMHO, but no doubt still has a reasonable solution. I'm not that worried about it, my feeling is: if we fix up the MM so it doesn't bog down with a lot of pages in cache and, in addition, do better readahead, interactive performance will be just fine. > > > Just like > > > (not that I'm proposing it) putting those "one-shot" pages directly on > > > the inactive-clean list instead of the active list. How an application > > > could declare such a behaviour is an open question, of course. Maybe > > > it's even possible to detect it. And BTW that's really fine tuning. > > > Evicting an 8 hours old page may be a mistake sometime, but it's never > > > a *big* mistake. > > > > IMHO, updatedb *should* evict all the "interactive" pages that aren't > > actually doing anything[1]. That way it should run faster, provided of > > course its accessed-once pages are properly given low priority. > > So in the morning you find your Gnome session completely on swap, > and at the same time a lot of free mem. > > > I see three page priority levels: > > > > 0 - accessed-never/aged to zero > > 1 - accessed-once/just loaded > > 2 - accessed-often > > > > with these transitions: > > > > 0 -> 1, if a page is accessed > > 1 -> 2, if a page is accessed a second time > > 1 -> 0, if a page gets old > > 2 -> 0, if a page gets old > > > > The 0 and 1 level pages are on a fifo queue, the 2 level pages are > > scanned clock-wise, relying on the age computation[2]. Eviction > > candidates are taken from the cold end of the 0 level list, unless it is > > empty, in which case they are taken from the 1 level list. In > > desperation, eviction candidates are taken from the 2 level list, i.e., > > random eviction policy, as opposed to what we do now which is to initiate > > an emergency scan of the active list for new inactive candidates - rather > > like calling a quick board meeting when the building is on fire. > > Well, it's just aging faster when it's needed. Random evicting is not > good. It's better than getting bogged down in scanning latency just at the point you should be starting new writeouts. Obviously, it's a tradeoff. > List 2 is ordered by age, and there're always better candidates > at the end of the list than at the front. The higher the pressure, > the shorter is the time a page has to rest idle to get at the end of the > list. But the list *is* ordered. No, list 2 is randomly ordered. Pages move from the initial trial list to the active list with 0 temperature, and drop in just behind the one-hand scan pointer (which we actually implement as the head of the list). After that they get "aged" up or down as we do now. (New improved terminology: heated or cooled according to the referenced bit.) > > Note that the above is only a very slight departure from the current > > design. And by the way, this is just brainstorming, it hasn't reached the > > 'proposal' stage yet. > > > > [1] It would be nice to have a mechanism whereby the evicted > > 'interactive' pages are automatically reloaded when updatedb has finished > > its work. This is a case of scavenging unused disk bandwidth for > > something useful, i.e., improving the interactive experience. > > updatedb doesn't really need all the memory it takes. All it needs is > a small buffer to sequentially scan all the disk. So we s
Re: VM Requirement Document - v0.0
On Wednesday 04 July 2001 10:32, Marco Colombo wrote: > On Tue, 3 Jul 2001, Daniel Phillips wrote: > > On Monday 02 July 2001 20:42, Rik van Riel wrote: > > > On Thu, 28 Jun 2001, Marco Colombo wrote: > > > > I'm not sure that, in general, recent pages with only one access are > > > > still better eviction candidates compared to 8 hours old pages. Here > > > > we need either another way to detect one-shot activity (like the one > > > > performed by updatedb), > > > > > > Fully agreed, but there is one problem with this idea. > > > Suppose you have a maximum of 20% of your RAM for these > > > "one-shot" things, now how are you going to be able to > > > page in an application with a working set of, say, 25% > > > the size of RAM ? > > > > Easy. What's the definition of working set? Those pages that are > > frequently referenced. So as the application starts up some of its pages > > will get promoted from used-once to used-often. (On the other hand, the > > target behavior here conflicts with the goal of grouping together several > > temporally-related accesses to the same page together as one access, so > > there's a subtle distinction to be made here, see below.) > > [...] > > In Rik example, the ws is larger than available memory. Part of it > (the "hottest" one) will get double-accesses, but other pages will keep > condending the few available (physical) pages with no chance of being > accessed twice. But see my previous posting... But that's exactly what we want. Note that the idea of reserving a fixed amount of memory for "one-shot" pages wasn't mine. I see no reason to set a limit. There's only one critereon: does a page get referenced between the time it's created and when its probation period expires? Once a page makes it into the active (level 2) set it's on an equal footing with lots of others and it's up to our intrepid one-hand clock to warm it up or cool it down as appropriate. On the other hand, if the page gets sent to death row it still has a few chances to prove its worth before being cleaned up and sent to the aba^H^H^H^H^H^H^H^H reclaimed. (Apologies for the multiplying metaphors ;-) -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Tue, 3 Jul 2001, Daniel Phillips wrote: > On Tuesday 03 July 2001 12:33, Marco Colombo wrote: > > Oh, yes, since that PAGE_AGE_BG_INTERACTIVE_MINIMUM is applied only > > when background aging, maybe it's not enough to keep processes like > > updatedb from causing interactive pages to be evicted. > > That's why I said we should have another way to detect that kind of > > activity... well, the application could just let us know (no need to > > embed an autotuning-genetic-page-replacement-optimizer into the kernel). > > We should just drop all FS metadata accessed by updatedb, since we > > know that's one-shot only, without raising pressure at all. > > Note that some of updatedb's metadata pages are of the accessed-often kind, > e.g., directory blocks and inodes. A blanket low priority on all the pages > updatedb touches just won't do. Remember that the first message was about a laptop. At 4:00AM there's no activity but the updatedb one (and the other cron jobs). Simply, there's no 'accessed-often' data. Moreover, I'd bet that 90% of the metadata touched by updatedb won't be accessed at all in the future. Laptop users don't do find /usr/share/terminfo/ so often. > > Just like > > (not that I'm proposing it) putting those "one-shot" pages directly on > > the inactive-clean list instead of the active list. How an application > > could declare such a behaviour is an open question, of course. Maybe it's > > even possible to detect it. And BTW that's really fine tuning. > > Evicting an 8 hours old page may be a mistake sometime, but it's never > > a *big* mistake. > > IMHO, updatedb *should* evict all the "interactive" pages that aren't > actually doing anything[1]. That way it should run faster, provided of > course its accessed-once pages are properly given low priority. So in the morning you find your Gnome session completely on swap, and at the same time a lot of free mem. > I see three page priority levels: > > 0 - accessed-never/aged to zero > 1 - accessed-once/just loaded > 2 - accessed-often > > with these transitions: > > 0 -> 1, if a page is accessed > 1 -> 2, if a page is accessed a second time > 1 -> 0, if a page gets old > 2 -> 0, if a page gets old > > The 0 and 1 level pages are on a fifo queue, the 2 level pages are scanned > clock-wise, relying on the age computation[2]. Eviction candidates are taken > from the cold end of the 0 level list, unless it is empty, in which case they > are taken from the 1 level list. In desperation, eviction candidates are > taken from the 2 level list, i.e., random eviction policy, as opposed to what > we do now which is to initiate an emergency scan of the active list for new > inactive candidates - rather like calling a quick board meeting when the > building is on fire. Well, it's just aging faster when it's needed. Random evicting is not good. List 2 is ordered by age, and there're always better candidates at the end of the list than at the front. The higher the pressure, the shorter is the time a page has to rest idle to get at the end of the list. But the list *is* ordered. > Note that the above is only a very slight departure from the current design. > And by the way, this is just brainstorming, it hasn't reached the 'proposal' > stage yet. > > [1] It would be nice to have a mechanism whereby the evicted 'interactive' > pages are automatically reloaded when updatedb has finished its work. This > is a case of scavenging unused disk bandwidth for something useful, i.e., > improving the interactive experience. updatedb doesn't really need all the memory it takes. All it needs is a small buffer to sequentially scan all the disk. So we should just drop all the pages it references, since we already know they won't be referenced again by noone else. > [2] I much prefer the hot/cold terminology over old/young. The latter gets > confusing because a 'high' age is 'young'. I'd rather think of a high value > as being 'hot'. True. s/page->age/page->temp/g B-) .TM. -- / / / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _/ _/ _/ [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Tue, 3 Jul 2001, Daniel Phillips wrote: > On Monday 02 July 2001 20:42, Rik van Riel wrote: > > On Thu, 28 Jun 2001, Marco Colombo wrote: > > > I'm not sure that, in general, recent pages with only one access are > > > still better eviction candidates compared to 8 hours old pages. Here > > > we need either another way to detect one-shot activity (like the one > > > performed by updatedb), > > > > Fully agreed, but there is one problem with this idea. > > Suppose you have a maximum of 20% of your RAM for these > > "one-shot" things, now how are you going to be able to > > page in an application with a working set of, say, 25% > > the size of RAM ? > > Easy. What's the definition of working set? Those pages that are frequently > referenced. So as the application starts up some of its pages will get > promoted from used-once to used-often. (On the other hand, the target > behavior here conflicts with the goal of grouping together several > temporally-related accesses to the same page together as one access, so > there's a subtle distinction to be made here, see below.) [...] In Rik example, the ws is larger than available memory. Part of it (the "hottest" one) will get double-accesses, but other pages will keep condending the few available (physical) pages with no chance of being accessed twice. But see my previous posting... .TM. -- / / / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _/ _/ _/ [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Tue, 3 Jul 2001, Daniel Phillips wrote: > And by the way, this is just brainstorming, it hasn't reached the 'proposal' > stage yet. So while we're here, an idea someone proposed in #debian while discussion this thread ([EMAIL PROTECTED], you know who you are): QoS for application paging on desktops. Basically you designate to the kernel which applications you want to give priviledges, and it avoids swapping them out, even if they've been idle for a long time. You designate your desktop apps, and then when updatedb comes along they don't get kicked (but something more intensive like a kernel compile would claim the pages). Maybe it would be as simple as a category of apps whose pages won't get kicked before a singly-touched page (like and updatedb or streaming media run). For the record, I'm impressed with the new VM design, and I think its unbiased behaviour (once the bugs are ironed out) will be exactly what I'm looking for in life (traditional Unix "the fair way") :) Currently using a 4-way RS/6000 running AIX 4.2 which has been up for a long time and is running a lot of programs (even though the active set is quite reasonable), and decides to swap at evil times :) Looking forward to the tweaks/settings options that will appear on this VM over the next little while... Cheers, Ari Heitner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Monday 02 July 2001 20:42, Rik van Riel wrote: > On Thu, 28 Jun 2001, Marco Colombo wrote: > > I'm not sure that, in general, recent pages with only one access are > > still better eviction candidates compared to 8 hours old pages. Here > > we need either another way to detect one-shot activity (like the one > > performed by updatedb), > > Fully agreed, but there is one problem with this idea. > Suppose you have a maximum of 20% of your RAM for these > "one-shot" things, now how are you going to be able to > page in an application with a working set of, say, 25% > the size of RAM ? Easy. What's the definition of working set? Those pages that are frequently referenced. So as the application starts up some of its pages will get promoted from used-once to used-often. (On the other hand, the target behavior here conflicts with the goal of grouping together several temporally-related accesses to the same page together as one access, so there's a subtle distinction to be made here, see below.) The point here is that there are such things as run-once program pages, just as there are use-once file pages. Both should get low priority and be evicted early, regardless of the fact they were just loaded. > If you don't have any special measures, the pages from > this "new" application will always be treated as one-shot > pages and the process will never be able to be cached in > memory completely... The self-balancing way of doing this is to promote pages from the old end of the used-once list to the used-often (active) list at a rate corresponding to the fault-in rate so we get more aggressive promotion of referenced-often pages during program loading, and conversely, aggressive demotion of referenced-once pages. -- Daniel -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
An ammendment to my previous post... > I see three page priority levels: > > 0 - accessed-never/aged to zero > 1 - accessed-once/just loaded > 2 - accessed-often > > with these transitions: > > 0 -> 1, if a page is accessed > 1 -> 2, if a page is accessed a second time > 1 -> 0, if a page gets old > 2 -> 0, if a page gets old Better: 1 -> 0, if a page was not referenced before arriving at the old end 1 -> 2, if it was Meaning that multiple accesses to pages on the level 1 list are treated as a single access. In addition, this reflects what we can do practically with the hardware referenced bit. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Tuesday 03 July 2001 12:33, Marco Colombo wrote: > Oh, yes, since that PAGE_AGE_BG_INTERACTIVE_MINIMUM is applied only > when background aging, maybe it's not enough to keep processes like > updatedb from causing interactive pages to be evicted. > That's why I said we should have another way to detect that kind of > activity... well, the application could just let us know (no need to > embed an autotuning-genetic-page-replacement-optimizer into the kernel). > We should just drop all FS metadata accessed by updatedb, since we > know that's one-shot only, without raising pressure at all. Note that some of updatedb's metadata pages are of the accessed-often kind, e.g., directory blocks and inodes. A blanket low priority on all the pages updatedb touches just won't do. > Just like > (not that I'm proposing it) putting those "one-shot" pages directly on > the inactive-clean list instead of the active list. How an application > could declare such a behaviour is an open question, of course. Maybe it's > even possible to detect it. And BTW that's really fine tuning. > Evicting an 8 hours old page may be a mistake sometime, but it's never > a *big* mistake. IMHO, updatedb *should* evict all the "interactive" pages that aren't actually doing anything[1]. That way it should run faster, provided of course its accessed-once pages are properly given low priority. I see three page priority levels: 0 - accessed-never/aged to zero 1 - accessed-once/just loaded 2 - accessed-often with these transitions: 0 -> 1, if a page is accessed 1 -> 2, if a page is accessed a second time 1 -> 0, if a page gets old 2 -> 0, if a page gets old The 0 and 1 level pages are on a fifo queue, the 2 level pages are scanned clock-wise, relying on the age computation[2]. Eviction candidates are taken from the cold end of the 0 level list, unless it is empty, in which case they are taken from the 1 level list. In desperation, eviction candidates are taken from the 2 level list, i.e., random eviction policy, as opposed to what we do now which is to initiate an emergency scan of the active list for new inactive candidates - rather like calling a quick board meeting when the building is on fire. Note that the above is only a very slight departure from the current design. And by the way, this is just brainstorming, it hasn't reached the 'proposal' stage yet. [1] It would be nice to have a mechanism whereby the evicted 'interactive' pages are automatically reloaded when updatedb has finished its work. This is a case of scavenging unused disk bandwidth for something useful, i.e., improving the interactive experience. [2] I much prefer the hot/cold terminology over old/young. The latter gets confusing because a 'high' age is 'young'. I'd rather think of a high value as being 'hot'. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Mon, 2 Jul 2001, Rik van Riel wrote: > On Thu, 28 Jun 2001, Marco Colombo wrote: > > > I'm not sure that, in general, recent pages with only one access are > > still better eviction candidates compared to 8 hours old pages. Here > > we need either another way to detect one-shot activity (like the one > > performed by updatedb), > > Fully agreed, but there is one problem with this idea. > Suppose you have a maximum of 20% of your RAM for these > "one-shot" things, now how are you going to be able to > page in an application with a working set of, say, 25% > the size of RAM ? > > If you don't have any special measures, the pages from > this "new" application will always be treated as one-shot > pages and the process will never be able to be cached in > memory completely... I see your point. Running Gnome on a 64MB box means you have most of the pages that are "warm" (using my definition), so there's little room for "cold" (new) pages, and maybe they don't get a chance of being accessed a second time before they are evicted, which leads to thrashing if you're trying to start something really big (well, I guess the access pattern within a typical ws is not uniformly distributed, so some pages will get accessed twice, but I see the problem). I'll try and make my point a bit clearer. I was referring to background aging only. When aging is caused by pressure, you don't make any difference between pages. I don't know how the idea to give high values for page->age on the second access instead of the first is going to be implemented, but I'm assuming that new pages are going to be placed on the active list with a low age value (PAGE_AGE_START_FIRST ?), maybe even 0 (well, I'm not a guru of course). I'm just saying that, to avoid Mike's "problem" (which BTW I don't believe is a big one, really), we could stop background aging on interactive pages (short form for "pages that belong to the ws of an interactive process") at a certain miminum age, say PAGE_AGE_BG_INTERACTIVE_MINIMUM, with PAGE_AGE_BG_INTERACTIVE_MINIMUM > PAGE_AGE_START_FIRST). Weighting the difference between the two ages, you can give long-standing interactive pages some advantage vs new pages. But they will be aged below PAGE_AGE_START_FIRST and eventually moved to the inactive list. After all, they *are* good candidates. Does this make some sense? Oh, yes, since that PAGE_AGE_BG_INTERACTIVE_MINIMUM is applied only when background aging, maybe it's not enough to keep processes like updatedb from causing interactive pages to be evicted. That's why I said we should have another way to detect that kind of activity... well, the application could just let us know (no need to embed an autotuning-genetic-page-replacement-optimizer into the kernel). We should just drop all FS metadata accessed by updatedb, since we know that's one-shot only, without raising pressure at all. Just like (not that I'm proposing it) putting those "one-shot" pages directly on the inactive-clean list instead of the active list. How an application could declare such a behaviour is an open question, of course. Maybe it's even possible to detect it. And BTW that's really fine tuning. Evicting an 8 hours old page may be a mistake sometime, but it's never a *big* mistake. > > Rik > -- > Virtual memory is like a game you can't win; > However, without VM there's truly nothing to lose... > > http://www.surriel.com/ http://distro.conectiva.com/ > > Send all your spam to [EMAIL PROTECTED] (spam digging piggy) .TM. -- / / / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _/ _/ _/ [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Thu, 28 Jun 2001, Marco Colombo wrote: > I'm not sure that, in general, recent pages with only one access are > still better eviction candidates compared to 8 hours old pages. Here > we need either another way to detect one-shot activity (like the one > performed by updatedb), Fully agreed, but there is one problem with this idea. Suppose you have a maximum of 20% of your RAM for these "one-shot" things, now how are you going to be able to page in an application with a working set of, say, 25% the size of RAM ? If you don't have any special measures, the pages from this "new" application will always be treated as one-shot pages and the process will never be able to be cached in memory completely... Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to [EMAIL PROTECTED] (spam digging piggy) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
[...] > immediate: RAM, on-chip cache, etc. > fast: Flash reads, ROMs, etc. > medium:Hard drives, CD-ROMs, 100Mb ethernet, etc. > slow: Flash writes, floppy disks, CD-WR burners > packeted: Reads/write should be in as large a packet as possible > > Embedded Case [...] > Desktop Case I'm not sure there's any point in separating the cases like this. The complex part of the VM is the caching part => to be a good cache you must take into account the speed of accesses to the cached medium, including warm up times for sleepy drives etc. It would be really cool if the VM could do that, so e.g. in the ideal world you could connect up a slow harddrive and have its contents cached as swap on your fast harddrive(!) (not a new idea btw and already implemented elsewhere). I.e. from the point of view of the VM a computer is just a group of data storage units and it's allowed to use up certain parts of each one to do stuff [...] -- http://ape.n3.net - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Thu, 28 Jun 2001, Daniel Phillips wrote: > On Thursday 28 June 2001 14:20, [EMAIL PROTECTED] wrote: > > > If individual pages could be classified as code (text segments), > > > data, file cache, and so on, I would specify costs to the paging > > > of such pages in or out. This way I can make the system perfer > > > to drop a file cache page that has not been accessed for five > > > minutes, over a program text page that has not been acccessed > > > for one hour (or much more). > > > > This would be extremely useful. My laptop has 256mb of ram, but every day > > it runs the updatedb for locate. This fills the memory with the file > > cache. Interactivity is then terrible, and swap is unnecessarily used. On > > the laptop all this hard drive thrashing is bad news for battery life > > (plus the fact that laptop hard drives are not the fastest around). I > > purposely do not run more applications than can comfortably fit in the > > 256mb of memory. > > > > If fact, to get interactivity back, I've got a small 10 liner that mallocs > > memory to *force* stuff into swap purely so I can have a large block of > > memory back for interactivity. > > > > Something simple that did "you haven't used this file for 30mins, flush it > > out of the cache would be sufficient" > > Updatedb fills memory full of clean file pages so there's nothing to flush. > Did you mean "evict"? Well, I believe all inodes get dirtied for access time update, unless the FS is mounted no_atime. And it does write its database file... > Roughly speaking we treat clean pages as "instantly relaimable". Eviction > and reclaiming are done in the same step (look at reclaim_page). The key to > efficient mm is nothing more or less than choosing the best victim for > reclaiming and we aren't doing a spectacularly good job of that right now. > > There is a simple change in strategy that will fix up the updatedb case quite > nicely, it goes something like this: a single access to a page (e.g., reading > it) isn't enough to bring it to the front of the LRU queue, but accessing it > twice or more is. This is being looked at. You mean that pages that belong to interactive applications (working sets) won't be evicted to make room for the cache? And that pages just filled with data read by updatedb will be chosen instead (a kind of drop-behind)? There's nothing really wrong in the kernel "swapping out" interactive applications at 4 a.m., their pages have the property of both not being accessed recently and (the kernel doesn't know, of course) not going to be useful in the near future (say for another 4 hours). In the end they *are* good canditates for eviction. > Note that we don't actually use a LRU queue, we use a more efficient > approximation called aging, so the above is not a recipe for implementation. I'm not sure that, in general, recent pages with only one access are still better eviction candidates compared to 8 hours old pages. Here we need either another way to detect one-shot activity (like the one performed by updatedb), or to keep pages that belong to the working set of interactive processes somewhat "warm", and never let them age too much. A page with one one (read) access can be "cold". A page with more than one access becomes "hot". Aging moves page towards the "cold" state, and of course "cold" pages are the best candidates for eviction. Pages belonging to interactive processes are never moved from the "warm" state into the "cold" state by the background aging. Maybe this can be implemented just leaving such pages on the active list, and deactivating them only on pressure. Or not leaving their age reach 0. (Well, i'm not really into current VM implementation. I guess that those single access pages will be placed on the end of the active list with age 0, or something like that). If I understand the current VM code, after 8 hours of idle time, all pages of interactive applications will be on the inactive(_clean?) list, ready for eviction. Even if you place new pages (the updatedb activity) at the *end* of the active list, (instead of the front), it won't be enough to prevent application pages from being evicted. It won't solve Mike's problem, that is. > > -- > Daniel > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > .TM. -- / / / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _/ _/ _/ [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Thursday 28 June 2001 17:21, Jonathan Morton wrote: > >There is a simple change in strategy that will fix up the updatedb case > > quite nicely, it goes something like this: a single access to a page > > (e.g., reading it) isn't enough to bring it to the front of the LRU > > queue, but accessing it twice or more is. This is being looked at. > > Say, when a page is created due to a page fault, page->age is set to > zero instead of whatever it is now. This isn't quite enough. We do want to be able to assign a ranking to members of the accessed-once set, and we do want to distinguish between newly created pages and pages that have aged all the way to zero. > Then, on the first access, it is > incremented to one. All accesses where page->age was previously zero > cause it to be incremented to one, and subsequent accesses where > page->age is non-zero cause a doubling rather than an increment. > This gives a nice heavy priority boost to frequently-accessed pages... While on that topic, could somebody please explain to me why exponential aging is better than linear aging by a suitably chosen increment? It's clear what's wrong with it: after 32 hits you lose all further information. I suspect there are more problems with it than that. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
>There is a simple change in strategy that will fix up the updatedb case quite >nicely, it goes something like this: a single access to a page (e.g., reading >it) isn't enough to bring it to the front of the LRU queue, but accessing it >twice or more is. This is being looked at. Say, when a page is created due to a page fault, page->age is set to zero instead of whatever it is now. Then, on the first access, it is incremented to one. All accesses where page->age was previously zero cause it to be incremented to one, and subsequent accesses where page->age is non-zero cause a doubling rather than an increment. This gives a nice heavy priority boost to frequently-accessed pages... >Note that we don't actually use a LRU queue, we use a more efficient >approximation called aging, so the above is not a recipe for implementation. Maybe it is, but in a slightly lateral manner as above. -- -- from: Jonathan "Chromatix" Morton mail: [EMAIL PROTECTED] (not for attachments) website: http://www.chromatix.uklinux.net/vnc/ geekcode: GCS$/E dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) tagline: The key to knowledge is not to rely on people to teach you it. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Thursday 28 June 2001 15:37, Alan Cox wrote: > > The problem with updatedb is that it pushes all applications to the swap, > > and when you get back in the morning, everything has to be paged back > > from swap just because the (stupid) OS is prepared for yet another > > updatedb run. > > Updatedb is a bit odd in that it mostly sucks in metadata and the buffer to > page cache balancing is a bit suspect IMHO. For Ext2, most or all of that metadata will be moved into the page cache early in 2.5, and other filesystem will likely follow that lead. That's not to say the buffer/page cache balancing shouldn't get attention, just that this particular problem will die by itself. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Thursday 28 June 2001 14:20, [EMAIL PROTECTED] wrote: > > If individual pages could be classified as code (text segments), > > data, file cache, and so on, I would specify costs to the paging > > of such pages in or out. This way I can make the system perfer > > to drop a file cache page that has not been accessed for five > > minutes, over a program text page that has not been acccessed > > for one hour (or much more). > > This would be extremely useful. My laptop has 256mb of ram, but every day > it runs the updatedb for locate. This fills the memory with the file > cache. Interactivity is then terrible, and swap is unnecessarily used. On > the laptop all this hard drive thrashing is bad news for battery life > (plus the fact that laptop hard drives are not the fastest around). I > purposely do not run more applications than can comfortably fit in the > 256mb of memory. > > If fact, to get interactivity back, I've got a small 10 liner that mallocs > memory to *force* stuff into swap purely so I can have a large block of > memory back for interactivity. > > Something simple that did "you haven't used this file for 30mins, flush it > out of the cache would be sufficient" Updatedb fills memory full of clean file pages so there's nothing to flush. Did you mean "evict"? Roughly speaking we treat clean pages as "instantly relaimable". Eviction and reclaiming are done in the same step (look at reclaim_page). The key to efficient mm is nothing more or less than choosing the best victim for reclaiming and we aren't doing a spectacularly good job of that right now. There is a simple change in strategy that will fix up the updatedb case quite nicely, it goes something like this: a single access to a page (e.g., reading it) isn't enough to bring it to the front of the LRU queue, but accessing it twice or more is. This is being looked at. Note that we don't actually use a LRU queue, we use a more efficient approximation called aging, so the above is not a recipe for implementation. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
> > Updatedb is a bit odd in that it mostly sucks in metadata and the buffer to > > page cache balancing is a bit suspect IMHO. > > In 2.4.6-pre, the buffer cache is no longer used for metata, right? For ext2 directory blocks the page cache is now used - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Thu, 28 Jun 2001, Alan Cox wrote: > > > That isnt really down to labelling pages, what you are talking qbout is what > > > you get for free when page aging works right (eg 2.0.39) but don't get in > > > 2.2 - and don't yet (although its coming) quite get right in 2.4.6pre. > > > > Correct, but all pages are not equal. > > That is the whole point of page aging done right. The use of a page dictates > how it is aged before being discarded. So pages referenced once are aged > rapidly, but once they get touched a couple of times then you know they arent > streaming I/O. There are other related techniques like punishing pages that > are touched when streaming I/O is done to pages further down the same file - > FreeBSD does this one for example Are you saying that classification of pages will not be useful? Only looking at the page access patterns can certainly reveal a lot, but tuning how to punish different pages is useful. > > The problem with updatedb is that it pushes all applications to the swap, > > and when you get back in the morning, everything has to be paged back from > > swap just because the (stupid) OS is prepared for yet another updatedb > > run. > > Updatedb is a bit odd in that it mostly sucks in metadata and the buffer to > page cache balancing is a bit suspect IMHO. In 2.4.6-pre, the buffer cache is no longer used for metata, right? /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
> > That isnt really down to labelling pages, what you are talking qbout is what > > you get for free when page aging works right (eg 2.0.39) but don't get in > > 2.2 - and don't yet (although its coming) quite get right in 2.4.6pre. > > Correct, but all pages are not equal. That is the whole point of page aging done right. The use of a page dictates how it is aged before being discarded. So pages referenced once are aged rapidly, but once they get touched a couple of times then you know they arent streaming I/O. There are other related techniques like punishing pages that are touched when streaming I/O is done to pages further down the same file - FreeBSD does this one for example > The problem with updatedb is that it pushes all applications to the swap, > and when you get back in the morning, everything has to be paged back from > swap just because the (stupid) OS is prepared for yet another updatedb > run. Updatedb is a bit odd in that it mostly sucks in metadata and the buffer to page cache balancing is a bit suspect IMHO. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Thu, 28 Jun 2001, Alan Cox wrote: > > This would be extremely useful. My laptop has 256mb of ram, but every day > > it runs the updatedb for locate. This fills the memory with the file > > cache. Interactivity is then terrible, and swap is unnecessarily used. On > > the laptop all this hard drive thrashing is bad news for battery life > > That isnt really down to labelling pages, what you are talking qbout is what > you get for free when page aging works right (eg 2.0.39) but don't get in > 2.2 - and don't yet (although its coming) quite get right in 2.4.6pre. Correct, but all pages are not equal. The problem with updatedb is that it pushes all applications to the swap, and when you get back in the morning, everything has to be paged back from swap just because the (stupid) OS is prepared for yet another updatedb run. Other bad activities include copying lots of files, tar/untar:ing and CD writing. They all cause unwanted paging, at least for the desktop user. /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
Stefan Hoffmeister <[EMAIL PROTECTED]> writes: [...] > Windows NT/2000 has flags that can be for each CreateFile operation > ("open" in Unix terms), for instance > > FILE_ATTRIBUTE_TEMPORARY > > FILE_FLAG_WRITE_THROUGH > FILE_FLAG_NO_BUFFERING > FILE_FLAG_RANDOM_ACCESS > FILE_FLAG_SEQUENTIAL_SCAN > > If Linux does not have mechanism that would allow the signalling of > specific use case, it might be helpful to implement such a hinting > system? madvise(2) does it on mappings IIRC -- Seeking summer job at last minute - see http://ape.n3.net/cv.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On 28 Jun 2001, Xavier Bestel wrote: > On 28 Jun 2001 14:02:09 +0200, Tobias Ringstrom wrote: > > > This would be very useful, I think. Would it be very hard to classify > > pages like this (text/data/cache/...)? > > How would you classify a page of perl code ? I do know how the Perl interpreter works, but I think it byte-compiles the code and puts it in the data segment, which also would have a high paging cost. The perl source code would be paged in/out before running binaries such as shells and the window system, but the same thing would happen to binaries with short life-span, I suppose. Perhaps cached executables and cached data files can be classified differently as well. What I meant to ask with the question above was if it would be hard to implement the classification in the kernel. /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On 28 Jun 2001 14:02:09 +0200, Tobias Ringstrom wrote: > This would be very useful, I think. Would it be very hard to classify > pages like this (text/data/cache/...)? How would you classify a page of perl code ? Xav - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
> This would be extremely useful. My laptop has 256mb of ram, but every day > it runs the updatedb for locate. This fills the memory with the file > cache. Interactivity is then terrible, and swap is unnecessarily used. On > the laptop all this hard drive thrashing is bad news for battery life That isnt really down to labelling pages, what you are talking qbout is what you get for free when page aging works right (eg 2.0.39) but don't get in 2.2 - and don't yet (although its coming) quite get right in 2.4.6pre. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
> If individual pages could be classified as code (text segments), > data, file cache, and so on, I would specify costs to the paging > of such pages in or out. This way I can make the system perfer > to drop a file cache page that has not been accessed for five > minutes, over a program text page that has not been acccessed > for one hour (or much more). This would be extremely useful. My laptop has 256mb of ram, but every day it runs the updatedb for locate. This fills the memory with the file cache. Interactivity is then terrible, and swap is unnecessarily used. On the laptop all this hard drive thrashing is bad news for battery life (plus the fact that laptop hard drives are not the fastest around). I purposely do not run more applications than can comfortably fit in the 256mb of memory. If fact, to get interactivity back, I've got a small 10 liner that mallocs memory to *force* stuff into swap purely so I can have a large block of memory back for interactivity. Something simple that did "you haven't used this file for 30mins, flush it out of the cache would be sufficient" Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Thu, 28 Jun 2001, Helge Hafting wrote: > Preventing swap-trashing at all cost doesn't help if the > machine loose to io-trashing instead. Performance will be > just as much down, although perhaps more satisfying because > people aren't that surprised if explicit file operations > take a long time. They hate it when moving the mouse > or something cause a disk access even if their > apps runs faster. :-( Exactly. I still want the ability to tune the system according to my taste. I've been thinking about this for some time, and I've specifically tried to come up with nice tunables, completely ignoring if it is possible now or not. If individual pages could be classified as code (text segments), data, file cache, and so on, I would specify costs to the paging of such pages in or out. This way I can make the system perfer to drop a file cache page that has not been accessed for five minutes, over a program text page that has not been acccessed for one hour (or much more). This would be very useful, I think. Would it be very hard to classify pages like this (text/data/cache/...)? Any reason why this is a bad idea? /Tobias - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
Helge Hafting wrote: > > Martin Knoblauch wrote: > > > > > maybe more specific: If the hit-rate is low and the cache is already > > 70+% of the systems memory, the chances maybe slim that more cache is > > going to improve the hit-rate. > > > Oh, but this is posible. You can get into situations where > the (file cache) working set needs 80% or so of memory > to get a near-perfect hitrate, and where > using 70% of memory will trash madly due to the file access thats why I said "maybe" :-) Sure, another 5% of cache may improve things, but they also may kill the interactive performance. Thats why there should be probably more than one VM strategy to accomodate Servers and Workstations/Lpatops. > pattern. And this won't be a problem either, if > the working set of "other" (non-file) > stuff is below 20% of memory. The total size of > non-file stuff may be above 20% though, so something goes > into swap. > And that is the problem. To much seems to go into swap. At least for interactive work. Unfortunatelly, with 128MB of memory I cannot entirely turn of swap. I will see how things are going once I have 256 or 512 MB (hopefully soon :-) > I definitely want the machine to work under such circumstances, > so an arbitrary limit of 70% won't work. > Do not take the 70% as an arbitrary limit. I never said that. The 70% is just my situation. The problems may arise at 60% cache or at 97.38% cache. > Preventing swap-trashing at all cost doesn't help if the Never said at all cost. > machine loose to io-trashing instead. Performance will be > just as much down, although perhaps more satisfying because > people aren't that surprised if explicit file operations > take a long time. They hate it when moving the mouse > or something cause a disk access even if their > apps runs faster. :-( > Absolutely true. And if the main purpose of the machine is interactive work (we do want to be Linux a success on the desktop, don't we?), it should not be hampered by by an IO improvement that may be only of secondary importance to the user (that the final "customer" for all the work that is done to the kernel :-). On big servers a litle paging now and then may be absolutely OK, as long as the IO is going strong. I am observing the the discussions of VM behaviour in 2.4.x for some time. They are mostly very entertaining and revealing. But they also show that one solution does not seem to benefit all possible scenarios. Therfore either more than one VM strategy is necessary, or better means of tuning the cache behaviour, or both. Definitely better ways of measuring the VM efficiency seem to be needed. While implementing VM strategies is probably out of question for a lot of the people that complain, I hope that at least my complaints are kind of useful. Martin -- -- Martin Knoblauch |email: [EMAIL PROTECTED] TeraPort GmbH|Phone: +49-89-510857-309 C+ITS|Fax:+49-89-510857-111 http://www.teraport.de |Mobile: +49-170-4904759 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
Martin Knoblauch wrote: > > maybe more specific: If the hit-rate is low and the cache is already > 70+% of the systems memory, the chances maybe slim that more cache is > going to improve the hit-rate. > Oh, but this is posible. You can get into situations where the (file cache) working set needs 80% or so of memory to get a near-perfect hitrate, and where using 70% of memory will trash madly due to the file access pattern. And this won't be a problem either, if the working set of "other" (non-file) stuff is below 20% of memory. The total size of non-file stuff may be above 20% though, so something goes into swap. I definitely want the machine to work under such circumstances, so an arbitrary limit of 70% won't work. Preventing swap-trashing at all cost doesn't help if the machine loose to io-trashing instead. Performance will be just as much down, although perhaps more satisfying because people aren't that surprised if explicit file operations take a long time. They hate it when moving the mouse or something cause a disk access even if their apps runs faster. :-( Helge Hafting - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
Rik van Riel wrote: > > On Wed, 27 Jun 2001, Martin Knoblauch wrote: > > > I do not care much whether the cache is using 99% of the systems memory > > or 50%. As long as there is free memory, using it for cache is great. I > > care a lot if the cache takes down interactivity, because it pushes out > > processes that it thinks idle, but that I need in 5 seconds. The caches > > pressure against processes > > Too bad that processes are in general cached INSIDE the cache. > > You'll have to write a new balancing story now ;) > maybe that is part of "the answer" :-) Martin -- -- Martin Knoblauch |email: [EMAIL PROTECTED] TeraPort GmbH|Phone: +49-89-510857-309 C+ITS|Fax:+49-89-510857-111 http://www.teraport.de |Mobile: +49-170-4904759 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Wed, 27 Jun 2001, Martin Knoblauch wrote: > I do not care much whether the cache is using 99% of the systems memory > or 50%. As long as there is free memory, using it for cache is great. I > care a lot if the cache takes down interactivity, because it pushes out > processes that it thinks idle, but that I need in 5 seconds. The caches > pressure against processes Too bad that processes are in general cached INSIDE the cache. You'll have to write a new balancing story now ;) regards, Rik -- Executive summary of a recent Microsoft press release: "we are concerned about the GNU General Public License (GPL)" http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
> Rik> ... but I fail to see this one. If we get a low cache hit rate, > Rik> couldn't that just mean we allocated too little memory for the > Rik> cache ? > Or that we're doing big sequential reads of file(s) which are larger > than memory, in which case expanding the cache size buys us nothing, > and can actually hurt us alot. I've got an idea about how to handle this situation generally (without sending 'tips' to kernel via madvice() or anything similar). Instead of sorting chached pages (i mean blocks of files) by last touch time, and dropping the oldest page(s) if we're sort on memory, i would propose this nicer algorithm: (i this is relevant only to the read cache) Suppose that f1,f2,...fN files cached, their sizes are s1,s2,...sN and that they were last touched t1,t2,...tN seconds ago. (t1 I personally don't feel that the cache should be allowed to grow over > 50% of the system's memory at all, we've got so much in the cache at > that point, that we're probably not hitting it all that much. > > This is why the discussion on the other cache scanning algorithm > (2Q+?) was so interesting, since it looked to handle both the LRU > vs. FIFO tradeoffs very nicely. -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Tue, 26 Jun 2001, Rik van Riel wrote: > On Tue, 26 Jun 2001, John Stoffel wrote: > > > >> * If we're getting low cache hit rates, don't flush > > >> processes to swap. > > >> * If we're getting good cache hit rates, flush old, idle > > >> processes to swap. > > > > Rik> ... but I fail to see this one. If we get a low cache hit rate, > > Rik> couldn't that just mean we allocated too little memory for the > > Rik> cache ? > > > > Or that we're doing big sequential reads of file(s) which are > > larger than memory, in which case expanding the cache size buys > > us nothing, and can actually hurt us alot. > > That's a big "OR". I think we should have an algorithm to > see which of these two is the case, otherwise we're just > making the wrong decision half of the time. > > Also, in many systems we'll be doing IO on _multiple_ files > at the same time, so I guess this will have to be a file-by-file > decision. Of course, you can always think of a "bad" behaviour. That should really be a page-by-page decision. An application may have both data and meta-data on the same file. You want to keep the metadata on core (think of access by an index, it's much better if all the index is there, even some unused parts) *and* cache commonly used data (that's just a cache of hot objects, normal replacement algoriths may be used) *and* drop-behind data on sequential scans... trying to understand what an application is doing, in order to foresee what it will be doing, it's bad attitude. Let's give an application writer a way to code it sanely (setting per-file VM attributes is fine). If an application is not friendly (gives no hints on its VM behaviour) just punish it. I mean, when tuning the VM behaviour, system health and friendly applications performance are the goals - do whatever necessary to preserve them, even kill the offender and rm its executable if someone it's running it again (*grin*) B-). .TM. -- / / / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _/ _/ _/ [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On 26 Jun 2001 20:43:33 -0400, Dan Maas wrote: > > Windows NT/2000 has flags that can be for each CreateFile operation > > ("open" in Unix terms), for instance > > > > FILE_ATTRIBUTE_TEMPORARY > > FILE_FLAG_WRITE_THROUGH > > FILE_FLAG_NO_BUFFERING > > FILE_FLAG_RANDOM_ACCESS > > FILE_FLAG_SEQUENTIAL_SCAN > > > We do (nearly) already have O_DIRECT which won't touch cache (alas I don't think I will read-ahead more) Xav - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
>> * If we're getting low cache hit rates, don't flush >> processes to swap. >> * If we're getting good cache hit rates, flush old, idle >> processes to swap. Rik> ... but I fail to see this one. If we get a low cache hit rate, Rik> couldn't that just mean we allocated too little memory for the Rik> cache ? maybe more specific: If the hit-rate is low and the cache is already 70+% of the systems memory, the chances maybe slim that more cache is going to improve the hit-rate. I do not care much whether the cache is using 99% of the systems memory or 50%. As long as there is free memory, using it for cache is great. I care a lot if the cache takes down interactivity, because it pushes out processes that it thinks idle, but that I need in 5 seconds. The caches pressure against processes should decrease with the (relative) size of the cache. Especially in low hit-rate situations. OT: I asked the question before somewhere else. Are there interfaces to the VM that expose the various cache sizes and, more important, hit-rates to userland? I would love to see (or maybe help writing in my free time) a tool to just visualize/analyze the efficiency of the VM system. Martin -- -- Martin Knoblauch |email: [EMAIL PROTECTED] TeraPort GmbH|Phone: +49-89-510857-309 C+ITS|Fax:+49-89-510857-111 http://www.teraport.de |Mobile: +49-170-4904759 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
> I personally don't feel that the cache should be allowed to grow over > 50% of the system's memory at all, we've got so much in the cache at > that point, that we're probably not hitting it all that much. That depends very much on what you're using the system for. Suppose you're running a trivial database application on a gigantic disk array - the name of the game is to cache as much metadata as possible, and that goes directly to the bottom line as performance. Might as well use 90%+ of your memory for that. The conclusion to draw here is, the balance between file cache and process memory should be able to slide all the way from one extreme to the other. It's not a requirement that that be fully automatic but it's highly desireable. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Tue, Jun 26, 2001 at 08:43:33PM -0400, Dan Maas wrote: > (hrm, maybe I could hack up my own manual read-ahead/drop-behind with mmap() > and memory locking...) Just to argue portability for a moment (portability on the expected results, that is, vs APIs). Would this technique work across a variety of OSes? Would the recent caching difficulties of the 2.4.* series have handled such a technique in a reasonable fashion? mrc -- Mike Castle [EMAIL PROTECTED] www.netcom.com/~dalgoda/ We are all of us living in the shadow of Manhattan. -- Watchmen fatal ("You are in a maze of twisty compiler features, all different"); -- gcc - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
> Windows NT/2000 has flags that can be for each CreateFile operation > ("open" in Unix terms), for instance > > FILE_ATTRIBUTE_TEMPORARY > FILE_FLAG_WRITE_THROUGH > FILE_FLAG_NO_BUFFERING > FILE_FLAG_RANDOM_ACCESS > FILE_FLAG_SEQUENTIAL_SCAN > There is a BSD-originated convention for this - madvise(). If you look in the Linux VM code there is a bit of explicit code for different madvise access patterns, but I'm not sure if it's 100% supported. Drop-behind would be really, really nice to have for my multimedia applications. I routinely deal with very large video files (several times larger than my RAM). When I sequentially read though such files a bit at a time, I do NOT want the old pages sitting there in RAM while all of my other running programs are rudely paged out... (hrm, maybe I could hack up my own manual read-ahead/drop-behind with mmap() and memory locking...) Regards, Dan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Tue, Jun 26, 2001 at 03:48:09PM -0700, Jeffrey W. Baker wrote: > These flags would be really handy. We already have the raw device for > sequential reading of e.g. CDROM and DVD devices. Not going to help 99% of the applications out there. mrc -- Mike Castle [EMAIL PROTECTED] www.netcom.com/~dalgoda/ We are all of us living in the shadow of Manhattan. -- Watchmen fatal ("You are in a maze of twisty compiler features, all different"); -- gcc - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Wed, 27 Jun 2001, Stefan Hoffmeister wrote: > : On Tue, 26 Jun 2001 18:42:56 -0300 (BRST), Rik van Riel wrote: > > >On Tue, 26 Jun 2001, John Stoffel wrote: > > > >> Or that we're doing big sequential reads of file(s) which are > >> larger than memory, in which case expanding the cache size buys > >> us nothing, and can actually hurt us alot. > > > >That's a big "OR". I think we should have an algorithm to > >see which of these two is the case, otherwise we're just > >making the wrong decision half of the time. > > Windows NT/2000 has flags that can be for each CreateFile operation > ("open" in Unix terms), for instance > > FILE_ATTRIBUTE_TEMPORARY > > FILE_FLAG_WRITE_THROUGH > FILE_FLAG_NO_BUFFERING > FILE_FLAG_RANDOM_ACCESS > FILE_FLAG_SEQUENTIAL_SCAN > > If Linux does not have mechanism that would allow the signalling of > specific use case, it might be helpful to implement such a hinting system? These flags would be really handy. We already have the raw device for sequential reading of e.g. CDROM and DVD devices. -jwb - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
: On Tue, 26 Jun 2001 18:42:56 -0300 (BRST), Rik van Riel wrote: >On Tue, 26 Jun 2001, John Stoffel wrote: > >> Or that we're doing big sequential reads of file(s) which are >> larger than memory, in which case expanding the cache size buys >> us nothing, and can actually hurt us alot. > >That's a big "OR". I think we should have an algorithm to >see which of these two is the case, otherwise we're just >making the wrong decision half of the time. Windows NT/2000 has flags that can be for each CreateFile operation ("open" in Unix terms), for instance FILE_ATTRIBUTE_TEMPORARY FILE_FLAG_WRITE_THROUGH FILE_FLAG_NO_BUFFERING FILE_FLAG_RANDOM_ACCESS FILE_FLAG_SEQUENTIAL_SCAN If Linux does not have mechanism that would allow the signalling of specific use case, it might be helpful to implement such a hinting system? Disclaimer: I am clueless about what the kernel provides at this time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Tue, Jun 26, 2001 at 06:21:21PM -0300, Rik van Riel wrote: > > * If we're getting low cache hit rates, don't flush > > processes to swap. > > * If we're getting good cache hit rates, flush old, idle > > processes to swap. > > ... but I fail to see this one. If we get a low cache hit > rate, couldn't that just mean we allocated too little memory > for the cache ? Hmmm. I didn't take that into consideration. But at the same time, shouldn't a VM be able to determine that its cache strategy is causing _more_ (absolute) misses by increasing it cache size? The percentage of misses may go down, but total device I/O may stay the same. So let's see... I'll rephrase that 'Motiviation' as: * Minimize the total medium/slow I/Os that occur over a sliding window of time. Is that a more general case? > Also, how would we translate all these requirements into > VM strategies ? First, I would like to translate them into measurements. Once we know how to measure these criteria, its possible to formalize the feedback mechanism/accounting that a VM should be aware of. In the end, I would like a VM to have some idea of how well its performing, and be able to attempt various well-known strategies based upon its own performance. -- Jason McMullan, Senior Linux Consultant Linuxcare, Inc. 412.432.6457 tel, 412.656.3519 cell [EMAIL PROTECTED], http://www.linuxcare.com/ Linuxcare. Putting open source to work. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Tue, 26 Jun 2001, John Stoffel wrote: > >> * If we're getting low cache hit rates, don't flush > >> processes to swap. > >> * If we're getting good cache hit rates, flush old, idle > >> processes to swap. > > Rik> ... but I fail to see this one. If we get a low cache hit rate, > Rik> couldn't that just mean we allocated too little memory for the > Rik> cache ? > > Or that we're doing big sequential reads of file(s) which are > larger than memory, in which case expanding the cache size buys > us nothing, and can actually hurt us alot. That's a big "OR". I think we should have an algorithm to see which of these two is the case, otherwise we're just making the wrong decision half of the time. Also, in many systems we'll be doing IO on _multiple_ files at the same time, so I guess this will have to be a file-by-file decision. > I personally don't feel that the cache should be allowed to grow over > 50% of the system's memory at all, we've got so much in the cache at > that point, that we're probably not hitting it all that much. Remember that disk cache includes stuff like mmap()ed executables and swap-backed user memory. Do you really want to limit those too ? regards, Rik -- Executive summary of a recent Microsoft press release: "we are concerned about the GNU General Public License (GPL)" http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
>> * If we're getting low cache hit rates, don't flush >> processes to swap. >> * If we're getting good cache hit rates, flush old, idle >> processes to swap. Rik> ... but I fail to see this one. If we get a low cache hit rate, Rik> couldn't that just mean we allocated too little memory for the Rik> cache ? Or that we're doing big sequential reads of file(s) which are larger than memory, in which case expanding the cache size buys us nothing, and can actually hurt us alot. I personally don't feel that the cache should be allowed to grow over 50% of the system's memory at all, we've got so much in the cache at that point, that we're probably not hitting it all that much. This is why the discussion on the other cache scanning algorithm (2Q+?) was so interesting, since it looked to handle both the LRU vs. FIFO tradeoffs very nicely. Rik> I am very much interested in continuing this discussion... Me too, even if I can just contribute comments and not much code. John John Stoffel - Senior Unix Systems Administrator - Lucent Technologies [EMAIL PROTECTED] - http://www.lucent.com - 978-952-7548 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: VM Requirement Document - v0.0
On Tue, 26 Jun 2001, Jason McMullan wrote: > If we take all the motivations from the above, and list > them, we get: > > * Don't write to the (slow,packeted) devices until > you need to free up memory for processes. > * Never cache reads from immediate/fast devices. > * Keep packetized devices as continuously-idle as possible. > Small chunks of idleness don't count. You want to have > maximal stetches of idleness for the device. > * Keep running processes as fully in memory as possible. I agree with your modification, and with the obvious 4 points above ... > * If we're getting low cache hit rates, don't flush > processes to swap. > * If we're getting good cache hit rates, flush old, idle > processes to swap. ... but I fail to see this one. If we get a low cache hit rate, couldn't that just mean we allocated too little memory for the cache ? I am very much interested in continuing this discussion... Also, how would we translate all these requirements into VM strategies ? regards, Rik -- Executive summary of a recent Microsoft press release: "we are concerned about the GNU General Public License (GPL)" http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/