Re: [PATCH] VM fixes + RSS limits 2.4.0-test13-pre5

2001-01-07 Thread Peter Chubb



Ingo wrote:
> On Wed, Jan 03, 2001 at 09:43:54AM -0200, Rik van Riel wrote:
> > On Fri, 28 Dec 2000, Mike Sklar wrote:
> > > If I wanted to adjust the rlim_cur value of a running
> > > processes, is there any sort of interface for that?
> > 
> > Hmmm, I don't think there is an interface to adjust the
> > per-process ulimit settings on-the-fly ...
> > 
> > Does anybody know if there's an interface for this ?

> If you don't mean "kill -TERM", no there isn't. It would be evil
> to the process anyway.

The RSS limits patch I sent to linux-kernel some time ago provided an
experimental /proc interface to allow exactly this.
The patch against 2.2.16 is still on our FTP server at 

ftp://ftp-au.aurema.com/private/aurpjc31/linux-2216-rsslimit.diff.bz2

Here's the patch against 2.4.0.  The main differences between this and 
Rik's patch are:
  -- you  choose soft or hard limits at kernel config time with my 
  patch; with Rik's you get both (rlim_cur is `soft' rlim_max is
  `hard') 
  -- Rik's patch does some extra stuff to the VM code as well as
 the RSS limits
  -- Rik's patch doesn't affect swap behaviour (except in so far
 as processes over their RSS limit will tend to swap, which reduces
 memory pressure on all other processes); my patch means that
 processes over RSS limit suffer somewhat
  -- My patch puts the limit into the struct mm for slightly more
 cache-friendly behaviour, and to allow later interfacing with
 per-user resource-management software (it should be possible
 to write a kernel module to adjust RSS limits to implement per-user 
 limits without affecting per-process RLIMIT values)
  -- My patch has a /proc interface to allow setting
 rlimit[RLIMIT_RSS]
  -- my patch implements the rss accounting fields so that time -v 
 gives reasonable output


Index: linux-2.4.0/CREDITS
===
RCS file: /wrk/CVSROOT/linux-2.4/CREDITS,v
retrieving revision 1.1.1.5
diff -u -b -u -r1.1.1.5 CREDITS
--- linux-2.4.0/CREDITS 2001/01/04 23:02:54 1.1.1.5
+++ linux-2.4.0/CREDITS 2001/01/08 04:41:41
@@ -491,6 +491,24 @@
 S: Stanford, California 94305
 S: USA
 
+N: Kingsley Cheung
+E: [EMAIL PROTECTED]
+D: Page fault calculation
+D: /proc//rss support
+D: kswapd improvements regarding process RSS limits 
+S: Aurema Pty Limited
+S: PO Box 305, Strawberry Hills NSW 2012, 
+S: Australia 
+
+N: Peter Chubb
+E: [EMAIL PROTECTED]
+D: Page fault calculation
+D: /proc//rss support
+D: kswapd improvements regarding process RSS limits 
+S: Aurema Pty Limited
+S: PO Box 305, Strawberry Hills NSW 2012, 
+S: Australia 
+
 N: Juan Jose Ciarlante
 W: http://juanjox.kernelnotes.org/
 E: [EMAIL PROTECTED]
Index: linux-2.4.0/Documentation/Configure.help
===
RCS file: /wrk/CVSROOT/linux-2.4/Documentation/Configure.help,v
retrieving revision 1.1.1.6
diff -u -b -u -r1.1.1.6 Configure.help
--- linux-2.4.0/Documentation/Configure.help2001/01/07 21:44:33 1.1.1.6
+++ linux-2.4.0/Documentation/Configure.help2001/01/08 04:41:41
@@ -16955,6 +16955,50 @@
   another UltraSPARC-IIi-cEngine boardset with a 7-segment display,
   you should say N to this option. 
 
+RSS Softlimits (EXPERIMENTAL)
+CONFIG_RSS_SOFTLIMIT
+  If you want the setrlimit(RLIMIT_RSS, ...) system call to work, say
+  Y either here or for RSS Hardlimits.  If you don't understand this
+  you don't need it, so say N.
+
+  RSS Softlimits will make it more likely that pages will be stolen
+  from processes that have a resident set size (i.e., real memory
+  footprint) greater than their limit.  Processes with a limit set
+  that is below their actual need may still exceed their limits, and
+  in this instance kswapd may work excessively hard.
+
+  Because of the way that RSS is measured and controlled, the limit is
+  approximate only.
+
+  It is harmless to have RSS Softlimits and RSS Hardlimits both set.
+
+RSS Hardlimits (EXPERIMENTAL)
+CONFIG_RSS_HARDLIMIT
+  If you want the setrlimit(RLIMIT_RSS, ...) system call to work, say
+  Y either here or for RSS Softlimits.  If you don't understand this
+  you don't need it, so say N.
+
+  RSS Hardlimits changes the behaviour of the kernel at page-fault
+  time.  If a process is over its RSS limit when it wants to get a new
+  page, then with this configuration option enabled the process's
+  memory space will be reduced before the page-fault continues.
+
+  Because of the way that RSS is measured and controlled, the actual
+  memory footprint of a process may exceed the set limit for a short
+  time.
+
+  It is harmless to have RSS Softlimits and RSS Hardlimits both set.
+
+Support for /proc/pid/rss (EXPERIMENTAL)
+CONFIG_PROC_RSS
+  Saying Y here adds an extra file inside each process directory in the
+  /proc file system that allows measurement and control of resident
+  set size (real m

Re: [PATCH] VM fixes + RSS limits 2.4.0-test13-pre5

2001-01-03 Thread Ingo Oeser

On Wed, Jan 03, 2001 at 09:43:54AM -0200, Rik van Riel wrote:
> On Fri, 28 Dec 2000, Mike Sklar wrote:
> > If I wanted to adjust the rlim_cur value of a running
> > processes, is there any sort of interface for that?
> 
> Hmmm, I don't think there is an interface to adjust the
> per-process ulimit settings on-the-fly ...
> 
> Does anybody know if there's an interface for this ?

If you don't mean "kill -TERM", no there isn't. It would be evil
to the process anyway.

Some[1] programs ask their resource limits on startup to scale to a
sane amount of memory usage for caching, operation buffers and
the like. If your readjust it to sth. smaller, they'll be killed
soon and if you readjust to sth, bigger, they wouldn't use it.


Regards

Ingo Oeser

[1] I would like to write "most programs", but most programs
   assume, that they will never run out of memory and leave it to
   the administrator/user to care for this issue :-(
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag 
    come and join the fun   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fixes + RSS limits 2.4.0-test13-pre5

2001-01-03 Thread Rik van Riel

On Fri, 28 Dec 2000, Mike Sklar wrote:

> If I wanted to adjust the rlim_cur value of a running
> processes, is there any sort of interface for that?

Hmmm, I don't think there is an interface to adjust the
per-process ulimit settings on-the-fly ...

Does anybody know if there's an interface for this ?

regards,

Rik
--
Hollywood goes for world dumbination,
Trailer at 11.

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fixes + RSS limits 2.4.0-test13-pre5 / test13-pre7

2000-12-30 Thread Dieter Nützel

Hello Rik,

I did some more benchmarks on this --- puh, took me some time...:-)

Test machine: 256 MB, K7 550 SlotA, SCSI, IDE, ReiserFS 3.6.23, Blocksize=4K
Test: dbench 48

2.4.0-test13-pre5 + Rik's VM fix #2

/dev/sda7:
 Timing buffered disk reads:  64 MB in  6.07 seconds = 10.54 MB/sec

Throughput 7.54785 MB/sec (NB=9.43482 MB/sec  75.4785 MBit/sec)
41.200u 95.870s 13:59.50 16.3%  0+0k 0+0io 1797pf+0w

-O -mcpu=k6 -mpreferred-stack-boundary=2 -malign-functions=4
-fschedule-insns2 -fexpensive-optimizations

Throughput 7.7981 MB/sec (NB=9.74762 MB/sec  77.981 MBit/sec)
42.180u 96.620s 13:32.54 17.0%  0+0k 0+0io 1799pf+0w

--

/dev/hdc1:
 Timing buffered disk reads:  64 MB in  2.89 seconds = 22.15 MB/sec

Throughput 9.4113 MB/sec (NB=11.7641 MB/sec  94.113 MBit/sec)
36.990u 117.720s 11:13.24 22.9% 0+0k 0+0io 1505pf+0w

-O -mcpu=k6 -mpreferred-stack-boundary=2 -malign-functions=4
-fschedule-insns2 -fexpensive-optimizations

Throughput 10.254 MB/sec (NB=12.8175 MB/sec  102.54 MBit/sec)
36.620u 112.870s 10:17.91 24.1% 0+0k 0+0io 1505pf+0w

***

2.4.0-test13-pre7

/dev/sda7:
 Timing buffered disk reads:  64 MB in  6.07 seconds = 10.54 MB/sec

Throughput 9.61382 MB/sec (NB=12.0173 MB/sec  96.1382 MBit/sec)
43.950u 96.790s 10:59.06 21.3%  0+0k 0+0io 1746pf+0w

-O -mcpu=k6 -mpreferred-stack-boundary=2 -malign-functions=4
-fschedule-insns2 -fexpensive-optimizations

Throughput 10.8312 MB/sec (NB=13.539 MB/sec  108.312 MBit/sec)
44.510u 93.000s 9:44.99 23.5%   0+0k 0+0io 1795pf+0w

-

/dev/hdc1:
 Timing buffered disk reads:  64 MB in  2.89 seconds = 22.15 MB/sec

Throughput 12.3312 MB/sec (NB=15.414 MB/sec  123.312 MBit/sec)
35.220u 112.630s 8:33.83 28.7%  0+0k 0+0io 1505pf+0w

-O -mcpu=k6 -mpreferred-stack-boundary=2 -malign-functions=4
-fschedule-insns2 -fexpensive-optimizations

Throughput 14.4331 MB/sec (NB=18.0414 MB/sec  144.331 MBit/sec)
36.060u 119.760s 7:19.00 35.4%  0+0k 0+0io 1505pf+0w

Addition:
Your fix show some 'bad' swap behavior on my 'normal' load (3D medical 
visualization). It do some 'little' swap out and in. Mostly the (not needed?) 
swap in hurts performance. A little 'cp -a X11R6 X11R6-new' take more than 2 
times longer. If my system hits the 'ZERO swap wall' the currently running 
process (render) abort immediately and restart. With test13-pre7 it runs 
several times longer (render generates some more frames) but then load goes 
up to 10 and render would be killed.

SunWave1>cat /proc/version
Linux version 2.4.0-test13-pre7 (root@SunWave1) (gcc version 2.95.2 19991024 
(release)) #1 Sat Dec 30 22:13:04 CET 2000
SunWave1>free -t
 total   used   free sharedbuffers cached
Mem:255728 164980  90748  0  34160  46488
-/+ buffers/cache:  84332 171396
Swap:   200772  8 200764
Total:  456500 164988 291512

Happy New Year!
I'll be back on Monday.

-Dieter

-- 
Dieter Nützel
Graduate Student, Computer Science

University of Hamburg
Department of Computer Science
Cognitive Systems Group
Vogt-Kölln-Straße 30
D-22527 Hamburg, Germany

email: [EMAIL PROTECTED]
@home: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] VM fixes + RSS limits 2.4.0-test13-pre5

2000-12-29 Thread Dieter Nützel

Am Freitag, 29. Dezember 2000 14:38 schrieben Sie:
> On Fri, 29 Dec 2000, Dieter Nützel wrote:
> > your patch didn't apply clean.
> > Have you another version?
>
> It should apply just fine. What error messages did
> patch give ?
>
Applied #2 against my running 2.4.0-test13-pre5 + ReiserFS 3.6.23 tree and
a clean test13-pre5 (test12 + test13-pre5). Same for both of them:

SunWave1>patch -p0 -E -N http://www.tux.org/lkml/



[PATCH] VM fixes + RSS limits 2.4.0-test13-pre5

2000-12-28 Thread Rik van Riel

Hi Linus,

I know this is probably not the birthday present you've been
hoping for, but here is a patch agains 2.4.0-test13-pre5 which
does the following - trivial - things:

1. trivially implement RSS ulimit support, with
   p->rlim[RLIMIT_RSS].rlim_max treated as a hard limit
   and .rlim_cur treated as a soft limit

2. fix the return value from try_to_swap_out() to return
   success whenever we make the RSS of a process smaller

3. clean up refill_inactive() ... try_to_swap_out() returns
   the expected result now, so things should be balanced again

4. only call deactivate_page() from generic_file_write() if we
   write "beyond the end of" the page, so partially written
   pages stay active and will remain in memory longer (8% more
   performance for dbench, as tested by Daniel Phillips)

5. (minor) s/unsigned int gfp_mask/int gfp_mask/ in vmscan.c
   ... we had both types used, which is rather inconsistent

Please consider including this patch in the next 2.4 pre-patch,
IMHO all of these things are fairly trivial and it seems to run
very nicely on my test box ;)

regards,

Rik
--
Hollywood goes for world dumbination,
Trailer at 11.

http://www.surriel.com/
http://www.conectiva.com/   http://distro.conectiva.com.br/


--- linux-2.4.0-test13-pre5/mm/filemap.c.orig   Thu Dec 28 19:11:39 2000
+++ linux-2.4.0-test13-pre5/mm/filemap.cThu Dec 28 19:28:06 2000
@@ -1912,7 +1912,7 @@
 
/* Make sure this doesn't exceed the process's max rss. */
error = -EIO;
-   rlim_rss = current->rlim ?  current->rlim[RLIMIT_RSS].rlim_cur :
+   rlim_rss = current->rlim ?  (current->rlim[RLIMIT_RSS].rlim_cur >> PAGE_SHIFT) 
+:
LONG_MAX; /* default: see resource.h */
if ((vma->vm_mm->rss + (end - start)) > rlim_rss)
return error;
@@ -2438,7 +2438,7 @@
}
 
while (count) {
-   unsigned long bytes, index, offset;
+   unsigned long bytes, index, offset, partial = 0;
char *kaddr;
 
/*
@@ -2448,8 +2448,10 @@
offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
index = pos >> PAGE_CACHE_SHIFT;
bytes = PAGE_CACHE_SIZE - offset;
-   if (bytes > count)
+   if (bytes > count) {
bytes = count;
+   partial = 1;
+   }
 
/*
 * Bring in the user page that we will copy from _first_.
@@ -2491,9 +2493,17 @@
buf += status;
}
 unlock:
-   /* Mark it unlocked again and drop the page.. */
+   /*
+* Mark it unlocked again and release the page.
+* In order to prevent large (fast) file writes
+* from causing too much memory pressure we move
+* completely written pages to the inactive list.
+* We do, however, try to keep the pages that may
+* still be written to (ie. partially written pages).
+*/
UnlockPage(page);
-   deactivate_page(page);
+   if (!partial)
+   deactivate_page(page);
page_cache_release(page);
 
if (status < 0)
--- linux-2.4.0-test13-pre5/mm/memory.c.origThu Dec 28 19:11:39 2000
+++ linux-2.4.0-test13-pre5/mm/memory.c Thu Dec 28 19:12:04 2000
@@ -1198,6 +1198,12 @@
pgd = pgd_offset(mm, address);
pmd = pmd_alloc(pgd, address);
 
+   if (mm->rss >= (current->rlim[RLIMIT_RSS].rlim_max >> PAGE_SHIFT)) {
+   lock_kernel();
+   enforce_rss_limit(mm, GFP_HIGHUSER);
+   unlock_kernel();
+   }
+
if (pmd) {
pte_t * pte = pte_alloc(pmd, address);
if (pte)
--- linux-2.4.0-test13-pre5/mm/vmscan.c.origThu Dec 28 19:11:40 2000
+++ linux-2.4.0-test13-pre5/mm/vmscan.c Thu Dec 28 20:30:10 2000
@@ -49,7 +49,8 @@
if ((!VALID_PAGE(page)) || PageReserved(page))
goto out_failed;
 
-   if (mm->swap_cnt)
+   /* RSS trimming doesn't change the process' chances wrt. normal swap */
+   if (mm->swap_cnt && !(gfp_mask & __GFP_RSS_LIMIT))
mm->swap_cnt--;
 
onlist = PageActive(page);
@@ -58,7 +59,13 @@
age_page_up(page);
goto out_failed;
}
-   if (!onlist)
+   /*
+* SUBTLE: if the page is on the active list and we're not doing
+* RSS ulimit trimming, then we let refill_inactive_scan() take
+* care of the down aging. Always aging down here would severely
+* disadvantage shared mappings (of eg libc.so).
+*/
+   if (!onlist || (gfp_mask & __GFP_RSS_LIMIT))
/* The page is still mapped, so it can't be freeable... */
age_page_down_ageonly(page);
 
@@ -85,8 +92,8 @@
 * we c