Hi, there is an ongoing discussion initiated by Max whether Apache::SizeLimit does the right thing in reporting the current amount of RAM a process does not share with any other process as
unshared_size = total_size - shared_size Max suggests to change that definition to unshared_size = rss - shared_size Beside the fact that that change should be announced very loudly, perhaps by a new major version, because it requires current installations to adjust their parameters I am not sure whether it is the right way. (I am talking about Linux here) What does that mean? ==================== The total size of a process comprises its complete address space. Normally, by far not everything of this space is present in RAM. When the process accesses a part of its address space that is not present the CPU generates an interrupt and the operating system reads in that piece from disk or allocates an empty page and thus makes the accessed page present. Then the operation is repeated and this time it succeeds. The process normally is not aware of all this. The part of the process that is really present in RAM is the RSS. Now, Linux comes with the /proc/$PID/smaps device that reports sizes for shared and private portions of the process' address space. How does that work? =================== Linux organizes the RAM and address spaces in fixed size chunks, so called pages (normally 4kb). Now, a single page of RAM can belongs to only one process or it can be used by multiple processes (for example because they use the same algorithmic part of the C library that is read-only). So, each page has a reference count. If that refcount is 1 the page is used by only one process and hence private to it. If the refcount is >1 the page is shared among multiple processes. When /proc/$PID/smaps is read for a process Linux walks all pages of the process and classifies them in 3 groups: - the page is present in RAM and has a refcount==1 ==> add it to the process total size and to the private portion - the page is present in RAM and has a refcount>1 ==> add it to the process total size and to the shared portion - the page is not present in RAM ==> add it to the process total size The point here is, for a page that is not present Linux cannot read the refcount because that count is also not present in RAM. So, to decide if a page is shared or not it would have to read in the page. This is too expensive an operation only to read the refcount. So, while in theory a page is either used by only one process and hence private or by multiple and hence shared in practice we have total_size = private + shared + notpresent where notpresent is either shared or private, we cannot know. How processes are created? ========================== Under Linux a process is create by the fork() or clone() system calls. In theory the operating system duplicates the complete address space of the process calling fork. One copy belongs to the original process (the parent) the other is for the new process (the child). But if we really had to copy the whole address space fork() would be a really expensive operation. In fact, only a table holding pointers to the pages that comprise the parent's address space is duplicated. And all pages are marked read-only and their reference count is incremented. Now, if one of the processes wants to write to a page the CPU again generates an interrupt because the page is marked as read-only. The operating system catches that interrupt. And only now the actual page is duplicated. One page for the writing process and one for the others. The refcount of the new page becomes 1 that of the old is decremented. This working pattern is called copy- on-write. With the apache web server we have one parent process that spawns many children. At first, almost all of the child's address space is shared with the parent due to copy-on-write. Over its lifetime the child's private address space grows by 2 means: * it allocates more memory ==> total_size grows, unshared grows but shared stays the same. * it writes to portions that were initially shared with the parent ==> unshared grows, shared shrinks but total_size does not change. Now, what is the goal of Apache::SizeLimit? =========================================== If the overall working set of all apache children becomes larger than the available RAM the system then the operating system has to fetch from the disk code and/or data for each request and by doing so it has to evict pages that will be needed by the next request shortly after. Apache::SizeLimit (ASL hereafter) tries to avoid this situation. Note, there is no problem with large process sizes and heavy swap space usage if the data remains there and is normally not used. ASL can monitor a few values per apache child, the total size of the process, the RSS portion, the portion that is reported as shared and the private part. As for the "unshared" value above it can be defined as: (all rvalues are reported by /proc/$PID/smaps) 1) unshared = size - shared 2) unshared = rss - shared 3) unshared = private this is just the same as 2) because rss = shared + private What are the implications? ========================== 1) since size = shared + private + notpresent unshared becomes unshared = private + notpresent Of notpresent we know nothing for sure. It is a mix of shared and private pages. Now, if an administrator turns off swapping (swapoff /dev/...) the part of notpresent that has been there becomes present. So, we can expect shared to grow considerably. unshared will shrink by the same amount. As for absolute values, unshared in this case is quite large a number because it is on top of notpresent. 2) here unshared lacks the notpresent part. So the actual number is much less than it would be in case 1. But if an administrator turns off swapping now a part of notpresent will be added to unshared. unshared may suddenly jump over the limit in all apache children. Well, an administrator doing that on a busy web server should be converted into a httpd ... So, I am quite undecided what to do. Please comment! See also http://foertsch.name/ModPerl-Tricks/Measuring-memory-consumption/index.shtml What else could be done to hit ASL's goal? ========================================== There are a few other status fields that can be possibly used: - "Swap" in /proc/$PID/smaps don't know for sure what that means but sounds good. Need to inspect the kernel code - "Referenced" in /proc/$PID/smaps can be used to find out how many RAM a process has accessed since the last reset of the counter. We could reset it in PerlPostReadRequestHandler and read in a $r->pool cleanup. - "Pss" in /proc/$PID/smaps segment size divided by the refcount - "VmSwap" in /proc/$PID/status for example: terminate if the process starts to use swap space Certainly more. Torsten Förtsch -- Need professional modperl support? Hire me! (http://foertsch.name) Like fantasy? http://kabatinte.net