Hi,
there is an ongoing discussion initiated by Max whether Apache::SizeLimit does
the right thing in reporting the current amount of RAM a process does not
share with any other process as
unshared_size = total_size - shared_size
Max suggests to change that definition to
unshared_size = rss - shared_size
Beside the fact that that change should be announced very loudly, perhaps by a
new major version, because it requires current installations to adjust their
parameters I am not sure whether it is the right way.
(I am talking about Linux here)
What does that mean?
====================
The total size of a process comprises its complete address space. Normally, by
far not everything of this space is present in RAM. When the process accesses
a part of its address space that is not present the CPU generates an interrupt
and the operating system reads in that piece from disk or allocates an empty
page and thus makes the accessed page present. Then the operation is repeated
and this time it succeeds. The process normally is not aware of all this.
The part of the process that is really present in RAM is the RSS.
Now, Linux comes with the /proc/$PID/smaps device that reports sizes for
shared and private portions of the process' address space.
How does that work?
===================
Linux organizes the RAM and address spaces in fixed size chunks, so called
pages (normally 4kb). Now, a single page of RAM can belongs to only one
process or it can be used by multiple processes (for example because they use
the same algorithmic part of the C library that is read-only). So, each page
has a reference count. If that refcount is 1 the page is used by only one
process and hence private to it. If the refcount is >1 the page is shared
among multiple processes.
When /proc/$PID/smaps is read for a process Linux walks all pages of the
process and classifies them in 3 groups:
- the page is present in RAM and has a refcount==1
==> add it to the process total size and to the private portion
- the page is present in RAM and has a refcount>1
==> add it to the process total size and to the shared portion
- the page is not present in RAM
==> add it to the process total size
The point here is, for a page that is not present Linux cannot read the
refcount because that count is also not present in RAM. So, to decide if a
page is shared or not it would have to read in the page. This is too expensive
an operation only to read the refcount.
So, while in theory a page is either used by only one process and hence
private or by multiple and hence shared in practice we have
total_size = private + shared + notpresent
where notpresent is either shared or private, we cannot know.
How processes are created?
==========================
Under Linux a process is create by the fork() or clone() system calls. In
theory the operating system duplicates the complete address space of the
process calling fork. One copy belongs to the original process (the parent)
the other is for the new process (the child).
But if we really had to copy the whole address space fork() would be a really
expensive operation. In fact, only a table holding pointers to the pages that
comprise the parent's address space is duplicated. And all pages are marked
read-only and their reference count is incremented.
Now, if one of the processes wants to write to a page the CPU again generates
an interrupt because the page is marked as read-only. The operating system
catches that interrupt. And only now the actual page is duplicated. One page
for the writing process and one for the others. The refcount of the new page
becomes 1 that of the old is decremented. This working pattern is called copy-
on-write.
With the apache web server we have one parent process that spawns many
children. At first, almost all of the child's address space is shared with the
parent due to copy-on-write. Over its lifetime the child's private address
space grows by 2 means:
* it allocates more memory
==> total_size grows, unshared grows but shared stays the same.
* it writes to portions that were initially shared with the parent
==> unshared grows, shared shrinks but total_size does not change.
Now, what is the goal of Apache::SizeLimit?
===========================================
If the overall working set of all apache children becomes larger than the
available RAM the system then the operating system has to fetch from the disk
code and/or data for each request and by doing so it has to evict pages that
will be needed by the next request shortly after.
Apache::SizeLimit (ASL hereafter) tries to avoid this situation.
Note, there is no problem with large process sizes and heavy swap space usage
if the data remains there and is normally not used.
ASL can monitor a few values per apache child, the total size of the process,
the RSS portion, the portion that is reported as shared and the private part.
As for the "unshared" value above it can be defined as:
(all rvalues are reported by /proc/$PID/smaps)
1) unshared = size - shared
2) unshared = rss - shared
3) unshared = private
this is just the same as 2) because rss = shared + private
What are the implications?
==========================
1) since size = shared + private + notpresent unshared becomes
unshared = private + notpresent
Of notpresent we know nothing for sure. It is a mix of shared and private
pages.
Now, if an administrator turns off swapping (swapoff /dev/...) the part of
notpresent that has been there becomes present. So, we can expect shared to
grow considerably. unshared will shrink by the same amount.
As for absolute values, unshared in this case is quite large a number because
it is on top of notpresent.
2) here unshared lacks the notpresent part. So the actual number is much less
than it would be in case 1.
But if an administrator turns off swapping now a part of notpresent will be
added to unshared. unshared may suddenly jump over the limit in all apache
children.
Well, an administrator doing that on a busy web server should be converted
into a httpd ...
So, I am quite undecided what to do.
Please comment!
See also
http://foertsch.name/ModPerl-Tricks/Measuring-memory-consumption/index.shtml
What else could be done to hit ASL's goal?
==========================================
There are a few other status fields that can be possibly used:
- "Swap" in /proc/$PID/smaps
don't know for sure what that means but sounds good. Need to inspect the
kernel code
- "Referenced" in /proc/$PID/smaps
can be used to find out how many RAM a process has accessed since the last
reset of the counter. We could reset it in PerlPostReadRequestHandler and
read in a $r->pool cleanup.
- "Pss" in /proc/$PID/smaps
segment size divided by the refcount
- "VmSwap" in /proc/$PID/status
for example: terminate if the process starts to use swap space
Certainly more.
Torsten Förtsch
--
Need professional modperl support? Hire me! (http://foertsch.name)
Like fantasy? http://kabatinte.net