G'Day Mike,

On Wed, 2 Nov 2005, Mike Gerdts wrote:

> Whenever I have tried to take a look at who is using lots of memory on
> a system running non-trivial workloads, I have met with limited
> success largely due to the fact that pages in the RSS of multiple
> processes are counted multiple times.  In an extreme example of a busy
> 15k domain with hundreds of gigabytes of physical RAM and less than 50
> GB swap, "prstat -a" will report that many terabytes of memory are in
> the resident set for the oracle user.  This is impossible.
>
> On Solaris 9 and earlier this wasn't a big deal.  I would just look at
> the vmstat's data to be sure that bad things weren't happening to the
> system.
>
> Now I would like to be able to cap memory per zone.  Because rcapd
> only seems to be able to set limits on all processes in a project, I
> am pretty much forced to put all of the non-system processes (those
> that use all the RAM, I hope) into a single project in each zone.
> However, perusing the rcapd source, it looks as though it too simply
> adds up the RSS without regard for things like shared memory segments
> or mmap'd files.  As such, zones with many programs that make use of
> shared memory segments, large executables or libraries, or other large
> memory mapped files, will be unfairly targetted by rcapd.  There are
> other problems that further limit the usefulness of setting memory
> caps per project in attempt to isolate the impact of piggy zones.
>
> The problems I see with the current model are:
>
> 1) Pages mapped in multiple processes' address spaces are counted
> multiple times.
> 2) If root is not well controlled in a zone (somewhat encouraged by
> marketing, it seems) the zone's root user can do away with the memory
> caps altogether.
> 3) There is no way to indicate that misbehaving programs in one zone
> cannot consume all of the system's swap or cause such heavy paging
> activity that other zones are significantly affected.
> 4) /tmp can be misused to use lots of RAM.  This problem is worse if
> root in the zone is not well controlled (size=X mount option no longer
> reliable).

This is all quite correct, and quite well expressed. :-). AFAIK the Zone
guys do know about it and there are some RFEs about this, eg,

        http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=5103071

There have been several public mentions of "Memory Sets" and "Swap Sets",
which are planned for future updates to Solaris. The following quote from
http://www.sun.com/blueprints/0505/819-2679.pdf suggests how they will be
added,

        "While the processor set is the only type of resource set
        available in the Solaris OS, the resource pool abstraction allows
        other types of resource sets, such as memory sets, to be added in
        later Solaris OS versions."

As pools are controlled by root in the global zone, this should solve all
of your (our) worries. I've left a few deliberate blanks on the resource
summary at http://www.brendangregg.com/zones.html#resource0 for future
Solaris updates.

In the meantime Zones are terribly useful. Certainly, there will be a few
people who will hang out for the Solaris update, especially for highly
secure environments.

cheers,

Brendan

_______________________________________________
perf-discuss mailing list
[email protected]

Reply via email to