On Fri, Oct 29, 2010 at 11:00 AM, Robin Cotgrove <ro...@rjcnet.co.uk> wrote:
> I need some assistance and guidance in writing a DTRACE script or even 
> better, finding an example one which would help me identify what's going on 
> our system. Intermittently, and we think it might be happening after about 60 
> days, on a E2900, 192GB, 24 core, Solaris 10 11.06 system with a fairly new 
> patch cluster (Generic_142900-13) we are running into a problem whereby we 
> suddenly hit a problem which results in processes failing to start and 
> getting the error message 'resource temporarily unavailable' error. This is 
> leading to Oracle crash/startup issues.
>
> I ran a simple du command at the time it was happening at got the following 
> response.
>
> ‘du: No more processes: Resource temporarily unavailable’

Does anything get logged to /var/adm/messages?

>
> Approximately 6500 TCP connections on server at time. 6000 unix processes. 
> The max UNIX processes per user is set to 29995. 60GB free physical memory 
> and no swap being used. Absolutely baffling us at mo.

Swap may not be used, but it is certainly reserved.  Note that Solaris
has multiple definitions of swap.  That disk space you allocated and
called "swap" is one thing.  The overall RAM and swap device backed
address space is another.

Unlike Linux (default config), Solaris does not allow memory to be
overcommitted.  If something does malloc(1024 * 1024 * 1024 * 1024),
the call will fail on Solaris unless you have 1 TB of free "swap"
(memory + swap devices).  On Linux, the malloc would likely succeed.
At such a time as you actually start writing to more pages of memory
than your system has in RAM + swap devices, the allocated memory, the
Linux Out of Memory Killer will kick in and start selecting things to
kill to free up memory.

We can see this with two runs of /opt/DTT/Mem/swapinfo.d on my
OpenSolaris system.  You can get this for Solaris 10 as part of the
DTraceToolkit.

# /opt/DTT/Mem/swapinfo.d
...
Swap _______Total  2496 MB
Swap         Resv   619 MB
Swap        Avail  1877 MB
Swap    (Minfree)   222 MB

# /opt/DTT/Mem/swapinfo.d
...
Swap _______Total  2224 MB
Swap         Resv  2047 MB
Swap        Avail   176 MB
Swap    (Minfree)   222 MB


One thing I just noticed - minfree does not become 176 MB as I would
have expected.  Be careful with that value!

Why was there such a big difference in Avail?  Because I ran this program:

/* Save as foo.c then compile with gcc -o foo foo.c */
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char **argv) {
        if ( malloc(1024 * 1024 * 1700) == NULL ) {
                perror("malloc");
                exit(1);
        }
        sleep(5);
        exit(0);
}

A likely scenario that would cause a database server to temporarily
reserve a lot more swap is when a new oracle process is created.  When
a process forks, memory is reserved for all of the pages of memory
that are anonymous (e.g. not an mmapped file or device), read-write,
and not shared.  This is required to support the copy-on-write
mechanism used by the virtual memory system.  You can use pmap to take
a look at the memory mappings of a process to get an idea of how much
space this takes.

To look at the amount of available swap that matters, refer to the
swap column of vmstat.  For things like this that are transient, you
may have trouble seeing it, even with "vmstat 1".  Note that while you
are looking at vmstat output, you should always ignore the first line
of output - it is a pretty much useless average since boot.  If you
need to get values at a higher resolution, you may want to adapt
swapinfo.d from the DTraceToolkit to use the profile provider to
quantize the available swap value.

>
> Not managed to truss a failing command when it happened yet because it's so 
> intermitttent in it's nature.
>
> We've checked all the usual suspects including max processes per users and 
> cannot find the cause. Need a way to monitor all the internal kernel 
> resources to see what we're hitting. Suggestions please on a postcard. All 
> welcome.

It seems quite likely to me that you will find that the swap that is
available to reserve temporarily dips to a minuscule value.  If this
is the case, adding more swap will help.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Reply via email to