On Fri, Oct 29, 2010 at 11:00 AM, Robin Cotgrove <ro...@rjcnet.co.uk> wrote: > I need some assistance and guidance in writing a DTRACE script or even > better, finding an example one which would help me identify what's going on > our system. Intermittently, and we think it might be happening after about 60 > days, on a E2900, 192GB, 24 core, Solaris 10 11.06 system with a fairly new > patch cluster (Generic_142900-13) we are running into a problem whereby we > suddenly hit a problem which results in processes failing to start and > getting the error message 'resource temporarily unavailable' error. This is > leading to Oracle crash/startup issues. > > I ran a simple du command at the time it was happening at got the following > response. > > ‘du: No more processes: Resource temporarily unavailable’
Does anything get logged to /var/adm/messages? > > Approximately 6500 TCP connections on server at time. 6000 unix processes. > The max UNIX processes per user is set to 29995. 60GB free physical memory > and no swap being used. Absolutely baffling us at mo. Swap may not be used, but it is certainly reserved. Note that Solaris has multiple definitions of swap. That disk space you allocated and called "swap" is one thing. The overall RAM and swap device backed address space is another. Unlike Linux (default config), Solaris does not allow memory to be overcommitted. If something does malloc(1024 * 1024 * 1024 * 1024), the call will fail on Solaris unless you have 1 TB of free "swap" (memory + swap devices). On Linux, the malloc would likely succeed. At such a time as you actually start writing to more pages of memory than your system has in RAM + swap devices, the allocated memory, the Linux Out of Memory Killer will kick in and start selecting things to kill to free up memory. We can see this with two runs of /opt/DTT/Mem/swapinfo.d on my OpenSolaris system. You can get this for Solaris 10 as part of the DTraceToolkit. # /opt/DTT/Mem/swapinfo.d ... Swap _______Total 2496 MB Swap Resv 619 MB Swap Avail 1877 MB Swap (Minfree) 222 MB # /opt/DTT/Mem/swapinfo.d ... Swap _______Total 2224 MB Swap Resv 2047 MB Swap Avail 176 MB Swap (Minfree) 222 MB One thing I just noticed - minfree does not become 176 MB as I would have expected. Be careful with that value! Why was there such a big difference in Avail? Because I ran this program: /* Save as foo.c then compile with gcc -o foo foo.c */ #include <unistd.h> #include <stdlib.h> #include <stdio.h> int main(int argc, char **argv) { if ( malloc(1024 * 1024 * 1700) == NULL ) { perror("malloc"); exit(1); } sleep(5); exit(0); } A likely scenario that would cause a database server to temporarily reserve a lot more swap is when a new oracle process is created. When a process forks, memory is reserved for all of the pages of memory that are anonymous (e.g. not an mmapped file or device), read-write, and not shared. This is required to support the copy-on-write mechanism used by the virtual memory system. You can use pmap to take a look at the memory mappings of a process to get an idea of how much space this takes. To look at the amount of available swap that matters, refer to the swap column of vmstat. For things like this that are transient, you may have trouble seeing it, even with "vmstat 1". Note that while you are looking at vmstat output, you should always ignore the first line of output - it is a pretty much useless average since boot. If you need to get values at a higher resolution, you may want to adapt swapinfo.d from the DTraceToolkit to use the profile provider to quantize the available swap value. > > Not managed to truss a failing command when it happened yet because it's so > intermitttent in it's nature. > > We've checked all the usual suspects including max processes per users and > cannot find the cause. Need a way to monitor all the internal kernel > resources to see what we're hitting. Suggestions please on a postcard. All > welcome. It seems quite likely to me that you will find that the swap that is available to reserve temporarily dips to a minuscule value. If this is the case, adding more swap will help. -- Mike Gerdts http://mgerdts.blogspot.com/ _______________________________________________ dtrace-discuss mailing list dtrace-discuss@opensolaris.org