Mike is correct. Pretty much every time I've seen this, it's VM (VM = virtual memory = swap) related.
There's a DTrace script below you can run when you hit this problem that will show us which system call is failing with an EAGAIN error. It is most likely fork(2) (and yes, I know printing the errno in the return action is superfluous given we use it in the predicate - it's me being OCD and sanity checking). A second DTrace script further down should provide a kernel stack trace if it is a fork(2) failure. Or....(disk is cheap) "swap -a" (add swap space) and see if the problem goes away. Thanks /jim #!/usr/sbin/dtrace -s #pragma D option quiet syscall:::entry { self->flag[probefunc] = 1; } syscall:::return /self->flag[probefunc] && errno == 11/ { printf("syscall: %s, arg0: %d, arg1: %d, errno: %d\n\n",probefunc,arg0,arg1,errno); self->flag[probefunc] = 0; } ------------------------------------------------------------------------------------------------------------------------ #!/usr/sbin/dtrace -s #pragma D option quiet syscall::forksys:entry { self->flag = 1; @ks[stack(),ustack()] = count(); } syscall::forksys:return /self->flag && arg0 == -1 && errno != 0/ { printf("fork failed, errno: %d\n",errno); printa(@ks); clear(@ks); exit(0); } On Oct 29, 2010, at 12:00 PM, Robin Cotgrove wrote: > I need some assistance and guidance in writing a DTRACE script or even > better, finding an example one which would help me identify what's going on > our system. Intermittently, and we think it might be happening after about 60 > days, on a E2900, 192GB, 24 core, Solaris 10 11.06 system with a fairly new > patch cluster (Generic_142900-13) we are running into a problem whereby we > suddenly hit a problem which results in processes failing to start and > getting the error message 'resource temporarily unavailable' error. This is > leading to Oracle crash/startup issues. > > I ran a simple du command at the time it was happening at got the following > response. > > ‘du: No more processes: Resource temporarily unavailable’ > > Approximately 6500 TCP connections on server at time. 6000 unix processes. > The max UNIX processes per user is set to 29995. 60GB free physical memory > and no swap being used. Absolutely baffling us at mo. > > Not managed to truss a failing command when it happened yet because it's so > intermitttent in it's nature. > > We've checked all the usual suspects including max processes per users and > cannot find the cause. Need a way to monitor all the internal kernel > resources to see what we're hitting. Suggestions please on a postcard. All > welcome. > > Robin Cotgrove > -- > This message posted from opensolaris.org > _______________________________________________ > dtrace-discuss mailing list > dtrace-discuss@opensolaris.org _______________________________________________ dtrace-discuss mailing list dtrace-discuss@opensolaris.org