Mike is correct. Pretty much every time I've seen this, it's
VM (VM = virtual memory = swap) related.

There's a DTrace script below you can run when you hit this
problem that will show us which system call is failing with an
EAGAIN error. It is most likely fork(2) (and yes, I know printing
the errno in the return action is superfluous given we use it
in the predicate - it's me being OCD and sanity checking).

A second DTrace script further down should provide a kernel
stack trace if it is a fork(2) failure.

Or....(disk is cheap) "swap -a" (add swap space) and see if the
problem goes away.

Thanks
/jim


#!/usr/sbin/dtrace -s

#pragma D option quiet

syscall:::entry
{
        self->flag[probefunc] = 1;
}
syscall:::return
/self->flag[probefunc] && errno == 11/
{
        printf("syscall: %s, arg0: %d, arg1: %d, errno: 
%d\n\n",probefunc,arg0,arg1,errno);
        self->flag[probefunc] = 0;
}


------------------------------------------------------------------------------------------------------------------------

#!/usr/sbin/dtrace -s

#pragma D option quiet

syscall::forksys:entry
{
        self->flag = 1;
        @ks[stack(),ustack()] = count();
}
syscall::forksys:return
/self->flag && arg0 == -1 && errno != 0/
{
        printf("fork failed, errno: %d\n",errno);
        printa(@ks);
        clear(@ks);
        exit(0);
}


On Oct 29, 2010, at 12:00 PM, Robin Cotgrove wrote:

> I need some assistance and guidance in writing a DTRACE script or even 
> better, finding an example one which would help me identify what's going on 
> our system. Intermittently, and we think it might be happening after about 60 
> days, on a E2900, 192GB, 24 core, Solaris 10 11.06 system with a fairly new 
> patch cluster (Generic_142900-13) we are running into a problem whereby we 
> suddenly hit a problem which results in processes failing to start and 
> getting the error message 'resource temporarily unavailable' error. This is 
> leading to Oracle crash/startup issues.
> 
> I ran a simple du command at the time it was happening at got the following 
> response.
> 
> ‘du: No more processes: Resource temporarily unavailable’     
> 
> Approximately 6500 TCP connections on server at time. 6000 unix processes. 
> The max UNIX processes per user is set to 29995. 60GB free physical memory 
> and no swap being used. Absolutely baffling us at mo. 
> 
> Not managed to truss a failing command when it happened yet because it's so 
> intermitttent in it's nature. 
> 
> We've checked all the usual suspects including max processes per users and 
> cannot find the cause. Need a way to monitor all the internal kernel 
> resources to see what we're hitting. Suggestions please on a postcard. All 
> welcome. 
> 
> Robin Cotgrove
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss@opensolaris.org

_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Reply via email to