James and Mike, Thanks for the hints. The process is not a zombie:
from ps -ef: # ps -ef |grep ifind |grep -v grep root 9973 1 0 Mar 21 ? 0:34 /opt/galaxy/iDataAgent/ifind -j 28995 -a 2:52 -t 2 -d backupmedia_server root 9141 1 0 00:00:27 ? 0:04 /opt/galaxy/iDataAgent/ifind -j 29234 -a 2:52 -t 1 -d backupmedia_server root 22521 1 0 12:50:00 ? 0:06 /opt/galaxy/iDataAgent/ifind -j 29234 -a 2:52 -t 1 -d backupmedia_server # truss -fp 9973 truss: no such process: 9973 These processes are part of our backup software (Commvault) that scans filesystems to populate the servers indexes. I am also pretty sure it's not dumping core either as at least one has been arround for awhile. >From the Thread stack trace: > ::ps !grep ifind R 22521 22484 22484 22484 0 0x4a004000 000003002248b160 ifind R 9973 1 12809 12809 0 0x4a304902 0000030000c65b40 ifind R 9141 1 12809 12809 0 0x4a304902 0000030016cd18d0 ifind > 0000030000c65b40::print proc_t p_tlist | ::findstack -v stack pointer for thread 300117d32a0: 2a1039ec841 [ 000002a1039ec841 cv_wait+0x38() ] 000002a1039ec8f1 top_begin_async+0x90(600274d3180, 3, 100, 0, 1, 60023ac2620) 000002a1039ec9a1 ufs_syncip+0x298(60045392cf0, 400, 0, 15, 1000, 0) 000002a1039eca51 ufs_idle_free+0x68(60045392cf0, 300117d32a4, 0, 60045390140, 300117d32a4, 0) 000002a1039ecb01 ufs_idle_some+0x180(2, 60045392cf0, c0, 0, 18db010, 18d90f8) 000002a1039ecbb1 ufs_lookup+0x240(30017142d40, 2a1039ed680, 2a1039ed678, 60029b33d00, fe2f, fd82) 000002a1039ecc91 fop_lookup+0x28(30017142d40, 2a1039ed680, 2a1039ed678, 129c858, 0, 30000d94f00) 000002a1039ecd51 lookuppnvp+0x344(2a1039ed940, 0, 2f, 2a1039ed678, 2a1039ed680, 6002184da40) 000002a1039ecf91 lookuppnat+0x120(30017142d40, 0, 0, 0, 2a1039edad8, 0) 000002a1039ed051 lookupnameat+0x5c(0, 0, 0, 0, 2a1039edad8, 0) 000002a1039ed161 cstatat_getvp+0x198(ffd19400, 456878, 1, 0, 2a1039edad8, 0) 000002a1039ed221 cstatat64_32+0x40(ffffffffffd19553, 456878, 1000, ffbea6e0, 1000, 0) 000002a1039ed2e1 syscall_trap32+0xcc(456878, ffbea6e0, 0, 0, 0, 0) > For what it's worth, the server was just patched with a January patch cluster plus 139483-05 Thanks again, --Brett On 3/22/09, James C. McPherson <James.McPherson at sun.com> wrote: > On Sun, 22 Mar 2009 16:21:15 -0700 > Michael Schuster <Michael.Schuster at Sun.COM> wrote: > > > Brett Monroe wrote: > > > Hey all, > > > > > > I am seeing an issue on one of our Solaris 10 servers and I would like > > > to get more insight into what is going on. I suspect it is a kernel > > > bug and I think mdb is the only way I can look into the kernel to see > > > what's going on (with respect to this issue). My mdb skills are close > > > to non-existent so please bear with me. :) Anyway, here is what I am > > > seeing: > > > > > > The Server is running Solaris 10 Kernel 138888-02. I have some > > > processes that appear in the process table and in /proc but they won't > > > die if killed and can't be trussed and p* commands fail with the error > > > "no such process." > > > > do they appear in a 'ps -ef' listing, perhaps as "defunct"? in that case, > > you have so-called zombies, which are processed that have exited but whose > > exit code still needs to be reaped. > > > Hi Brett, > a mate just pointed out that you might have come across the > situation where if the proc is large enough and it recently > received a signal, it could be in the process of dumping core. > > iirc you'd want to check the p_siginfo part of the proc structure > to make sure of that. > > > cheers, > James > -- > Senior Kernel Software Engineer, Solaris > Sun Microsystems > http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/mdb-discuss/attachments/20090322/7ba66bbf/attachment.html>