After exchanging emails with John, the truss output he supplied shows we are looping within search_state_machine(). It's not the loop I thought, but a mode complex one with more than two states being transitioned through. Source code line numbers will be taken from http://src.opensolaris.org/source/xref/onnv/onnvgate/usr/src/lib/libsldap/common/ns_reads.c :
Starting at: 1)/3...@3: -> libsldap:get_next_session(0x8142a90, 0x0, 0x8142b08, 0xfec9fb55)
/3...@3: <- libsldap:get_next_session() = 0The only caller of this function is L2248 (HERE#1), implying cookie->state == NEXT_SESSION:
2247 case NEXT_SESSION:
2248 if (get_next_session(cookie) < 0)
2249 cookie->new_state = RESTART_SESSION;
2250 else
2251 cookie->new_state = NEXT_SEARCH; <====
HERE#1
2252 break;
2)
/3...@3: -> libsldap:setup_next_search(0x8142a90, 0x0, 0x8142b08,
0xfec9fb55)
/3...@3: <- libsldap:setup_next_search() = 0/3...@3: -> libsldap:paging_supported(0x8142a90, 0x0, 0x8142b08, 0xfec9fb55)
/3...@3: <- libsldap:paging_supported() = 1Again, the only caller is at L2275 (HERE#2). We already know state == NEXT_SEARCH:
2268 case NEXT_SEARCH:
2269 /* setup referrals search if necessary */
2270 if (cookie->refpos) {
2271 if (setup_referral_search(cookie) < 0) {
2272 cookie->new_state = EXIT;
2273 break;
2274 }
2275 } else if (setup_next_search(cookie) < 0) {
<==== HERE#2
2276 cookie->new_state = EXIT;
2277 break;
2278 }
2279 /* only do VLV/PAGE on scopes onelevel/subtree */
2280 if (paging_supported(cookie)) {
<==== HERE#3
2281 if (cookie->use_paging &&
2282 (cookie->scope != LDAP_SCOPE_BASE)) {
2283 cookie->index = 1;
2284 if (cookie->listType == VLVCTRLFLAG)
2285 cookie->new_state = NEXT_VLV;
2286 else
2287 cookie->new_state = NEXT_PAGE;
2288 break;
2289 }
2290 }
2291 cookie->new_state = ONE_SEARCH;
2292 break;
paging_supported() was also called, so we know the cookie state
cannot be EXIT (at least from L2272 or L2276).
3) /3...@3: -> libldap:ldap_search_ext(0x8149a88, 0x80a5728, 0x2, 0x80f0d68) (I don't have the return value from this)I suspect we set ONE_SEARCH above, then back round the loop, we go via L2313, setting next_state to DO_SEARCH) then round the loop again and to L2317:
2317 case DO_SEARCH:
2318 rc = ldap_search_ext(cookie->conn->ld,
2319 cookie->basedn,
2320 cookie->scope,
2321 cookie->filter,
2322 cookie->attribute,
2323 0,
2324 cookie->p_serverctrls,
2325 NULL,
2326 &cookie->search_timeout, 0,
2327 &cookie->msgId);
2328 if (rc != LDAP_SUCCESS) {
2329 if (rc == LDAP_BUSY ||
2330 rc == LDAP_UNAVAILABLE ||
2331 rc == LDAP_UNWILLING_TO_PERFORM ||
2332 rc == LDAP_CONNECT_ERROR ||
2333 rc == LDAP_SERVER_DOWN) {
2334
2335 if (cookie->reinit_on_retriable_err) {
2336 cookie->err_rc = rc;
2337 cookie->new_state = REINIT;
2338 } else
2339 cookie->new_state =
2340 NEXT_SESSION;
Although we don't know the return value from ldap_search_ext, it shows
that the loop is more complex, and we could have set new_state to
NEXT_SESSION, and as long as the return was not LDAP_CONNECT_ERROR or
LDAP_SERVER_DOWN (L2356), the code breaks out of the switch at L2423
with new_state=NEXT_SESSION, and we pick up from #1 again.
Interestingly, bug 6274517 "libsldap:search_state_machine() falls into recursive loop if ldap_search_ext() returns 91" was fixed in snv_27, but hints that the logic within this function needs looking at.
Also, bug 6532913 "wrong error handling in libsldap" hints that there may be a generic problem with error handling in libsldap.
The evaluation of bug 6494750 hints that the write (which fails with EPIPE) may be the write at the end of ldap_search_ext() - although this is just speculation.
My guess is that ldap_search_ext did not return LDAP_SUCCESS, but whatever error it did return is not covered in either 'if' statement which followed.
I've run out of time on this (and it's not an area of code I'm familiar with). Is there anybody else who can progress this?
I've attached the truss that John sent me, although we may need a longer sample to complete the circle of functions.
Thanks, Brian Brian Ruthven - Sun UK wrote:
This is all based on a bit of speculation and guesswork, but it looks like there is a possibility of spinning round a loop between__s_api_conn_mt_get() and match_conn_mt().There's also a possibility of alternating between cookie->state == NEXT_SESSION and RESTART_SESSION if get_next_session(cookie) returns <0 and i_flags has NS_LDAP_HARD set.Can you do this: # truss -t\!all -u:: -o /tmp/nscd.truss -p <PID>Let it run for a few seconds, then Ctrl-C it. Send me the resulting /tmp/nscd.truss file (gzipped, and privately to avoid spamming everybody on the list with an attachment). This should hopefully tell us how much of the stack is looping.Regards, Brian John Ryan wrote:r...@bs-ssvr02:~# truss -Tgetpid -p 17841 /3: getpid() = 17841 [1] r...@bs-ssvr02:~# pstack 17841 17841: /usr/sbin/nscd ----------------- lwp# 1 / thread# 1 -------------------- feef1667 pause () 08058f13 main (1, 8047e4c, 8047e54, feffb7b4) + 7b3 0805861d _start (1, 8047eec, 0, 8047efb, 8047f0c, 8047f1d) + 7d ----------------- lwp# 2 / thread# 2 -------------------- feef1f88 door (0, 0, 0, 0, 0, 8) feed8804 door_unref_func (45b1, fef7f000, fe38efec, feeecd1e) + 44 feeecd56 _thrp_setup (fe280200) + 7e feeecfe0 _lwp_start (fe280200, 0, 0, feeecd1e, 0, 0) ----------------- lwp# 3 / thread# 3 -------------------- feef1457 getpid ()fecc2571 __s_api_conn_mt_get (0, 0, 0, fe279c94, 813f4b4, 80c0f08) + 141fecadd4b getConnection (0, 0, 0, fe279c90, fe279c94, 813f4b4) + 7bfecae43c __s_api_getConnection (0, 0, 0, fe279c90, fe279c94, 813f4b4) + 34fec9efd6 get_next_session (813f470, 0, 813f4e8, fec9fb55) + 8e fec9fec8 search_state_machine (813f470, 1, 0, feca0c9b) + 384 feca0ef4 ldap_list (0, fecfea24, fe27a778, fecec1dc, fecfe530, 0) + 290feca10a1 __ns_ldap_list (fecfea24, fe27a778, fecec1dc, fecfe530, 0, 0) + a5 feceb732 _nss_ldap_lookup (80f2090, fe27aac8, fecfea24, fe27a778, 0, fecec1dc) + 4efece9571 getbyname (80f2090, fe27aac8, 0, 806a16a) + c5 0806a101 nss_search (0, 80693b8, 4, fe27aac8) + 6b1 0806ac7c nss_psearch (fe27acb8, 4000, fe27ab98, 0) + f0 0805ce6f lookup_int (fe27ecb8, 0, fe27ecc0, 0) + 763 0805d7c8 nsc_lookup (fe27ecb8, 0, 10, d0) + 18 0806f6b1 lookup (fe27ed48, b8, 0, 1) + 13d 0806fc4d switcher (deadbeed, fe27ed48, b8, 0, 0, 806fa40) + 20d feef1ff2 __door_return () + 52 ----------------- lwp# 4 / thread# 4 -------------------- feef1757 read (5, fe17f674, 94c) 08070a88 rts_mon (0, fef7f000, fe17ffec, feeecd1e) + 5c feeecd56 _thrp_setup (fe281200) + 7e feeecfe0 _lwp_start (fe281200, 0, 0, feeecd1e, 0, 0) ----------------- lwp# 5 / thread# 5 -------------------- feeed01b lwp_park (0, 0, 0) feee677d cond_wait_queue (fe07cae8, 80a7b78, 0, feee6c46) + 60 feee6cbe __cond_wait (fe07cae8, 80a7b78, fe07cad0, feee6d03) + 86 feee6d11 cond_wait (fe07cae8, 80a7b78, 0, 813e308) + 24 0805e9b5 nscd_wait (80a8ac8) + 81 0805c9f9 lookup_int (fe080cb8, 0, fe080cc0, 0) + 2ed 0805d7c8 nsc_lookup (fe080cb8, 0, 10, d0) + 18 0806f6b1 lookup (fe080d48, b8, 0, 1) + 13d 0806fc4d switcher (deadbeed, fe080d48, b8, 0, 0, 806fa40) + 20d feef1ff2 __door_return () + 52 ----------------- lwp# 6 / thread# 6 -------------------- feef0f47 nanosleep (fdf81f74, fdf81f7c) feedce69 sleep (190, 80a7a88, 14, f) + 31 0805d93a revalidate (80a8ac8, fef7f000, fdf81fec, feeecd1e) + 8e feeecd56 _thrp_setup (fe282a00) + 7e feeecfe0 _lwp_start (fe282a00, 0, 0, feeecd1e, 0, 0) ----------------- lwp# 7 / thread# 7 -------------------- feef0f47 nanosleep (fde82f74, fde82f7c) feedce69 sleep (259, 100, 258, feeec017) + 31 0805e435 reaper (80a8ac8, fef7f000, fde82fec, feeecd1e) + 1c9 feeecd56 _thrp_setup (fe283200) + 7e feeecfe0 _lwp_start (fe283200, 0, 0, feeecd1e, 0, 0) ----------------- lwp# 8 / thread# 8 -------------------- feef0f47 nanosleep (fdd83f74, fdd83f7c) feedce69 sleep (258, fee11800, 200, feeec017) + 31 0805d905 revalidate (80d9688, fef7f000, fdd83fec, feeecd1e) + 59 feeecd56 _thrp_setup (fe283a00) + 7e feeecfe0 _lwp_start (fe283a00, 0, 0, feeecd1e, 0, 0) ----------------- lwp# 9 / thread# 9 -------------------- feef0f47 nanosleep (fdc84f74, fdc84f7c) feedce69 sleep (258, 8088b1c, 80888d4, 258) + 31 0805e424 reaper (80d9688, fef7f000, fdc84fec, feeecd1e) + 1b8 feeecd56 _thrp_setup (fe284200) + 7e feeecfe0 _lwp_start (fe284200, 0, 0, feeecd1e, 0, 0) ----------------- lwp# 10 / thread# 10 -------------------- feeed01b lwp_park (0, 0, 0) feee677d cond_wait_queue (fdb81ae8, 80a7b78, 0, feee6c46) + 60 feee6cbe __cond_wait (fdb81ae8, 80a7b78, fdb81ad0, feee6d03) + 86 feee6d11 cond_wait (fdb81ae8, 80a7b78, 0, 813e308) + 24 0805e9b5 nscd_wait (80a8ac8) + 81 0805c9f9 lookup_int (fdb85cb8, 0, fdb85cc0, 0) + 2ed 0805d7c8 nsc_lookup (fdb85cb8, 0, 10, d0) + 18 0806f6b1 lookup (fdb85d48, b8, 0, 1) + 13d 0806fc4d switcher (deadbeed, fdb85d48, b8, 0, 0, 806fa40) + 20d feef1ff2 __door_return () + 52 ----------------- lwp# 11 / thread# 11 -------------------- feef0f47 nanosleep (fda86f74, fda86f7c) feedce69 sleep (258, fee11800, 200, feeec017) + 31 0805d905 revalidate (80a8648, fef7f000, fda86fec, feeecd1e) + 59 feeecd56 _thrp_setup (fe285200) + 7e feeecfe0 _lwp_start (fe285200, 0, 0, feeecd1e, 0, 0) ----------------- lwp# 12 / thread# 12 -------------------- feef0f47 nanosleep (fd987f74, fd987f7c) feedce69 sleep (259, 100, 258, feeec017) + 31 0805e435 reaper (80a8648, fef7f000, fd987fec, feeecd1e) + 1c9 feeecd56 _thrp_setup (fe285a00) + 7e feeecfe0 _lwp_start (fe285a00, 0, 0, feeecd1e, 0, 0) ----------------- lwp# 13 / thread# 13 -------------------- feef0f47 nanosleep (fd888f74, fd888f7c) feedce69 sleep (960, 80a7908, 14, 5a) + 31 0805d93a revalidate (80a8948, fef7f000, fd888fec, feeecd1e) + 8e feeecd56 _thrp_setup (fe286200) + 7e feeecfe0 _lwp_start (fe286200, 0, 0, feeecd1e, 0, 0) ----------------- lwp# 14 / thread# 14 -------------------- feef0f47 nanosleep (fd789f74, fd789f7c) feedce69 sleep (e11, 100, e10, feeec017) + 31 0805e435 reaper (80a8948, fef7f000, fd789fec, feeecd1e) + 1c9 feeecd56 _thrp_setup (fe286a00) + 7e feeecfe0 _lwp_start (fe286a00, 0, 0, feeecd1e, 0, 0) ----------------- lwp# 15 / thread# 15 -------------------- feef0f47 nanosleep (fd68af74, fd68af7c) feedce69 sleep (960, 80a7248, 14, 5a) + 31 0805d93a revalidate (80a81c8, fef7f000, fd68afec, feeecd1e) + 8e feeecd56 _thrp_setup (fe287200) + 7e feeecfe0 _lwp_start (fe287200, 0, 0, feeecd1e, 0, 0) ----------------- lwp# 16 / thread# 16 -------------------- feef0f47 nanosleep (fd58bf74, fd58bf7c) feedce69 sleep (e11, 100, e10, feeec017) + 31 0805e435 reaper (80a81c8, fef7f000, fd58bfec, feeecd1e) + 1c9 feeecd56 _thrp_setup (fe287a00) + 7e feeecfe0 _lwp_start (fe287a00, 0, 0, feeecd1e, 0, 0) ----------------- lwp# 17 / thread# 17 -------------------- feef0f47 nanosleep (fd48cf74, fd48cf7c) feedce69 sleep (960, 80da608, 14, 5a) + 31 0805d93a revalidate (80d9208, fef7f000, fd48cfec, feeecd1e) + 8e feeecd56 _thrp_setup (fe288200) + 7e feeecfe0 _lwp_start (fe288200, 0, 0, feeecd1e, 0, 0) ----------------- lwp# 18 / thread# 18 -------------------- feef0f47 nanosleep (fd38df74, fd38df7c) feedce69 sleep (e11, 100, e10, feeec017) + 31 0805e435 reaper (80d9208, fef7f000, fd38dfec, feeecd1e) + 1c9 feeecd56 _thrp_setup (fe288a00) + 7e feeecfe0 _lwp_start (fe288a00, 0, 0, feeecd1e, 0, 0) ----------------- lwp# 21 / thread# 21 -------------------- feeed01b lwp_park (0, fd28ef3c, 0) feee677d cond_wait_queue (fe8027e0, fe802800, fd28ef3c, feee6966) + 60feee6b43 cond_wait_common (fe8027e0, fe802800, fd28ef3c, feee6d86) + 1ebfeee6e3c __cond_timedwait (fe8027e0, fe802800, fd28efac, feee6e70) + c4 feee6e81 cond_timedwait (fe8027e0, fe802800) + 27 fe7d8165 umem_update_thread (0, fef7f000, fd28efec, feeecd1e) + 191 feeecd56 _thrp_setup (fe289200) + 7e feeecfe0 _lwp_start (fe289200, 0, 0, feeecd1e, 0, 0) ----------------- lwp# 22 / thread# 22 -------------------- feef1e0a door (4, fd15af04, 0, 0, 0, 3) feedd2a0 door_call (4, fd15af04, 80d43f8, fecb6956) + c8fecb6997 __ns_ldap_trydoorcall_send (fd15af88, fd15af90, fd15af8c, fecc3be7) + 4f fecc3c3b get_server_change (80d43f8, fef7f000, fd15efec, feeecd1e) + 233feeecd56 _thrp_setup (fe289a00) + 7e feeecfe0 _lwp_start (fe289a00, 0, 0, feeecd1e, 0, 0) ----------------- lwp# 137 / thread# 137 -------------------- feef1fc1 __door_return () + 21 ----------------- lwp# 136 / thread# 136 -------------------- feeed01b lwp_park (0, 0, 0) feee677d cond_wait_queue (fcf5cae8, 80a7b78, 0, feee6c46) + 60 feee6cbe __cond_wait (fcf5cae8, 80a7b78, fcf5cad0, feee6d03) + 86 feee6d11 cond_wait (fcf5cae8, 80a7b78, 0, 813e308) + 24 0805e9b5 nscd_wait (80a8ac8) + 81 0805c9f9 lookup_int (fcf60cb8, 0, fcf60cc0, 0) + 2ed 0805d7c8 nsc_lookup (fcf60cb8, 0, 10, d0) + 18 0806f6b1 lookup (fcf60d48, b8, 0, 1) + 13d 0806fc4d switcher (deadbeed, fcf60d48, b8, 0, 0, 806fa40) + 20d feef1ff2 __door_return () + 52 ----------------- lwp# 238 / thread# 238 -------------------- feef1fc1 __door_return () + 21 ----------------- lwp# 237 / thread# 237 -------------------- feef1fc1 __door_return () + 21 r...@bs-ssvr02:~# r...@bs-ssvr02:~# pfiles 17841 17841: /usr/sbin/nscd Current rlimit: 256 file descriptors 0: S_IFCHR mode:0666 dev:313,0 ino:6815752 uid:0 gid:3 rdev:13,2 O_RDONLY|O_LARGEFILE /devices/pseudo/m...@0:null 1: S_IFCHR mode:0600 dev:313,0 ino:50855942 uid:0 gid:3 rdev:97,1 O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE /devices/pseudo/sys...@0:msglog2: S_IFREG mode:0644 dev:182,65538 ino:410655 uid:0 gid:0 size:12353286O_WRONLY|O_APPEND /var/log/nscd.log 3: S_IFDOOR mode:0777 dev:315,0 ino:0 uid:0 gid:0 size:0 O_RDWR FD_CLOEXEC door to nscd[17841] 4: S_IFDOOR mode:0444 dev:323,0 ino:50 uid:0 gid:0 size:0 O_RDONLY FD_CLOEXEC door to ldap_cachemgr[217] /var/run/ldap_cache_door 5: S_IFSOCK mode:0666 dev:322,0 ino:13899 uid:0 gid:0 size:0 O_RDWR SOCK_RAW SO_SNDBUF(8192),SO_RCVBUF(8192) peername: AF_ROUTE 6: S_IFCHR mode:0000 dev:313,0 ino:39964 uid:0 gid:0 rdev:41,117 O_RDWR FD_CLOEXEC sockname: AF_INET 0.0.0.0 port: 64901 /devices/pseudo/u...@0:udp 7: S_IFSOCK mode:0666 dev:322,0 ino:13898 uid:0 gid:0 size:0 O_RDWR|O_NONBLOCK SOCK_STREAM SO_SNDBUF(49152),SO_RCVBUF(49152) sockname: AF_INET6 :: port: 0 r...@bs-ssvr02:~# Cheers John
-- Brian Ruthven Solaris Revenue Product Engineering Sun Microsystems UK Sparc House, Guillemont Park, Camberley, GU17 9QG
nscd.truss.gz
Description: application/gzip
_______________________________________________ networking-discuss mailing list [email protected]
