After exchanging emails with John, the truss output he supplied shows we are looping within search_state_machine(). It's not the loop I thought, but a mode complex one with more than two states being transitioned through. Source code line numbers will be taken from http://src.opensolaris.org/source/xref/onnv/onnvgate/usr/src/lib/libsldap/common/ns_reads.c :

Starting at:

1)
/3...@3: -> libsldap:get_next_session(0x8142a90, 0x0, 0x8142b08, 0xfec9fb55)
   /3...@3:   <- libsldap:get_next_session() = 0
The only caller of this function is L2248 (HERE#1), implying cookie->state == NEXT_SESSION:
      2247         case NEXT_SESSION:
      2248             if (get_next_session(cookie) < 0)
      2249                 cookie->new_state = RESTART_SESSION;
      2250             else
2251 cookie->new_state = NEXT_SEARCH; <==== HERE#1
      2252             break;


2)
/3...@3: -> libsldap:setup_next_search(0x8142a90, 0x0, 0x8142b08, 0xfec9fb55)
   /3...@3:   <- libsldap:setup_next_search() = 0
/3...@3: -> libsldap:paging_supported(0x8142a90, 0x0, 0x8142b08, 0xfec9fb55)
   /3...@3:   <- libsldap:paging_supported() = 1
Again, the only caller is at L2275 (HERE#2). We already know state == NEXT_SEARCH:
      2268         case NEXT_SEARCH:
      2269             /* setup referrals search if necessary */
      2270             if (cookie->refpos) {
      2271                 if (setup_referral_search(cookie) < 0) {
      2272                     cookie->new_state = EXIT;
      2273                     break;
      2274                 }
2275 } else if (setup_next_search(cookie) < 0) { <==== HERE#2
      2276                 cookie->new_state = EXIT;
      2277                 break;
      2278             }
      2279             /* only do VLV/PAGE on scopes onelevel/subtree */
2280 if (paging_supported(cookie)) { <==== HERE#3
      2281                 if (cookie->use_paging &&
      2282                     (cookie->scope != LDAP_SCOPE_BASE)) {
      2283                     cookie->index = 1;
      2284                     if (cookie->listType == VLVCTRLFLAG)
      2285                         cookie->new_state = NEXT_VLV;
      2286                     else
      2287                         cookie->new_state = NEXT_PAGE;
      2288                     break;
      2289                 }
      2290             }
      2291             cookie->new_state = ONE_SEARCH;
      2292             break;

paging_supported() was also called, so we know the cookie state cannot be EXIT (at least from L2272 or L2276).

3)
   /3...@3:   -> libldap:ldap_search_ext(0x8149a88, 0x80a5728, 0x2, 0x80f0d68)
   (I don't have the return value from this)
I suspect we set ONE_SEARCH above, then back round the loop, we go via L2313, setting next_state to DO_SEARCH) then round the loop again and to L2317:
      2317         case DO_SEARCH:
      2318             rc = ldap_search_ext(cookie->conn->ld,
      2319                 cookie->basedn,
      2320                 cookie->scope,
      2321                 cookie->filter,
      2322                 cookie->attribute,
      2323                 0,
      2324                 cookie->p_serverctrls,
      2325                 NULL,
      2326                 &cookie->search_timeout, 0,
      2327                 &cookie->msgId);
      2328             if (rc != LDAP_SUCCESS) {
      2329                 if (rc == LDAP_BUSY ||
      2330                     rc == LDAP_UNAVAILABLE ||
      2331                     rc == LDAP_UNWILLING_TO_PERFORM ||
      2332                     rc == LDAP_CONNECT_ERROR ||
      2333                     rc == LDAP_SERVER_DOWN) {
      2334
      2335                     if (cookie->reinit_on_retriable_err) {
      2336                         cookie->err_rc = rc;
      2337                         cookie->new_state = REINIT;
      2338                     } else
      2339                         cookie->new_state =
      2340                             NEXT_SESSION;

Although we don't know the return value from ldap_search_ext, it shows that the loop is more complex, and we could have set new_state to NEXT_SESSION, and as long as the return was not LDAP_CONNECT_ERROR or LDAP_SERVER_DOWN (L2356), the code breaks out of the switch at L2423 with new_state=NEXT_SESSION, and we pick up from #1 again.

Interestingly, bug 6274517 "libsldap:search_state_machine() falls into recursive loop if ldap_search_ext() returns 91" was fixed in snv_27, but hints that the logic within this function needs looking at.

Also, bug 6532913 "wrong error handling in libsldap" hints that there may be a generic problem with error handling in libsldap.

The evaluation of bug 6494750 hints that the write (which fails with EPIPE) may be the write at the end of ldap_search_ext() - although this is just speculation.


My guess is that ldap_search_ext did not return LDAP_SUCCESS, but whatever error it did return is not covered in either 'if' statement which followed.


I've run out of time on this (and it's not an area of code I'm familiar with). Is there anybody else who can progress this?

I've attached the truss that John sent me, although we may need a longer sample to complete the circle of functions.

Thanks,
Brian



Brian Ruthven - Sun UK wrote:

This is all based on a bit of speculation and guesswork, but it looks like there is a possibility of spinning round a loop between
__s_api_conn_mt_get() and match_conn_mt().

There's also a possibility of alternating between cookie->state == NEXT_SESSION and RESTART_SESSION if get_next_session(cookie) returns <0 and i_flags has NS_LDAP_HARD set.

Can you do this:

# truss -t\!all -u:: -o /tmp/nscd.truss -p <PID>

Let it run for a few seconds, then Ctrl-C it. Send me the resulting /tmp/nscd.truss file (gzipped, and privately to avoid spamming everybody on the list with an attachment). This should hopefully tell us how much of the stack is looping.

Regards,
Brian


John Ryan wrote:
r...@bs-ssvr02:~# truss -Tgetpid -p 17841
/3:     getpid()                                        = 17841 [1]
r...@bs-ssvr02:~# pstack 17841
17841:  /usr/sbin/nscd
-----------------  lwp# 1 / thread# 1  --------------------
 feef1667 pause    ()
 08058f13 main     (1, 8047e4c, 8047e54, feffb7b4) + 7b3
 0805861d _start   (1, 8047eec, 0, 8047efb, 8047f0c, 8047f1d) + 7d
-----------------  lwp# 2 / thread# 2  --------------------
 feef1f88 door     (0, 0, 0, 0, 0, 8)
 feed8804 door_unref_func (45b1, fef7f000, fe38efec, feeecd1e) + 44
 feeecd56 _thrp_setup (fe280200) + 7e
 feeecfe0 _lwp_start (fe280200, 0, 0, feeecd1e, 0, 0)
-----------------  lwp# 3 / thread# 3  --------------------
 feef1457 getpid   ()
fecc2571 __s_api_conn_mt_get (0, 0, 0, fe279c94, 813f4b4, 80c0f08) + 141
 fecadd4b getConnection (0, 0, 0, fe279c90, fe279c94, 813f4b4) + 7b
fecae43c __s_api_getConnection (0, 0, 0, fe279c90, fe279c94, 813f4b4) + 34
 fec9efd6 get_next_session (813f470, 0, 813f4e8, fec9fb55) + 8e
 fec9fec8 search_state_machine (813f470, 1, 0, feca0c9b) + 384
 feca0ef4 ldap_list (0, fecfea24, fe27a778, fecec1dc, fecfe530, 0) + 290
feca10a1 __ns_ldap_list (fecfea24, fe27a778, fecec1dc, fecfe530, 0, 0) + a5 feceb732 _nss_ldap_lookup (80f2090, fe27aac8, fecfea24, fe27a778, 0, fecec1dc) + 4e
 fece9571 getbyname (80f2090, fe27aac8, 0, 806a16a) + c5
 0806a101 nss_search (0, 80693b8, 4, fe27aac8) + 6b1
 0806ac7c nss_psearch (fe27acb8, 4000, fe27ab98, 0) + f0
 0805ce6f lookup_int (fe27ecb8, 0, fe27ecc0, 0) + 763
 0805d7c8 nsc_lookup (fe27ecb8, 0, 10, d0) + 18
 0806f6b1 lookup   (fe27ed48, b8, 0, 1) + 13d
 0806fc4d switcher (deadbeed, fe27ed48, b8, 0, 0, 806fa40) + 20d
 feef1ff2 __door_return () + 52
-----------------  lwp# 4 / thread# 4  --------------------
 feef1757 read     (5, fe17f674, 94c)
 08070a88 rts_mon  (0, fef7f000, fe17ffec, feeecd1e) + 5c
 feeecd56 _thrp_setup (fe281200) + 7e
 feeecfe0 _lwp_start (fe281200, 0, 0, feeecd1e, 0, 0)
-----------------  lwp# 5 / thread# 5  --------------------
 feeed01b lwp_park (0, 0, 0)
 feee677d cond_wait_queue (fe07cae8, 80a7b78, 0, feee6c46) + 60
 feee6cbe __cond_wait (fe07cae8, 80a7b78, fe07cad0, feee6d03) + 86
 feee6d11 cond_wait (fe07cae8, 80a7b78, 0, 813e308) + 24
 0805e9b5 nscd_wait (80a8ac8) + 81
 0805c9f9 lookup_int (fe080cb8, 0, fe080cc0, 0) + 2ed
 0805d7c8 nsc_lookup (fe080cb8, 0, 10, d0) + 18
 0806f6b1 lookup   (fe080d48, b8, 0, 1) + 13d
 0806fc4d switcher (deadbeed, fe080d48, b8, 0, 0, 806fa40) + 20d
 feef1ff2 __door_return () + 52
-----------------  lwp# 6 / thread# 6  --------------------
 feef0f47 nanosleep (fdf81f74, fdf81f7c)
 feedce69 sleep    (190, 80a7a88, 14, f) + 31
 0805d93a revalidate (80a8ac8, fef7f000, fdf81fec, feeecd1e) + 8e
 feeecd56 _thrp_setup (fe282a00) + 7e
 feeecfe0 _lwp_start (fe282a00, 0, 0, feeecd1e, 0, 0)
-----------------  lwp# 7 / thread# 7  --------------------
 feef0f47 nanosleep (fde82f74, fde82f7c)
 feedce69 sleep    (259, 100, 258, feeec017) + 31
 0805e435 reaper   (80a8ac8, fef7f000, fde82fec, feeecd1e) + 1c9
 feeecd56 _thrp_setup (fe283200) + 7e
 feeecfe0 _lwp_start (fe283200, 0, 0, feeecd1e, 0, 0)
-----------------  lwp# 8 / thread# 8  --------------------
 feef0f47 nanosleep (fdd83f74, fdd83f7c)
 feedce69 sleep    (258, fee11800, 200, feeec017) + 31
 0805d905 revalidate (80d9688, fef7f000, fdd83fec, feeecd1e) + 59
 feeecd56 _thrp_setup (fe283a00) + 7e
 feeecfe0 _lwp_start (fe283a00, 0, 0, feeecd1e, 0, 0)
-----------------  lwp# 9 / thread# 9  --------------------
 feef0f47 nanosleep (fdc84f74, fdc84f7c)
 feedce69 sleep    (258, 8088b1c, 80888d4, 258) + 31
 0805e424 reaper   (80d9688, fef7f000, fdc84fec, feeecd1e) + 1b8
 feeecd56 _thrp_setup (fe284200) + 7e
 feeecfe0 _lwp_start (fe284200, 0, 0, feeecd1e, 0, 0)
-----------------  lwp# 10 / thread# 10  --------------------
 feeed01b lwp_park (0, 0, 0)
 feee677d cond_wait_queue (fdb81ae8, 80a7b78, 0, feee6c46) + 60
 feee6cbe __cond_wait (fdb81ae8, 80a7b78, fdb81ad0, feee6d03) + 86
 feee6d11 cond_wait (fdb81ae8, 80a7b78, 0, 813e308) + 24
 0805e9b5 nscd_wait (80a8ac8) + 81
 0805c9f9 lookup_int (fdb85cb8, 0, fdb85cc0, 0) + 2ed
 0805d7c8 nsc_lookup (fdb85cb8, 0, 10, d0) + 18
 0806f6b1 lookup   (fdb85d48, b8, 0, 1) + 13d
 0806fc4d switcher (deadbeed, fdb85d48, b8, 0, 0, 806fa40) + 20d
 feef1ff2 __door_return () + 52
-----------------  lwp# 11 / thread# 11  --------------------
 feef0f47 nanosleep (fda86f74, fda86f7c)
 feedce69 sleep    (258, fee11800, 200, feeec017) + 31
 0805d905 revalidate (80a8648, fef7f000, fda86fec, feeecd1e) + 59
 feeecd56 _thrp_setup (fe285200) + 7e
 feeecfe0 _lwp_start (fe285200, 0, 0, feeecd1e, 0, 0)
-----------------  lwp# 12 / thread# 12  --------------------
 feef0f47 nanosleep (fd987f74, fd987f7c)
 feedce69 sleep    (259, 100, 258, feeec017) + 31
 0805e435 reaper   (80a8648, fef7f000, fd987fec, feeecd1e) + 1c9
 feeecd56 _thrp_setup (fe285a00) + 7e
 feeecfe0 _lwp_start (fe285a00, 0, 0, feeecd1e, 0, 0)
-----------------  lwp# 13 / thread# 13  --------------------
 feef0f47 nanosleep (fd888f74, fd888f7c)
 feedce69 sleep    (960, 80a7908, 14, 5a) + 31
 0805d93a revalidate (80a8948, fef7f000, fd888fec, feeecd1e) + 8e
 feeecd56 _thrp_setup (fe286200) + 7e
 feeecfe0 _lwp_start (fe286200, 0, 0, feeecd1e, 0, 0)
-----------------  lwp# 14 / thread# 14  --------------------
 feef0f47 nanosleep (fd789f74, fd789f7c)
 feedce69 sleep    (e11, 100, e10, feeec017) + 31
 0805e435 reaper   (80a8948, fef7f000, fd789fec, feeecd1e) + 1c9
 feeecd56 _thrp_setup (fe286a00) + 7e
 feeecfe0 _lwp_start (fe286a00, 0, 0, feeecd1e, 0, 0)
-----------------  lwp# 15 / thread# 15  --------------------
 feef0f47 nanosleep (fd68af74, fd68af7c)
 feedce69 sleep    (960, 80a7248, 14, 5a) + 31
 0805d93a revalidate (80a81c8, fef7f000, fd68afec, feeecd1e) + 8e
 feeecd56 _thrp_setup (fe287200) + 7e
 feeecfe0 _lwp_start (fe287200, 0, 0, feeecd1e, 0, 0)
-----------------  lwp# 16 / thread# 16  --------------------
 feef0f47 nanosleep (fd58bf74, fd58bf7c)
 feedce69 sleep    (e11, 100, e10, feeec017) + 31
 0805e435 reaper   (80a81c8, fef7f000, fd58bfec, feeecd1e) + 1c9
 feeecd56 _thrp_setup (fe287a00) + 7e
 feeecfe0 _lwp_start (fe287a00, 0, 0, feeecd1e, 0, 0)
-----------------  lwp# 17 / thread# 17  --------------------
 feef0f47 nanosleep (fd48cf74, fd48cf7c)
 feedce69 sleep    (960, 80da608, 14, 5a) + 31
 0805d93a revalidate (80d9208, fef7f000, fd48cfec, feeecd1e) + 8e
 feeecd56 _thrp_setup (fe288200) + 7e
 feeecfe0 _lwp_start (fe288200, 0, 0, feeecd1e, 0, 0)
-----------------  lwp# 18 / thread# 18  --------------------
 feef0f47 nanosleep (fd38df74, fd38df7c)
 feedce69 sleep    (e11, 100, e10, feeec017) + 31
 0805e435 reaper   (80d9208, fef7f000, fd38dfec, feeecd1e) + 1c9
 feeecd56 _thrp_setup (fe288a00) + 7e
 feeecfe0 _lwp_start (fe288a00, 0, 0, feeecd1e, 0, 0)
-----------------  lwp# 21 / thread# 21  --------------------
 feeed01b lwp_park (0, fd28ef3c, 0)
 feee677d cond_wait_queue (fe8027e0, fe802800, fd28ef3c, feee6966) + 60
feee6b43 cond_wait_common (fe8027e0, fe802800, fd28ef3c, feee6d86) + 1eb
 feee6e3c __cond_timedwait (fe8027e0, fe802800, fd28efac, feee6e70) + c4
 feee6e81 cond_timedwait (fe8027e0, fe802800) + 27
 fe7d8165 umem_update_thread (0, fef7f000, fd28efec, feeecd1e) + 191
 feeecd56 _thrp_setup (fe289200) + 7e
 feeecfe0 _lwp_start (fe289200, 0, 0, feeecd1e, 0, 0)
-----------------  lwp# 22 / thread# 22  --------------------
 feef1e0a door     (4, fd15af04, 0, 0, 0, 3)
 feedd2a0 door_call (4, fd15af04, 80d43f8, fecb6956) + c8
fecb6997 __ns_ldap_trydoorcall_send (fd15af88, fd15af90, fd15af8c, fecc3be7) + 4f fecc3c3b get_server_change (80d43f8, fef7f000, fd15efec, feeecd1e) + 233
 feeecd56 _thrp_setup (fe289a00) + 7e
 feeecfe0 _lwp_start (fe289a00, 0, 0, feeecd1e, 0, 0)
-----------------  lwp# 137 / thread# 137  --------------------
 feef1fc1 __door_return () + 21
-----------------  lwp# 136 / thread# 136  --------------------
 feeed01b lwp_park (0, 0, 0)
 feee677d cond_wait_queue (fcf5cae8, 80a7b78, 0, feee6c46) + 60
 feee6cbe __cond_wait (fcf5cae8, 80a7b78, fcf5cad0, feee6d03) + 86
 feee6d11 cond_wait (fcf5cae8, 80a7b78, 0, 813e308) + 24
 0805e9b5 nscd_wait (80a8ac8) + 81
 0805c9f9 lookup_int (fcf60cb8, 0, fcf60cc0, 0) + 2ed
 0805d7c8 nsc_lookup (fcf60cb8, 0, 10, d0) + 18
 0806f6b1 lookup   (fcf60d48, b8, 0, 1) + 13d
 0806fc4d switcher (deadbeed, fcf60d48, b8, 0, 0, 806fa40) + 20d
 feef1ff2 __door_return () + 52
-----------------  lwp# 238 / thread# 238  --------------------
 feef1fc1 __door_return () + 21
-----------------  lwp# 237 / thread# 237  --------------------
 feef1fc1 __door_return () + 21
r...@bs-ssvr02:~#
r...@bs-ssvr02:~# pfiles 17841
17841:  /usr/sbin/nscd
  Current rlimit: 256 file descriptors
   0: S_IFCHR mode:0666 dev:313,0 ino:6815752 uid:0 gid:3 rdev:13,2
      O_RDONLY|O_LARGEFILE
      /devices/pseudo/m...@0:null
   1: S_IFCHR mode:0600 dev:313,0 ino:50855942 uid:0 gid:3 rdev:97,1
      O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE
      /devices/pseudo/sys...@0:msglog
2: S_IFREG mode:0644 dev:182,65538 ino:410655 uid:0 gid:0 size:12353286
      O_WRONLY|O_APPEND
      /var/log/nscd.log
   3: S_IFDOOR mode:0777 dev:315,0 ino:0 uid:0 gid:0 size:0
      O_RDWR FD_CLOEXEC  door to nscd[17841]
   4: S_IFDOOR mode:0444 dev:323,0 ino:50 uid:0 gid:0 size:0
      O_RDONLY FD_CLOEXEC  door to ldap_cachemgr[217]
      /var/run/ldap_cache_door
   5: S_IFSOCK mode:0666 dev:322,0 ino:13899 uid:0 gid:0 size:0
      O_RDWR
        SOCK_RAW
        SO_SNDBUF(8192),SO_RCVBUF(8192)
        peername: AF_ROUTE
   6: S_IFCHR mode:0000 dev:313,0 ino:39964 uid:0 gid:0 rdev:41,117
      O_RDWR FD_CLOEXEC
        sockname: AF_INET 0.0.0.0  port: 64901
      /devices/pseudo/u...@0:udp
   7: S_IFSOCK mode:0666 dev:322,0 ino:13898 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK
        SOCK_STREAM
        SO_SNDBUF(49152),SO_RCVBUF(49152)
        sockname: AF_INET6 ::  port: 0
r...@bs-ssvr02:~#

Cheers
John


--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

Attachment: nscd.truss.gz
Description: application/gzip

_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to