Hello, I recently upgraded curl from version 7.50 to 7.78 on RHEL6 using openldap and openssl. After the upgrade, a long standing in-house library of ours that uses libcurl to access a credential server and an ldap server started blocking in the sread call in multi_wait.
Context: A multithreaded java webservice calls this in-house library which returns access credentials for each user accessing the service. The webservice handles thousands of users and up to 15 or so calls per second at peak. The curl calls to both the credential server and the ldap server are protected by their own individual mutexes so that only one credential call and only one ldap call can happen at a time. I fixed the block before the sread in multi_wait by using a nonblocking pselect before the sread. This is not a complete solution in that it isn't adequate for all operating systems. It may also not be the "curl" way of doing things, but I confess that I don't know the codebase so this was the quickest option. The blocking situation seems only to occur for the secure ldap calls. It doesn't seem like it's the same problem about openldap listed in the "known bugs" in the release notes for 7.78. Rolling back the curl version to 7.50, which uses the same openldap, fixes the problem so I'm inclined to think the issue has nothing to do with openldap. The infinite while loop around the sread call depends on an external condition, which seems pretty weak to me. The external condition is that the socket is nonblocking. I added an fcntl call to get the flags to test for the O_NONBLOCK flag. It turns out that it's not always set. In the case when O_NONBLOCK is not set, the sread (recv) blocks indefinitely. In addition, I'm seeing EBADF errors from the fcntl and pselect calls, and ENOTSOCK from the sread (recv) call. Since I'm not familiar with the code base, I don't understand how it's possible to get EBADF, ENOTSOCK, or a blocking socket in multi_wait. Maybe there's an async connect going on and it's not fully configured before calling multi_wait. I have no idea. In addition, the overall timeout I have for the curl call (CURLOPT_TIMEOUT) is ignored. The partial exception to this statement is if I set up a progress function. If I set up a progress function that always returns the "continue" flag, I'll see a trace from libcurl that says "Operation timed out after 10000ms". I have a 10s timeout. But, instead of breaking out of the blocking sread, I get a SEGV in ldapsb_tls_write (openldap.c), at this line: ret = (li->send)(data, FIRSTSOCKET, buf, len, &err) li->send is NULL, resulting in the SEGV. Adding "if (li && li->send)" in front of this line fixed the problem. BTW, the same problem occurs if the progress function returns something other than the "continue" flag. I get a trace saying "Operation was aborted by an application callback", followed by the SEGV. Again, the SEGV only happens if I set up a progress function. Adding the check for li->send fixes the SEGV. Both of these problems are fixed for me. I'm happy to discuss in more detail if necessary. Regards, Bill Smith
-- Unsubscribe: https://lists.haxx.se/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
