Hi All,
I am using the Zookeeper C-client libraries to connect to ZK servers.
I am using 3.6.2 library.
The problem I am facing is that the library gets stuck in pthread_join() call
and never returns.
The scenario is as follows:
* Zookeeper C-client connects to zookeeper over a m-TLS connection.
* The client loses network connectivity to zookeeper servers.
* During this time the zookeeper client code calls function
zookeeper_close().
* Zookeeper_close() never returns.
* The state of the ZH handle during this time is ZOO_SSL_CONNECTING_STATE
I made the program dump core while it was stuck in this state. The back trace
shows that zookeeper_close() calls adaptor_finsih() which gets stuck in the
phthread_join() call for the IO thread.
This indicates that IO thread was stuck doing something.
The backtrace for the IO thread shows this trace.
Do_io() -> zookeeper_process() -> check_events() -> init_ssl_for_handler() ->
init_ssl_for_socket().
While looking at the code there is a while(1) loop in init_ssl_for_socket() to
which I added the following highlighted code and it seemed to have fixed the
problem for me. Can anybody suggest if this is correct? Or if this problem has
already been fixed in other releases?
while(1) {
int rc;
int sock = fd->sock;
struct timeval tv;
fd_set s_rfds, s_wfds;
tv.tv_sec = 1;
tv.tv_usec = 0;
FD_ZERO(&s_rfds);
FD_ZERO(&s_wfds);
if(zh->close_requested)
{
return ZSSLCONNECTIONERROR;
}
Many Thanks,
-Parag