Heikki, * Heikki Linnakangas (hlinnakan...@vmware.com) wrote: > Hmm. Are you sure you're getting an SSL connection? Run it with > something like this to make sure:
sslmode=require doesn't help on Unix domain connections. :) Was able to get it to lock with both 9.2.4 and master, and with both versions of the SSL library (1.0.1c-4ubuntu8.1 and 1.0.1e-3). The lock that I got showed these stacks: Thread #2: pthread_mutex_lock init_ssl_system ; fe-secure.c:917 Thread #3: pthread_mutex_lock pq_lockingcallback ... SSL_new pqsecure_open_client ; fe-secure.c:275 Thread #2 waiting at 917 makes sense, he's waiting on the lock that the other thread has on ssl_config_mutex before moving in to set up his own SSL connection. What's odd is how is thread #3 waiting on a lock in the lock array. Both threads agree that ssl_open_connections is only 1 (thread #3's ; thread #2 hasn't gotten to incrementing it yet). Looking at the lock array, only one of the locks is taken out and it's owner is thread #3, meaning that SSL apparently caused a deadlock by trying to take a lock which it's already taken. Changing the lock type to be recursive instead masks the self-locking issue, but then I got a case where, with the same stack traces as above, the lock in the array was held by thread #2 instead, where thread #2 is in init_ssl_system- well before it's even made any calls into SSL since the previous PQdestroy happened. I've also caught it where a thread is still holding a lock when it drops into destroy_ssl_system() by simply trying to unlock all of the locks in the array. With the recursive lock type, all such attempts should simply error out (either it's locked by someone else, or it's already unlocked) and so I checked for *successful* unlocks: 2: DEBUG: database connection established 2: DEBUG: about to call PQfinish() successfully unlocked mutex! Having that happen can then cause the deadlock because the other thread can end up waiting on that lock that we're still holding while in destroy_ssl_system(), where we're waiting on the ssl_config_mutex lock that the first thread has. Even leaving that code in there, which unlocks all the locks during destroy_ssl_system(), it's still deadlocked on me with the same stack trace as above, with thread #2 holding a lock in the pq_lockarray which thread #3 is trying to get (while thread #3 holds the ssl_config_mutex lock that thread #2 is waiting on). Very curious. Out of time right now to look into it, but will probably be back at it later tonight. Thanks, Stephen
signature.asc
Description: Digital signature