Hi,
I previously reported this issue with the subject "ssl3_get_server_hello
not threadsafe(?)" to -dev. Rephrased a bit and reposted to -bugs. This
is definitely relevant for 0.9.8i, but seems to be so for current
snapshots too.
I've a simple program (used as a stress tool) that creates a number of
threads, and uses a shared SSL_CTX to create an SSL object for each
thread. My understanding is that this fits with the intended
thread-safety model of the library: the SSL_CTX structure is intended to
be shareable between different threads without additional locking by the
application. Sporadically, I get "SSL_R_WRONG_CIPHER_RETURNED" errors as
the threads negotiated with the server. Stepping through to reproduce
the problem, I got to ssl3_get_server_hello(), where (at ssl/s3_clnt.c
line 802 in 0.9.8i) it does this:
> sk=ssl_get_ciphers_by_id(s);
> i=sk_SSL_CIPHER_find(sk,c);
The stack returned by ssl_get_ciphers_by_id happens to be the one from
the SSL_CTX structure: there's no specific set of ciphers for the SSL
object itself. This stack is shared between all threads accessing the
SSL_CTX structure.
The implementation of sk_SSL_CIPHER_find is a macro that ends up
invoking "sk_find", in turn invoking "internal_find". This calls
sk_sort() before doing a binary search of the stack. This means that
sk_sort can be invoked from different threads simultaneously without any
lock protection. A quick analysis of the instance of the SSL_CTX
structure at the point that I get my "wrong cipher returned" error shows
that the "sorted" field of the stack is indeed set, but when running
through the list of ciphers, I see that the ID fields are mostly, but
not completely, sorted - hence the intermittent failure of
sk_SSL_CIPHER_find.
I think the best way to avoid this problem is to ensure that the cipher
stack is sorted before there's any possibility of different threads
using the SSL_CTX structure: My reading is that the ciphers_by_id will
end up being sorted most of the time anyway, and the overhead of sorting
the small number of items in it justifies sorting it earlier rather than
later. The attached patch fixes the problem locally for me, but I'm not
familiar enough with OpenSSL to be sure its the best thing to do.
diff -u -r1.1.1.2 ssl_ciph.c
--- ssl/ssl_ciph.c 18 Nov 2008 12:53:20 -0000 1.1.1.2
+++ ssl/ssl_ciph.c 28 Nov 2008 11:41:41 -0000
@@ -1088,6 +1088,14 @@
*cipher_list_by_id = tmp_cipher_list;
(void)sk_SSL_CIPHER_set_cmp_func(*cipher_list_by_id,ssl_cipher_ptr_id_cmp);
+ /*
+ * XXX: Without this, the ostensibly readonly operation
+ * sk_SSL_CIPHER_find will invoke the sort. A single SSL_CTX object
+ * shared between multiple threads could therefore be updated
+ * unsafely
+ */
+ sk_SSL_CIPHER_sort(*cipher_list_by_id);
+
return(cipherstack);
}