RE: Locking inefficiency

Levin, Igor Wed, 11 Jun 2014 00:46:25 -0700

                Geoff,
we did not seem to be able to figure out what openssl Makefile actually builds 
crypto/threads/th-lock.c

In our particular case we explicitly included that file when building our 
server, but for “pure” OpenSSL, what make includes th-lock.c ?

                Igor.

From: Geoffrey Thorpe [mailto:ge...@geoffthorpe.com]
Sent: Monday, June 09, 2014 10:13 PM
To: openssl-dev@openssl.org
Cc: Levin, Igor; Salz, Rich
Subject: Re: Locking inefficiency

Hey Bodo,

On Mon, Jun 9, 2014 at 3:15 PM, Bodo Moeller 
<bmoel...@acm.org<mailto:bmoel...@acm.org>> wrote:
Geoffrey Thorpe <ge...@geoffthorpe.com<mailto:ge...@geoffthorpe.com>>:

First, you're right, pthreads_locking_callback() is collapsing everything to a 
mutex.

I was well aware of this and thought we did this for compatibility reasons 
(because I couldn't think of any other reasonable explanation, I guess).  If 
actual read-write locks are just as portable, I think it's a no-brainer that we 
should switch to them.  (After all, our code is already prepared for that, for 
applications that provide appropriate custom callbacks.  It's just the default 
that falls behind.)

Well, who knows what the legacy is behind some of the SSLeay-esque assumptions, 
but I dare say that any pthread implementation that didn't have rw-locks is 
likely to be relegated to history by now.

In any case, I suspect we will soon require that "supported platforms" be those 
for which automated builds and tests are running regularly. Before the next 
stable branch is cut, in any case.

So I'm going to propose that we initially put this patch into the development 
head only, and defer a decision on whether to cherry-pick it into stable 
branches until that testing is in place. The use of mutex instead of rwclock is 
not a functional bug, and for this to be a noticeable performance issue (ie. 
the kind of workload where this matters) is likely to be in situations where 
downstream patching of the openssl release/checkout is OK. Well, until such 
time as a cherry-pick-to-stable decision can be made with confidence, in any 
case.

I certainly don't want us to replace the mutex-using code with something that 
uses mutex *or* rwlock depending on whether OPENSSL_PTHREADS_SUCK is defined or 
not.

BTW, I mention this because NPTL headers apparently cage the rwlock definitions 
in some #ifdef-ery that I think we want to avoid in the mutex->rwlock changes 
in openssl. Rather than grappling with the 
will-some-platform-fail-in-some-subtle-way issues, I prefer that we rely on the 
short-term arrival of platform coverage/testing to detect the issue if there is 
one to cater for.

What do you think?

For future work, a lock-free approach (using thread-local storage?) certainly 
might make sense, but switching to read-write locks in the default callbacks 
should be a tiny change with significant benefits to multithreaded applications.

Yeah, but a couple of things come to mind.

(1) rwlocks (under optimised conditions anyway) seem to be essentially "lock 
free" in the fast-path case anyway, ie. for the read-lock/no-contention case, 
due to futex magic. That means no context-switch (to the kernel or otherwise) 
in that by-far-the-most-common case. So I think a change to rwlocks is likely 
to eliminate the observable syscall and contention overheads anyway. (2) that 
error code is pretty convoluted and is one of my top candidates for getting an 
axe taken to it, so fiddling with it to get the fast-path lockless (in spite of 
pthread optimisations) sounds like a bad use of time.

Cheers,
Geoff

RE: Locking inefficiency

Reply via email to