OK, I've taken a look at this, and scratched my head a little. It's a
touch complicated by the fact that thread-ids have changed in the head
of development relative to what you're looking at in 0.9.8. But I'm now
wondering if you haven't misunderstood the nature of openssl's threading
support;
On Tuesday 31 March 2009 04:52:19 Marc Haisenko wrote:
> Hi folks,
> last week I described a problem with RSA blinding and its locking[1].
> Coincidentally another user ran into the same problem in a totally
> different scenario, the work-around of disabling the blinding fixed
> the issue for him[2].
>
> So here's a patch to fix this issue.
>
> The issue is that the rsa_get_blinding method checks whether there is
> concurrent access to the blinding and tells its caller whether locking
> needs to be done. The check is done by comparing thread/process IDs.
> If they differ locking needs to be done. The problem is that locking
> is not done if the IDs are same. It is assumed that the creator
> doesn't need to do locking which is correct for a single-threaded
> application but is wrong in a multi- threaded/processed environment
> once another process has accessed the blinding.
We need to separate multi-threading from multi-processing here. Once
you've forked, there should be no locking relationship whatsoever
between processes, irrespective of any threads (and thread-ids) within
those processes. Do you have some reason to believe this isn't the case?
The fork has essentially duplicated the process and created isolation
between the two (copy-on-write or otherwise). So our only concern
*should* be between threads within the same process.
So ... multi-threading. My understanding of the RSA blinding (which I
didn't write) is that rsa->blinding is lazy-initialised by whichever
thread gets to it first. The call to RSA_setup_blinding() creates the
blinding struct and tags it with the current thread-id. Subsequent
accesses to RSA blinding will use rsa->blinding if and only if the
caller is the thread that lazy-initialised it (ipso facto, if processing
of a given RSA struct is always in one thread - this is optimal).
Accesses to RSA blinding from other threads should use rsa->mt_blinding
(which is lazy-initialised by the first such thread) and this places
obligations on the caller - ie. do your own locking, don't use a common
blinding factor, etc.
This should work. I'm wondering if I understood what you meant by "The
problem is that locking is not done if the IDs are same." ... If the IDs
are the same, that means you're in the same thread, period!! If that's
not the case, then the problem is that your thread-id callback isn't set
up correctly. For threading support to work correctly, you need to
provide hooks in order to make openssl compatible with your thread model
(pthreads or otherwise). For 0.9.8, the APIs include;
void CRYPTO_set_locking_callback(void (*func)(int mode,int type,
const char *file,int line));
void CRYPTO_set_id_callback(unsigned long (*func)(void));
Locking callbacks default to nops, and id callbacks default to different
things depending on platform, but one possibility is "&errno" (ie. the
address of the errno var, in the hopes that it's thread-local) and
another is getpid() (ie. in the hopes that the threads have distinct
pids too). It may be that the default behaviour is degenerating on your
platform and not correctly differentiating between threads.
>
> As a fix I've introduced a new flag for struct rsa_st,
> RSA_FLAG_BLINDING_NEEDS_LOCKING. Once it has been noticed that locking
> has to be done the flag is set and from then on we always do locking,
> whether we are the creator or not.
>
> However, I'm scratching my head why this fixes the issue given the
> fact that there are actually two blindings: one when doing no locking
> (rsa->blinding) and another one when doing locking (rsa->mt_blinding).
> Yet our load test indicates that this really does fix the issue:
> without it we get about a dozen Bad Record MAC per day due to the
> blinding. Since I've implemented the attached patch none has occured
> (which is about 1.5 days of load test now).
As before, the thread detection ('id_callback') might be failing you such
that all threads appear to be "the same thread"? Try calling
CRYPTO_thread_id() from different threads in the same process and see
whether they return the same value. This might explain why you "fixed"
the problem by locking all callers even when they all have the same
thread ID...
Cheers,
Geoff
--
Un terrien, c'est un singe avec des clefs de char...
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [email protected]
Automated List Manager [email protected]