Re: [openssl-dev] [openssl-users] Changing malloc/debug stuff

2015-12-17 Thread Nico Williams
On Thu, Dec 17, 2015 at 08:16:50PM +, Salz, Rich wrote:
> > > https://github.com/openssl/openssl/pull/450
> > 
> > This seems much more sane.
> 
> I'll settle for less insane :)

That is, I think, the best you can do.  Some allocations might have
taken place by the time a wrapper or alternative allocator is
installed, in which case something bad will happen.  In the case of
alternative allocators the something bad is "it blows up", while in the
case of a wrapper the something bad is "some state/whatever will be
off".

A fully sane approach would be to have every allocated object internally
point to its destructor, and then always destroy by calling that
destructor instead of a global one.  (Or call a global one that knows
how to find the object's private destructor pointer, and then calls
that.)  If you wish, something more OO-ish.  But for many allocations
that's not possible because they aren't "objects" in the sense that
matters.  You could always wrap allocations so that they always have
room at the front for the corresponding destructor, then return the
offset of the end of that pointer, but this will be very heavy-duty for
many allocations.  So, all in all, I like and prefer your approach.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-users] Changing malloc/debug stuff

2015-12-17 Thread Nico Williams
On Thu, Dec 17, 2015 at 09:28:28AM +, Salz, Rich wrote:
> I want to change the memory alloc/debug things.
> 
> Right now there are several undocumented functions to allow you to
> swap-out the malloc/realloc/free routines, wrappers that call those
> routines, debug versions of those wrappers, and functions to set the
> set-options versions of those functions.  Yes, really :)  Is anyone
> using that stuff?

This is another one of those things that isn't easy to deal with sanely
the way OpenSSL is actually used (i.e., by other libraries as well as by
apps).

> I want to change the model so that there are three wrappers around
> malloc/realloc/free, and that the only thing you can do is change that
> wrapper.  This is vastly simpler and easier to understand.  I also
> documented it.  A version can be found at
> https://github.com/openssl/openssl/pull/450

This seems much more sane.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-16 Thread Nico Williams
Another interesting portable atomics library is
https://github.com/mintomic/mintomic

FYI, I took a stab at a simple portable atomics that uses GCC/clang
__atomic, or __sync, or Win32 Interlocked*, or a single global lock, and
with a fallback to unsafe, non-atomic implementations for no-threads
configurations; adding C11 support will be trivial.  For just
incremdent/decrement and CAS this is really small, and I think that's
enough for OpenSSL for starters.
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-15 Thread Nico Williams
On Tue, Dec 15, 2015 at 01:24:12PM +0100, Florian Weimer wrote:
> * Nico Williams:
> 
> > On Tue, Dec 08, 2015 at 11:19:32AM +0100, Florian Weimer wrote:
> >> > Maybe http://trac.mpich.org/projects/openpa/ would fit the bill?
> >> 
> >> It seems to have trouble to keep up with new architectures.
> >
> > New architectures are not really a problem because between a) decent
> > compilers with C11 and/or non-C11 atomic intrinsics,
> 
> Not on Windows.

Windows has a family of functions for atomic addition, compare-and-swap,
etcetera:

https://msdn.microsoft.com/en-us/library/windows/desktop/ms686360%28v=vs.85%29.aspx#interlocked_functions

Solaris/Illumos has its own as well.

Linux has several atomics libraries.

And there are several open-source portable atomics libraries as well.

I.e., between compiler non-C11 atomic intrinsics, C11 intrinsics, OS
atomic function libraries, and portable open-source atomics libraries,
we can cover almost all the bases.

> > What's the alternative anyways?
> 
> Using C++11.

Sure, but only for a C atomics library for the rest of OpenSSL.

So that makes five alternatives, plus the two stub implementations (one
with global locks, one with no locking/atomics).  Any platform not
covered will get one of the stub implementations and its users will have
to contribute a better implementation.

We have a surfeit of options, not a dearth of them.  I don't think lack
of atomics primitives is remotely a concern.  We should use atomic
primitives in OpenSSL.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-15 Thread Nico Williams
On Tue, Dec 15, 2015 at 09:57:32AM -0600, Benjamin Kaduk wrote:
> On 12/15/2015 06:43 AM, Kurt Roeckx wrote:
> > On Tue, Dec 15, 2015 at 01:24:12PM +0100, Florian Weimer wrote:
> >> Using C++11.
> > I think this is a relevant article:
> > http://herbsutter.com/2012/05/03/reader-qa-what-about-vc-and-c99/
> >
> 
> I think an article from 2012 is no longer current; something like
> http://blogs.msdn.com/b/vcblog/archive/2015/06/19/c-11-14-17-features-in-vs-2015-rtm.aspx
> might be a better source.

Yes, but not everyone will have the latest and greatest compilers.
Still, the Windows interlocked function family is enough.  See my other
post just now.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-15 Thread Nico Williams
On Tue, Dec 15, 2015 at 06:15:32PM +, Salz, Rich wrote:
> > I.e., between compiler non-C11 atomic intrinsics, C11 intrinsics, OS atomic
> > function libraries, and portable open-source atomics libraries, we can cover
> > almost all the bases.
> 
> Agreed.

Thanks.  This is helpful.  I now think we'll have an easy road to
consensus on how to get thread-safety in OpenSSL.  I will try to
contribute infrastructure, but there's a lot of heavy lifting to do that
will take a long time, and I can't do it all.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-15 Thread Nico Williams
On Tue, Dec 15, 2015 at 07:54:35PM +0100, Kurt Roeckx wrote:
> Also, if you want to use atomics we really want the C11 / C++11
> memory model which prevents certain important optimazations.

Right, because compilers can reorder some operations.  But we've been
living with this pre-C11 for decades.  We can't yet require C11 (though
I'd sure like to).

BTW, there's this:

http://blogs.msdn.com/b/vcblog/archive/2015/05/01/bringing-clang-to-windows.aspx

which, IIUC means we can get C11 on Windows with MSVC with a clang
frontend.  So maybe Windows is a non-issue, and maybe C11 is a
non-issue.  Though I'm sure there's platforms where OpenSSL can't expect
C11 yet, so I suspect we're stuck with C90+ for a while.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-09 Thread Nico Williams
On Wed, Dec 09, 2015 at 02:33:46AM -0600, Nico Williams wrote:
> No more installing callbacks to get locking and atomics.

I should explain why.

First, lock callbacks are a serious detriment to usability.

Second, they are an admission that OpenSSL is incomplete.

Third, if we have lock callbacks to install, then we have the risk of
racing (by multiple libraries using OpenSSL) to install them.  Unless
there's a single function to install *all* such callbacks, then there's
no way to install callbacks atomically.  But every once in a while we'll
need to add an Nth callback, thus breaking the ABI or atomicity.

So, no, no lock callbacks.  OpenSSL should work thread-safely out of the
box like other libraries.  That means that the default configuration
should be to use pthreads on *nix, for example.  We'll need an atomics
library (e.g., OpenPA, or something new) with safe and sane -if not very
performant- defaults that use global locks for platform/compiler
combinations where there's no built-in atomics intrinsics or system
library.  It should be possible to have a no-threads configuration where
the locks and atomics are non-concurrent-safe implementations.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-09 Thread Nico Williams
On Thu, Dec 10, 2015 at 07:06:15AM +1000, Paul Dale wrote:
> Thanks for the clarification.  I was making an assumption that
> following the existing locking model, which did seem over complicated,
> was desirable.  Now that that is shot down, things can be much
> simpler.

Exactly :)

Sorry if I was a bit brusque.  Since inertia is strong, I figured I
needed to make a forceful argument.  However, it seems it was easy to
get consensus after all.

> It would make more sense to have a structure containing the reference
> counter and (optionally?) a lock to use for that counter.

It'd work but it'd be a complication, since now every integer to be used
with atomic increment/decrement, or CAS, or whatever, now needs to be a
struct type with the integer as one field and the rest as opaque.  It'd
be much nicer to be able to use ints normally, though I would agree that
having a special type for atomics has the benefit that it is
self-describing.

It's perfectly fine to have a worst-case atomics implementation that
uses a global lock.  Yes, that would be slow, but we need some incentive
to add true atomics for each platform, and making it slow is the exact
right incentive.

So even if for documentation and type-safety reasons we wanted to wrap
ints in structs for ints meant to be used with atomics, I'd still want
the worst-case atomics implementation to be slow.

> With atomics, the lock isn't there or at least isn't used.  Without
> them, it is.  This is because, I somewhat suspect having a fall back
> global lock for all atomic operations would be worse than the current
> situation were at least a few different locks are used.

That's a feature :)

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-08 Thread Nico Williams
On Tue, Dec 08, 2015 at 11:19:32AM +0100, Florian Weimer wrote:
> > Maybe http://trac.mpich.org/projects/openpa/ would fit the bill?
> 
> It seems to have trouble to keep up with new architectures.

New architectures are not really a problem because between a) decent
compilers with C11 and/or non-C11 atomic intrinsics, b) asm-coded
atomics, and c) mutex-based dumb atomics, we can get full coverage.
Anyone who's still not satisfied can then contribute missing asm-coded
atomics to OpenPA.  I suspect that OpenSSL using OpenPA is likely to
lead to contributions to OpenPA that will make it better anyways.

What's the alternative anyways?

We're talking about API and performance enhancements to OpenSSL to go
faster on platforms for which there are atomics, and maybe slower
otherwise -- or maybe not; maybe we can implement context up-/down-ref
functions that use fine-grained (or even global) locking as a fallback
that yields performance comparable to today's.

If OpenPA's (or some other such library's) license works for OpenSSL,
someone might start using it.  That someone might be me.  So that seems
like a good question to ask: is OpenPA's license compatible with
OpenSSL's?  For inclusion into OpenSSL's tree, or for use by OpenSSL?

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-08 Thread Nico Williams
On Wed, Dec 09, 2015 at 09:27:16AM +1000, Paul Dale wrote:
> It will be possible to support atomics in such a way that there is no
> performance penalty for machines without them or for single threaded
> operation.  My sketcy design is along the lines of adding a new API
> CRYPTO_add_atomic that takes the same arguments as CRYPTO_add (i.e.
> reference to counter, value to add and lock to use):
> 
> CRYPTO_add_atomic(int *addr, int amount, int lock)
> if have-atomics then
> atomic_add(addr, amount)
> else if (lock == have-lock-already)
> *addr += amount
> else
> CRYPTO_add(addr, amount, lock)

"have-atomics" must be known at compile time.

"lock" should not be needed because we should always have atomics, even
when we don't have true atomics: just use a global lock in a stub
implementation of atomic_add() and such.  KISS.  Besides, this will add
pressure to add true atomics wherever they are truly needed.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-07 Thread Nico Williams
On Mon, Dec 07, 2015 at 02:41:35PM +0100, Florian Weimer wrote:
> On 11/25/2015 06:48 PM, Kurt Roeckx wrote:
> > Please note that we use C, not C++.  But C11 has the same atomics
> > extentions as C++11.
> 
> C++11 support is much more widespread than C11 support.  You will have
> trouble finding reliable support for C11 atomics with the Microsoft
> toolchain.
>
> [...]
>
> It is a lot of working getting the atomics right on all supported platforms.

The MSFT toolchain has its own intrisics, as do GCC/clang.  A variety of
OSes have their own atomics libraries (e.g., Solaris/Illumos, FreeBSD,
and others).  Linux has several as well, but I am not sure that the
licensing on those will be compatible to link against (much less to
incorporate as source in OpenSSL).  Some of the BSD or CDDL licensed
libraries might be possible to incorporate as source into OpenSSL.

It's a solvable problem, but yes, a lot of work :(  Still, it seems
worth doing.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-12-07 Thread Nico Williams
Maybe http://trac.mpich.org/projects/openpa/ would fit the bill?
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-30 Thread Nico Williams
On Tue, Dec 01, 2015 at 09:21:34AM +1000, Paul Dale wrote:
> However, the obstacle preventing 100% CPU utilisation for both stacks
> is lock contention.  The NSS folks apparently spent a lot of effort
> addressing this and they have a far more scalable locking model than
> OpenSSL: one lock per context for all the different kinds of context
> versus a small number of global locks.

I prefer APIs which state that they are "thread-safe provided the
application accesses each XYZ context from only one thread at a time".

Leave it to the application to do locking, as much as possible.  Many
threaded applications won't need locking here because they may naturally
have only one thread using a given context.

Also, for something like a TLS context, ideally it should be naturally
possible to have two threads active, as long as one thread only reads
and the other thread only writes.  There can be some dragons here with
respect to fatal events and deletion of a context, but the simplest
thing to do is to use atomics for manipulating state like "had a fatal
alert", and use reference counts to defer deletion (then if the
application developer wants it this way, each of the reader and writer
threads can have a reference and the last one to stop using the context
deletes it).

> There is definitely scope for improvement here.  My atomic operation
> suggestion is one approach which was quick and easy to validate,
> better might be more locks since it doesn't introduce a new paradigm
> and is more widely supported (C11 notwithstanding).

A platform compatibility atomics library would be simple enough (plenty
exist, I believe).  For platforms where no suitable implementation
exists you can use a single global lock, and if there's not even that,
then you can use non-atomic implementations and pretend it's all OK or
fail to build (users of such platforms will quickly provide real
implementations).

(Most compilers have pre-C11 atomics intrinsics and many OSes have
atomics libraries.)

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Nico Williams
On Mon, Nov 23, 2015 at 01:53:47AM +, Viktor Dukhovni wrote:

[NetBSD header commentary extracts:]

> /*
>  * Use macros to rename many pthread functions to the corresponding
>  * libc symbols which are either trivial/no-op stubs or the real

No renaming is necessary if one's link-editor and RTLD support
filters...

  An ELF filter is a forwarding saying that the implementation of a
  symbol in the filter object is to be found elsewhere, e.g., in some
  other object.

  In Solaris/Illumos this is used to maintain backwards compatibility
  when symbols get moved from one library to another.

  E.g., libpthread and libdl moved into libc, but they remain as filters
  so that objects linked with those old libraries will a) still find
  them, b) still find the symbols the expect in them, c) get the correct
  implementations of those symbols from the object now providing them
  (here: libc).

  Filters are awesome.  Lack of universal support for them is very
  frustrating.  On Linux, for example, it's possible to create filters
  with strong link-editor-fu, but the RTLD does not support them.

>  * thing, depending on whether libpthread is linked in to the
>  * program. This permits code, particularly libraries that do not
>  * directly use threads but want to be thread-safe in the presence of
>  * threaded callers, to use pthread mutexes and the like without
>  * unnecessairly including libpthread in their linkage.

Just move these into libc, lock stock and barrel, and if you want to
have fast versions for the single-threaded case, just arrange for slower
versions to get hot-patched-in when pthread_create() is first called.

Or even just use a branch/computed jump (whichever is faster) to avoid
having to hot-patch.

It's important that pthread_mutex_init/lock/trylock/unlock/destroy work
correctly even in the optimized single-threaded case.  The main thread
might init and acquire some locks then create a second thread that will
block acquiring those locks.

>  * Left out of this list are functions that can't sensibly be trivial
>  * or no-op stubs in a single-threaded process (pthread_create,
>  * pthread_kill, pthread_detach), functions that normally block and
>  * wait for another thread to do something (pthread_join), and

Just move them into libc anyways.

>  * functions that don't make sense without the previous functions
>  * (pthread_attr_*). The pthread_cond_wait and pthread_cond_timedwait
>  * functions are useful in implementing certain protection mechanisms,
>  * though a non-buggy app shouldn't end up calling them in
>  * single-threaded mode.
>  *
>  * The rename is done as:
>  * #define pthread_foo__libc_foo
>  * instead of
>  * #define pthread_foo(x) __libc_foo((x))
>  * in order that taking the address of the function ("func =
>  * _foo;") continue to work.
>  *
>  * POSIX/SUSv3 requires that its functions exist as functions (even if
>  * macro versions exist) and specifically that "#undef pthread_foo" is
>  * legal and should not break anything. Code that does such will not
>  * successfully get the stub behavior implemented here and will
>  * require libpthread to be linked in.
>  */

All the more reason to not rename these symbols!  All you need is ELF
filter support.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Nico Williams
On Tue, Nov 24, 2015 at 11:32:32AM +1000, Peter Waltenberg wrote:
> I wasn't saying there was anything wrong with mmap(), just that guard pages
> only work if you can guarantee your overrun hits the guard page (and
> doesn't just step over it). Large stack allocations increase the odds of
> 'stepping over' the guard pages. It's still better than not having guard
> pages, but they aren't a hard guarantee that you won't have mysterious bugs
> still.
> 
> You obviously realize that, but bn_prime() is the classic example of
> allocating very large chunks of memory on the stack.

Sure.  Rich Salz claims this is all fixed in master, so guard pages
strike me as a plus, not a wash.

> As for fibre's, I doubt it'll work in general, the issue there is simply
> the range of OS's OpenSSL supports. If you wire it in you still have to run
> with man+dog+world in the process, that's a hard ask. One of the good
> points about OpenSSL up until now, it tends to not break those big messy
> apps where a whole lot of independly developed code ends up in the same
> process.

That's... a joke, right?

OpenSSL very much does "break those big messy apps ...".  I don't see
the fibers project making that worse, nor better -- it's neutral.

Let me give you some examples.

Using Heimdal's libgssapi from Java JGSS (with the JNI GSS wrapper)
blows up due to no one initializing OpenSSL's lock callbacks.  Heimdal,
of course, uses OpenSSL.  Who should be initializing the lock callbacks?
Not the JVM -- why should it know/assume that some library used via
dlopen() will want OpenSSL?

But it's not a library's place to initialize OpenSSL's lock callbacks
either!  Suppose a Java program were loading via dlopen() *two*
libraries that use OpenSSL, and suppose different threads are racing to
do this.

This happens, and it happens because OpenSSL is used in so many
*libraries*, not just *programs*.  OpenSSL is a prime case of how a
library meant for use by programs comes to be used by libraries.
Another example is PKCS#11.  That's no excuse.  Use by libraries must be
supported.

And historically OpenSSL has been very bad at keeping its ABI
backwards-compatible, so DLL Hell cases often involve OpenSSL.

The more layers: the higher the likelihood of breakage involving
OpenSSL.  All threaded pluggable-software-using software (name services
switch, PAM, JNI, ...) is vulnerable to these problems.

If the OpenSSL team finally decides to do something about sane locking
by default, then it will be a huge improvement.  If this thread provides
the impetus, so much the better.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Nico Williams
[Viktor asked me for my advice on this issue and bounced me the post
 that I'm following up to.  -Nico]

The summary of what I've to say is that making libcrypto and libssl need
-lpthread is something that does require discussion, as it will have
detrimental effects on some users.  Personally, I think that those
detrimental effects are a good thing (see below), but nonetheless I
encourage you to discuss whether this is actually what OpenSSL should
do.  In particular, it may be possible to avoid -lpthread on some
systems and still get a subset of lipthread functionality from libc or
the compiler (e.g., thread-locals), and that may be worth doing.

On Mon, Nov 23, 2015 at 01:53:47AM +, Viktor Dukhovni wrote:
> As a side-effect of the MacOS/X MR I've become aware that the async
> code in its current state links master with "-lphtread", and defines
> macros that enable multi-theaded (as opposed to merely thread-safe)
> compilation into OpenSSL.
> 
> commit 757d14905e3877abaa9f258f3dd0694ae3c7c270
> Author: Matt Caswell <m...@openssl.org>
> Date:   Thu Nov 19 14:55:09 2015 +
> 
>   Add pthread support
> 
>   The forthcoming async code needs to use pthread thread local variables. 
> This
>   updates the various Configurations to add the necessary flags. In many 
> cases
>   this is an educated guess as I don't have access to most of these
>   environments! There is likely to be some tweaking needed.
> 
>   Reviewed-by: Kurt Roeckx <k...@openssl.org>
> 
> This is quite possibly not be the right thing to do, and deserves
> some attention from the team.  We might even seek outside som
> outside advice from folks well versed in platform-specific library
> engineering (Christos Zoulas from NetBSD, Nico Williams formerly
> from Sun, ...).
> 
> My concern is that introducing -lpthread automatically converts
> single-threaded applications that link with OpenSSL into threaded
> applications (with a single thread).  This may well have undesirable
> consequences.

Some background may be needed.  When threading was introduced in the
90s, and in some cases still to this day, generally the end result was
that the system had to support a number "process models", with potential
transitions from one to another at run-time:

 - single-threaded, dynamically linked
 - single-threaded, statically  linked
 - multi-threaded,  dynamically linked
 - multi-threaded,  statically  linked
 - multi-threaded,  mixed linkage

The statically-linked models can become mixed-linkage models via
dlopen().  The single-threaded model can become a threaded model via
dlopen() of an object linked with -lpthread, or by dlopen()ing
libpthread itself.

Solaris 9 and under used to have a veritable rats nest of code to deal
with the process model transitions from single-threaded to
multi-threaded.  Solaris 10 unified these and moved libpthread and libdl
into libc [with filters left behind for backwards compatibility].  Thus,
on Solaris 10 and up, and Illumos, OpenSSL using -lpthread or not makes
no difference.

I'm quite fond of the approach taken by Solaris 10 and up (and thus also
Illumos): there is but one process model, and it is threaded, with
pthreads in libc.  But that need not be the way it works everywhere.
Some systems may still support multiple process models.

For a library like OpenSSL making use of -lpthread does mean dictating
to users that they may only use a threaded process model.  OTOH, not
using -lpthread allows the user to choose a process model unconstrained
by such a library.

Until now OpenSSL has avoided forcing the user to choose any particular
process model.  Now with this commit OpenSSL is now taking the reverse
stance.  This seems like a very significant change that should at least
be noted prominently in the release notes, but it should also be
disucssed, indeed.

Personally, I believe this change is a good thing, as OpenSSL really
ought to either automatically initialize its "lock callbacks" or do away
with them completely (leaving backwards compatibility stubs) and use the
OS' locking facility by default / only.  Automatic lock callback
initialization without forcing the use of -lpthread and still allowing
static linking would be tricky indeed (for example: OpenSSL couldn't use
weak symbols to detect when -lpthread gets brought in).

Still, if -lpthread avoidance were still desired, you'd have to find an
alternative to pthread_key_create(), pthread_getspecific(), and friends.

For thread-specifics the obvious answer is to use C compiler
thread-local variable support.  This might or might not be available;
this would have to be determined at build configuration time.  Still, if
where the compiler supports thread-locals, OpenSSL could avoid
-lpthread.

For pthread mutex functions (for lock callbacks) and, perhaps,
pthread_once() (for automatic initia

Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Nico Williams
[Resend, with slight edits.]

[Viktor asked me for my advice on this issue and bounced me the post
 that I'm following up to.  -Nico]

The summary of what I've to say is that making libcrypto and libssl need
-lpthread is something that does require discussion, as it will have
detrimental effects on some users.  Personally, I think that those
detrimental effects are a good thing (see below), but nonetheless I
encourage you to discuss whether this is actually what OpenSSL should
do.  In particular, it may be possible to avoid -lpthread on some
systems and still get a subset of lipthread functionality from libc or
the compiler (e.g., thread-locals), and that may be worth doing.

On a slightly related note, I asked and Viktor tells me that fiber
stacks are allocated with malloc().  I would prefer that they were
allocated with mmap(), because then you get a guard page.  A guard page
would allow one to safely tune down fiber stack size to the whatever
OpenSSL actually needs for a given use.

Comments below.

On Mon, Nov 23, 2015 at 01:53:47AM +, Viktor Dukhovni wrote:
> As a side-effect of the MacOS/X MR I've become aware that the async
> code in its current state links master with "-lphtread", and defines
> macros that enable multi-theaded (as opposed to merely thread-safe)
> compilation into OpenSSL.
> 
> commit 757d14905e3877abaa9f258f3dd0694ae3c7c270
> Author: Matt Caswell <m...@openssl.org>
> Date:   Thu Nov 19 14:55:09 2015 +
> 
>   Add pthread support
> 
>   The forthcoming async code needs to use pthread thread local variables. 
> This
>   updates the various Configurations to add the necessary flags. In many 
> cases
>   this is an educated guess as I don't have access to most of these
>   environments! There is likely to be some tweaking needed.
> 
>   Reviewed-by: Kurt Roeckx <k...@openssl.org>
> 
> This is quite possibly not be the right thing to do, and deserves
> some attention from the team.  We might even seek outside some
> outside advice from folks well versed in platform-specific library
> engineering (Christos Zoulas from NetBSD, Nico Williams formerly
> from Sun, ...).
> 
> My concern is that introducing -lpthread automatically converts
> single-threaded applications that link with OpenSSL into threaded
> applications (with a single thread).  This may well have undesirable
> consequences.

Some background may be needed.  When threading was introduced in the
90s, and in some cases still to this day, generally the end result was
that the system had to support a number "process models", with potential
transitions from one to another at run-time:

 - single-threaded, dynamically linked
 - single-threaded, statically  linked
 - multi-threaded,  dynamically linked
 - multi-threaded,  statically  linked
 - multi-threaded,  mixed linkage

The statically-linked models can become mixed-linkage models via
dlopen().  The single-threaded model can become a threaded model via
dlopen() of an object linked with -lpthread, or by dlopen()ing
libpthread itself.

Solaris 9 and under used to have a veritable rats nest of code to deal
with the process model transitions from single-threaded to
multi-threaded.  Solaris 10 unified these and moved libpthread and libdl
into libc [with filters left behind for backwards compatibility].  Thus,
on Solaris 10 and up, and Illumos, OpenSSL using -lpthread or not makes
no difference.

I'm quite fond of the approach taken by Solaris 10 and up (and thus also
Illumos): there is but one process model, and it is threaded, with
pthreads in libc.  But that need not be the way it works everywhere.
Some systems may still support multiple process models.

For a library like OpenSSL making use of -lpthread does mean dictating
to users that they may only use a threaded process model.  OTOH, not
using -lpthread allows the user to choose a process model unconstrained
by such a library.

Until now OpenSSL has avoided forcing the user to choose any particular
process model.  Now with this commit OpenSSL is now taking the reverse
stance.  This seems like a very significant change that should at least
be noted prominently in the release notes, but it should also be
discussed, indeed.

Personally, I believe this change is a good thing, as OpenSSL really
ought to either automatically initialize its "lock callbacks" or do away
with them completely (leaving backwards compatibility stubs) and use the
OS' locking facility by default / only.  Automatic lock callback
initialization without forcing the use of -lpthread and still allowing
static linking would be tricky indeed (for example: OpenSSL couldn't use
weak symbols to detect when -lpthread gets brought in).

Still, if -lpthread avoidance were still desired, you'd have to find an
alternative to pthread_key_create(), pthread_getspecific(), and friends.

For thread-spe

Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread

2015-11-23 Thread Nico Williams
On Mon, Nov 23, 2015 at 08:34:29PM +, Matt Caswell wrote:
> On 23/11/15 17:49, Nico Williams wrote:
> > On a slightly related note, I asked and Viktor tells me that fiber
> > stacks are allocated with malloc().  I would prefer that they were
> > allocated with mmap(), because then you get a guard page.  A guard page
> > would allow one to safely tune down fiber stack size to the whatever
> > OpenSSL actually needs for a given use.
> 
> Interesting. I'll take a look at that.

Please do.  It will make this much safer.  Also, you might want to run
some experiments to find the best stack size on each platform.  The
smaller the stack you can get away with, the better.

> > Still, if -lpthread avoidance were still desired, you'd have to find an
> > alternative to pthread_key_create(), pthread_getspecific(), and friends.
> 
> Just a point to note about this. The async code that introduced this has
> 3 different implementations:
> 
> - posix
> - windows
> - null
> 
> The detection code will check if you have a suitable posix or windows
> implementation and use that. Otherwise the fallback position is to use
> the null implementation. With "null" everything will compile and run but
> you won't be able to use any of the new async functionality.
> 
> Only the posix implementation uses the pthread* functions (and only for
> thread local storage). Part of the requirement of the posix detection
> code is that you have "Configured" with "threads" enabled. This is the
> default. However it is possible to explicitly configure with
> "no-threads". This suppresses stuff like the "-DRENENTERANT" flag. It
> now will also force the use of the null implementation for async and
> hence will not use any of the pthread functions.

Ah, I see.  I think that's fine.  Maybe Viktor misunderstood this?

> One other option we could pursue is to use the "__thread" syntax for
> thread local variables and avoid the need for libpthread altogether. An
> earlier version of the code did this. I have not found a way to reliably
> detect at compile time the capability to do this and my understanding is
> that this is a lot less portable.

I use this in an autoconf project (I know, OpenSSL doesn't use autoconf):

  dnl Thread local storage
  have___thread=no
  AC_MSG_CHECKING(for thread-local storage)
  AC_LINK_IFELSE([AC_LANG_SOURCE([
  static __thread int x ;
  int main () { x = 123; return x; }
  ])], have___thread=yes)
  if test $have___thread = yes; then
 AC_DEFINE([HAVE___THREAD],1,[Define to 1 if the system supports __thread])
  fi
  AC_MSG_RESULT($have___thread)

Is there something wrong with that that I should know?  I suppose the
test could use threads to make real sure that it's getting thread-
locals, in case the compiler is simply ignoring __thread.  Are there
compilers that ignore __thread??

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] curve25519

2015-06-22 Thread Nico Williams
On Sun, Jun 21, 2015 at 10:36:30PM +, Pascal Cuoq wrote:
 Short answer:
 
 No tools that are useful for usable implementations of asymmetric
 cryptography, that I know of, but useful tools for confirming that
 symmetric cryptography designed for constant-time implementation was
 correctly implemented.
 
 Long answer:
 
 [...]
 
 Following this line of reasoning gives you two tools that you can use:

 [...]
 
 There are two problems with both these approaches:
 
 - the assumption about knowledge about the time taken by assembly
 instructions. Both Frama-C and, to the best of my knowledge, ctgrind,
 ignore intermediate values computed from secrets and passed as
 arguments to the division instruction. On a modern processor the
 execution time of division reveals information about its inputs:

That's correct.

Reasoning about CT is hard, and doing so from C source is harder because
the optimizer can potentially convert CT code to not-CT code.

This argues for analyzing assembly, not C.

 So to summarize the first problem, there are a lot of assumptions.
 That doesn't prevent going forward, and taking any new discovery of an
 information leak through timing as a bug and fixing them as flaws are
 discovered in our assumptions, but you won't be “checking” that C code
 is “constant-time” in the sense that you might check the proof of a
 theorem. But even more serious is the second problem:
 [...]
  [...]. Also you
 should bear in mind that when implementing cache-aware constant-time
 look-up tables, you are really assuming a lot of things about the
 behavior of the processor.  [...]

At some point one is forced to dispense with -or, rather, augment-
static analysis with dynamic analysis.

The beauty of dynamic analysis is that it really short-circuits all of
these considerations and gives you an answer you can trust.  But one
does have to make sure to eliminate sources of jitter (e.g., quiesce the
system, disable interrupts on the CPU doing the benchmarking, ...).  If
one is able to trace execution to pinpoint sources of variability, then
dynamic analysis can even point out where secrets leak into timing.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] OpenSSL support on Solaris 11 (built on Solaris 10)

2015-06-16 Thread Nico Williams
On Tue, Jun 16, 2015 at 12:51:31PM +0530, Atul Thosar wrote:
 Currently, we build OpenSSL v0.9.8zc on Solaris 10 (SunOS, sun4u, sparc)
 and it works well on Solaris 10 platform. We use Sun Studio 12 compiler.
 
 We would like to run it on Solaris 11.2 (SunOS, sun4v, sparc) platform w/o
 changing the build platform. I mean we will continue to build OpenSSL on
 Solaris 10 and run it on Solaris 11.
 
 Has anyone encounter such situation?  Appreciate any help/pointers if this
 mechanism will work?

Historically the approach you describe has worked quite well with
Solaris because the ABI is quite stable.  This is particularly the case
if you use only features that are extremely unlikely to be removed in a
new minor release (Solaris 12 would be a minor release).

(You should not build on anything older than S10 to run on S10 or later,
mostly due to subtle changes in how snprintf() works.)

But you should read the ABI compatibility promises that Oracle makes and
decide for yourself.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] Self-initialization of locking/threadid callbacks and auto-detection of features

2015-06-15 Thread Nico Williams
On Thu, Jun 11, 2015 at 10:41:58AM +0200, Florian Weimer wrote:
 Detecting things in libcrypto is very difficult on GNU/Linux due to the
 way dynamic linking works.

Details?

 On GNU/Linux, you should try very hard to avoid linking -lpthread and
 restrict yourself to the pthreads API subset which is available without
 -lpthread.  If something is missing, we (as in glibc upstream) can move
 additional functionality from libpthread to libc.

[My apologies for making this so long.  There's a lot to say about this.]

Perhaps OpenSSL should have several configuration flavors for Linux
then.  If you want to statically link a non-threaded program with
OpenSSL, then you should use the libpthread static link meant for it.

Or perhaps we could have the OpenSSL static link archives assume
non-threaded processes (callers must initialize lock callbacks) but the
shared objects link with -lpthread and assume a threaded environment.
(This means supporting two process models instead of *four*.)

In any case, use of OpenSSL by *libraries* (which is rather common, I
gather; certainly in the case of Kerberos implementations) is currently
a disaster waiting to happen, and sometimes a disaster that strikes.

A workaound for libraries may be to use a private copy (as if by static
linking) of OpenSSL with distinct SONAME/symbols and initialize that
copy properly.  This is generally safe (we've tried it) but also a bit
troublesome.  On the plus side this means that ABI incompatibilities
betwee OpenSSL releases become a non-issue.

Or indeed, libpthread should move into libc (which I gather would take a
long time and is beyond what we can do here).

For background, Solaris moved libpthread into libc around 2004, during
S10 development.  Solaris 10 also dropped support for static linking of
libc and dropped all static linking archives from OS/Net (but not ld(1)
support for static linking).  Before that S9 had basically four process
models: statically-linked,-not-threaded, statically-linked,-threaded,
dynamically-linked,-not-threaded, dynamically-linked,-threaded.  This
was awful in a great many ways (think of statically-linked, single-
threaded programs that dlopen()ed dynamically-linked threaded objects
via, e.g., the name service switch).  Dropping support for statically
linking with libc greatly simplified a lot of things.  Linux ought to do
the same, but there's a fascination with static linking in Linux land...

...Static linking wouldn't be so bad if it had dependency tracking and
symbol resolution semantics closer to dynamic linking.  The startup
times for one process would be awesome, though startup times for entire
systems would be awful.  (Solaris 10 boot times improved significantly
when static linking was dropped.)  And then there's the need for PIE to
still get any benefit from ASLR, which then applies to the whole
program, not each library.  (Can gdb debug PIE nowadays?  Last I checked
it could not.)

(Moving the implementation of a library to another requires support for
shared object filters, at least on Solaris, so that dynamically-linked
dependents of libpthread and such will find the symbols they need there,
though the RTLD knows to go look in the object they moved to.  We say
that libpthread is a filter of libc because only the pthread-related
symbols of libc appear in libpthread.  IIUC the Linux RTLD does not
support filters.)

(I help maintain an open source project that distributes a statically-
linked executable as that's a great way to avoid needing packaging and
installers.  So perhaps I shouldn't speak so ill of static linking.  But
then, static linking truly is awful.)

 Linking -lpthread has a real performance hit for unthreaded
 applications, so core libraries should avoid it.

It shouldn't.  Having *four* process models is an enormous burden on
developers, especially libc and libpthread developers, but this burden
leaks into other system libraries, such as OpenSSL's.  Solaris saw a
sizeable net performance win in switching to a single process model
(dynamically-linked, threaded) (for many processes there's just one
thread, natch).

 If you have received advice to the contrary, your source of advice is
 wrong. :-)

Or maybe I'm rather spoiled by Solaris and am continually suprised to
see that developers on Linux must struggle with problems that Solaris
tackled a decade ago.

In any case, the initialization problems when OpenSSL is used by
*libraries* are simply unacceptable.  Either that means that OpenSSL
cannot and must not be used by libraries, or we must find a solution
that doesn't suck.  See above and let me know which paths, if any, are
the way forward; I offered several, and there may be more that I have
not seen.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] Self-initialization of locking/threadid callbacks and auto-detection of features

2015-06-15 Thread Nico Williams
On Mon, Jun 15, 2015 at 06:19:49PM +, Salz, Rich wrote:
 My overall goal is that I want to remove the thread callback stuff.

Excellent.

 Ideally we have two options: no threads and system-threads.

Presumably that would be either a configure-time option or a run-time
automatic option, but not an option given to the caller via an API.

 It seems that on Linux shared/static libraries might be an issue.  I
 hope we can resolve and simplify that.

With weak symbols this problem can go away, but when linking statically
it does make the order of dependencies matter in the final link edit.
It'd have to be -lpthread -lcrypto -lssl; -lpthread could not come after
OpenSSL; the strong symbol definition has to come first.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] Self-initialization of locking/threadid callbacks and auto-detection of features

2015-06-15 Thread Nico Williams

Hmm, another option is to use weak symbols to detect presence of
pthreads.  This should work regardless of whether static or dynamic
linking is used.

A statically-linked, single-threaded program that dlopen()s an object
that brings in libpthread will have different OpenSSL dependencies for
the dynamically-loaded objects than for the initial statically-linked
program.  So everything should work out.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-users] Replacing RFC2712 (was Re: Kerberos)

2015-05-11 Thread Nico Williams
On Mon, May 11, 2015 at 04:42:49PM +, Viktor Dukhovni wrote:
 On Mon, May 11, 2015 at 11:25:33AM -0500, Nico Williams wrote:
 
   - If you don't want to depend on server certs, use anon-(EC)DH
 ciphersuites.
  
 Clients and servers must reject[*] TLS connections using such a
 ciphersuite but not using a GSS-authenticated application protocol.
 
 [*] Except when employing unauthenticated encrypted communication
 to mitigate passive monitoring (oportunistic security).

As this would be replacing RFC2712, it's not opportunistic to begin with :)
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


[openssl-dev] Replacing RFC2712 (was Re: Kerberos)

2015-05-11 Thread Nico Williams
On Fri, May 08, 2015 at 10:57:52PM -0500, Nico Williams wrote:
 I should have mentioned NPN and ALPN too.
 [...]

A few more details:

 - If you don't want to depend on server certs, use anon-(EC)DH
   ciphersuites.

   Clients and servers must reject TLS connections using such a
   ciphersuite but not using a GSS-authenticated application protocol.

 - The protocol MUST use GSS channel binding to TLS.

 - Use SASL/GS2 instead of plain GSS and you get to use an authzid
   (optional) and you get a builtin authorization status result message
   at no extra cost, and all while still using GSS.

You get to optimize only the mechanism negotiation, and you get TLS w/
Kerberos (and others) and without PKIX (if you don't want it).

See RFCs 7301, 5801, 5056, and 5929 (but note that the TLS session hash
extension is required).

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] [openssl-users] Kerberos

2015-05-08 Thread Nico Williams
On Fri, May 08, 2015 at 05:17:29PM -0400, Nathaniel McCallum wrote:
 I agree that the current situation is not sustainable. I was only
 hoping to start a conversation about how to improve the situation.

RFC2712 uses Authenticator, which is an ASN.1 type quite clearly NOT
intended for use outside RFC1510 because it isn't a PDU.  RFC2712
unnecessarily constructed its own AP-REQ that's different from the
RFC1510 (now 4120) AP-REQ.

This is bad for a variety of reasons, not the least of which are
complicating Kerberos APIs and/or RFC2712 implementations (which might
have to parse out the Authenticator and Ticket from a plain AP-REQ).

I also notice that the EncryptedPreMasterSecret is under-specified (is
it a Kerberos EncryptedData?  who knows?).

RFC2712 could be replaced with a properly-done protocol that uses
Kerberos in the full TLS handshake (i.e., not in session resumption).
This would be the lowest-effort fix.

A generic GSS-in-TLS extension would require much more energy (see
below).

 For instance, there is this: http://tls-kdh.arpa2.net/

Yes, it'd be nice to add PFS to the Kerberos AP exchange, and we just
might get there, but adding Kerberos and/or GSS to TLS is a very
different undertaking.

 I don't see any reason this couldn't be expanded to do GSSAPI.

Well, that's difficult because GSS has arbitrary round trips...

You're not the first to want this, see for example here:

https://tools.ietf.org/html/draft-santesson-tls-gssapi-01
https://tools.ietf.org/html/draft-williams-tls-app-sasl-opt-04

And more if you consider other efforts like False Start and look past
GSS/SASL.  Probably many more than I know of then...

Two main design axis:

1) When does the GSS context token begin, and how is channel binding
done.

 - no GSS mech negotiation, first GSS context token goes in TLS
   ClientHello;

   (channel binding done via MIC tokens or GSS_Pesudorandom() output
   exchanges)

or (e.g., if the client needs to negotiate mechs)

 - TLS ClientHello carries client mechList, server announces a mech in
   its handshake message, first GSS context token goes in second client
   handshake flight with normal channel binding

(Both options could be specified, with clients choosing as desired.)

x

2) How many GSS context tokens can be exchanged and who is responsible
for continuing past the traditional TLS handshake.

 - one round trip only

or

 - arbitrary round trips continued by TLS or by the application

The first order of business is to decide on whether or not to support
multiple round trips (IMO we must; what's the point if not?).

The second is to decide whether or not additional context token round
trips are to be done by the application, both as to how they appear on
the wire and how they appear in the API.

The third is to decide whether GSS mechanism negotiation is supported,
and whether it can be optimized away when it's not needed.

The fourth is to decide whether SASL (with SASL/GS2 to get GSS) isn't
better, since if we're going to spend a pair of flights in negotiation,
we might as well let server-talks-first SASL mechs get a leg up on GSS.
Remember, SASL can do GSS just fine via SASL/GS2 [RFC5801].

 But maybe this mailing list isn't the right place for such a
 discussion.

Well, TLS WG would be the right forum, but they are busy with TLS 1.3.
Some of us could get together elsewhere, probably not here.

 Perhaps the right question to ask is how much interest there would be
 in improving this situation in the TLS WG and whether or not OpenSSL
 would have interest in implementing such a project.

My impression is: none, because TLS WG is too busy at this time, and in
the past it has been very difficult to get the necessary level of
implementor effort.  Past performance is not always a predictor of
future performance.

It would help if GSS had better, less niche mechanisms.  For example: if
Kerberos had PKCROSS (based on DANE, say), that would help.  Or if ABFAB
went viral.  But for now everyone in the TLS world is happy _enough_
with WebPKI for server (should be service, but hey) authentication and
bearer tokens for user authentication.

Part of the problem is that HTTP authentication schemes (whether in HTTP
proper or not) have no real binding to TLS, and HTTP is basically a
routable (and usually routed) protocol anyways, which complicates
everything.  But HTTPS is the main consumer of TLS.  One might think
that adding user authentication options to TLS would be desirable for
HTTP applications, but again, the routing inherent to HTTP means that
routing must pass along user authentication information, but this isn't
always easy.  And HTTP is stateless and so doesn't deal well with
needing continuation of authentication exchanges, so bearer tokens it
basically kinda has to be, so that better mechanisms lose their appeal.

If the main consumer of GSS-in-TLS were to be something other than HTTP,
well, great, but still, HTTPS is the biggest consumer (next is SMTP)...
And it's easier then to 

Re: [openssl-dev] [openssl-users] Kerberos

2015-05-08 Thread Nico Williams

I should have mentioned NPN and ALPN too.

A TLS application could use ALPN to negotiate the use of a variant of
the real application protocol, with the variant starting with a
channel-bound GSS context token exchange.

The ALPN approach can optimize the GSS mechanism negotiation, at the
price of a cartesian explosion of {app protocols} x {GSS mechs}.  A
variant based on the same idea could avoid the cartesian explosion.  But
hey, TLS is the land of cartesian explosions; when in Rome...

The ALPN approach would be my preference here.  With TLS libraries
implementing the GSS context exchange, naturally.  The result would be
roughly what you seem to have in mind.

If we ask TLS WG, I strongly suspect that we'll be asked to look at ALPN
first.

I should add that I also would like to see the RFC4121 Kerberos GSS
mechanism gain PFS, independently of TLS gaining GSS.

Nico
-- 
___
openssl-dev mailing list
To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev


Re: [openssl-dev] Circumstances cause CBC often to be preferred over GCM modes

2014-12-16 Thread Nico Williams
On Tue, Dec 16, 2014 at 12:26:32PM -0500, Salz, Rich wrote:
  In particular there MUST NOT be any fragile hand-tuning.  All ordering needs
  to be based on general principles.
 
 This is not a universally-held view.

I think the key word is fragile, not hand-tuning.

Subtracting (in local configuration) algorithms from a keyword denoting
all known-strong algorithms is hand-tuning, but not fragile hand-tuning.

Nico
-- 
___
openssl-dev mailing list
openssl-dev@openssl.org
https://mta.opensslfoundation.net/mailman/listinfo/openssl-dev


Re: [openssl-dev] Circumstances cause CBC often to be preferred over GCM modes

2014-12-16 Thread Nico Williams
On Tue, Dec 16, 2014 at 05:17:01PM +, Viktor Dukhovni wrote:
 In particular there MUST NOT be any fragile hand-tuning.  All
 ordering needs to be based on general principles.  

+1.

 One might for example say that any CBC cipher at 128+ bits gets a
 baseline sorting strength of 128 bits.  One might then apply either
 @STRENGTH or @SPEED (new), the first of which adds 1 to any
 CBC cipher whose key is longer than 128-bits, the second to those
 that are equal to 128 bits.  
 
 With AES AEAD the baseline could be 129, with similar STRENGTH
 vs.  SPEED boosts.  Which would ensure that AEAD@128 beats CBC@256.
 
 However, where do we fit ChaCha20/Poly-1305?  Again, not hand-placement,
 but some extensible algorithm.

Any algorithm numeric strength assignments should be baked into the
library, or perhaps configurable in a configuration file.  The should
not be known to applications.  Ditto for any computaiton of overall
strength of cipher-mode combinations.

Internally using such numeric strength assignments is fine.

In particular I want you to avoid the problem that Cyrus SASL had (and
still has) with the security strength factor (SSF), where entire
security mechanisms get boiled down to a single numeric strength factor
even though a mechanism might negotiate cryptographic algorithms, and
where applications end up hardcoding SSF values for things like Kerberos
V5 (which has SSF of 56, because initially Kerberos V5 only supported
1DES).  This is a horrible problem to have.

Nico
-- 
___
openssl-dev mailing list
openssl-dev@openssl.org
https://mta.opensslfoundation.net/mailman/listinfo/openssl-dev


Re: [openssl-dev] Circumstances cause CBC often to be preferred over GCM modes

2014-12-16 Thread Nico Williams
On Tue, Dec 16, 2014 at 01:04:17PM -0500, Salz, Rich wrote:
  Subtracting (in local configuration) algorithms from a keyword denoting all
  known-strong algorithms is hand-tuning, but not fragile hand-tuning.
 
 Three years ago RC4 was known-strong.  Two years ago DES-CBC was
 known-strong.  Now we only have AES-GCM. At what point do we think
 ChaCha/Poly is known-strong, and who gets to make that call?  Dan?
 Adam? 

Changing the internal relative strength weighings of these requires
pushing out new code.  Something that... happens all the time.

I'm not against local configuration of these things as, say, a temporary
override while waiting for patches.  The configuration needs to be
simple and not fragile.

Subtracting from named sets of algorithms and sorting by desired
attributes (speed, strength), is a non-fragile way to specify
administrative preferences.

Assiging numeric algorithm strength in a config file is fragile but
acceptable for emergencies.

 Who said these are known-strong and when did they say it, and are
 they still correct? And where and how does a system admin find those
 things out.

This is why I'm advising against exposing any sort of numeric algorithm
strength assessments to _applications_: once those are baked in in the
application they can't be changed.

I realize that there was no proposal to do so.  However, any time
numeric algorithm strength assessments are discussed is also a good time
to warn others to avoid the SASL SSF mistake.

Nico
-- 
___
openssl-dev mailing list
openssl-dev@openssl.org
https://mta.opensslfoundation.net/mailman/listinfo/openssl-dev


Re: [openssl-dev] Circumstances cause CBC often to be preferred over GCM modes

2014-12-16 Thread Nico Williams
On Tue, Dec 16, 2014 at 06:26:50PM +, Viktor Dukhovni wrote:
 Internally OpenSSL has a multi-dimensional property matrix, and
 preferences between numerically equal ciphers are based on other
 properties.  The (stable) numeric sorting just re-arranges blocks
 of ciphers already sorted by other means.  Thus preference for PFS
 already puts kEECDH and kDHE ahead of kRSA for otherwise equally
 strong ciphers.
 
 When the user choose a group of ciphers to add to a cipherlist,
 the members of the group retain their relative order.  However,
 there are interesting games that can be played with:
 
   aRSA:-kRSA:ALL:@STRENGTH
 
   (prefers kRSA over PFS).
 
 which perturb the order, because the most recenly removed elements
 end up at the top of the list when ALL is added.  So the relative
 preferences of various properties can be changed.

Iterating subtaction and addition seems like a fragile way to indicate
preference.

 The SASL problem mostly does not bite OpenSSL,  However, @STRENGTH
 (which is often needed) is not sufficiently tunable, it is not eas
 to prefer AES-128 with AEAD over 256 with CBC.  For that we need
 some new mappings that produce slight different effective stengths.

My preference would be: subtract undesired algorithms from a named set,
then specify order of preference via some method other than iteratively
adding and subtracting algorithms.  Something like:

DEFAULT:-FOO128:::PFS,AEAD,speed,strength

(Whatever is in DEFAULT minus FOO128, sort by PFS, AEAD, speed,
strength.)

Preferences should affect the order in which cipher suites are
advertised/picked, not which ones are advertised.

Algorithms that are not desired should not be advertised/used.

Nico
-- 
___
openssl-dev mailing list
openssl-dev@openssl.org
https://mta.opensslfoundation.net/mailman/listinfo/openssl-dev


Re: [openssl-dev] Circumstances cause CBC often to be preferred over GCM modes

2014-12-16 Thread Nico Williams
On Tue, Dec 16, 2014 at 09:50:32PM +0100, Hubert Kario wrote:
  My preference would be: subtract undesired algorithms from a named set,
  then specify order of preference via some method other than iteratively
  adding and subtracting algorithms.  Something like:
  
  DEFAULT:-FOO128:::PFS,AEAD,speed,strength

I would add that this would work like one would expect, and that
speed,strength would give [potentially] different results than
strength,speed, as one would expect.

 Let's say, for the sake of argument, that CBC mode is significantly broken, 
 even in EtM mode (that's another can of worms[1]), then many people will want 
 to prioritise *non-PFS* versions of AEAD ciphers above any other ciphers. And 
 they will want to do it for the same reason people currently leave RC4 in.

Right.  Any crypto is better than no crypto (or, rather, the identity
ciphersuite), but the weak crypto has to go last.  Of course, for
one-offs like this hypothetical one might want a way to indicate that
some algorithms are least preferred and others most, not just sets of
algorithms.

  1 - by another can of worms I mean: what if it's broken only in MtE
  mode? how to specify different ciphers depending on presence of this
  extension, so that in MtE only AEAD ciphers are available while if
  EtM is on, the list gains CBC ciphers? This is similar to the problem
  with BEAST: ordering with RC4 at the front for TLS1.0 is sort-of OK,
  not so much for TLSv1.1 and later...

Specifying ciphersuites and preference on a per-protocol version basis
would help.  Specifying in more context-dependent ways would be nice
but now you'd need a way to name/identify the context.

Nico
-- 
___
openssl-dev mailing list
openssl-dev@openssl.org
https://mta.opensslfoundation.net/mailman/listinfo/openssl-dev


Re: Segfault seen with OpenLDAP: locking callback issue

2013-12-05 Thread Nico Williams

Please see the thread_safety branch of my github clone of OpenSSL
(https://github.com/nicowilliams/openssl).  I've not had time to get
around to finishing it, but it might well work for you.

Nico
-- 
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-11-08 Thread Nico Williams
On Fri, Nov 8, 2013 at 4:08 AM, Bodo Moeller via RT r...@openssl.org wrote:
 Alternatives would be (a) using a new lock for safe static initialization,

Maybe you could try my patches on my thread_safety branch of my github
clone of OpenSSL?  (https://github.com/nicowilliams/openssl)

Nico
--
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms

2013-11-08 Thread Nico Williams
On Fri, Nov 8, 2013 at 2:43 PM, Andy Polyakov via RT r...@openssl.org wrote:
 Alternatives would be (a) using a new lock for safe static initialization,
 or (b) more code duplication to avoid the need for an explicit pointer
 (there could be two separate implementations for the higher-level
 routines).  However, given the 1% performance penalty, that's a minor issue
 at this point.

 While if (functiona==NULL || functionb==NULL) { asssign functiona,
 functionb } can be unsafe, I'd argue that if (functiona==NULL) { assign
 functiona } followed by if (functionb) { assign functionb } is.

That's not thread-safe.  There's no memory barrier so the writes can
be reordered.  The compiler itself could reorder those instructions.

Nico
--
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: Self-initialization of locking/threadid callbacks and auto-detection of features

2013-10-31 Thread Nico Williams
On Oct 31, 2013 8:19 AM, Kevin Fowler kevpfow...@gmail.com wrote:

 Not to stand in the way of progress at all, but just to note we
cross-compile OpenSSL libraries for an embedded linux target that is still
using libc_r, not libpthread. That will not change anytime soon for us, at
least on legacy systems.

 Besides that, it seems much of this discussion is about native builds and
config-time detection of threading support. What about cross-compiling?

I will be introducing new targets.  For cross-compilation you must tell
./Configure the target name and there's no auto-detection.  If you want
auto-detection you run ./config on the target host.  (At least that's my
understanding of how OpenSSL build configuration works.)


Re: Self-initialization of locking/threadid callbacks and auto-detection of features

2013-10-30 Thread Nico Williams
On Wed, Oct 30, 2013 at 11:15:27AM +0100, Corinna Vinschen wrote:
 Please, before any change is made in terms of threading, let me point
 out that Cygwin is NOT Windows, even if it runs on Windows.  Cygwin
 provides its own pthreads implementation which is always available,
 even without explicit linking against -lpthread.
 
 So, for Cygwin, please use pthreads, not Windows threading, otherwise
 applications linked against libcrypto might be broken in subtil ways.

Of course.  But surely OpenSSL needs to be built separately (and
differently) for native Windows and Cygwin.  I.e., an OpenSSL DLL built
against the Windows CRT can't be used from a Cygwin app.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: Self-initialization of locking/threadid callbacks and auto-detection of features

2013-10-30 Thread Nico Williams
On Wed, Oct 30, 2013 at 06:12:08PM +0100, Corinna Vinschen wrote:
 Right.  I'm just trying to raise awareness before Cygwin gets sorted
 into the runs on Windows so use Windows functions drawer, accidentally,
 as happened too often in the past in various projects.

I won't.

I'll basically be adding new targets for every existing target that has
a standard (POSIX or Win32) threading library.  I probably won't change
what target ./config picks -- I think core OpenSSL developers must make
that decision.

Nico
-- 
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: Self-initialization of locking/threadid callbacks and auto-detection of features

2013-10-30 Thread Nico Williams
On Tue, Oct 29, 2013 at 09:58:25PM +0100, Andy Polyakov wrote:


Another thing: run-time selection from among multiple pthreads
implementations can only safely be managed by the RTLD, and dependencies
on pthreads must still be declared via the linker-editor.

Libraries should be able to refer to pthread_* symbols and still use the
run-time-selected pthread implementation *even* if the library was
linked with -lpthread (and therefore a specific libpthread.so
compilation symlink).

If an OS delivers two (or more) pthreads implementations then:

 - if the ABIs of those pthreads implementations all match then the RTLD
   should be able to substitute any of them regardless of which
   libpthread any one object in the process was linked with;

   (e.g., via LD_PRELOAD, different library search paths, ..., but all
   the libpthreads would have to have the same SONAMEs and symbol
   version numbers)

 - but if their ABIs are not compatible then no run-time substitution is
   safe, no matter what the mechanism.

   (unless compatibility shims are involved, but then we *really* must
   leave it all to the RTLD and we can't be using dlsym() or weak
   symbols)

I don't think libraries should be failing to declare their dependencies
and then use dlsym(RTLD_DEFAULT) and dlopen()+dlsym() as a way to
run-time link-edit-load their undeclared dependencies.  Doing so is
fundamentally racy and so would have to be done in .init sections, and
results (which pthread gets used) would be unpredictable (since that
would depend on object load and .init firing order).  Plus we'd lose
observability via ldd(1).  I can't see any way to justify this without
having examples of OSes where this is the documented way to handle
dynamic linking against libpthread.  But maybe I'm missing something?

Nico
-- 
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: Self-initialization of locking/threadid callbacks and auto-detection of features

2013-10-30 Thread Nico Williams

It's getting closer, see:

https://github.com/nicowilliams/openssl/compare/thread_safety

I spoke to a Linux expert and was told that the correct thing to do in
the shared build of OpenSSL is to always link with -lpthread.  I assert
that this is also the correct thing to do on Solaris.  It's been so for
a long time now.  I.e., OpenSSL should default to using native threads
on all POSIX-like and Win32/64 systems (others will have to contribute
similar support for other environments).

My patches nonetheless will allow apps to provide alternate threading
callbacks, just as today, except that the callbacks now may not be
changed once set, and they are set to native implementations where
possible if they are needed before they are set.

Nico
-- 
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: Self-initialization of locking/threadid callbacks and auto-detection of features

2013-10-29 Thread Nico Williams
On Tue, Oct 29, 2013 at 09:58:25PM +0100, Andy Polyakov wrote:
 I feel like saying few words. One should recognize that by the time
 multi-threading support was taking shape there was a whole variety
 of threading implementations and callbacks were the only way to
 convey the specifics. Nowadays we're pretty much talking only about

Certainly.  I'm not saying omg, that OpenSSL is insanez!!.

I should repeat my problem case: we have libraries that depend on
OpenSSL libraries, and it should be possible to for our libraries to
work in threaded programs.  But right now a Java program that uses JGSS
with the JNI shim to the C GSS-API libraries using Heimdal's libgss
crashes in RAND_bytes().  Heimdal can't provide locking callbacks (how
could it? it might step on some other caller's toes in the same
process).  The app can't either (since in this case it's not using
OpenSSL directly at all).

 pthreads and Windows, and one can indeed argue why wouldn't OpenSSL
 simply default to either of the two when appropriate. While it's
 more than appropriate on Windows as it is, on pthreads-powered
 Unices it's not as obvious. Because pthreads can be undesired in
 some situations. Basically it boils down to question whether or not
 libcrypto may be linked with libpthread or not. And answer is
 actually not desired. Ideally libcrypto should *detect* if

More details would be nice.  What follows is speculation of mine as to
what you might have in mind (or even if you don't, it might be
relevant).

To summarize the below:

OpenSSL .so's should always be linked with -lpthread on OSes where
there's a standard libpthread that threaded apps are expected to
use; OpenSSL static link archives should assume no threading
libraries and rely on the callers to provide the threading
callbacks.

Distros that focus on static linking should ship OpenSSL static link
archives only and shuld ensure that dependents of OpenSSL initialize
it correctly when used in threaded programs.

Linux still has the multiple process models problem that Solaris 9 used
to have:

 - statically linked with libc but not -lpthread
 - statically linked with libc and -lpthread
 - dynamically linked with libc but not -lpthread
 - dynamically linked with libc and -lpthread

 - mixed: not originally linked with -lpthread but a dlopen()ed object brought
   it in

I dunno about Linux, but IIUC loading libpthread at run-time in the
third case works fine.  It also works fine in S8 and S9.  It's loading
libpthread at run-time in the first case that leads to fireworks.

The mixed case is a very difficult one to handle correctly in any
library.  Merely using dlsym() on RTLD_DEFAULT is not good enough:
pthreads symbols might appear later on.

Typically the mixed case results when an app uses the name service
switch without nscd, or PAM, and one of those pluggable things loads
libcrypto.  If the app also uses libcrypto then... who knows.

(It's clearly insane to support a static libc and have an RTLD.  And yet
Solaris used to support that...  But then, it stopped supporting that
for a reason: it was insanely expensive in terms of engineering costs.)

I'm not sure that OpenSSL should support such madness in any way other
than at ./Configure time: the builder person picks.

I.e., this is the distro builder's problem.  Some distros are very
focused on static linking.  Others are not.

Static link archives don't record dependencies, so we can't speak of
building libcrypto.a with -lpthread.  So in the static link build option
OpenSSL needs to know whether to assume pthreads or not, and only the
builder can tell OpenSSL.

In the shared object build of OpenSSL it's just easier to link with
-lpthread if the target has pthreads.

E.g., a distro that is mostly-dynamically-linked should just ship
OpenSSL libs linked with -lpthread, while a distro that is
mostly-statically-linked should ship OpenSSL .a's built to not use
pthreads.  A distro with both should *probably* build OpenSSL .a's w/o
pthread and .so's with.  A distro might want to require that pthreads is
*always* available, even with linking statically.

Or perhaps you were thinking of apps that use GNU Pth, but those should
either build their own private OpenSSL libs or they should use static
link OpenSSL archives.  I don't think OpenSSL should bend over backwards
to support third-party alternate threading libraries where the OS has a
single supported standard.  If an OS has multiple supported alternatives
then OpenSSL should pick one at run-time, but this is very difficult
(see below).

 *hosting* application is linked with libpthread and only *then*
 adjust accordingly. Is there way to detect if application is linked
 with libpthread at run-time? There is DSO_global_lookup.

See above.  Using weak symbols might work, but it might be very
OS-dependent; I'm not prepared to research that for every target, or
even more than one or two targets.  Using dlsym() seems like asking for
trouble.

 As for 

Re: Self-initialization of locking/threadid callbacks and auto-detection of features

2013-10-27 Thread Nico Williams
I'm making progress slowly (not my main project).  I've run into a bit
of a problem: the dynlock callback setting cannot be made thread-safe
due to the setter API's using three functions to set three related
callbacks.

Also, I'm not sure that the dynlocks need a default implementation.  It
seems they don't, at least not as far as *apps* are concerned.  Please
let me know, though I'm inclined to provide a default implementation
anyways, just because it's no big deal.

A few notes and questions:

 - Locks in OpenSSL are really reader-writer locks, but the sample code
   in crypto/threads/*.c uses mutexes only.

   How important is it that reader/writer locks be used instead of
   exclusive locks?

 - The add_lock stuff should just use OPENSSL_atomic_add() wherever it
   exists.

   How would I determine at build-time (in ./Configure and in
   ./crypto/lock.c) whether OPENSSL_atomic_add is available?

 - I'll be adding a single setter for dynlock callbacks, and deprecating
   the old ones.

   Any objections?

 - As I get closer to having code that can be tested...

   I can provide tests of the one-time initialization of things, but
   it'd be nice to test threaded functionality in general -- is there
   such a general test?  If so, please point me to it.

Nico
-- 
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: Self-initialization of locking/threadid callbacks and auto-detection of features

2013-10-23 Thread Nico Williams
On Wed, Oct 23, 2013 at 08:32:35AM +1000, Peter Waltenberg wrote:
 There is no 'safe' way to do this other than hardwired. Admitted, we have a
 fairly ugly stack on which to find that out, multiple independently
 developed lumps of code jammed into the same process, quite a few using
 dlopen()/dlclose() on other libraries - multiples of them calling the
 crypto. code.

Oh, good point.

I think what I'll do is add targets that denote always use the OS
thread library; disallow setting these callbacks, and a corresponding
command-line option to ./config.  This should be the best option in
general because of the possibility of the text for callbacks being
unmapped when the provider gets dlclose()ed.

Then maybe there's no need to bother with the pthread_once()/
InitOnceExecuteOnce() business.  I had assumed, going in, that I needed
to preserve existing semantics as much as possible, but because that
might still be the case (even if you're right as to what the ideal
should be) I will do *both*.  (Who knows, maybe there's a program out
there that insists on using the gnu pth library and not the OS' native
threading library.  Or maybe there's no need to support such oddities.)

 All those lumps of code think they 'own' the crypto. stack - worst case
 scenario was a dlopen()'d library setting the callbacks, then being
 unloaded while other parts of the stack were still using crypto.
 Surprisingly - that still worked on some OS's - but some (like AIX/HPUX)
 ummap program text immediately on dlclose().

Right.  That's a question regarding patch contribution: is it OK to
leak things that can only really be torn down at dlclose() / unload
time and which effectively never happens?  git grep '\.fini\' finds
nothing in the tree (while git grep '\.init\' does), so I'm guessing
yes, it's OK.

 Personally, I'd suggest making it  build option to turn off default locking
 and use whatever the OS provides by default. That'll allow the few corner
 cases to continue doing whatever wierd things they were doing before, but
 remove the big risk factor for the vast majority of users. And it is
 becoming a big risk factor now.

 It certainly shouldn't be an issue for the OS installed OpenSSL which
 probably covers most of your users, the only sane choice there is the OS
 default locking scheme anyway.

Agreed.  Thanks for your response,

Nico
-- 
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: Self-initialization of locking/threadid callbacks and auto-detection of features

2013-10-22 Thread Nico Williams
 But I could have a target that has a weak dependency on pthreads and is
safe when the library is present.  Ditto Windows (be unsafe pre-vista/2008,
safe in vista/2008 and later, using same OpenSSL DLLs builds).  I'd rather
add this variation later, after the meat of this work is done, assuming
such a variation is desired.

Of course, that assumes a degree of ABI compatibility in the OS that may
not in fact be available -- a great reason not to go for that...


Re: Self-initialization of locking/threadid callbacks and auto-detection of features

2013-10-22 Thread Nico Williams
On Oct 22, 2013 10:28 AM, Ben Laurie b...@links.org wrote:
 On 22 October 2013 06:47, Nico Williams n...@cryptonector.com wrote:
 What I need to know:

  - should i add new targets to ./Configure?  for now I modified the
linux-elf target, but this feels wrong to me.

  - what about Windows?  I either need to have different targets for
pre-vista/2008 or. i have to write a once initialization function for older
Windows (which I can and know how to do, it's just more work that, and in
particular i couldn't test it, so I'm not inclined to do it).

  - if so, should ./config automatically pick the new targets where there
is appropriate threading support?


 I've been musing about a more autoconf-like approach for some time now
(but, for the love of all that is fluffy, not using autoconf itself, which
sucks) - it seems this is a good reason to go down that path.

Well, I'm not signing up for that, not yet anyways!  :)  Short-term advice
will do.  I think I'll just add new targets and ./config logic for picking
then.

The fact that targets are stable-ish is useful, as it allows building
whatever targets one can build (or cross-build) on the host.  autoconf
can't do this, and that's one more reason not to autoconf.

 Interesting question is: what to do if no appropriate locking mechanism
is discovered?

I think for a linux-elf-pthread target the dependency on pthreads should be
hard.  The old linux-elf target should remain thread-unsafe (I can't make
OpenSSL fully thread-safe without a thread library).

But I could have a target that has a weak dependency on pthreads and is
safe when the library is present.  Ditto Windows (be unsafe pre-vista/2008,
safe in vista/2008 and later, using same OpenSSL DLLs builds).  I'd rather
add this variation later, after the meat of this work is done, assuming
such a variation is desired.

Nico
--


Re: Self-initialization of locking/threadid callbacks and auto-detection of features

2013-10-21 Thread Nico Williams
On Monday, October 21, 2013, Salz, Rich wrote:

 I like your proposal, but I'd prefer to see an already initialized error
 code returned. Or a flag to the (new?) init api that says ignore if
 already set


Thanks for your reply!

I can add an error, but note that the caller can set then get the callbacks
and compare to check whether the caller's callbacks were taken.  I could
also add a new set of callback setters with ignore-if-set flags.  As long
as the existing ones behave reliably in the already-set case.

In the already-set case I think it may well be best to ignore without
failing on the theory that the caller that first set the callbacks must
have set sufficiently useful ones anyways... and that where the OS has a
good enough default threading library, that's the one that will be used by
all DSOs calling OpenSSL in the same process, as otherwise all hell would
already be breaking loose anyways!  (I can imagine twisted cases where this
would not be true, but they seem exceedingly unlikely.)

If you want to see the half-baked bits I have (which build on Linux, but
which aren't tested) to see what I'm up to, see
https://github.com/nicowilliams/openssl, specifically the thread_safety
branch.  See the XXX comments in rand_lib.c in particular.  The outline:
add a thread-safe one-time initialization function, built on whatever the
OS provides, then use that to make callback init thread-safe.

What I need to know:

 - should i add new targets to ./Configure?  for now I modified the
linux-elf target, but this feels wrong to me.

 - what about Windows?  I either need to have different targets for
pre-vista/2008 or. i have to write a once initialization function for older
Windows (which I can and know how to do, it's just more work that, and in
particular i couldn't test it, so I'm not inclined to do it).

 - if so, should ./config automatically pick the new targets where there is
appropriate threading support?

 - how to allocate error codes for already initialized errors that you
suggest?

 - should I work to make sure that it's possible to change the default RAND
method after it's been set once?

   The code in rand_lib.c is currently fundamentally thread-unsafe, though
it could be accidentally thread-safe if, e.g., ENGINE_finish() doesn't
actually tear down state at all.  The simplest fix involves setting the
default only once, as wih the callbacks, but here I feel that's a shaky
idea, that I should allow RAND method changes at any time, in a thread-safe
manner -- more work for me, but less surprising.

Nico
-- 

(sent from a mobile device with lousy typing options, and no plain text
button)
(my patches need rebasing to squash and split up, need tests, need
finishing, but if you have comments I would love them sooner than later! :)