Re: [openssl-dev] [openssl-users] Changing malloc/debug stuff
On Thu, Dec 17, 2015 at 08:16:50PM +, Salz, Rich wrote: > > > https://github.com/openssl/openssl/pull/450 > > > > This seems much more sane. > > I'll settle for less insane :) That is, I think, the best you can do. Some allocations might have taken place by the time a wrapper or alternative allocator is installed, in which case something bad will happen. In the case of alternative allocators the something bad is "it blows up", while in the case of a wrapper the something bad is "some state/whatever will be off". A fully sane approach would be to have every allocated object internally point to its destructor, and then always destroy by calling that destructor instead of a global one. (Or call a global one that knows how to find the object's private destructor pointer, and then calls that.) If you wish, something more OO-ish. But for many allocations that's not possible because they aren't "objects" in the sense that matters. You could always wrap allocations so that they always have room at the front for the corresponding destructor, then return the offset of the end of that pointer, but this will be very heavy-duty for many allocations. So, all in all, I like and prefer your approach. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-users] Changing malloc/debug stuff
On Thu, Dec 17, 2015 at 09:28:28AM +, Salz, Rich wrote: > I want to change the memory alloc/debug things. > > Right now there are several undocumented functions to allow you to > swap-out the malloc/realloc/free routines, wrappers that call those > routines, debug versions of those wrappers, and functions to set the > set-options versions of those functions. Yes, really :) Is anyone > using that stuff? This is another one of those things that isn't easy to deal with sanely the way OpenSSL is actually used (i.e., by other libraries as well as by apps). > I want to change the model so that there are three wrappers around > malloc/realloc/free, and that the only thing you can do is change that > wrapper. This is vastly simpler and easier to understand. I also > documented it. A version can be found at > https://github.com/openssl/openssl/pull/450 This seems much more sane. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
Another interesting portable atomics library is https://github.com/mintomic/mintomic FYI, I took a stab at a simple portable atomics that uses GCC/clang __atomic, or __sync, or Win32 Interlocked*, or a single global lock, and with a fallback to unsafe, non-atomic implementations for no-threads configurations; adding C11 support will be trivial. For just incremdent/decrement and CAS this is really small, and I think that's enough for OpenSSL for starters. ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
On Tue, Dec 15, 2015 at 01:24:12PM +0100, Florian Weimer wrote: > * Nico Williams: > > > On Tue, Dec 08, 2015 at 11:19:32AM +0100, Florian Weimer wrote: > >> > Maybe http://trac.mpich.org/projects/openpa/ would fit the bill? > >> > >> It seems to have trouble to keep up with new architectures. > > > > New architectures are not really a problem because between a) decent > > compilers with C11 and/or non-C11 atomic intrinsics, > > Not on Windows. Windows has a family of functions for atomic addition, compare-and-swap, etcetera: https://msdn.microsoft.com/en-us/library/windows/desktop/ms686360%28v=vs.85%29.aspx#interlocked_functions Solaris/Illumos has its own as well. Linux has several atomics libraries. And there are several open-source portable atomics libraries as well. I.e., between compiler non-C11 atomic intrinsics, C11 intrinsics, OS atomic function libraries, and portable open-source atomics libraries, we can cover almost all the bases. > > What's the alternative anyways? > > Using C++11. Sure, but only for a C atomics library for the rest of OpenSSL. So that makes five alternatives, plus the two stub implementations (one with global locks, one with no locking/atomics). Any platform not covered will get one of the stub implementations and its users will have to contribute a better implementation. We have a surfeit of options, not a dearth of them. I don't think lack of atomics primitives is remotely a concern. We should use atomic primitives in OpenSSL. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
On Tue, Dec 15, 2015 at 09:57:32AM -0600, Benjamin Kaduk wrote: > On 12/15/2015 06:43 AM, Kurt Roeckx wrote: > > On Tue, Dec 15, 2015 at 01:24:12PM +0100, Florian Weimer wrote: > >> Using C++11. > > I think this is a relevant article: > > http://herbsutter.com/2012/05/03/reader-qa-what-about-vc-and-c99/ > > > > I think an article from 2012 is no longer current; something like > http://blogs.msdn.com/b/vcblog/archive/2015/06/19/c-11-14-17-features-in-vs-2015-rtm.aspx > might be a better source. Yes, but not everyone will have the latest and greatest compilers. Still, the Windows interlocked function family is enough. See my other post just now. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
On Tue, Dec 15, 2015 at 06:15:32PM +, Salz, Rich wrote: > > I.e., between compiler non-C11 atomic intrinsics, C11 intrinsics, OS atomic > > function libraries, and portable open-source atomics libraries, we can cover > > almost all the bases. > > Agreed. Thanks. This is helpful. I now think we'll have an easy road to consensus on how to get thread-safety in OpenSSL. I will try to contribute infrastructure, but there's a lot of heavy lifting to do that will take a long time, and I can't do it all. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
On Tue, Dec 15, 2015 at 07:54:35PM +0100, Kurt Roeckx wrote: > Also, if you want to use atomics we really want the C11 / C++11 > memory model which prevents certain important optimazations. Right, because compilers can reorder some operations. But we've been living with this pre-C11 for decades. We can't yet require C11 (though I'd sure like to). BTW, there's this: http://blogs.msdn.com/b/vcblog/archive/2015/05/01/bringing-clang-to-windows.aspx which, IIUC means we can get C11 on Windows with MSVC with a clang frontend. So maybe Windows is a non-issue, and maybe C11 is a non-issue. Though I'm sure there's platforms where OpenSSL can't expect C11 yet, so I suspect we're stuck with C90+ for a while. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
On Wed, Dec 09, 2015 at 02:33:46AM -0600, Nico Williams wrote: > No more installing callbacks to get locking and atomics. I should explain why. First, lock callbacks are a serious detriment to usability. Second, they are an admission that OpenSSL is incomplete. Third, if we have lock callbacks to install, then we have the risk of racing (by multiple libraries using OpenSSL) to install them. Unless there's a single function to install *all* such callbacks, then there's no way to install callbacks atomically. But every once in a while we'll need to add an Nth callback, thus breaking the ABI or atomicity. So, no, no lock callbacks. OpenSSL should work thread-safely out of the box like other libraries. That means that the default configuration should be to use pthreads on *nix, for example. We'll need an atomics library (e.g., OpenPA, or something new) with safe and sane -if not very performant- defaults that use global locks for platform/compiler combinations where there's no built-in atomics intrinsics or system library. It should be possible to have a no-threads configuration where the locks and atomics are non-concurrent-safe implementations. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
On Thu, Dec 10, 2015 at 07:06:15AM +1000, Paul Dale wrote: > Thanks for the clarification. I was making an assumption that > following the existing locking model, which did seem over complicated, > was desirable. Now that that is shot down, things can be much > simpler. Exactly :) Sorry if I was a bit brusque. Since inertia is strong, I figured I needed to make a forceful argument. However, it seems it was easy to get consensus after all. > It would make more sense to have a structure containing the reference > counter and (optionally?) a lock to use for that counter. It'd work but it'd be a complication, since now every integer to be used with atomic increment/decrement, or CAS, or whatever, now needs to be a struct type with the integer as one field and the rest as opaque. It'd be much nicer to be able to use ints normally, though I would agree that having a special type for atomics has the benefit that it is self-describing. It's perfectly fine to have a worst-case atomics implementation that uses a global lock. Yes, that would be slow, but we need some incentive to add true atomics for each platform, and making it slow is the exact right incentive. So even if for documentation and type-safety reasons we wanted to wrap ints in structs for ints meant to be used with atomics, I'd still want the worst-case atomics implementation to be slow. > With atomics, the lock isn't there or at least isn't used. Without > them, it is. This is because, I somewhat suspect having a fall back > global lock for all atomic operations would be worse than the current > situation were at least a few different locks are used. That's a feature :) Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
On Tue, Dec 08, 2015 at 11:19:32AM +0100, Florian Weimer wrote: > > Maybe http://trac.mpich.org/projects/openpa/ would fit the bill? > > It seems to have trouble to keep up with new architectures. New architectures are not really a problem because between a) decent compilers with C11 and/or non-C11 atomic intrinsics, b) asm-coded atomics, and c) mutex-based dumb atomics, we can get full coverage. Anyone who's still not satisfied can then contribute missing asm-coded atomics to OpenPA. I suspect that OpenSSL using OpenPA is likely to lead to contributions to OpenPA that will make it better anyways. What's the alternative anyways? We're talking about API and performance enhancements to OpenSSL to go faster on platforms for which there are atomics, and maybe slower otherwise -- or maybe not; maybe we can implement context up-/down-ref functions that use fine-grained (or even global) locking as a fallback that yields performance comparable to today's. If OpenPA's (or some other such library's) license works for OpenSSL, someone might start using it. That someone might be me. So that seems like a good question to ask: is OpenPA's license compatible with OpenSSL's? For inclusion into OpenSSL's tree, or for use by OpenSSL? Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
On Wed, Dec 09, 2015 at 09:27:16AM +1000, Paul Dale wrote: > It will be possible to support atomics in such a way that there is no > performance penalty for machines without them or for single threaded > operation. My sketcy design is along the lines of adding a new API > CRYPTO_add_atomic that takes the same arguments as CRYPTO_add (i.e. > reference to counter, value to add and lock to use): > > CRYPTO_add_atomic(int *addr, int amount, int lock) > if have-atomics then > atomic_add(addr, amount) > else if (lock == have-lock-already) > *addr += amount > else > CRYPTO_add(addr, amount, lock) "have-atomics" must be known at compile time. "lock" should not be needed because we should always have atomics, even when we don't have true atomics: just use a global lock in a stub implementation of atomic_add() and such. KISS. Besides, this will add pressure to add true atomics wherever they are truly needed. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
On Mon, Dec 07, 2015 at 02:41:35PM +0100, Florian Weimer wrote: > On 11/25/2015 06:48 PM, Kurt Roeckx wrote: > > Please note that we use C, not C++. But C11 has the same atomics > > extentions as C++11. > > C++11 support is much more widespread than C11 support. You will have > trouble finding reliable support for C11 atomics with the Microsoft > toolchain. > > [...] > > It is a lot of working getting the atomics right on all supported platforms. The MSFT toolchain has its own intrisics, as do GCC/clang. A variety of OSes have their own atomics libraries (e.g., Solaris/Illumos, FreeBSD, and others). Linux has several as well, but I am not sure that the licensing on those will be compatible to link against (much less to incorporate as source in OpenSSL). Some of the BSD or CDDL licensed libraries might be possible to incorporate as source into OpenSSL. It's a solvable problem, but yes, a lot of work :( Still, it seems worth doing. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
Maybe http://trac.mpich.org/projects/openpa/ would fit the bill? ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
On Tue, Dec 01, 2015 at 09:21:34AM +1000, Paul Dale wrote: > However, the obstacle preventing 100% CPU utilisation for both stacks > is lock contention. The NSS folks apparently spent a lot of effort > addressing this and they have a far more scalable locking model than > OpenSSL: one lock per context for all the different kinds of context > versus a small number of global locks. I prefer APIs which state that they are "thread-safe provided the application accesses each XYZ context from only one thread at a time". Leave it to the application to do locking, as much as possible. Many threaded applications won't need locking here because they may naturally have only one thread using a given context. Also, for something like a TLS context, ideally it should be naturally possible to have two threads active, as long as one thread only reads and the other thread only writes. There can be some dragons here with respect to fatal events and deletion of a context, but the simplest thing to do is to use atomics for manipulating state like "had a fatal alert", and use reference counts to defer deletion (then if the application developer wants it this way, each of the reader and writer threads can have a reference and the last one to stop using the context deletes it). > There is definitely scope for improvement here. My atomic operation > suggestion is one approach which was quick and easy to validate, > better might be more locks since it doesn't introduce a new paradigm > and is more widely supported (C11 notwithstanding). A platform compatibility atomics library would be simple enough (plenty exist, I believe). For platforms where no suitable implementation exists you can use a single global lock, and if there's not even that, then you can use non-atomic implementations and pretend it's all OK or fail to build (users of such platforms will quickly provide real implementations). (Most compilers have pre-C11 atomics intrinsics and many OSes have atomics libraries.) Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
On Mon, Nov 23, 2015 at 01:53:47AM +, Viktor Dukhovni wrote: [NetBSD header commentary extracts:] > /* > * Use macros to rename many pthread functions to the corresponding > * libc symbols which are either trivial/no-op stubs or the real No renaming is necessary if one's link-editor and RTLD support filters... An ELF filter is a forwarding saying that the implementation of a symbol in the filter object is to be found elsewhere, e.g., in some other object. In Solaris/Illumos this is used to maintain backwards compatibility when symbols get moved from one library to another. E.g., libpthread and libdl moved into libc, but they remain as filters so that objects linked with those old libraries will a) still find them, b) still find the symbols the expect in them, c) get the correct implementations of those symbols from the object now providing them (here: libc). Filters are awesome. Lack of universal support for them is very frustrating. On Linux, for example, it's possible to create filters with strong link-editor-fu, but the RTLD does not support them. > * thing, depending on whether libpthread is linked in to the > * program. This permits code, particularly libraries that do not > * directly use threads but want to be thread-safe in the presence of > * threaded callers, to use pthread mutexes and the like without > * unnecessairly including libpthread in their linkage. Just move these into libc, lock stock and barrel, and if you want to have fast versions for the single-threaded case, just arrange for slower versions to get hot-patched-in when pthread_create() is first called. Or even just use a branch/computed jump (whichever is faster) to avoid having to hot-patch. It's important that pthread_mutex_init/lock/trylock/unlock/destroy work correctly even in the optimized single-threaded case. The main thread might init and acquire some locks then create a second thread that will block acquiring those locks. > * Left out of this list are functions that can't sensibly be trivial > * or no-op stubs in a single-threaded process (pthread_create, > * pthread_kill, pthread_detach), functions that normally block and > * wait for another thread to do something (pthread_join), and Just move them into libc anyways. > * functions that don't make sense without the previous functions > * (pthread_attr_*). The pthread_cond_wait and pthread_cond_timedwait > * functions are useful in implementing certain protection mechanisms, > * though a non-buggy app shouldn't end up calling them in > * single-threaded mode. > * > * The rename is done as: > * #define pthread_foo__libc_foo > * instead of > * #define pthread_foo(x) __libc_foo((x)) > * in order that taking the address of the function ("func = > * _foo;") continue to work. > * > * POSIX/SUSv3 requires that its functions exist as functions (even if > * macro versions exist) and specifically that "#undef pthread_foo" is > * legal and should not break anything. Code that does such will not > * successfully get the stub behavior implemented here and will > * require libpthread to be linked in. > */ All the more reason to not rename these symbols! All you need is ELF filter support. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
On Tue, Nov 24, 2015 at 11:32:32AM +1000, Peter Waltenberg wrote: > I wasn't saying there was anything wrong with mmap(), just that guard pages > only work if you can guarantee your overrun hits the guard page (and > doesn't just step over it). Large stack allocations increase the odds of > 'stepping over' the guard pages. It's still better than not having guard > pages, but they aren't a hard guarantee that you won't have mysterious bugs > still. > > You obviously realize that, but bn_prime() is the classic example of > allocating very large chunks of memory on the stack. Sure. Rich Salz claims this is all fixed in master, so guard pages strike me as a plus, not a wash. > As for fibre's, I doubt it'll work in general, the issue there is simply > the range of OS's OpenSSL supports. If you wire it in you still have to run > with man+dog+world in the process, that's a hard ask. One of the good > points about OpenSSL up until now, it tends to not break those big messy > apps where a whole lot of independly developed code ends up in the same > process. That's... a joke, right? OpenSSL very much does "break those big messy apps ...". I don't see the fibers project making that worse, nor better -- it's neutral. Let me give you some examples. Using Heimdal's libgssapi from Java JGSS (with the JNI GSS wrapper) blows up due to no one initializing OpenSSL's lock callbacks. Heimdal, of course, uses OpenSSL. Who should be initializing the lock callbacks? Not the JVM -- why should it know/assume that some library used via dlopen() will want OpenSSL? But it's not a library's place to initialize OpenSSL's lock callbacks either! Suppose a Java program were loading via dlopen() *two* libraries that use OpenSSL, and suppose different threads are racing to do this. This happens, and it happens because OpenSSL is used in so many *libraries*, not just *programs*. OpenSSL is a prime case of how a library meant for use by programs comes to be used by libraries. Another example is PKCS#11. That's no excuse. Use by libraries must be supported. And historically OpenSSL has been very bad at keeping its ABI backwards-compatible, so DLL Hell cases often involve OpenSSL. The more layers: the higher the likelihood of breakage involving OpenSSL. All threaded pluggable-software-using software (name services switch, PAM, JNI, ...) is vulnerable to these problems. If the OpenSSL team finally decides to do something about sane locking by default, then it will be a huge improvement. If this thread provides the impetus, so much the better. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
[Viktor asked me for my advice on this issue and bounced me the post that I'm following up to. -Nico] The summary of what I've to say is that making libcrypto and libssl need -lpthread is something that does require discussion, as it will have detrimental effects on some users. Personally, I think that those detrimental effects are a good thing (see below), but nonetheless I encourage you to discuss whether this is actually what OpenSSL should do. In particular, it may be possible to avoid -lpthread on some systems and still get a subset of lipthread functionality from libc or the compiler (e.g., thread-locals), and that may be worth doing. On Mon, Nov 23, 2015 at 01:53:47AM +, Viktor Dukhovni wrote: > As a side-effect of the MacOS/X MR I've become aware that the async > code in its current state links master with "-lphtread", and defines > macros that enable multi-theaded (as opposed to merely thread-safe) > compilation into OpenSSL. > > commit 757d14905e3877abaa9f258f3dd0694ae3c7c270 > Author: Matt Caswell <m...@openssl.org> > Date: Thu Nov 19 14:55:09 2015 + > > Add pthread support > > The forthcoming async code needs to use pthread thread local variables. > This > updates the various Configurations to add the necessary flags. In many > cases > this is an educated guess as I don't have access to most of these > environments! There is likely to be some tweaking needed. > > Reviewed-by: Kurt Roeckx <k...@openssl.org> > > This is quite possibly not be the right thing to do, and deserves > some attention from the team. We might even seek outside som > outside advice from folks well versed in platform-specific library > engineering (Christos Zoulas from NetBSD, Nico Williams formerly > from Sun, ...). > > My concern is that introducing -lpthread automatically converts > single-threaded applications that link with OpenSSL into threaded > applications (with a single thread). This may well have undesirable > consequences. Some background may be needed. When threading was introduced in the 90s, and in some cases still to this day, generally the end result was that the system had to support a number "process models", with potential transitions from one to another at run-time: - single-threaded, dynamically linked - single-threaded, statically linked - multi-threaded, dynamically linked - multi-threaded, statically linked - multi-threaded, mixed linkage The statically-linked models can become mixed-linkage models via dlopen(). The single-threaded model can become a threaded model via dlopen() of an object linked with -lpthread, or by dlopen()ing libpthread itself. Solaris 9 and under used to have a veritable rats nest of code to deal with the process model transitions from single-threaded to multi-threaded. Solaris 10 unified these and moved libpthread and libdl into libc [with filters left behind for backwards compatibility]. Thus, on Solaris 10 and up, and Illumos, OpenSSL using -lpthread or not makes no difference. I'm quite fond of the approach taken by Solaris 10 and up (and thus also Illumos): there is but one process model, and it is threaded, with pthreads in libc. But that need not be the way it works everywhere. Some systems may still support multiple process models. For a library like OpenSSL making use of -lpthread does mean dictating to users that they may only use a threaded process model. OTOH, not using -lpthread allows the user to choose a process model unconstrained by such a library. Until now OpenSSL has avoided forcing the user to choose any particular process model. Now with this commit OpenSSL is now taking the reverse stance. This seems like a very significant change that should at least be noted prominently in the release notes, but it should also be disucssed, indeed. Personally, I believe this change is a good thing, as OpenSSL really ought to either automatically initialize its "lock callbacks" or do away with them completely (leaving backwards compatibility stubs) and use the OS' locking facility by default / only. Automatic lock callback initialization without forcing the use of -lpthread and still allowing static linking would be tricky indeed (for example: OpenSSL couldn't use weak symbols to detect when -lpthread gets brought in). Still, if -lpthread avoidance were still desired, you'd have to find an alternative to pthread_key_create(), pthread_getspecific(), and friends. For thread-specifics the obvious answer is to use C compiler thread-local variable support. This might or might not be available; this would have to be determined at build configuration time. Still, if where the compiler supports thread-locals, OpenSSL could avoid -lpthread. For pthread mutex functions (for lock callbacks) and, perhaps, pthread_once() (for automatic initia
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
[Resend, with slight edits.] [Viktor asked me for my advice on this issue and bounced me the post that I'm following up to. -Nico] The summary of what I've to say is that making libcrypto and libssl need -lpthread is something that does require discussion, as it will have detrimental effects on some users. Personally, I think that those detrimental effects are a good thing (see below), but nonetheless I encourage you to discuss whether this is actually what OpenSSL should do. In particular, it may be possible to avoid -lpthread on some systems and still get a subset of lipthread functionality from libc or the compiler (e.g., thread-locals), and that may be worth doing. On a slightly related note, I asked and Viktor tells me that fiber stacks are allocated with malloc(). I would prefer that they were allocated with mmap(), because then you get a guard page. A guard page would allow one to safely tune down fiber stack size to the whatever OpenSSL actually needs for a given use. Comments below. On Mon, Nov 23, 2015 at 01:53:47AM +, Viktor Dukhovni wrote: > As a side-effect of the MacOS/X MR I've become aware that the async > code in its current state links master with "-lphtread", and defines > macros that enable multi-theaded (as opposed to merely thread-safe) > compilation into OpenSSL. > > commit 757d14905e3877abaa9f258f3dd0694ae3c7c270 > Author: Matt Caswell <m...@openssl.org> > Date: Thu Nov 19 14:55:09 2015 + > > Add pthread support > > The forthcoming async code needs to use pthread thread local variables. > This > updates the various Configurations to add the necessary flags. In many > cases > this is an educated guess as I don't have access to most of these > environments! There is likely to be some tweaking needed. > > Reviewed-by: Kurt Roeckx <k...@openssl.org> > > This is quite possibly not be the right thing to do, and deserves > some attention from the team. We might even seek outside some > outside advice from folks well versed in platform-specific library > engineering (Christos Zoulas from NetBSD, Nico Williams formerly > from Sun, ...). > > My concern is that introducing -lpthread automatically converts > single-threaded applications that link with OpenSSL into threaded > applications (with a single thread). This may well have undesirable > consequences. Some background may be needed. When threading was introduced in the 90s, and in some cases still to this day, generally the end result was that the system had to support a number "process models", with potential transitions from one to another at run-time: - single-threaded, dynamically linked - single-threaded, statically linked - multi-threaded, dynamically linked - multi-threaded, statically linked - multi-threaded, mixed linkage The statically-linked models can become mixed-linkage models via dlopen(). The single-threaded model can become a threaded model via dlopen() of an object linked with -lpthread, or by dlopen()ing libpthread itself. Solaris 9 and under used to have a veritable rats nest of code to deal with the process model transitions from single-threaded to multi-threaded. Solaris 10 unified these and moved libpthread and libdl into libc [with filters left behind for backwards compatibility]. Thus, on Solaris 10 and up, and Illumos, OpenSSL using -lpthread or not makes no difference. I'm quite fond of the approach taken by Solaris 10 and up (and thus also Illumos): there is but one process model, and it is threaded, with pthreads in libc. But that need not be the way it works everywhere. Some systems may still support multiple process models. For a library like OpenSSL making use of -lpthread does mean dictating to users that they may only use a threaded process model. OTOH, not using -lpthread allows the user to choose a process model unconstrained by such a library. Until now OpenSSL has avoided forcing the user to choose any particular process model. Now with this commit OpenSSL is now taking the reverse stance. This seems like a very significant change that should at least be noted prominently in the release notes, but it should also be discussed, indeed. Personally, I believe this change is a good thing, as OpenSSL really ought to either automatically initialize its "lock callbacks" or do away with them completely (leaving backwards compatibility stubs) and use the OS' locking facility by default / only. Automatic lock callback initialization without forcing the use of -lpthread and still allowing static linking would be tricky indeed (for example: OpenSSL couldn't use weak symbols to detect when -lpthread gets brought in). Still, if -lpthread avoidance were still desired, you'd have to find an alternative to pthread_key_create(), pthread_getspecific(), and friends. For thread-spe
Re: [openssl-dev] [openssl-team] Discussion: design issue: async and -lpthread
On Mon, Nov 23, 2015 at 08:34:29PM +, Matt Caswell wrote: > On 23/11/15 17:49, Nico Williams wrote: > > On a slightly related note, I asked and Viktor tells me that fiber > > stacks are allocated with malloc(). I would prefer that they were > > allocated with mmap(), because then you get a guard page. A guard page > > would allow one to safely tune down fiber stack size to the whatever > > OpenSSL actually needs for a given use. > > Interesting. I'll take a look at that. Please do. It will make this much safer. Also, you might want to run some experiments to find the best stack size on each platform. The smaller the stack you can get away with, the better. > > Still, if -lpthread avoidance were still desired, you'd have to find an > > alternative to pthread_key_create(), pthread_getspecific(), and friends. > > Just a point to note about this. The async code that introduced this has > 3 different implementations: > > - posix > - windows > - null > > The detection code will check if you have a suitable posix or windows > implementation and use that. Otherwise the fallback position is to use > the null implementation. With "null" everything will compile and run but > you won't be able to use any of the new async functionality. > > Only the posix implementation uses the pthread* functions (and only for > thread local storage). Part of the requirement of the posix detection > code is that you have "Configured" with "threads" enabled. This is the > default. However it is possible to explicitly configure with > "no-threads". This suppresses stuff like the "-DRENENTERANT" flag. It > now will also force the use of the null implementation for async and > hence will not use any of the pthread functions. Ah, I see. I think that's fine. Maybe Viktor misunderstood this? > One other option we could pursue is to use the "__thread" syntax for > thread local variables and avoid the need for libpthread altogether. An > earlier version of the code did this. I have not found a way to reliably > detect at compile time the capability to do this and my understanding is > that this is a lot less portable. I use this in an autoconf project (I know, OpenSSL doesn't use autoconf): dnl Thread local storage have___thread=no AC_MSG_CHECKING(for thread-local storage) AC_LINK_IFELSE([AC_LANG_SOURCE([ static __thread int x ; int main () { x = 123; return x; } ])], have___thread=yes) if test $have___thread = yes; then AC_DEFINE([HAVE___THREAD],1,[Define to 1 if the system supports __thread]) fi AC_MSG_RESULT($have___thread) Is there something wrong with that that I should know? I suppose the test could use threads to make real sure that it's getting thread- locals, in case the compiler is simply ignoring __thread. Are there compilers that ignore __thread?? Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] curve25519
On Sun, Jun 21, 2015 at 10:36:30PM +, Pascal Cuoq wrote: Short answer: No tools that are useful for usable implementations of asymmetric cryptography, that I know of, but useful tools for confirming that symmetric cryptography designed for constant-time implementation was correctly implemented. Long answer: [...] Following this line of reasoning gives you two tools that you can use: [...] There are two problems with both these approaches: - the assumption about knowledge about the time taken by assembly instructions. Both Frama-C and, to the best of my knowledge, ctgrind, ignore intermediate values computed from secrets and passed as arguments to the division instruction. On a modern processor the execution time of division reveals information about its inputs: That's correct. Reasoning about CT is hard, and doing so from C source is harder because the optimizer can potentially convert CT code to not-CT code. This argues for analyzing assembly, not C. So to summarize the first problem, there are a lot of assumptions. That doesn't prevent going forward, and taking any new discovery of an information leak through timing as a bug and fixing them as flaws are discovered in our assumptions, but you won't be “checking” that C code is “constant-time” in the sense that you might check the proof of a theorem. But even more serious is the second problem: [...] [...]. Also you should bear in mind that when implementing cache-aware constant-time look-up tables, you are really assuming a lot of things about the behavior of the processor. [...] At some point one is forced to dispense with -or, rather, augment- static analysis with dynamic analysis. The beauty of dynamic analysis is that it really short-circuits all of these considerations and gives you an answer you can trust. But one does have to make sure to eliminate sources of jitter (e.g., quiesce the system, disable interrupts on the CPU doing the benchmarking, ...). If one is able to trace execution to pinpoint sources of variability, then dynamic analysis can even point out where secrets leak into timing. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] OpenSSL support on Solaris 11 (built on Solaris 10)
On Tue, Jun 16, 2015 at 12:51:31PM +0530, Atul Thosar wrote: Currently, we build OpenSSL v0.9.8zc on Solaris 10 (SunOS, sun4u, sparc) and it works well on Solaris 10 platform. We use Sun Studio 12 compiler. We would like to run it on Solaris 11.2 (SunOS, sun4v, sparc) platform w/o changing the build platform. I mean we will continue to build OpenSSL on Solaris 10 and run it on Solaris 11. Has anyone encounter such situation? Appreciate any help/pointers if this mechanism will work? Historically the approach you describe has worked quite well with Solaris because the ABI is quite stable. This is particularly the case if you use only features that are extremely unlikely to be removed in a new minor release (Solaris 12 would be a minor release). (You should not build on anything older than S10 to run on S10 or later, mostly due to subtle changes in how snprintf() works.) But you should read the ABI compatibility promises that Oracle makes and decide for yourself. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] Self-initialization of locking/threadid callbacks and auto-detection of features
On Thu, Jun 11, 2015 at 10:41:58AM +0200, Florian Weimer wrote: Detecting things in libcrypto is very difficult on GNU/Linux due to the way dynamic linking works. Details? On GNU/Linux, you should try very hard to avoid linking -lpthread and restrict yourself to the pthreads API subset which is available without -lpthread. If something is missing, we (as in glibc upstream) can move additional functionality from libpthread to libc. [My apologies for making this so long. There's a lot to say about this.] Perhaps OpenSSL should have several configuration flavors for Linux then. If you want to statically link a non-threaded program with OpenSSL, then you should use the libpthread static link meant for it. Or perhaps we could have the OpenSSL static link archives assume non-threaded processes (callers must initialize lock callbacks) but the shared objects link with -lpthread and assume a threaded environment. (This means supporting two process models instead of *four*.) In any case, use of OpenSSL by *libraries* (which is rather common, I gather; certainly in the case of Kerberos implementations) is currently a disaster waiting to happen, and sometimes a disaster that strikes. A workaound for libraries may be to use a private copy (as if by static linking) of OpenSSL with distinct SONAME/symbols and initialize that copy properly. This is generally safe (we've tried it) but also a bit troublesome. On the plus side this means that ABI incompatibilities betwee OpenSSL releases become a non-issue. Or indeed, libpthread should move into libc (which I gather would take a long time and is beyond what we can do here). For background, Solaris moved libpthread into libc around 2004, during S10 development. Solaris 10 also dropped support for static linking of libc and dropped all static linking archives from OS/Net (but not ld(1) support for static linking). Before that S9 had basically four process models: statically-linked,-not-threaded, statically-linked,-threaded, dynamically-linked,-not-threaded, dynamically-linked,-threaded. This was awful in a great many ways (think of statically-linked, single- threaded programs that dlopen()ed dynamically-linked threaded objects via, e.g., the name service switch). Dropping support for statically linking with libc greatly simplified a lot of things. Linux ought to do the same, but there's a fascination with static linking in Linux land... ...Static linking wouldn't be so bad if it had dependency tracking and symbol resolution semantics closer to dynamic linking. The startup times for one process would be awesome, though startup times for entire systems would be awful. (Solaris 10 boot times improved significantly when static linking was dropped.) And then there's the need for PIE to still get any benefit from ASLR, which then applies to the whole program, not each library. (Can gdb debug PIE nowadays? Last I checked it could not.) (Moving the implementation of a library to another requires support for shared object filters, at least on Solaris, so that dynamically-linked dependents of libpthread and such will find the symbols they need there, though the RTLD knows to go look in the object they moved to. We say that libpthread is a filter of libc because only the pthread-related symbols of libc appear in libpthread. IIUC the Linux RTLD does not support filters.) (I help maintain an open source project that distributes a statically- linked executable as that's a great way to avoid needing packaging and installers. So perhaps I shouldn't speak so ill of static linking. But then, static linking truly is awful.) Linking -lpthread has a real performance hit for unthreaded applications, so core libraries should avoid it. It shouldn't. Having *four* process models is an enormous burden on developers, especially libc and libpthread developers, but this burden leaks into other system libraries, such as OpenSSL's. Solaris saw a sizeable net performance win in switching to a single process model (dynamically-linked, threaded) (for many processes there's just one thread, natch). If you have received advice to the contrary, your source of advice is wrong. :-) Or maybe I'm rather spoiled by Solaris and am continually suprised to see that developers on Linux must struggle with problems that Solaris tackled a decade ago. In any case, the initialization problems when OpenSSL is used by *libraries* are simply unacceptable. Either that means that OpenSSL cannot and must not be used by libraries, or we must find a solution that doesn't suck. See above and let me know which paths, if any, are the way forward; I offered several, and there may be more that I have not seen. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] Self-initialization of locking/threadid callbacks and auto-detection of features
On Mon, Jun 15, 2015 at 06:19:49PM +, Salz, Rich wrote: My overall goal is that I want to remove the thread callback stuff. Excellent. Ideally we have two options: no threads and system-threads. Presumably that would be either a configure-time option or a run-time automatic option, but not an option given to the caller via an API. It seems that on Linux shared/static libraries might be an issue. I hope we can resolve and simplify that. With weak symbols this problem can go away, but when linking statically it does make the order of dependencies matter in the final link edit. It'd have to be -lpthread -lcrypto -lssl; -lpthread could not come after OpenSSL; the strong symbol definition has to come first. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] Self-initialization of locking/threadid callbacks and auto-detection of features
Hmm, another option is to use weak symbols to detect presence of pthreads. This should work regardless of whether static or dynamic linking is used. A statically-linked, single-threaded program that dlopen()s an object that brings in libpthread will have different OpenSSL dependencies for the dynamically-loaded objects than for the initial statically-linked program. So everything should work out. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-users] Replacing RFC2712 (was Re: Kerberos)
On Mon, May 11, 2015 at 04:42:49PM +, Viktor Dukhovni wrote: On Mon, May 11, 2015 at 11:25:33AM -0500, Nico Williams wrote: - If you don't want to depend on server certs, use anon-(EC)DH ciphersuites. Clients and servers must reject[*] TLS connections using such a ciphersuite but not using a GSS-authenticated application protocol. [*] Except when employing unauthenticated encrypted communication to mitigate passive monitoring (oportunistic security). As this would be replacing RFC2712, it's not opportunistic to begin with :) ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
[openssl-dev] Replacing RFC2712 (was Re: Kerberos)
On Fri, May 08, 2015 at 10:57:52PM -0500, Nico Williams wrote: I should have mentioned NPN and ALPN too. [...] A few more details: - If you don't want to depend on server certs, use anon-(EC)DH ciphersuites. Clients and servers must reject TLS connections using such a ciphersuite but not using a GSS-authenticated application protocol. - The protocol MUST use GSS channel binding to TLS. - Use SASL/GS2 instead of plain GSS and you get to use an authzid (optional) and you get a builtin authorization status result message at no extra cost, and all while still using GSS. You get to optimize only the mechanism negotiation, and you get TLS w/ Kerberos (and others) and without PKIX (if you don't want it). See RFCs 7301, 5801, 5056, and 5929 (but note that the TLS session hash extension is required). Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] [openssl-users] Kerberos
On Fri, May 08, 2015 at 05:17:29PM -0400, Nathaniel McCallum wrote: I agree that the current situation is not sustainable. I was only hoping to start a conversation about how to improve the situation. RFC2712 uses Authenticator, which is an ASN.1 type quite clearly NOT intended for use outside RFC1510 because it isn't a PDU. RFC2712 unnecessarily constructed its own AP-REQ that's different from the RFC1510 (now 4120) AP-REQ. This is bad for a variety of reasons, not the least of which are complicating Kerberos APIs and/or RFC2712 implementations (which might have to parse out the Authenticator and Ticket from a plain AP-REQ). I also notice that the EncryptedPreMasterSecret is under-specified (is it a Kerberos EncryptedData? who knows?). RFC2712 could be replaced with a properly-done protocol that uses Kerberos in the full TLS handshake (i.e., not in session resumption). This would be the lowest-effort fix. A generic GSS-in-TLS extension would require much more energy (see below). For instance, there is this: http://tls-kdh.arpa2.net/ Yes, it'd be nice to add PFS to the Kerberos AP exchange, and we just might get there, but adding Kerberos and/or GSS to TLS is a very different undertaking. I don't see any reason this couldn't be expanded to do GSSAPI. Well, that's difficult because GSS has arbitrary round trips... You're not the first to want this, see for example here: https://tools.ietf.org/html/draft-santesson-tls-gssapi-01 https://tools.ietf.org/html/draft-williams-tls-app-sasl-opt-04 And more if you consider other efforts like False Start and look past GSS/SASL. Probably many more than I know of then... Two main design axis: 1) When does the GSS context token begin, and how is channel binding done. - no GSS mech negotiation, first GSS context token goes in TLS ClientHello; (channel binding done via MIC tokens or GSS_Pesudorandom() output exchanges) or (e.g., if the client needs to negotiate mechs) - TLS ClientHello carries client mechList, server announces a mech in its handshake message, first GSS context token goes in second client handshake flight with normal channel binding (Both options could be specified, with clients choosing as desired.) x 2) How many GSS context tokens can be exchanged and who is responsible for continuing past the traditional TLS handshake. - one round trip only or - arbitrary round trips continued by TLS or by the application The first order of business is to decide on whether or not to support multiple round trips (IMO we must; what's the point if not?). The second is to decide whether or not additional context token round trips are to be done by the application, both as to how they appear on the wire and how they appear in the API. The third is to decide whether GSS mechanism negotiation is supported, and whether it can be optimized away when it's not needed. The fourth is to decide whether SASL (with SASL/GS2 to get GSS) isn't better, since if we're going to spend a pair of flights in negotiation, we might as well let server-talks-first SASL mechs get a leg up on GSS. Remember, SASL can do GSS just fine via SASL/GS2 [RFC5801]. But maybe this mailing list isn't the right place for such a discussion. Well, TLS WG would be the right forum, but they are busy with TLS 1.3. Some of us could get together elsewhere, probably not here. Perhaps the right question to ask is how much interest there would be in improving this situation in the TLS WG and whether or not OpenSSL would have interest in implementing such a project. My impression is: none, because TLS WG is too busy at this time, and in the past it has been very difficult to get the necessary level of implementor effort. Past performance is not always a predictor of future performance. It would help if GSS had better, less niche mechanisms. For example: if Kerberos had PKCROSS (based on DANE, say), that would help. Or if ABFAB went viral. But for now everyone in the TLS world is happy _enough_ with WebPKI for server (should be service, but hey) authentication and bearer tokens for user authentication. Part of the problem is that HTTP authentication schemes (whether in HTTP proper or not) have no real binding to TLS, and HTTP is basically a routable (and usually routed) protocol anyways, which complicates everything. But HTTPS is the main consumer of TLS. One might think that adding user authentication options to TLS would be desirable for HTTP applications, but again, the routing inherent to HTTP means that routing must pass along user authentication information, but this isn't always easy. And HTTP is stateless and so doesn't deal well with needing continuation of authentication exchanges, so bearer tokens it basically kinda has to be, so that better mechanisms lose their appeal. If the main consumer of GSS-in-TLS were to be something other than HTTP, well, great, but still, HTTPS is the biggest consumer (next is SMTP)... And it's easier then to
Re: [openssl-dev] [openssl-users] Kerberos
I should have mentioned NPN and ALPN too. A TLS application could use ALPN to negotiate the use of a variant of the real application protocol, with the variant starting with a channel-bound GSS context token exchange. The ALPN approach can optimize the GSS mechanism negotiation, at the price of a cartesian explosion of {app protocols} x {GSS mechs}. A variant based on the same idea could avoid the cartesian explosion. But hey, TLS is the land of cartesian explosions; when in Rome... The ALPN approach would be my preference here. With TLS libraries implementing the GSS context exchange, naturally. The result would be roughly what you seem to have in mind. If we ask TLS WG, I strongly suspect that we'll be asked to look at ALPN first. I should add that I also would like to see the RFC4121 Kerberos GSS mechanism gain PFS, independently of TLS gaining GSS. Nico -- ___ openssl-dev mailing list To unsubscribe: https://mta.openssl.org/mailman/listinfo/openssl-dev
Re: [openssl-dev] Circumstances cause CBC often to be preferred over GCM modes
On Tue, Dec 16, 2014 at 12:26:32PM -0500, Salz, Rich wrote: In particular there MUST NOT be any fragile hand-tuning. All ordering needs to be based on general principles. This is not a universally-held view. I think the key word is fragile, not hand-tuning. Subtracting (in local configuration) algorithms from a keyword denoting all known-strong algorithms is hand-tuning, but not fragile hand-tuning. Nico -- ___ openssl-dev mailing list openssl-dev@openssl.org https://mta.opensslfoundation.net/mailman/listinfo/openssl-dev
Re: [openssl-dev] Circumstances cause CBC often to be preferred over GCM modes
On Tue, Dec 16, 2014 at 05:17:01PM +, Viktor Dukhovni wrote: In particular there MUST NOT be any fragile hand-tuning. All ordering needs to be based on general principles. +1. One might for example say that any CBC cipher at 128+ bits gets a baseline sorting strength of 128 bits. One might then apply either @STRENGTH or @SPEED (new), the first of which adds 1 to any CBC cipher whose key is longer than 128-bits, the second to those that are equal to 128 bits. With AES AEAD the baseline could be 129, with similar STRENGTH vs. SPEED boosts. Which would ensure that AEAD@128 beats CBC@256. However, where do we fit ChaCha20/Poly-1305? Again, not hand-placement, but some extensible algorithm. Any algorithm numeric strength assignments should be baked into the library, or perhaps configurable in a configuration file. The should not be known to applications. Ditto for any computaiton of overall strength of cipher-mode combinations. Internally using such numeric strength assignments is fine. In particular I want you to avoid the problem that Cyrus SASL had (and still has) with the security strength factor (SSF), where entire security mechanisms get boiled down to a single numeric strength factor even though a mechanism might negotiate cryptographic algorithms, and where applications end up hardcoding SSF values for things like Kerberos V5 (which has SSF of 56, because initially Kerberos V5 only supported 1DES). This is a horrible problem to have. Nico -- ___ openssl-dev mailing list openssl-dev@openssl.org https://mta.opensslfoundation.net/mailman/listinfo/openssl-dev
Re: [openssl-dev] Circumstances cause CBC often to be preferred over GCM modes
On Tue, Dec 16, 2014 at 01:04:17PM -0500, Salz, Rich wrote: Subtracting (in local configuration) algorithms from a keyword denoting all known-strong algorithms is hand-tuning, but not fragile hand-tuning. Three years ago RC4 was known-strong. Two years ago DES-CBC was known-strong. Now we only have AES-GCM. At what point do we think ChaCha/Poly is known-strong, and who gets to make that call? Dan? Adam? Changing the internal relative strength weighings of these requires pushing out new code. Something that... happens all the time. I'm not against local configuration of these things as, say, a temporary override while waiting for patches. The configuration needs to be simple and not fragile. Subtracting from named sets of algorithms and sorting by desired attributes (speed, strength), is a non-fragile way to specify administrative preferences. Assiging numeric algorithm strength in a config file is fragile but acceptable for emergencies. Who said these are known-strong and when did they say it, and are they still correct? And where and how does a system admin find those things out. This is why I'm advising against exposing any sort of numeric algorithm strength assessments to _applications_: once those are baked in in the application they can't be changed. I realize that there was no proposal to do so. However, any time numeric algorithm strength assessments are discussed is also a good time to warn others to avoid the SASL SSF mistake. Nico -- ___ openssl-dev mailing list openssl-dev@openssl.org https://mta.opensslfoundation.net/mailman/listinfo/openssl-dev
Re: [openssl-dev] Circumstances cause CBC often to be preferred over GCM modes
On Tue, Dec 16, 2014 at 06:26:50PM +, Viktor Dukhovni wrote: Internally OpenSSL has a multi-dimensional property matrix, and preferences between numerically equal ciphers are based on other properties. The (stable) numeric sorting just re-arranges blocks of ciphers already sorted by other means. Thus preference for PFS already puts kEECDH and kDHE ahead of kRSA for otherwise equally strong ciphers. When the user choose a group of ciphers to add to a cipherlist, the members of the group retain their relative order. However, there are interesting games that can be played with: aRSA:-kRSA:ALL:@STRENGTH (prefers kRSA over PFS). which perturb the order, because the most recenly removed elements end up at the top of the list when ALL is added. So the relative preferences of various properties can be changed. Iterating subtaction and addition seems like a fragile way to indicate preference. The SASL problem mostly does not bite OpenSSL, However, @STRENGTH (which is often needed) is not sufficiently tunable, it is not eas to prefer AES-128 with AEAD over 256 with CBC. For that we need some new mappings that produce slight different effective stengths. My preference would be: subtract undesired algorithms from a named set, then specify order of preference via some method other than iteratively adding and subtracting algorithms. Something like: DEFAULT:-FOO128:::PFS,AEAD,speed,strength (Whatever is in DEFAULT minus FOO128, sort by PFS, AEAD, speed, strength.) Preferences should affect the order in which cipher suites are advertised/picked, not which ones are advertised. Algorithms that are not desired should not be advertised/used. Nico -- ___ openssl-dev mailing list openssl-dev@openssl.org https://mta.opensslfoundation.net/mailman/listinfo/openssl-dev
Re: [openssl-dev] Circumstances cause CBC often to be preferred over GCM modes
On Tue, Dec 16, 2014 at 09:50:32PM +0100, Hubert Kario wrote: My preference would be: subtract undesired algorithms from a named set, then specify order of preference via some method other than iteratively adding and subtracting algorithms. Something like: DEFAULT:-FOO128:::PFS,AEAD,speed,strength I would add that this would work like one would expect, and that speed,strength would give [potentially] different results than strength,speed, as one would expect. Let's say, for the sake of argument, that CBC mode is significantly broken, even in EtM mode (that's another can of worms[1]), then many people will want to prioritise *non-PFS* versions of AEAD ciphers above any other ciphers. And they will want to do it for the same reason people currently leave RC4 in. Right. Any crypto is better than no crypto (or, rather, the identity ciphersuite), but the weak crypto has to go last. Of course, for one-offs like this hypothetical one might want a way to indicate that some algorithms are least preferred and others most, not just sets of algorithms. 1 - by another can of worms I mean: what if it's broken only in MtE mode? how to specify different ciphers depending on presence of this extension, so that in MtE only AEAD ciphers are available while if EtM is on, the list gains CBC ciphers? This is similar to the problem with BEAST: ordering with RC4 at the front for TLS1.0 is sort-of OK, not so much for TLSv1.1 and later... Specifying ciphersuites and preference on a per-protocol version basis would help. Specifying in more context-dependent ways would be nice but now you'd need a way to name/identify the context. Nico -- ___ openssl-dev mailing list openssl-dev@openssl.org https://mta.opensslfoundation.net/mailman/listinfo/openssl-dev
Re: Segfault seen with OpenLDAP: locking callback issue
Please see the thread_safety branch of my github clone of OpenSSL (https://github.com/nicowilliams/openssl). I've not had time to get around to finishing it, but it might well work for you. Nico -- __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms
On Fri, Nov 8, 2013 at 4:08 AM, Bodo Moeller via RT r...@openssl.org wrote: Alternatives would be (a) using a new lock for safe static initialization, Maybe you could try my patches on my thread_safety branch of my github clone of OpenSSL? (https://github.com/nicowilliams/openssl) Nico -- __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [openssl.org #3149] [patch] Fast and side channel protected implementation of the NIST P-256 Elliptic Curve, for x86-64 platforms
On Fri, Nov 8, 2013 at 2:43 PM, Andy Polyakov via RT r...@openssl.org wrote: Alternatives would be (a) using a new lock for safe static initialization, or (b) more code duplication to avoid the need for an explicit pointer (there could be two separate implementations for the higher-level routines). However, given the 1% performance penalty, that's a minor issue at this point. While if (functiona==NULL || functionb==NULL) { asssign functiona, functionb } can be unsafe, I'd argue that if (functiona==NULL) { assign functiona } followed by if (functionb) { assign functionb } is. That's not thread-safe. There's no memory barrier so the writes can be reordered. The compiler itself could reorder those instructions. Nico -- __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: Self-initialization of locking/threadid callbacks and auto-detection of features
On Oct 31, 2013 8:19 AM, Kevin Fowler kevpfow...@gmail.com wrote: Not to stand in the way of progress at all, but just to note we cross-compile OpenSSL libraries for an embedded linux target that is still using libc_r, not libpthread. That will not change anytime soon for us, at least on legacy systems. Besides that, it seems much of this discussion is about native builds and config-time detection of threading support. What about cross-compiling? I will be introducing new targets. For cross-compilation you must tell ./Configure the target name and there's no auto-detection. If you want auto-detection you run ./config on the target host. (At least that's my understanding of how OpenSSL build configuration works.)
Re: Self-initialization of locking/threadid callbacks and auto-detection of features
On Wed, Oct 30, 2013 at 11:15:27AM +0100, Corinna Vinschen wrote: Please, before any change is made in terms of threading, let me point out that Cygwin is NOT Windows, even if it runs on Windows. Cygwin provides its own pthreads implementation which is always available, even without explicit linking against -lpthread. So, for Cygwin, please use pthreads, not Windows threading, otherwise applications linked against libcrypto might be broken in subtil ways. Of course. But surely OpenSSL needs to be built separately (and differently) for native Windows and Cygwin. I.e., an OpenSSL DLL built against the Windows CRT can't be used from a Cygwin app. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: Self-initialization of locking/threadid callbacks and auto-detection of features
On Wed, Oct 30, 2013 at 06:12:08PM +0100, Corinna Vinschen wrote: Right. I'm just trying to raise awareness before Cygwin gets sorted into the runs on Windows so use Windows functions drawer, accidentally, as happened too often in the past in various projects. I won't. I'll basically be adding new targets for every existing target that has a standard (POSIX or Win32) threading library. I probably won't change what target ./config picks -- I think core OpenSSL developers must make that decision. Nico -- __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: Self-initialization of locking/threadid callbacks and auto-detection of features
On Tue, Oct 29, 2013 at 09:58:25PM +0100, Andy Polyakov wrote: Another thing: run-time selection from among multiple pthreads implementations can only safely be managed by the RTLD, and dependencies on pthreads must still be declared via the linker-editor. Libraries should be able to refer to pthread_* symbols and still use the run-time-selected pthread implementation *even* if the library was linked with -lpthread (and therefore a specific libpthread.so compilation symlink). If an OS delivers two (or more) pthreads implementations then: - if the ABIs of those pthreads implementations all match then the RTLD should be able to substitute any of them regardless of which libpthread any one object in the process was linked with; (e.g., via LD_PRELOAD, different library search paths, ..., but all the libpthreads would have to have the same SONAMEs and symbol version numbers) - but if their ABIs are not compatible then no run-time substitution is safe, no matter what the mechanism. (unless compatibility shims are involved, but then we *really* must leave it all to the RTLD and we can't be using dlsym() or weak symbols) I don't think libraries should be failing to declare their dependencies and then use dlsym(RTLD_DEFAULT) and dlopen()+dlsym() as a way to run-time link-edit-load their undeclared dependencies. Doing so is fundamentally racy and so would have to be done in .init sections, and results (which pthread gets used) would be unpredictable (since that would depend on object load and .init firing order). Plus we'd lose observability via ldd(1). I can't see any way to justify this without having examples of OSes where this is the documented way to handle dynamic linking against libpthread. But maybe I'm missing something? Nico -- __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: Self-initialization of locking/threadid callbacks and auto-detection of features
It's getting closer, see: https://github.com/nicowilliams/openssl/compare/thread_safety I spoke to a Linux expert and was told that the correct thing to do in the shared build of OpenSSL is to always link with -lpthread. I assert that this is also the correct thing to do on Solaris. It's been so for a long time now. I.e., OpenSSL should default to using native threads on all POSIX-like and Win32/64 systems (others will have to contribute similar support for other environments). My patches nonetheless will allow apps to provide alternate threading callbacks, just as today, except that the callbacks now may not be changed once set, and they are set to native implementations where possible if they are needed before they are set. Nico -- __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: Self-initialization of locking/threadid callbacks and auto-detection of features
On Tue, Oct 29, 2013 at 09:58:25PM +0100, Andy Polyakov wrote: I feel like saying few words. One should recognize that by the time multi-threading support was taking shape there was a whole variety of threading implementations and callbacks were the only way to convey the specifics. Nowadays we're pretty much talking only about Certainly. I'm not saying omg, that OpenSSL is insanez!!. I should repeat my problem case: we have libraries that depend on OpenSSL libraries, and it should be possible to for our libraries to work in threaded programs. But right now a Java program that uses JGSS with the JNI shim to the C GSS-API libraries using Heimdal's libgss crashes in RAND_bytes(). Heimdal can't provide locking callbacks (how could it? it might step on some other caller's toes in the same process). The app can't either (since in this case it's not using OpenSSL directly at all). pthreads and Windows, and one can indeed argue why wouldn't OpenSSL simply default to either of the two when appropriate. While it's more than appropriate on Windows as it is, on pthreads-powered Unices it's not as obvious. Because pthreads can be undesired in some situations. Basically it boils down to question whether or not libcrypto may be linked with libpthread or not. And answer is actually not desired. Ideally libcrypto should *detect* if More details would be nice. What follows is speculation of mine as to what you might have in mind (or even if you don't, it might be relevant). To summarize the below: OpenSSL .so's should always be linked with -lpthread on OSes where there's a standard libpthread that threaded apps are expected to use; OpenSSL static link archives should assume no threading libraries and rely on the callers to provide the threading callbacks. Distros that focus on static linking should ship OpenSSL static link archives only and shuld ensure that dependents of OpenSSL initialize it correctly when used in threaded programs. Linux still has the multiple process models problem that Solaris 9 used to have: - statically linked with libc but not -lpthread - statically linked with libc and -lpthread - dynamically linked with libc but not -lpthread - dynamically linked with libc and -lpthread - mixed: not originally linked with -lpthread but a dlopen()ed object brought it in I dunno about Linux, but IIUC loading libpthread at run-time in the third case works fine. It also works fine in S8 and S9. It's loading libpthread at run-time in the first case that leads to fireworks. The mixed case is a very difficult one to handle correctly in any library. Merely using dlsym() on RTLD_DEFAULT is not good enough: pthreads symbols might appear later on. Typically the mixed case results when an app uses the name service switch without nscd, or PAM, and one of those pluggable things loads libcrypto. If the app also uses libcrypto then... who knows. (It's clearly insane to support a static libc and have an RTLD. And yet Solaris used to support that... But then, it stopped supporting that for a reason: it was insanely expensive in terms of engineering costs.) I'm not sure that OpenSSL should support such madness in any way other than at ./Configure time: the builder person picks. I.e., this is the distro builder's problem. Some distros are very focused on static linking. Others are not. Static link archives don't record dependencies, so we can't speak of building libcrypto.a with -lpthread. So in the static link build option OpenSSL needs to know whether to assume pthreads or not, and only the builder can tell OpenSSL. In the shared object build of OpenSSL it's just easier to link with -lpthread if the target has pthreads. E.g., a distro that is mostly-dynamically-linked should just ship OpenSSL libs linked with -lpthread, while a distro that is mostly-statically-linked should ship OpenSSL .a's built to not use pthreads. A distro with both should *probably* build OpenSSL .a's w/o pthread and .so's with. A distro might want to require that pthreads is *always* available, even with linking statically. Or perhaps you were thinking of apps that use GNU Pth, but those should either build their own private OpenSSL libs or they should use static link OpenSSL archives. I don't think OpenSSL should bend over backwards to support third-party alternate threading libraries where the OS has a single supported standard. If an OS has multiple supported alternatives then OpenSSL should pick one at run-time, but this is very difficult (see below). *hosting* application is linked with libpthread and only *then* adjust accordingly. Is there way to detect if application is linked with libpthread at run-time? There is DSO_global_lookup. See above. Using weak symbols might work, but it might be very OS-dependent; I'm not prepared to research that for every target, or even more than one or two targets. Using dlsym() seems like asking for trouble. As for
Re: Self-initialization of locking/threadid callbacks and auto-detection of features
I'm making progress slowly (not my main project). I've run into a bit of a problem: the dynlock callback setting cannot be made thread-safe due to the setter API's using three functions to set three related callbacks. Also, I'm not sure that the dynlocks need a default implementation. It seems they don't, at least not as far as *apps* are concerned. Please let me know, though I'm inclined to provide a default implementation anyways, just because it's no big deal. A few notes and questions: - Locks in OpenSSL are really reader-writer locks, but the sample code in crypto/threads/*.c uses mutexes only. How important is it that reader/writer locks be used instead of exclusive locks? - The add_lock stuff should just use OPENSSL_atomic_add() wherever it exists. How would I determine at build-time (in ./Configure and in ./crypto/lock.c) whether OPENSSL_atomic_add is available? - I'll be adding a single setter for dynlock callbacks, and deprecating the old ones. Any objections? - As I get closer to having code that can be tested... I can provide tests of the one-time initialization of things, but it'd be nice to test threaded functionality in general -- is there such a general test? If so, please point me to it. Nico -- __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: Self-initialization of locking/threadid callbacks and auto-detection of features
On Wed, Oct 23, 2013 at 08:32:35AM +1000, Peter Waltenberg wrote: There is no 'safe' way to do this other than hardwired. Admitted, we have a fairly ugly stack on which to find that out, multiple independently developed lumps of code jammed into the same process, quite a few using dlopen()/dlclose() on other libraries - multiples of them calling the crypto. code. Oh, good point. I think what I'll do is add targets that denote always use the OS thread library; disallow setting these callbacks, and a corresponding command-line option to ./config. This should be the best option in general because of the possibility of the text for callbacks being unmapped when the provider gets dlclose()ed. Then maybe there's no need to bother with the pthread_once()/ InitOnceExecuteOnce() business. I had assumed, going in, that I needed to preserve existing semantics as much as possible, but because that might still be the case (even if you're right as to what the ideal should be) I will do *both*. (Who knows, maybe there's a program out there that insists on using the gnu pth library and not the OS' native threading library. Or maybe there's no need to support such oddities.) All those lumps of code think they 'own' the crypto. stack - worst case scenario was a dlopen()'d library setting the callbacks, then being unloaded while other parts of the stack were still using crypto. Surprisingly - that still worked on some OS's - but some (like AIX/HPUX) ummap program text immediately on dlclose(). Right. That's a question regarding patch contribution: is it OK to leak things that can only really be torn down at dlclose() / unload time and which effectively never happens? git grep '\.fini\' finds nothing in the tree (while git grep '\.init\' does), so I'm guessing yes, it's OK. Personally, I'd suggest making it build option to turn off default locking and use whatever the OS provides by default. That'll allow the few corner cases to continue doing whatever wierd things they were doing before, but remove the big risk factor for the vast majority of users. And it is becoming a big risk factor now. It certainly shouldn't be an issue for the OS installed OpenSSL which probably covers most of your users, the only sane choice there is the OS default locking scheme anyway. Agreed. Thanks for your response, Nico -- __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: Self-initialization of locking/threadid callbacks and auto-detection of features
But I could have a target that has a weak dependency on pthreads and is safe when the library is present. Ditto Windows (be unsafe pre-vista/2008, safe in vista/2008 and later, using same OpenSSL DLLs builds). I'd rather add this variation later, after the meat of this work is done, assuming such a variation is desired. Of course, that assumes a degree of ABI compatibility in the OS that may not in fact be available -- a great reason not to go for that...
Re: Self-initialization of locking/threadid callbacks and auto-detection of features
On Oct 22, 2013 10:28 AM, Ben Laurie b...@links.org wrote: On 22 October 2013 06:47, Nico Williams n...@cryptonector.com wrote: What I need to know: - should i add new targets to ./Configure? for now I modified the linux-elf target, but this feels wrong to me. - what about Windows? I either need to have different targets for pre-vista/2008 or. i have to write a once initialization function for older Windows (which I can and know how to do, it's just more work that, and in particular i couldn't test it, so I'm not inclined to do it). - if so, should ./config automatically pick the new targets where there is appropriate threading support? I've been musing about a more autoconf-like approach for some time now (but, for the love of all that is fluffy, not using autoconf itself, which sucks) - it seems this is a good reason to go down that path. Well, I'm not signing up for that, not yet anyways! :) Short-term advice will do. I think I'll just add new targets and ./config logic for picking then. The fact that targets are stable-ish is useful, as it allows building whatever targets one can build (or cross-build) on the host. autoconf can't do this, and that's one more reason not to autoconf. Interesting question is: what to do if no appropriate locking mechanism is discovered? I think for a linux-elf-pthread target the dependency on pthreads should be hard. The old linux-elf target should remain thread-unsafe (I can't make OpenSSL fully thread-safe without a thread library). But I could have a target that has a weak dependency on pthreads and is safe when the library is present. Ditto Windows (be unsafe pre-vista/2008, safe in vista/2008 and later, using same OpenSSL DLLs builds). I'd rather add this variation later, after the meat of this work is done, assuming such a variation is desired. Nico --
Re: Self-initialization of locking/threadid callbacks and auto-detection of features
On Monday, October 21, 2013, Salz, Rich wrote: I like your proposal, but I'd prefer to see an already initialized error code returned. Or a flag to the (new?) init api that says ignore if already set Thanks for your reply! I can add an error, but note that the caller can set then get the callbacks and compare to check whether the caller's callbacks were taken. I could also add a new set of callback setters with ignore-if-set flags. As long as the existing ones behave reliably in the already-set case. In the already-set case I think it may well be best to ignore without failing on the theory that the caller that first set the callbacks must have set sufficiently useful ones anyways... and that where the OS has a good enough default threading library, that's the one that will be used by all DSOs calling OpenSSL in the same process, as otherwise all hell would already be breaking loose anyways! (I can imagine twisted cases where this would not be true, but they seem exceedingly unlikely.) If you want to see the half-baked bits I have (which build on Linux, but which aren't tested) to see what I'm up to, see https://github.com/nicowilliams/openssl, specifically the thread_safety branch. See the XXX comments in rand_lib.c in particular. The outline: add a thread-safe one-time initialization function, built on whatever the OS provides, then use that to make callback init thread-safe. What I need to know: - should i add new targets to ./Configure? for now I modified the linux-elf target, but this feels wrong to me. - what about Windows? I either need to have different targets for pre-vista/2008 or. i have to write a once initialization function for older Windows (which I can and know how to do, it's just more work that, and in particular i couldn't test it, so I'm not inclined to do it). - if so, should ./config automatically pick the new targets where there is appropriate threading support? - how to allocate error codes for already initialized errors that you suggest? - should I work to make sure that it's possible to change the default RAND method after it's been set once? The code in rand_lib.c is currently fundamentally thread-unsafe, though it could be accidentally thread-safe if, e.g., ENGINE_finish() doesn't actually tear down state at all. The simplest fix involves setting the default only once, as wih the callbacks, but here I feel that's a shaky idea, that I should allow RAND method changes at any time, in a thread-safe manner -- more work for me, but less surprising. Nico -- (sent from a mobile device with lousy typing options, and no plain text button) (my patches need rebasing to squash and split up, need tests, need finishing, but if you have comments I would love them sooner than later! :)