If you want this in the mainstream code, you'll need to detect the capability at runtime and use your alternate code paths only if the hardware is present. It's not even to Intels advantage if OpenSSL crashes and burns on older Intel CPU's and most bulk users of OpenSSL (OS vendors) won't want to mess around installing different OpenSSL versions for different hardware.
Autodetection is the best option if the detection overhead is reasonable - take a look at crypto/x86_64cpuid.pl for how to do the detection logic neatly. There are advantages in this being present all the time/dynamically enabled if it can be done, most users/OS vendors wouldn't bother to configure an engine backend anyway. I'll disagree with Andy on that aspect only. The engine modules aren't particularly useful for this situation where the function is inherent in some subset of CPU's, the engines will only get used by a few end users that can be bothered to configure them. I doubt the OS vendors would bother to enable an engine by default, testing of the possible configurations is expensive and the costs of support calls if they mess up makes autodetecting the engine to use a very unattractive proposition. (i.e. You get scenarios like building an image on a system with the new hardware then cloning it across large numbers of machines ....) Peter From: Andy Polyakov <[EMAIL PROTECTED]> To: openssl-dev@openssl.org Date: 10/12/2008 05:42 Subject: Re: [PATCH RFC] Add support to Intel AES-NI instruction set for x86_64 platform As for RFC part. NO! This is NOT the way to do it. For several reasons (in ascending order of importance): - OpenSSL assembler modules are maintained as dual-ABI, i.e. suitable for both Unix and Win64; - "and $-16, %rdx" is unacceptable in this context. The relevant interface is exposed to end-user and we have to reserve for possibility that key schedule is memcpy-ed to location with alternative alignment; - zero-copy CBC routine gives a fair performance improvement even in ordinary case, and driving ultra-fast block function from C would be just wasteful. In other words AESENC/DEC would benefit more from dedicated CBC routine (see even comment below); - implementation should allow for pipelining; As for the latter. I refer to possibility of scheduling of multiple AESENC/DEC with same key schedule element and multiple data chunks. It's possible in modes that allow for parallelization (e.g. ECB, CBC decrypt, CTR), and as far as I understand it is even recommended. So we are kind of obliged to reserve for this option. The answer is engine. I mean this preferably should be implemented as engine that will be able to take full advantage of architecture, not as patch to general purpose block function. > This patch adds support to Intel AES-NI instruction set for x86_64 > platform. > > Intel AES-NI is a new set of Single Instruction Multiple Data (SIMD) > instructions that are going to be introduced in the next generation of > Intel processor, as of 2009. Hardware however is not expected before 2010, right? A. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED] ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager [EMAIL PROTECTED]