>> In case you consider submitting assembler code there is couple of >> requirements that has to be met. Inline assembler (or exotic intrinsics) >> is not considered as viable option for MMX/SSE (or any code bigger than >> couple of instructions), perlasm code is. > > As it is available in the MSC, Intel and GCC compilers, I have realized it > with intrinsics.
Well, they are not available in *all* MSC adn GCC versions, are they? Another reason for favoring real assembler is that it's not uncommon that you find yourself at compiler's mercy to produce efficient code and performance can vary significantly from version to version. As we aim to support quite a range of developer environments, there is no reason to "punish" users with "wrong" compiler versions. Frankly it's troublesome enough to even identify "wrong"/"right" compiler versions. > Sorry, did not know of this OpenSSL policy. The current > construction is: > > #if defined(_MSC_VER) && (defined(_M_IX86) || defined(_M_AMD64) || > defined(_M_X64)) || \ > defined(__GNUC__) && (defined(__i386__) || defined(__amd64__) || > defined(__x86_64__)) || \ > defined(__ICL) || defined(_EMM_FUNCTIONALITY) > #define EMM_INTRINSICS > #endif > > #define X86_CPUID_BIT_SSE2 0x04000000 > > #ifdef EMM_INTRINSICS > if(X86_CPUID_BIT_SSE2 & OPENSSL_ia32cap) > { > SSE2 version; > } > else > #endif > { > classic version; > } > > This should compile and execute on any system. "Should" is not part of vocabulary in this context, "does" is, and the only way to assure "does" in long run is perlasm. >>> - replacement of allocated tables by local (stack) tables (as the table >>> generation is now faster than the overhead for an alloc), >> Good idea if we settle for "one-shot" interface... > > Even for one block a local 256-bytes table is much faster than IBM's > GCM_mult_noaccel() or B.Gladman's "slow field multiplier", both with 128 > unpredictable branches. But we have to weight it against own implementation(s), not somebody else's... > And with SSE2, the table build is such efficient > that it does not consume more cycles than one block. So, it will also be > suited for more than one-shot applications. Cool. But once again, I have benchmarking left to do, so we have to postpone this particular topic... A. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org