Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
Provide these so that the assembler users can be oblivious about whether this is PIC or non-PIC, 64-bit or 32-bit, etc. http://cvs.openssl.org/chngview?cn=22855. It's adapted for Sun compiler. Most essential difference is that latter doesn't handle ## correctly, therefore one has to pass complete register names as arguments, e.g. SPARC_LOAD_ADDRESS(OPENSSL_sparcv9cap_P,%g1,%g2). Had to compensate for other differences, made some rearrangements and changed name for one internal macro to SPARC_SETUP_GOT_REG... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
Basically, in the context I'd prefer not to touch aes-sparcv9.pl and stick to aesni approach as the only one, i.e. keep T4 code as separate module referred from EVP. It allows to concentrate on things that matter, optimizing specific modes performance. By extension it's preferred approach even for other ciphers. Too many pieces of code stil call interfaces such as AES_*() directly and do not go through EVP. This I considered a show stopper, and an absolute failure of design with the way the Oracle folks implemented crypto opcode support in their openssl changes. It is absolutely impractical to have an EVP only driver for this stuff. As for Oracle, they all are [or definitely should be and have been] pro-EVP, because crypto support on pre-T4 was relying on pluggable engine interface and EVP is the *only* way to utilize it. Secondly, if you stick to old interface [and want parallelizable modes] you don't get adequate performance. AES-NI is available only though EVP (normally developers target on multiple platforms). EVP interface is the one that gets FIPS-validated, not low-level. There is a lot of incentives to use EVP, and most critical applications do so. I consider supporting the old APIs a requirement. Not at arbitrary high costs... __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
From: Andy Polyakov ap...@openssl.org Date: Fri, 28 Sep 2012 17:37:19 +0200 As for Oracle, they all are [or definitely should be and have been] pro-EVP, because crypto support on pre-T4 was relying on pluggable engine interface and EVP is the *only* way to utilize it. That's really Oracle's problem, and nothing I am concerned with at all. Secondly, if you stick to old interface [and want parallelizable modes] you don't get adequate performance. AES-NI is available only though EVP (normally developers target on multiple platforms). EVP interface is the one that gets FIPS-validated, not low-level. There is a lot of incentives to use EVP, and most critical applications do so. Even supposedly well maintained trees using openssl's interfaces such as OpenSSH still use a mixture of EVP and direct AES calls. It is impractical to say that everyone should convert. A library is supposed to be maximally useful to it's users, both existing and new. This is violated by simply dismissing existing users who don't use EVP. I consider supporting the old APIs a requirement. Not at arbitrary high costs... At least for AES and Camellia, the amount of changes necessary for T4 direct support was very low. And BTW, there is precedence for this, as this is what already is done for the s390 crypto instruction support. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
As for Oracle, they all are [or definitely should be and have been] pro-EVP, because crypto support on pre-T4 was relying on pluggable engine interface and EVP is the *only* way to utilize it. That's really Oracle's problem, and nothing I am concerned with at all. Secondly, if you stick to old interface [and want parallelizable modes] you don't get adequate performance. AES-NI is available only though EVP (normally developers target on multiple platforms). EVP interface is the one that gets FIPS-validated, not low-level. There is a lot of incentives to use EVP, and most critical applications do so. Even supposedly well maintained trees using openssl's interfaces such as OpenSSH still use a mixture of EVP and direct AES calls. There is only one place OpenSSH calls AES_* directly and that's their own counter mode implementations. The reason they do is that there was no EVP counter in OpenSSL at the time. But what do they do with it? They actually ... implement EVP interface. So that the only code modification that is required in OpenSSH is to lookup if counter is already provided and use it or fall back to own implementation. A library is supposed to be maximally useful to it's users, both existing and new. This is violated by simply dismissing existing users who don't use EVP. Give more examples. I consider supporting the old APIs a requirement. Not at arbitrary high costs... At least for AES and Camellia, the amount of changes necessary for T4 direct support was very low. Not from my viewpoint... And BTW, there is precedence for this, as this is what already is done for the s390 crypto instruction support. And I regret every bit of it! Day will come for a change... But you contradict yourself:-) If you don't care about what Oracle does, why do you care about IBM? It was a joke! __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
From: Andy Polyakov ap...@openssl.org Date: Fri, 28 Sep 2012 21:05:27 +0200 As for Oracle, they all are [or definitely should be and have been] pro-EVP, because crypto support on pre-T4 was relying on pluggable engine interface and EVP is the *only* way to utilize it. That's really Oracle's problem, and nothing I am concerned with at all. Secondly, if you stick to old interface [and want parallelizable modes] you don't get adequate performance. AES-NI is available only though EVP (normally developers target on multiple platforms). EVP interface is the one that gets FIPS-validated, not low-level. There is a lot of incentives to use EVP, and most critical applications do so. Even supposedly well maintained trees using openssl's interfaces such as OpenSSH still use a mixture of EVP and direct AES calls. There is only one place OpenSSH calls AES_* directly and that's their own counter mode implementations. The reason they do is that there was no EVP counter in OpenSSL at the time. But what do they do with it? They actually ... implement EVP interface. So that the only code modification that is required in OpenSSH is to lookup if counter is already provided and use it or fall back to own implementation. A library is supposed to be maximally useful to it's users, both existing and new. This is violated by simply dismissing existing users who don't use EVP. Give more examples. Ok I could be convinced about crypto operations, in that case. And BTW, there is precedence for this, as this is what already is done for the s390 crypto instruction support. And I regret every bit of it! Day will come for a change... But you contradict yourself:-) If you don't care about what Oracle does, why do you care about IBM? It was a joke! My point was that I specifically am generally against how Oracle designed their T4 openssl changes. They even optimized hashing only via the EVP interfaces, and that's the real joke. Even something common like GIT does direct SHA1 calls. Also, instead of supporting Montgomery Multiply and Square instructions directly, they translate between OpenSSL bignums and the bignum format used in Solaris's libsoftcrypto, then they call into libsoftcrypto to do the work. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
From: Andy Polyakov ap...@openssl.org Date: Fri, 28 Sep 2012 11:47:09 +0200 It's adapted for Sun compiler. Most essential difference is that latter doesn't handle ## correctly, It was hard for me to figure out what exactly this problem is since no specific details have been provided. So I asked someone on a Solaris system to try: #define foo(x) insn %##x, xyzzy foo(g1) and with SunPRO it gives. cc -E /tmp/foo | grep insn insn % g1 , xyzzy Is that space causing a problem? From my testing GNU as is perfectly fine with things like: or % g1, 0, % g1 Does the SunPRO assembler reject these kinds of things? If you are terse on the details of the problems which crop up on Solaris, I will be unable to help suggest alternatives. So please do provide them in the future. Thank you. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
I'll handle this, but differently. Specifically I won't go through GOT, but directly to variable, something like this: I would like to politely request that you don't go down this road. .Lretl: retl nop ... sethi %hi(var-.Lpic),%reg .Lpic: call.Lretl add %o7,%lo(var-.Lpic),%reg I honestly think it's easiest to to simply generate correct PIC sequences, as my macros are trying to do. Being hit by various problems it became kind of occupational hazard: assume bare minimum. The rule of thumb forged over years is if it's possible to rely on generic instruction set and assembler properties, not specific to some binary format, then it's preferable approach. You can blame me for it, but it served OpenSSL very well at numerous occasions and I see no reason to give it up. Naturally unless one has to. And it seems to be the case here. Because (var-.Lpic) doesn't seem to work with external variables on SPARC Solaris. Unfortunate... As for preferred approach mentioned above. I for one have never actually tested assembly modules on linux-sparc. But it didn't prevent me from being sure that they work. Because they were sheer code segments, which is direct result of the conservative approach. We can add whatever ifdefs and code generation cases we need to sparc_arch.h The code that I'm emitting is identical to what GCC generates on Linux and Solaris under Sparc regardless of which assembler and linker are in use. Well, vendor compiler doesn't define __PIC__, so macros wouldn't work there... Let me ponder... It might take time to figure it out (because of my time constrains), but I'll get there. I'm pretty confident that you'll be satisfied. I should know, I wrote much of the sparc GCC backend. If you describe to me what problems your scheme ran into, I can fix them up. The above mentioned rule is a general rule. In SPARC case it was rather projection of experience with x86, x86_64, ppc on multiple systems (think mixture of Windows, MacOS X, AIX). __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
From: Andy Polyakov ap...@openssl.org Date: Sat, 22 Sep 2012 20:01:03 +0200 And it seems to be the case here. Because (var-.Lpic) doesn't seem to work with external variables on SPARC Solaris. Unfortunate... The simplistic existing expressions also won't work for des_enc.m4's local tables once the DES opcode code is added, because the 13-bit relocation isn't large enough. Yes I do remember you suggested a sethi/or based scheme to handle arbitrary symbols. Well, vendor compiler doesn't define __PIC__, We have OPENSSL_PIC, just use that. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
About the RAS stack missing cost, every Sun produced UltraSPARC chip pushes unconditionally onto the RAS and does not special case the call.+8 pattern. Thinking about this logically, a RAS miss can (at best) perform like a full branch misprediction. Which on UltraSPARC results in a full pipeline flush as the mis-predicted fetched instructions needs to be cancelled and cleared out of the pipeline so we can begin executing down the correct path. This can be huge, depending upon the contents of the improperly fetched path of instructions. In the worst possible case, up to 18 instructions can need to be cancelled (UltraSPARC-I programmers manual, section 16.2.9, page 270) I have to say that 18 instructions is very optimistic, that high IPC is rather rare (it has to be right mixture of integer, load and floating point operations). One should rather think of cycles, not amount of instructions. And even if number of cycles can appear substantial it still has to be taken in perspective. For example, originally AES code looked like this: 1: call_sparcv9_AES_encrypt add %o7,AES_Te-1b,%o4 By all means most optimal, because it doesn't consume extra entry of otherwise limited RAS. It didn't work with Purify, so it became 1: call.+8 add %o7,AES-1b,%o4 call_sparcv8_AES_encrypt Performance difference? Less than -1% for speed aes-128-cbc, -2% for speed -evp aes-128-ecb for small blocks and 0% for large blocks. On UltraSPARC. And as mentioned SPARC V [and beyond] handles call .+8 explicitly. So that unless it's shown that T-SPARCs suffer greatly, I wouldn't have bad conscience using call .+8 under circumstances. I agree that it's not the most optimal, but at the same time no real reason to fill bad about it. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
From: Andy Polyakov ap...@openssl.org Date: Sat, 22 Sep 2012 21:04:47 +0200 I agree that it's not the most optimal, but at the same time no real reason to fill bad about it. But on the other hand I've done all the work to implement the macros to do the PIC sequence properly. You really don't have to implement anything. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
From: David Miller da...@davemloft.net Date: Sat, 22 Sep 2012 15:14:23 -0400 (EDT) From: Andy Polyakov ap...@openssl.org Date: Sat, 22 Sep 2012 21:04:47 +0200 I agree that it's not the most optimal, but at the same time no real reason to fill bad about it. But on the other hand I've done all the work to implement the macros to do the PIC sequence properly. You really don't have to implement anything. BTW, two other points need restating: 1) My macros handle the non-PIC case optimally. 2) Your RAS corruption cost considerations are only considering the most immediate effect on the return from the assembler routine in question. Whereas the true RAS miss cost must be multiplied onto the next N functions up in the call chain, where N is the size of the RAS. Since all of those will miss as well. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
I agree that it's not the most optimal, but at the same time no real reason to fill bad about it. But on the other hand I've done all the work to implement the macros to do the PIC sequence properly. You really don't have to implement anything. BTW, two other points need restating: 1) My macros handle the non-PIC case optimally. 2) Your RAS corruption cost considerations are only considering the most immediate effect on the return from the assembler routine in question. Whereas the true RAS miss cost must be multiplied onto the next N functions up in the call chain, where N is the size of the RAS. Since all of those will miss as well. N is 4 on UltraSPARC. For comparison, in AES case depth from EVP_encrypt to assembly code is 4, so that penalties don't spill on caller. [Apparently we are talking about obsolete platform, as I measure no performance difference between sequences depicted in last message on T4.] All I'm saying is that it doesn't have to be classified as absolutely critical to fix. Basically, in the context I'd prefer not to touch aes-sparcv9.pl and stick to aesni approach as the only one, i.e. keep T4 code as separate module referred from EVP. It allows to concentrate on things that matter, optimizing specific modes performance. By extension it's preferred approach even for other ciphers. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
From: Andy Polyakov ap...@openssl.org Date: Sat, 22 Sep 2012 22:02:55 +0200 Basically, in the context I'd prefer not to touch aes-sparcv9.pl and stick to aesni approach as the only one, i.e. keep T4 code as separate module referred from EVP. It allows to concentrate on things that matter, optimizing specific modes performance. By extension it's preferred approach even for other ciphers. Too many pieces of code stil call interfaces such as AES_*() directly and do not go through EVP. This I considered a show stopper, and an absolute failure of design with the way the Oracle folks implemented crypto opcode support in their openssl changes. It is absolutely impractical to have an EVP only driver for this stuff. I consider supporting the old APIs a requirement. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
Provide these so that the assembler users can be oblivious about whether this is PIC or non-PIC, 64-bit or 32-bit, etc. It is important to use a real call and return to implement the obtaining of the %pc as part of the PIC sequence. Sequences such as: call. + 8 mov %o7, %PIC_REG are to be avoided at all costs on UltraSPARC cpus. This is because such a sequence flushes the Return Address Stack (RAS) because the call is not paired with a return. Every time a call or jmpl with RD=%o7 is performed, the chip pushes the PC+8 onto the top of the RAS. The next jmpl %o7 + 8 or return %i7 + 8 the chip sees will cause it to pop the top entry off the RAS and begin fetching down that path. If there is a mis-match the entire pipeline is flushed and the chip restarts fetching down the correct path. Therefore, the above discouraged sequence will cause all of the RAS entries to mismatch and there will therefore be a full pipeline flush on every subsequent function return. Well, last time I looked into this I could establish following. call .+8 was actually used by vendor compiler [maybe not anymore, I don't know, but at Sun days it was used extensively]. SPARC V manual is explicit about call .+8 *not* affecting RAS. Purify also was discussed in context, and it actually recognizes the construct and treats it specially. In other words it was considered widely adopted practice and it was found to be backed up by at least one hardware design. Penalties are measured to be minimal on UltraSPARC, two additional cycles (in comparison to 20 cycles for save and restore alone). But of course today situation might be different and T-SPARCs can suffer from it more... I'll handle this, but differently. Specifically I won't go through GOT, but directly to variable, something like this: .Lretl: retl nop ... sethi %hi(var-.Lpic),%reg .Lpic: call.Lretl add %o7,%lo(var-.Lpic),%reg This works with both Solaris and Linux toolchains and in both 32- and 64-bit mode (it was hell to get des_enc work everywhere). In 64-bit mode it implies that shared library itself is limited by 2GB, but it's considered reasonable limitation. Avoiding GOT allows to hide OPENSSL_sparcv9cap_P with __attribute__((visibility(hidden)));. Now it's static. Once again, don't think about it no more, it will be taken care of. As for OPENSSL_sparcv9cap_P itself. I'd rather extend it to int OPENSSL_sparcv9cap_P[2] and save %cfr as is to OPENSSL_sparcv9cap_P[1]. Any objections? __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
From: Andy Polyakov ap...@openssl.org Date: Fri, 21 Sep 2012 12:21:25 +0200 I'll handle this, but differently. Specifically I won't go through GOT, but directly to variable, something like this: I would like to politely request that you don't go down this road. .Lretl: retl nop ... sethi %hi(var-.Lpic),%reg .Lpic:call.Lretl add %o7,%lo(var-.Lpic),%reg I honestly think it's easiest to to simply generate correct PIC sequences, as my macros are trying to do. We can add whatever ifdefs and code generation cases we need to sparc_arch.h The code that I'm emitting is identical to what GCC generates on Linux and Solaris under Sparc regardless of which assembler and linker are in use. I should know, I wrote much of the sparc GCC backend. If you describe to me what problems your scheme ran into, I can fix them up. Did you test if my code sequences work for you? It is also important to note that they are also specifically designed to be usable in leaf functions. BTW, the real long term answer is mark openssl internal symbols as hidden and then use GOT_DATA optimization sequences which will get rid of the GOT reference altogether. But that requires some configure checks to see if the assembler and linker support these constructs. As for OPENSSL_sparcv9cap_P itself. I'd rather extend it to int OPENSSL_sparcv9cap_P[2] and save %cfr as is to OPENSSL_sparcv9cap_P[1]. Any objections? I think this is code masterbation at this early stage of the sparc crypto opcode support implementation and is something we can clean up later. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
Here is a more detailed reply specifically about generating correct and optimal Sparc PIC sequences. Let's get the non-PIC static case out of the way, we should always use: set symbol, %reg! 32-bit setxsymbol, %tmp_reg, %reg ! 64-bit Using calls to PIC stubs is completely pointless overhead when we are doing a static build. If we are generating PIC we need a stub function, there are a lot of ways to do this. One scheme is to simply emit a stub in each source file where the stub is needed. If the assembler and linker support got-data optimizations, we can emit the following sequence: sethi %hi(_GLOBAL_OFFSET_TABLE_-4), %PIC_REG call__sparc_pic_stub or %PIC_REG, %lo(_GLOBAL_OFFSET_TABLE_+4), %PIC_REG sethi %gdop_hix22(symbol), %TMP xor %TMP, %gdop_lox10(symbol), %TMP LDPTR [%PIC_REG + %TMP], %REG, %gdop(symbol) If the linker finds that the resolution of symbol (f.e. the symbol is static to the compilation unit, or marked as 'hidden') can be done at final link time, that LDPTR above will be optimized into: add %PIC_REG, %TMP, %REG The symbol offset will also be adjusted, as needed, in the %gdop_*() sethi and xor instructions. And finally, the reference to the global offset table slot that would have been generated for 'symbol', will be removed. Otherwise, if the linker and assembler lack gotdata optimization support, we use just a plain PIC sequence: sethi %hi(_GLOBAL_OFFSET_TABLE_-4), %PIC_REG call__sparc_pic_stub or %PIC_REG, %lo(_GLOBAL_OFFSET_TABLE_+4), %PIC_REG sethi %hi(symbol), %TMP or %TMP, %lo(symbol), %TMP LDPTR [%PIC_REG + %TMP], %REG If this doesn't work in some cases, we need to discover exactly why instead of dismissing my approach completely. Now, of course, all of the above if for -fPIC, but I see no sparc target (nor any target except one strange hpux case) that specifies -fpic instead of -fPIC in Configure. However that case is simple to accomodate as well, and I'd be happy to do so in my macros. About the RAS stack missing cost, every Sun produced UltraSPARC chip pushes unconditionally onto the RAS and does not special case the call.+8 pattern. Thinking about this logically, a RAS miss can (at best) perform like a full branch misprediction. Which on UltraSPARC results in a full pipeline flush as the mis-predicted fetched instructions needs to be cancelled and cleared out of the pipeline so we can begin executing down the correct path. This can be huge, depending upon the contents of the improperly fetched path of instructions. In the worst possible case, up to 18 instructions can need to be cancelled (UltraSPARC-I programmers manual, section 16.2.9, page 270) Worse than the immediate cost of the RAS corruption, is that every subsequent function return out of openssl is going to miss the RAS and incur the penalty as well. I consider it absolutely critical that the PIC sequences support being used in leaf functions, without save and restore instructions. And my macros have been designed with this in mind. When used, one need not allocate a register window merely for the sake of performing a PIC sequence. When we get past these initial patches and I post my DES work, you will see that I adjusted dec_enc.m4 to use the new PIC interfaces I created. In fact I had to, because the 13-bit relocations used there no longer fit with the crypto opcode code added. There are other problems in des_enc.m4, which I have fixed in my patches. As just one other example, it doesn't include opensslconf.h and therefore OPENSSL_SYSNAME_ULTRASPARC is never defined and the V9 sequences are never used for 32-bit, which hurts performance. Only one valid set of CPP tests exists for the various cases we care about on sparc. __PIC__ means PIC code generation is in use. __arch64__ means 64-bit code generation, and __sparc_v9__ means V9 code can be used. These are fully standardized and both SunPRO and GCC set them consistently. __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org
[PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.
Provide these so that the assembler users can be oblivious about whether this is PIC or non-PIC, 64-bit or 32-bit, etc. It is important to use a real call and return to implement the obtaining of the %pc as part of the PIC sequence. Sequences such as: call. + 8 mov %o7, %PIC_REG are to be avoided at all costs on UltraSPARC cpus. This is because such a sequence flushes the Return Address Stack (RAS) because the call is not paired with a return. Every time a call or jmpl with RD=%o7 is performed, the chip pushes the PC+8 onto the top of the RAS. The next jmpl %o7 + 8 or return %i7 + 8 the chip sees will cause it to pop the top entry off the RAS and begin fetching down that path. If there is a mis-match the entire pipeline is flushed and the chip restarts fetching down the correct path. Therefore, the above discouraged sequence will cause all of the RAS entries to mismatch and there will therefore be a full pipeline flush on every subsequent function return. It is also highly discourgaged to use rd %pc, %PIC_REG because that is extremely slow on UltraSPARC cpus. The cost of a RDPC instruction amounts essentially to a pipeline flush. Signed-off-by: David S. Miller da...@davemloft.net --- crypto/sparc_arch.h | 70 +++ 1 file changed, 70 insertions(+) diff --git a/crypto/sparc_arch.h b/crypto/sparc_arch.h index 3ece96a..bcb4829 100644 --- a/crypto/sparc_arch.h +++ b/crypto/sparc_arch.h @@ -25,4 +25,74 @@ extern int OPENSSL_sparcv9cap_P; #define SPARCV9_MONTSQR(117) #define SPARCV9_CRC32C (118) +#if __ASSEMBLER__ + +#ifdef __PIC__ +#define SPARC_PIC_THUNK(reg) \ + .align 32; \ +.Lpic_thunk: \ + jmp %o7 + 8;\ +add%o7, %##reg, %##reg; +#else +#define SPARC_PIC_THUNK(reg) +#endif + +#define SPARC_PIC_THUNK_CALL(reg) \ + sethi %hi(_GLOBAL_OFFSET_TABLE_-4), %##reg; \ + call.Lpic_thunk;\ +or %##reg, %lo(_GLOBAL_OFFSET_TABLE_+4), %##reg; + +#define SPARC_SETUP_PIC_REG(reg) \ + SPARC_PIC_THUNK_CALL(reg) + +#define SPARC_SETUP_PIC_REG_LEAF(reg, tmp) \ + mov %o7, %##tmp;\ + SPARC_PIC_THUNK_CALL(reg); \ + mov %##tmp, %o7; + +#ifdef __arch64__ +#define LDPTR ldx +#else +#define LDPTR ld #endif + +#ifdef __PIC__ + +#define SPARC_LOAD_ADDRESS(SYM, reg, tmp) \ + SPARC_SETUP_PIC_REG(reg); \ + sethi %hi(SYM), %##tmp; \ + or %##tmp, %lo(SYM), %##tmp; \ + LDPTR [%##reg + %##tmp], %##reg; + +#define SPARC_LOAD_ADDRESS_LEAF(SYM, reg, tmp) \ + SPARC_SETUP_PIC_REG_LEAF(reg, tmp); \ + sethi %hi(SYM), %##tmp; \ + or %##tmp, %lo(SYM), %##tmp; \ + LDPTR [%##reg + %##tmp], %##reg; + +#else + +#ifdef __arch64__ +#define SPARC_LOAD_ADDRESS(SYM, reg, tmp) \ + setxSYM, %##tmp, %##reg; +#else +#define SPARC_LOAD_ADDRESS(SYM, reg, tmp) \ + set SYM, %##reg; +#endif + +#define SPARC_LOAD_ADDRESS_LEAF(SYM, reg, tmp) \ + SPARC_LOAD_ADDRESS(SYM, reg, tmp) + +#endif + +#define SPARC_LOAD_V9_CAPS(reg, tmp) \ + SPARC_LOAD_ADDRESS(OPENSSL_sparcv9cap_P, reg, tmp); \ + ld [%##reg], %##reg; + +#define SPARC_LOAD_V9_CAPS_LEAF(reg, tmp) \ + SPARC_LOAD_ADDRESS_LEAF(OPENSSL_sparcv9cap_P, reg, tmp);\ + ld [%##reg], %##reg; + +#endif /* __ASSEMBLER__ */ + +#endif /* __SPARC_ARCH_H__ */ -- 1.7.10.4 __ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org