Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-28 Thread Andy Polyakov
 Provide these so that the assembler users can be oblivious about
 whether this is PIC or non-PIC, 64-bit or 32-bit, etc.

http://cvs.openssl.org/chngview?cn=22855. It's adapted for Sun compiler.
Most essential difference is that latter doesn't handle ## correctly,
therefore one has to pass complete register names as arguments, e.g.
SPARC_LOAD_ADDRESS(OPENSSL_sparcv9cap_P,%g1,%g2). Had to compensate for
other differences, made some rearrangements and changed name for one
internal macro to SPARC_SETUP_GOT_REG...
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-28 Thread Andy Polyakov
 Basically, in the context I'd prefer not to touch aes-sparcv9.pl and
 stick to aesni approach as the only one, i.e. keep T4 code as
 separate module referred from EVP. It allows to concentrate on
 things that matter, optimizing specific modes performance. By
 extension it's preferred approach even for other ciphers.
 
 Too many pieces of code stil call interfaces such as AES_*() directly
 and do not go through EVP.
 
 This I considered a show stopper, and an absolute failure of design
 with the way the Oracle folks implemented crypto opcode support in
 their openssl changes.
 
 It is absolutely impractical to have an EVP only driver for this
 stuff.

As for Oracle, they all are [or definitely should be and have been]
pro-EVP, because crypto support on pre-T4 was relying on pluggable
engine interface and EVP is the *only* way to utilize it. Secondly, if
you stick to old interface [and want parallelizable modes] you don't get
adequate performance. AES-NI is available only though EVP (normally
developers target on multiple platforms). EVP interface is the one that
gets FIPS-validated, not low-level. There is a lot of incentives to use
EVP, and most critical applications do so.

 I consider supporting the old APIs a requirement.

Not at arbitrary high costs...
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-28 Thread David Miller
From: Andy Polyakov ap...@openssl.org
Date: Fri, 28 Sep 2012 17:37:19 +0200

 As for Oracle, they all are [or definitely should be and have been]
 pro-EVP, because crypto support on pre-T4 was relying on pluggable
 engine interface and EVP is the *only* way to utilize it.

That's really Oracle's problem, and nothing I am concerned with at
all.

 Secondly, if you stick to old interface [and want parallelizable
 modes] you don't get adequate performance. AES-NI is available only
 though EVP (normally developers target on multiple platforms). EVP
 interface is the one that gets FIPS-validated, not low-level. There
 is a lot of incentives to use EVP, and most critical applications do
 so.

Even supposedly well maintained trees using openssl's interfaces
such as OpenSSH still use a mixture of EVP and direct AES calls.

It is impractical to say that everyone should convert.

A library is supposed to be maximally useful to it's users, both
existing and new.  This is violated by simply dismissing existing
users who don't use EVP.

 I consider supporting the old APIs a requirement.
 
 Not at arbitrary high costs...

At least for AES and Camellia, the amount of changes necessary for T4
direct support was very low.

And BTW, there is precedence for this, as this is what already is done
for the s390 crypto instruction support.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-28 Thread Andy Polyakov

As for Oracle, they all are [or definitely should be and have been]
pro-EVP, because crypto support on pre-T4 was relying on pluggable
engine interface and EVP is the *only* way to utilize it.


That's really Oracle's problem, and nothing I am concerned with at
all.


Secondly, if you stick to old interface [and want parallelizable
modes] you don't get adequate performance. AES-NI is available only
though EVP (normally developers target on multiple platforms). EVP
interface is the one that gets FIPS-validated, not low-level. There
is a lot of incentives to use EVP, and most critical applications do
so.


Even supposedly well maintained trees using openssl's interfaces
such as OpenSSH still use a mixture of EVP and direct AES calls.


There is only one place OpenSSH calls AES_* directly and that's their 
own counter mode implementations. The reason they do is that there was 
no EVP counter in OpenSSL at the time. But what do they do with it? They 
actually ... implement EVP interface. So that the only code modification 
that is required in OpenSSH is to lookup if counter is already provided 
and use it or fall back to own implementation.



A library is supposed to be maximally useful to it's users, both
existing and new.  This is violated by simply dismissing existing
users who don't use EVP.


Give more examples.


I consider supporting the old APIs a requirement.

Not at arbitrary high costs...


At least for AES and Camellia, the amount of changes necessary for T4
direct support was very low.


Not from my viewpoint...


And BTW, there is precedence for this, as this is what already is done
for the s390 crypto instruction support.


And I regret every bit of it! Day will come for a change... But you 
contradict yourself:-) If you don't care about what Oracle does, why do 
you care about IBM? It was a joke!


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-28 Thread David Miller
From: Andy Polyakov ap...@openssl.org
Date: Fri, 28 Sep 2012 21:05:27 +0200

 As for Oracle, they all are [or definitely should be and have been]
 pro-EVP, because crypto support on pre-T4 was relying on pluggable
 engine interface and EVP is the *only* way to utilize it.
 That's really Oracle's problem, and nothing I am concerned with at
 all.
 
 Secondly, if you stick to old interface [and want parallelizable
 modes] you don't get adequate performance. AES-NI is available only
 though EVP (normally developers target on multiple platforms). EVP
 interface is the one that gets FIPS-validated, not low-level. There
 is a lot of incentives to use EVP, and most critical applications do
 so.
 Even supposedly well maintained trees using openssl's interfaces
 such as OpenSSH still use a mixture of EVP and direct AES calls.
 
 There is only one place OpenSSH calls AES_* directly and that's their
 own counter mode implementations. The reason they do is that there was
 no EVP counter in OpenSSL at the time. But what do they do with it?
 They actually ... implement EVP interface. So that the only code
 modification that is required in OpenSSH is to lookup if counter is
 already provided and use it or fall back to own implementation.
 
 A library is supposed to be maximally useful to it's users, both
 existing and new.  This is violated by simply dismissing existing
 users who don't use EVP.
 
 Give more examples.

Ok I could be convinced about crypto operations, in that case.

 And BTW, there is precedence for this, as this is what already is done
 for the s390 crypto instruction support.
 
 And I regret every bit of it! Day will come for a change... But you
 contradict yourself:-) If you don't care about what Oracle does, why
 do you care about IBM? It was a joke!

My point was that I specifically am generally against how Oracle
designed their T4 openssl changes.

They even optimized hashing only via the EVP interfaces, and that's
the real joke.  Even something common like GIT does direct SHA1
calls.

Also, instead of supporting Montgomery Multiply and Square
instructions directly, they translate between OpenSSL bignums and the
bignum format used in Solaris's libsoftcrypto, then they call into
libsoftcrypto to do the work.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-28 Thread David Miller
From: Andy Polyakov ap...@openssl.org
Date: Fri, 28 Sep 2012 11:47:09 +0200

 It's adapted for Sun compiler.  Most essential difference is that
 latter doesn't handle ## correctly,

It was hard for me to figure out what exactly this problem is since
no specific details have been provided.

So I asked someone on a Solaris system to try:


#define foo(x)  insn %##x, xyzzy

foo(g1)


and with SunPRO it gives.

cc -E /tmp/foo | grep insn
 insn % g1 , xyzzy 

Is that space causing a problem?  From my testing GNU as is
perfectly fine with things like:

or  % g1, 0, % g1

Does the SunPRO assembler reject these kinds of things?

If you are terse on the details of the problems which crop up on
Solaris, I will be unable to help suggest alternatives.  So please do
provide them in the future.

Thank you.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-22 Thread Andy Polyakov

I'll handle this, but differently. Specifically I won't go through GOT,
but directly to variable, something like this:


I would like to politely request that you don't go down this road.


.Lretl:
retl
nop
...
sethi   %hi(var-.Lpic),%reg
.Lpic:  call.Lretl
add %o7,%lo(var-.Lpic),%reg


I honestly think it's easiest to to simply generate correct PIC
sequences, as my macros are trying to do.


Being hit by various problems it became kind of occupational hazard: 
assume bare minimum. The rule of thumb forged over years is if it's 
possible to rely on generic instruction set and assembler properties, 
not specific to some binary format, then it's preferable approach. You 
can blame me for it, but it served OpenSSL very well at numerous 
occasions and I see no reason to give it up. Naturally unless one has 
to. And it seems to be the case here. Because (var-.Lpic) doesn't seem 
to work with external variables on SPARC Solaris. Unfortunate...


As for preferred approach mentioned above. I for one have never actually 
tested assembly modules on linux-sparc. But it didn't prevent me from 
being sure that they work. Because they were sheer code segments, which 
is direct result of the conservative approach.



We can add whatever ifdefs and code generation cases we need to
sparc_arch.h The code that I'm emitting is identical to what GCC
generates on Linux and Solaris under Sparc regardless of which
assembler and linker are in use.


Well, vendor compiler doesn't define __PIC__, so macros wouldn't work 
there... Let me ponder... It might take time to figure it out (because 
of my time constrains), but I'll get there. I'm pretty confident that 
you'll be satisfied.



I should know, I wrote much of the sparc GCC backend.

If you describe to me what problems your scheme ran into, I can fix
them up.


The above mentioned rule is a general rule. In SPARC case it was rather 
projection of experience with x86, x86_64, ppc on multiple systems 
(think mixture of Windows, MacOS X, AIX).

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-22 Thread David Miller
From: Andy Polyakov ap...@openssl.org
Date: Sat, 22 Sep 2012 20:01:03 +0200

 And it seems to be the case here. Because (var-.Lpic) doesn't seem
 to work with external variables on SPARC Solaris. Unfortunate...

The simplistic existing expressions also won't work for
des_enc.m4's local tables once the DES opcode code is added,
because the 13-bit relocation isn't large enough.

Yes I do remember you suggested a sethi/or based scheme to
handle arbitrary symbols.

 Well, vendor compiler doesn't define __PIC__,

We have OPENSSL_PIC, just use that.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-22 Thread Andy Polyakov

About the RAS stack missing cost, every Sun produced UltraSPARC chip
pushes unconditionally onto the RAS and does not special case the

call.+8

pattern.

Thinking about this logically, a RAS miss can (at best) perform like a
full branch misprediction.  Which on UltraSPARC results in a full
pipeline flush as the mis-predicted fetched instructions needs to be
cancelled and cleared out of the pipeline so we can begin executing
down the correct path.

This can be huge, depending upon the contents of the improperly
fetched path of instructions.  In the worst possible case, up to 18
instructions can need to be cancelled (UltraSPARC-I programmers
manual, section 16.2.9, page 270)


I have to say that 18 instructions is very optimistic, that high IPC is 
rather rare (it has to be right mixture of integer, load and floating 
point operations). One should rather  think of cycles, not amount of 
instructions. And even if number of cycles can appear substantial it 
still has to be taken in perspective. For example, originally AES code 
looked like this:


1:  call_sparcv9_AES_encrypt
add %o7,AES_Te-1b,%o4

By all means most optimal, because it doesn't consume extra entry of 
otherwise limited RAS. It didn't work with Purify, so it became


1:  call.+8
add %o7,AES-1b,%o4
call_sparcv8_AES_encrypt

Performance difference? Less than -1% for speed aes-128-cbc, -2% for 
speed -evp aes-128-ecb for small blocks and 0% for large blocks. On 
UltraSPARC. And as mentioned SPARC V [and beyond] handles call .+8 
explicitly. So that unless it's shown that T-SPARCs suffer greatly, I 
wouldn't have bad conscience using call .+8 under circumstances. I agree 
that it's not the most optimal, but at the same time no real reason to 
fill bad about it.


__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-22 Thread David Miller
From: Andy Polyakov ap...@openssl.org
Date: Sat, 22 Sep 2012 21:04:47 +0200

 I agree that it's not the most optimal, but at the same time no real
 reason to fill bad about it.

But on the other hand I've done all the work to implement the macros
to do the PIC sequence properly.  You really don't have to implement
anything.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-22 Thread David Miller
From: David Miller da...@davemloft.net
Date: Sat, 22 Sep 2012 15:14:23 -0400 (EDT)

 From: Andy Polyakov ap...@openssl.org
 Date: Sat, 22 Sep 2012 21:04:47 +0200
 
 I agree that it's not the most optimal, but at the same time no real
 reason to fill bad about it.
 
 But on the other hand I've done all the work to implement the macros
 to do the PIC sequence properly.  You really don't have to implement
 anything.

BTW, two other points need restating:

1) My macros handle the non-PIC case optimally.

2) Your RAS corruption cost considerations are only considering
   the most immediate effect on the return from the assembler
   routine in question.

   Whereas the true RAS miss cost must be multiplied onto the
   next N functions up in the call chain, where N is the size
   of the RAS.  Since all of those will miss as well.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-22 Thread Andy Polyakov

I agree that it's not the most optimal, but at the same time no real
reason to fill bad about it.

But on the other hand I've done all the work to implement the macros
to do the PIC sequence properly.  You really don't have to implement
anything.


BTW, two other points need restating:

1) My macros handle the non-PIC case optimally.

2) Your RAS corruption cost considerations are only considering
   the most immediate effect on the return from the assembler
   routine in question.

   Whereas the true RAS miss cost must be multiplied onto the
   next N functions up in the call chain, where N is the size
   of the RAS.  Since all of those will miss as well.


N is 4 on UltraSPARC. For comparison, in AES case depth from EVP_encrypt 
to assembly code is 4, so that penalties don't spill on caller. 
[Apparently we are talking about obsolete platform, as I measure no 
performance difference between sequences depicted in last message on 
T4.] All I'm saying is that it doesn't have to be classified as 
absolutely critical to fix. Basically, in the context I'd prefer not 
to touch aes-sparcv9.pl and stick to aesni approach as the only one, 
i.e. keep T4 code as separate module referred from EVP. It allows to 
concentrate on things that matter, optimizing specific modes 
performance. By extension it's preferred approach even for other ciphers.

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-22 Thread David Miller
From: Andy Polyakov ap...@openssl.org
Date: Sat, 22 Sep 2012 22:02:55 +0200

 Basically, in the context I'd prefer not to touch aes-sparcv9.pl and
 stick to aesni approach as the only one, i.e. keep T4 code as
 separate module referred from EVP. It allows to concentrate on
 things that matter, optimizing specific modes performance. By
 extension it's preferred approach even for other ciphers.

Too many pieces of code stil call interfaces such as AES_*() directly
and do not go through EVP.

This I considered a show stopper, and an absolute failure of design
with the way the Oracle folks implemented crypto opcode support in
their openssl changes.

It is absolutely impractical to have an EVP only driver for this
stuff.

I consider supporting the old APIs a requirement.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-21 Thread Andy Polyakov
 Provide these so that the assembler users can be oblivious about
 whether this is PIC or non-PIC, 64-bit or 32-bit, etc.
 
 It is important to use a real call and return to implement the
 obtaining of the %pc as part of the PIC sequence.  Sequences
 such as:
 
   call. + 8
   mov %o7, %PIC_REG
 
 are to be avoided at all costs on UltraSPARC cpus.  This is because
 such a sequence flushes the Return Address Stack (RAS) because the
 call is not paired with a return.
 
 Every time a call or jmpl with RD=%o7 is performed, the chip pushes
 the PC+8 onto the top of the RAS.  The next jmpl %o7 + 8 or return
 %i7 + 8 the chip sees will cause it to pop the top entry off the RAS
 and begin fetching down that path.  If there is a mis-match the entire
 pipeline is flushed and the chip restarts fetching down the correct
 path.
 
 Therefore, the above discouraged sequence will cause all of the RAS
 entries to mismatch and there will therefore be a full pipeline flush
 on every subsequent function return.

Well, last time I looked into this I could establish following. call .+8
was actually used by vendor compiler [maybe not anymore, I don't know,
but at Sun days it was used extensively]. SPARC V manual is explicit
about call .+8 *not* affecting RAS. Purify also was discussed in
context, and it actually recognizes the construct and treats it
specially. In other words it was considered widely adopted practice and
it was found to be backed up by at least one hardware design. Penalties
are measured to be minimal on UltraSPARC, two additional cycles (in
comparison to 20 cycles for save and restore alone). But of course today
situation might be different and T-SPARCs can suffer from it more...

I'll handle this, but differently. Specifically I won't go through GOT,
but directly to variable, something like this:

.Lretl:
retl
nop
...
sethi   %hi(var-.Lpic),%reg
.Lpic:  call.Lretl
add %o7,%lo(var-.Lpic),%reg

This works with both Solaris and Linux toolchains and in both 32- and
64-bit mode (it was hell to get des_enc work everywhere). In 64-bit mode
it implies that shared library itself is limited by 2GB, but it's
considered reasonable limitation. Avoiding GOT allows to hide
OPENSSL_sparcv9cap_P with __attribute__((visibility(hidden)));. Now
it's static.

Once again, don't think about it no more, it will be taken care of.

As for OPENSSL_sparcv9cap_P itself. I'd rather extend it to int
OPENSSL_sparcv9cap_P[2] and save %cfr as is to  OPENSSL_sparcv9cap_P[1].
Any objections?

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-21 Thread David Miller
From: Andy Polyakov ap...@openssl.org
Date: Fri, 21 Sep 2012 12:21:25 +0200

 I'll handle this, but differently. Specifically I won't go through GOT,
 but directly to variable, something like this:

I would like to politely request that you don't go down this road.

 .Lretl:
   retl
   nop
 ...
   sethi   %hi(var-.Lpic),%reg
 .Lpic:call.Lretl
   add %o7,%lo(var-.Lpic),%reg

I honestly think it's easiest to to simply generate correct PIC
sequences, as my macros are trying to do.

We can add whatever ifdefs and code generation cases we need to
sparc_arch.h The code that I'm emitting is identical to what GCC
generates on Linux and Solaris under Sparc regardless of which
assembler and linker are in use.

I should know, I wrote much of the sparc GCC backend.

If you describe to me what problems your scheme ran into, I can fix
them up.

Did you test if my code sequences work for you?  It is also important
to note that they are also specifically designed to be usable in leaf
functions.

BTW, the real long term answer is mark openssl internal symbols as
hidden and then use GOT_DATA optimization sequences which will get
rid of the GOT reference altogether.  But that requires some configure
checks to see if the assembler and linker support these constructs.

 As for OPENSSL_sparcv9cap_P itself. I'd rather extend it to int
 OPENSSL_sparcv9cap_P[2] and save %cfr as is to  OPENSSL_sparcv9cap_P[1].
 Any objections?

I think this is code masterbation at this early stage of the sparc
crypto opcode support implementation and is something we can clean up
later.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


Re: [PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-21 Thread David Miller

Here is a more detailed reply specifically about generating
correct and optimal Sparc PIC sequences.

Let's get the non-PIC static case out of the way, we should
always use:

set symbol, %reg! 32-bit
setxsymbol, %tmp_reg, %reg  ! 64-bit

Using calls to PIC stubs is completely pointless overhead when we are
doing a static build.

If we are generating PIC we need a stub function, there are a lot of
ways to do this.  One scheme is to simply emit a stub in each source
file where the stub is needed.

If the assembler and linker support got-data optimizations, we can
emit the following sequence:

sethi   %hi(_GLOBAL_OFFSET_TABLE_-4), %PIC_REG
call__sparc_pic_stub
 or %PIC_REG, %lo(_GLOBAL_OFFSET_TABLE_+4), %PIC_REG
sethi   %gdop_hix22(symbol), %TMP
xor %TMP, %gdop_lox10(symbol), %TMP
LDPTR   [%PIC_REG + %TMP], %REG, %gdop(symbol)

If the linker finds that the resolution of symbol (f.e. the symbol
is static to the compilation unit, or marked as 'hidden') can be done
at final link time, that LDPTR above will be optimized into:

add %PIC_REG, %TMP, %REG

The symbol offset will also be adjusted, as needed, in the %gdop_*()
sethi and xor instructions.  And finally, the reference to the global
offset table slot that would have been generated for 'symbol', will be
removed.

Otherwise, if the linker and assembler lack gotdata optimization
support, we use just a plain PIC sequence:

sethi   %hi(_GLOBAL_OFFSET_TABLE_-4), %PIC_REG
call__sparc_pic_stub
 or %PIC_REG, %lo(_GLOBAL_OFFSET_TABLE_+4), %PIC_REG
sethi   %hi(symbol), %TMP
or  %TMP, %lo(symbol), %TMP
LDPTR   [%PIC_REG + %TMP], %REG

If this doesn't work in some cases, we need to discover exactly
why instead of dismissing my approach completely.

Now, of course, all of the above if for -fPIC, but I see no sparc
target (nor any target except one strange hpux case) that specifies
-fpic instead of -fPIC in Configure.

However that case is simple to accomodate as well, and I'd be happy to
do so in my macros.

About the RAS stack missing cost, every Sun produced UltraSPARC chip
pushes unconditionally onto the RAS and does not special case the

call.+8

pattern.

Thinking about this logically, a RAS miss can (at best) perform like a
full branch misprediction.  Which on UltraSPARC results in a full
pipeline flush as the mis-predicted fetched instructions needs to be
cancelled and cleared out of the pipeline so we can begin executing
down the correct path.

This can be huge, depending upon the contents of the improperly
fetched path of instructions.  In the worst possible case, up to 18
instructions can need to be cancelled (UltraSPARC-I programmers
manual, section 16.2.9, page 270)

Worse than the immediate cost of the RAS corruption, is that every
subsequent function return out of openssl is going to miss the RAS
and incur the penalty as well.

I consider it absolutely critical that the PIC sequences support being
used in leaf functions, without save and restore instructions.  And my
macros have been designed with this in mind.

When used, one need not allocate a register window merely for the sake
of performing a PIC sequence.

When we get past these initial patches and I post my DES work, you
will see that I adjusted dec_enc.m4 to use the new PIC interfaces I
created.  In fact I had to, because the 13-bit relocations used there
no longer fit with the crypto opcode code added.

There are other problems in des_enc.m4, which I have fixed in my
patches.  As just one other example, it doesn't include opensslconf.h
and therefore OPENSSL_SYSNAME_ULTRASPARC is never defined and the V9
sequences are never used for 32-bit, which hurts performance.

Only one valid set of CPP tests exists for the various cases we care
about on sparc.  __PIC__ means PIC code generation is in use.
__arch64__ means 64-bit code generation, and __sparc_v9__ means V9
code can be used.  These are fully standardized and both SunPRO and
GCC set them consistently.
__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org


[PATCH 4/7] sparc: Add assembler macros for loading OPENSSL_sparcv9cap_P into a register.

2012-09-19 Thread David Miller

Provide these so that the assembler users can be oblivious about
whether this is PIC or non-PIC, 64-bit or 32-bit, etc.

It is important to use a real call and return to implement the
obtaining of the %pc as part of the PIC sequence.  Sequences
such as:

call. + 8
mov %o7, %PIC_REG

are to be avoided at all costs on UltraSPARC cpus.  This is because
such a sequence flushes the Return Address Stack (RAS) because the
call is not paired with a return.

Every time a call or jmpl with RD=%o7 is performed, the chip pushes
the PC+8 onto the top of the RAS.  The next jmpl %o7 + 8 or return
%i7 + 8 the chip sees will cause it to pop the top entry off the RAS
and begin fetching down that path.  If there is a mis-match the entire
pipeline is flushed and the chip restarts fetching down the correct
path.

Therefore, the above discouraged sequence will cause all of the RAS
entries to mismatch and there will therefore be a full pipeline flush
on every subsequent function return.

It is also highly discourgaged to use rd %pc, %PIC_REG because that
is extremely slow on UltraSPARC cpus.  The cost of a RDPC instruction
amounts essentially to a pipeline flush.

Signed-off-by: David S. Miller da...@davemloft.net
---
 crypto/sparc_arch.h |   70 +++
 1 file changed, 70 insertions(+)

diff --git a/crypto/sparc_arch.h b/crypto/sparc_arch.h
index 3ece96a..bcb4829 100644
--- a/crypto/sparc_arch.h
+++ b/crypto/sparc_arch.h
@@ -25,4 +25,74 @@ extern int OPENSSL_sparcv9cap_P;
 #define SPARCV9_MONTSQR(117)
 #define SPARCV9_CRC32C (118)
 
+#if __ASSEMBLER__
+
+#ifdef __PIC__
+#define SPARC_PIC_THUNK(reg)   \
+   .align  32; \
+.Lpic_thunk:   \
+   jmp %o7 + 8;\
+add%o7, %##reg, %##reg;
+#else
+#define SPARC_PIC_THUNK(reg)
+#endif
+
+#define SPARC_PIC_THUNK_CALL(reg)  \
+   sethi   %hi(_GLOBAL_OFFSET_TABLE_-4), %##reg;   \
+   call.Lpic_thunk;\
+or %##reg, %lo(_GLOBAL_OFFSET_TABLE_+4), %##reg;
+
+#define SPARC_SETUP_PIC_REG(reg)   \
+   SPARC_PIC_THUNK_CALL(reg)
+
+#define SPARC_SETUP_PIC_REG_LEAF(reg, tmp) \
+   mov %o7, %##tmp;\
+   SPARC_PIC_THUNK_CALL(reg);  \
+   mov %##tmp, %o7;
+
+#ifdef __arch64__
+#define LDPTR  ldx
+#else
+#define LDPTR  ld
 #endif
+
+#ifdef __PIC__
+
+#define SPARC_LOAD_ADDRESS(SYM, reg, tmp)  \
+   SPARC_SETUP_PIC_REG(reg);   \
+   sethi   %hi(SYM), %##tmp;   \
+   or  %##tmp, %lo(SYM), %##tmp;   \
+   LDPTR   [%##reg + %##tmp], %##reg;
+
+#define SPARC_LOAD_ADDRESS_LEAF(SYM, reg, tmp) \
+   SPARC_SETUP_PIC_REG_LEAF(reg, tmp); \
+   sethi   %hi(SYM), %##tmp;   \
+   or  %##tmp, %lo(SYM), %##tmp;   \
+   LDPTR   [%##reg + %##tmp], %##reg;
+
+#else
+
+#ifdef __arch64__
+#define SPARC_LOAD_ADDRESS(SYM, reg, tmp)  \
+   setxSYM, %##tmp, %##reg;
+#else
+#define SPARC_LOAD_ADDRESS(SYM, reg, tmp)  \
+   set SYM, %##reg;
+#endif
+
+#define SPARC_LOAD_ADDRESS_LEAF(SYM, reg, tmp) \
+   SPARC_LOAD_ADDRESS(SYM, reg, tmp)
+
+#endif
+
+#define SPARC_LOAD_V9_CAPS(reg, tmp)   \
+   SPARC_LOAD_ADDRESS(OPENSSL_sparcv9cap_P, reg, tmp); \
+   ld  [%##reg], %##reg;
+
+#define SPARC_LOAD_V9_CAPS_LEAF(reg, tmp)  \
+   SPARC_LOAD_ADDRESS_LEAF(OPENSSL_sparcv9cap_P, reg, tmp);\
+   ld  [%##reg], %##reg;
+
+#endif /* __ASSEMBLER__ */
+
+#endif /* __SPARC_ARCH_H__ */
-- 
1.7.10.4

__
OpenSSL Project http://www.openssl.org
Development Mailing List   openssl-dev@openssl.org
Automated List Manager   majord...@openssl.org