>> Would it handle [the sub from %o7 being in the delay slot of the call]?
>
> Good idea, but no. This will fail regardless of whether the offset is the
> same as the target of the call. The reason is that this is still using an
> interprocedural trick to pass the called function its own address.
If label1 and label2 are the same, then yes, it's about nothing else but
passing caller's own address. However! If labels are not same, *and*
label2 is not referred to relative to caller's entry point, then it
doesn't depend on any inter-procedural context/knowledge/trick, it's
pure callee domain. For reference, that's how it's used in aes-sparcv9
module: caller does not modify sub instruction's target register, but
use it exclusively as base register to index S-box. In particular:
1: call _sparcv9_AES_encrypt
sub %o7,1b-AES_Te,%o4
> Except
> for situations we manage heuristically, we don't carry knowledge like
> "register X contains a certain code address" from caller to callee. That
> isn't to say we couldn't handle it, but we don't.
But does it do *anything* about suggested pair of instructions? I mean
when Purify sees such pair of instructions, does it [attempt to] fix sub
up, or does it simply omit it [because it's not part of call .+8
pattern]? Once again, there are situations when you don't have to pass
above mentioned knowledge to caller, e.g. imagine that caller didn't use
or modify %o4 and callee wanted it for *own* purposes.
> So your patch and the
> new code in aes_sparcv9.pl will also kill today's Purify.
Well, I normally don't accept "will" for an answer, only "do[es]." I
mean "will" implies that you haven't actually tested, right? But having
learned gory details I can accept that .PIC.me.up in des_enc has fair
chance to fail (because of .des_and-.PIC.me.up self-reference), but what
about aes_sparcv9? Could you actually test? Please? Also test attached
patch, which removes .PIC.me.up from des_enc. The patch is relative to
http://cvs.openssl.org/fileview?f=openssl/crypto/des/asm/des_enc.m4&v=1.9.
> Now for some more bad news, separate from the call8 question:
>
> Further digging in des_enc.m4 revealed another problem. Besides the actual
> instructions in .PIC.me.up there is something else Purify doesn't notice
> that it should patch. The data item at .PIC.DES_SPtrans is the 32-bit
> offset from its own location (in .text) to DES_SPtrans (in .rodata).
> Because of code movement, Purify needs to patch this data item, but
> doesn't notice that it should. (Once again, this is a nonstandard way for
> a program to get the address of a data item in a position-independent
> way.)
??? Latest version of des_enc.m4 does not have any offset data. There
*was* %r_disp32(DES_SPtrans) in *earlier* version, but as long as we're
discussing modifications, we should be concentrating on latest version
context. I mean why would we have to look at an earlier version if
modified latest one would replace it anyway?
> I completely understand the desire to optimize all this PIC nonsense away:
> the streamlined code currently in des_enc.m4 is much shorter and cleaner.
It was not really about optimization. It was more about getting the code
working with all assemblers and in both 32- and 64-bit contexts. I mean
the actual reasons for PIC-related modifications were compile failures.
> I'm not saying that anything about des_enc.m4 is bad or wrong, just that
> Purify doesn't recognize it.
>
> My best idea so far is that you should write a C function that does what
> .PIC.me.up does, then compile it to assembly (twice, for 32-bit and
> 64-bit) and paste the assembly (with minimal changes) into des_enc.m4.
> This way you know the instruction pattern is exactly as Purify would see
> from the compiler.
As implied above, this was found to be error-prone in multi-platform
context. Most notably code compilable by Solaris assembler could not be
compiled by Linux one. Platform and ABI neutral .PIC.me.up was
introduced in order to avoid even further segmentation with #ifdef
THIS_OR_THAT and collateral changes to Makefile.
> (The code under #ifdef OPENSSL_PIC almost does this,
> but not quite - it's still nonstandard.)
You again must be looking at old version... Latest version does not have
#ifdef OPENSSL_PIC nor #ifdef ABI64. And that's the way I would prefer
to keep it:-)
> If you can not absorb the additional cost of being well-behaved from
> Purify's perspective, there are more extreme ideas. One is for you to ship
> a de-optimized version of the library (built with the "no-asm" parameter
> to Configure) for your users to use when they want to use PurifyPlus. Of
> course people can also build this themselves from source, once they learn
> they must.
openssl does not ship any binary to any user and compile options are
users' or their binary vendor choice. Formally speaking I can dismiss
this discussion referring to SUPPORT paragraph:
"If you have *any* problems with OpenSSL then please take the following
steps first:
...
- Remove ASM versions of libraries
...
"
Emphasis is mine. I'm *not* saying that I intend to withdraw, I'm only
saying that the suggestion about shipping of binaries is misplaced:-) A.
--- des_enc.m4.orig 2005-12-15 23:55:16.000000000 +0100
+++ des_enc.m4 2009-03-13 10:45:34.000000000 +0100
@@ -1180,8 +1180,11 @@
save %sp, FRAME, %sp
- call .PIC.me.up
- mov .PIC.me.up-(.-4),out0
+ sethi %hi(.PIC.DES_SPtrans-1f),global1
+ or global1,%lo(.PIC.DES_SPtrans-1f),global1
+1: call .+8
+ add %o7,global1,global1
+ sub global1,.PIC.DES_SPtrans-.des_and,out2
ld [in0], in5 ! left
cmp in2, 0 ! enc
@@ -1238,8 +1241,11 @@
save %sp, FRAME, %sp
- call .PIC.me.up
- mov .PIC.me.up-(.-4),out0
+ sethi %hi(.PIC.DES_SPtrans-1f),global1
+ or global1,%lo(.PIC.DES_SPtrans-1f),global1
+1: call .+8
+ add %o7,global1,global1
+ sub global1,.PIC.DES_SPtrans-.des_and,out2
! Set sbox address 1 to 6 and rotate halfs 3 left
! Errors caught by destest? Yes. Still? *NO*
@@ -1353,8 +1359,11 @@
save %sp, FRAME, %sp
- call .PIC.me.up
- mov .PIC.me.up-(.-4),out0
+ sethi %hi(.PIC.DES_SPtrans-1f),global1
+ or global1,%lo(.PIC.DES_SPtrans-1f),global1
+1: call .+8
+ add %o7,global1,global1
+ sub global1,.PIC.DES_SPtrans-.des_and,out2
ld [in0], in5 ! left
add in2, 120, in4 ! ks2
@@ -1395,8 +1404,11 @@
save %sp, FRAME, %sp
- call .PIC.me.up
- mov .PIC.me.up-(.-4),out0
+ sethi %hi(.PIC.DES_SPtrans-1f),global1
+ or global1,%lo(.PIC.DES_SPtrans-1f),global1
+1: call .+8
+ add %o7,global1,global1
+ sub global1,.PIC.DES_SPtrans-.des_and,out2
ld [in0], in5 ! left
add in3, 120, in4 ! ks3
@@ -1425,19 +1437,6 @@
.DES_decrypt3.end:
.size DES_decrypt3,.DES_decrypt3.end-DES_decrypt3
-! input: out0 offset between .PIC.me.up and caller
-! output: out0 pointer to .PIC.me.up
-! out2 pointer to .des_and
-! global1 pointer to DES_SPtrans
- .align 32
-.PIC.me.up:
- add out0,%o7,out0 ! pointer to .PIC.me.up
- sethi %hi(.des_and-.PIC.me.up),out2
- or out2,%lo(.des_and-.PIC.me.up),out2
- add out0,out2,out2
- retl
- add out2,.PIC.DES_SPtrans-.des_and,global1
-
! void DES_ncbc_encrypt(input, output, length, schedule, ivec, enc)
! *****************************************************************
@@ -1454,8 +1453,11 @@
define({OUTPUT}, { [%sp+BIAS+ARG0+1*ARGSZ] })
define({IVEC}, { [%sp+BIAS+ARG0+4*ARGSZ] })
- call .PIC.me.up
- mov .PIC.me.up-(.-4),out0
+ sethi %hi(.PIC.DES_SPtrans-1f),global1
+ or global1,%lo(.PIC.DES_SPtrans-1f),global1
+1: call .+8
+ add %o7,global1,global1
+ sub global1,.PIC.DES_SPtrans-.des_and,out2
cmp in5, 0 ! enc
@@ -1676,8 +1678,11 @@
define({KS2}, { [%sp+BIAS+ARG0+4*ARGSZ] })
define({KS3}, { [%sp+BIAS+ARG0+5*ARGSZ] })
- call .PIC.me.up
- mov .PIC.me.up-(.-4),out0
+ sethi %hi(.PIC.DES_SPtrans-1f),global1
+ or global1,%lo(.PIC.DES_SPtrans-1f),global1
+1: call .+8
+ add %o7,global1,global1
+ sub global1,.PIC.DES_SPtrans-.des_and,out2
LDPTR [%fp+BIAS+ARG0+7*ARGSZ], local3 ! enc
LDPTR [%fp+BIAS+ARG0+6*ARGSZ], local4 ! ivec