> Provide these so that the assembler users can be oblivious about
> whether this is PIC or non-PIC, 64-bit or 32-bit, etc.
> 
> It is important to use a real call and return to implement the
> obtaining of the %pc as part of the PIC sequence.  Sequences
> such as:
> 
>       call    . + 8
>       mov     %o7, %PIC_REG
> 
> are to be avoided at all costs on UltraSPARC cpus.  This is because
> such a sequence flushes the Return Address Stack (RAS) because the
> call is not paired with a return.
> 
> Every time a call or jmpl with RD=%o7 is performed, the chip pushes
> the PC+8 onto the top of the RAS.  The next "jmpl %o7 + 8" or "return
> %i7 + 8" the chip sees will cause it to pop the top entry off the RAS
> and begin fetching down that path.  If there is a mis-match the entire
> pipeline is flushed and the chip restarts fetching down the correct
> path.
> 
> Therefore, the above discouraged sequence will cause all of the RAS
> entries to mismatch and there will therefore be a full pipeline flush
> on every subsequent function return.

Well, last time I looked into this I could establish following. call .+8
was actually used by vendor compiler [maybe not anymore, I don't know,
but at Sun days it was used extensively]. SPARC V manual is explicit
about call .+8 *not* affecting RAS. Purify also was discussed in
context, and it actually recognizes the construct and treats it
specially. In other words it was considered widely adopted practice and
it was found to be backed up by at least one hardware design. Penalties
are measured to be minimal on UltraSPARC, two additional cycles (in
comparison to 20 cycles for save and restore alone). But of course today
situation might be different and T-SPARCs can suffer from it more...

I'll handle this, but differently. Specifically I won't go through GOT,
but directly to variable, something like this:

.Lretl:
        retl
        nop
...
        sethi   %hi(var-.Lpic),%reg
.Lpic:  call    .Lretl
        add     %o7,%lo(var-.Lpic),%reg

This works with both Solaris and Linux toolchains and in both 32- and
64-bit mode (it was hell to get des_enc work everywhere). In 64-bit mode
it implies that shared library itself is limited by 2GB, but it's
considered reasonable limitation. Avoiding GOT allows to hide
OPENSSL_sparcv9cap_P with __attribute__((visibility("hidden")));. Now
it's static.

Once again, don't think about it no more, it will be taken care of.

As for OPENSSL_sparcv9cap_P itself. I'd rather extend it to int
OPENSSL_sparcv9cap_P[2] and save %cfr as is to  OPENSSL_sparcv9cap_P[1].
Any objections?

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to