> Provide these so that the assembler users can be oblivious about > whether this is PIC or non-PIC, 64-bit or 32-bit, etc. > > It is important to use a real call and return to implement the > obtaining of the %pc as part of the PIC sequence. Sequences > such as: > > call . + 8 > mov %o7, %PIC_REG > > are to be avoided at all costs on UltraSPARC cpus. This is because > such a sequence flushes the Return Address Stack (RAS) because the > call is not paired with a return. > > Every time a call or jmpl with RD=%o7 is performed, the chip pushes > the PC+8 onto the top of the RAS. The next "jmpl %o7 + 8" or "return > %i7 + 8" the chip sees will cause it to pop the top entry off the RAS > and begin fetching down that path. If there is a mis-match the entire > pipeline is flushed and the chip restarts fetching down the correct > path. > > Therefore, the above discouraged sequence will cause all of the RAS > entries to mismatch and there will therefore be a full pipeline flush > on every subsequent function return.
Well, last time I looked into this I could establish following. call .+8 was actually used by vendor compiler [maybe not anymore, I don't know, but at Sun days it was used extensively]. SPARC V manual is explicit about call .+8 *not* affecting RAS. Purify also was discussed in context, and it actually recognizes the construct and treats it specially. In other words it was considered widely adopted practice and it was found to be backed up by at least one hardware design. Penalties are measured to be minimal on UltraSPARC, two additional cycles (in comparison to 20 cycles for save and restore alone). But of course today situation might be different and T-SPARCs can suffer from it more... I'll handle this, but differently. Specifically I won't go through GOT, but directly to variable, something like this: .Lretl: retl nop ... sethi %hi(var-.Lpic),%reg .Lpic: call .Lretl add %o7,%lo(var-.Lpic),%reg This works with both Solaris and Linux toolchains and in both 32- and 64-bit mode (it was hell to get des_enc work everywhere). In 64-bit mode it implies that shared library itself is limited by 2GB, but it's considered reasonable limitation. Avoiding GOT allows to hide OPENSSL_sparcv9cap_P with __attribute__((visibility("hidden")));. Now it's static. Once again, don't think about it no more, it will be taken care of. As for OPENSSL_sparcv9cap_P itself. I'd rather extend it to int OPENSSL_sparcv9cap_P[2] and save %cfr as is to OPENSSL_sparcv9cap_P[1]. Any objections? ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List openssl-dev@openssl.org Automated List Manager majord...@openssl.org