About the RAS stack missing cost, every Sun produced UltraSPARC chip
pushes unconditionally onto the RAS and does not special case the
call .+8
pattern.
Thinking about this logically, a RAS miss can (at best) perform like a
full branch misprediction. Which on UltraSPARC results in a full
pipeline flush as the mis-predicted fetched instructions needs to be
cancelled and cleared out of the pipeline so we can begin executing
down the correct path.
This can be huge, depending upon the contents of the improperly
fetched path of instructions. In the worst possible case, up to 18
instructions can need to be cancelled (UltraSPARC-I programmers
manual, section 16.2.9, page 270)
I have to say that 18 instructions is very optimistic, that high IPC is
rather rare (it has to be right mixture of integer, load and floating
point operations). One should rather think of cycles, not amount of
instructions. And even if number of cycles can appear substantial it
still has to be taken in perspective. For example, originally AES code
looked like this:
1: call _sparcv9_AES_encrypt
add %o7,AES_Te-1b,%o4
By all means most optimal, because it doesn't consume extra entry of
otherwise limited RAS. It didn't work with Purify, so it became
1: call .+8
add %o7,AES-1b,%o4
call _sparcv8_AES_encrypt
Performance difference? Less than -1% for speed aes-128-cbc, -2% for
speed -evp aes-128-ecb for small blocks and 0% for large blocks. On
UltraSPARC. And as mentioned SPARC V [and beyond] handles call .+8
explicitly. So that unless it's shown that T-SPARCs suffer greatly, I
wouldn't have bad conscience using call .+8 under circumstances. I agree
that it's not the most optimal, but at the same time no real reason to
fill bad about it.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List openssl-dev@openssl.org
Automated List Manager majord...@openssl.org