> And the only difference would be to relax pattern recognition so that
delay slot is examined for %o7-based arithmetic for all call instructions,
not only call .+8 in particular. Is this correctly understood?
Yes, you correctly understand this. But it's not as easy as that. I don't
need to get into Purify implementation details, but remember, if the
target gets pushed more than 13 bits away we need to turn the call/add
into sethi/or/call/add or something like that. The fact that this is in a
delay slot and also that the %o7 value from the call is a source register
complicates things even more. It would not be impossible to handle this
but there is the ROI to consider.
You asked when, specifically, Purify stretches code. The short answer is:
anywhere we need to. Definitely at the top of a function, and at every
memory load or store instruction, and after function calls. Beyond that,
we might do insertion on any instruction at all, subject to our needs.
<sales_pitch>
The basic Purify insertion is on load and store instructions; everything
else is in support of that. Purify's whole value proposition is to
pinpoint memory errors like reading uninitialized memory, or touching
beyond the end of a block or the end of the current stack, or touching
memory you've already freed. In contrast, malloc-debug libraries only
report bad writes, and only after the fact. They spray patterns into freed
memory in the hopes that bad reads will cause visible misbehavior in the
program's future. Unlike those, Purify sees both reads and writes when
they happen, pinpointing the faulting instruction instead of telling you
"a bad thing happened sometime in the past."
</sales_pitch>
Best case (on SPARC) is that we insert two instructions before each load
or store. Worst case, we "unravel" instructions out of delay slots, add
more instructions to "shadow" certain types of register usage, and deal
with offsets that have grown too large by inserting additional math.
You asked how you can know that Purify will *not* do insertion or stretch
your code. That's a little tricky. If you have two non-global symbols that
identify data blocks, and there are no global symbols or code
(instructions) between them, there won't be any stretching from today's
Purify. But any instructions at all are subject to insertion, and in some
cases we insert dead space (a "red zone") before a global data symbol.
Now, back to libcrypto: while you and I have been talking, our resident
genius instrumentation engine guy has actually coded some modifications to
support the .PIC.me.up pattern as it appears in 0.9.8j. This supports our
current customers who use past, released versions of libcrypto on SPARC. I
expect this change to appear in an upcoming release of PurifyPlus, but I
can't commit to it or give a date because I'm not authorized to commit to
future product features or support in a public forum.
The new pattern recognizer is pretty specific, intending to support
existing customers with libcrypto binaries. It's not a general-purpose
recognizer for optimized interprocedural PIC sequences. It recognizes
patterns that stay very close to this:
call target
mov offset,%o0
...
target:
add %o0,%o7,%o0
The new code recognizes this when "offset" is the distance from the call
instruction to "target," and the "add" really is the very first
instruction at the call target. We'll even patch the offset if the
distance from the caller to the target grows past 13 bits.
The developer also coded changes to recognize and patch the self-relative
offset in data from .PIC.DES_SPtrans to DES_SPtrans. I don't know the
details and restrictions on this one. Like I said, it's really meant for
customers with current libcrypto binaries.
Regardless of any new recognizers which might appear in the future, there
are two Purify-safe ways to do PIC stuff on SPARC:
Short form:
L1: call8
add %o7,(target-L1),regZ
Long form:
sethi %hi(target-L2),regX
or regX,%lo(target-L2),regY
L2: call8
add regY,%o7,regZ
The short form will work even if Purify stretches the distance farther
than 13 bits will reach. Purify is flexible: regX, regY, and regZ can be
different or they can overlap, and the call8 can happen any time before
the add, and you can move the o7 result of the call8 to another register
if you want and then use that: it doesn't have to stay in o7. You can use
the same call8-derived base register for multiple PIC computations, but
you can't use one computed address (like regZ) as the base for another.
These two patterns work for both 32-bit and 64-bit programs.
Regarding the patch you referred to
(http://cvs.openssl.org/chngview?cn=17898): I'm sorry to say Purify is not
as flexible as you might want. In the short form we recognize "add" using
%o7 after call8, but not "sub." So the patched aes_sparcv9 module is *not*
Purify-friendly yet. To fix this, change "sub" to "add" and reverse the
subtraction that computes the offset:
BAD:
1: call .+8
sub %o7,1b-AES_Te,%o4
GOOD:
1: call .+8
add %o7,AES_Te-1b,%o4
Thanks for working with us on this. Let me know if you have more thoughts
or questions.
-- Allan Pratt, [email protected]
Rational software division of IBM
______________________________________________________________________
OpenSSL Project http://www.openssl.org
Development Mailing List [email protected]
Automated List Manager [email protected]