We recently debugged, and found a workaround for, a GCC [###version]
code-generation error when compiling OpenSSL 3.0.8 for 32-bit on Intel x86.
This error resulted in a use of a misaligned memory operand with a
packed-quadword instruction, producing a SIGSEGV on RedHat 8. (I'm a bit
surprised Linux doesn't raise SIGBUS for this particular trap, but whatever.) I
wanted to document this here in case other people run into it.
Aside: This does raise the question: Why aren't other people running into it?
And why are we only seeing it now? Honestly, I don't know. It is sensitive to
stack layout, but in some of our tests we could reproduce it consistently. It's
possible you'll never see this in a program where the path into the sensitive
functions in gcm128.c, which appear to be CRYPTO_gcm128_aad,
CRYPTO_gcm128_encrypt, and CRYPTO_gcm128_decrypt, is made up completely of code
compiled with GCC. In our case we have non-GCC code along that path in some
cases, and that non-GCC code does not follow GCC's rather arbitrary stack-frame
alignment rules for x86, so GCC may be making an invalid assumption about
callers further up the stack and how they'll pad and align stack frames.
(It's known that with default build flags and optimization, GCC requires that
callers align *parameters* strictly, because it may generate SSE code for
operations on 64-bit and larger operations. But the problem here isn't a
parameter, as I'll show in a moment.)
Anyway, back to the issue.
The affected functions declare a 64-bit integer object with automatic storage
class:
u64 alen = ctx->len.u[0];
and then operate on it:
alen += len;
GCC, under appropriate conditions, generates code that performs a
packed-quadword operation (specifically a PADDQ) with alen as the destination.
That requires alen have 64-bit alignment. However, the generated code puts alen
on a 32-bit boundary; examining its address before the trap occurs confirms it
ends with 0x8.
The fix we're using is to add -mstackrealign to the build flags for OpenSSL on
GCC x86 platforms. That adds prologue code to each function which checks the
stack alignment at runtime and fixes it if necessary. Unfortunately this does
mean some performance cost, obviously, which we have not yet tried to measure.
After quite a bit of investigation, we're fairly confident we'd call this a GCC
bug. It looks like a consequence of the "fix" for GCC bug 65105, which was made
a couple of years ago, to use XMM registers in 32-bit generated code on x86.
GCC has an unfortunate history of assuming stronger stack-alignment rules on
this platform than are required by the ISA or enforced by other languages and
compilers, and some members of the GCC team are a bit notorious for their ...
enthusiasm ... in justifying this position.
We have not yet attempted to raise this as a GCC bug, because, well, I've read
those discussions in the GCC forums.
--
Michael Wojcik