We recently debugged, and found a workaround for, a GCC [###version] code-generation error when compiling OpenSSL 3.0.8 for 32-bit on Intel x86. This error resulted in a use of a misaligned memory operand with a packed-quadword instruction, producing a SIGSEGV on RedHat 8. (I'm a bit surprised Linux doesn't raise SIGBUS for this particular trap, but whatever.) I wanted to document this here in case other people run into it.
Aside: This does raise the question: Why aren't other people running into it? And why are we only seeing it now? Honestly, I don't know. It is sensitive to stack layout, but in some of our tests we could reproduce it consistently. It's possible you'll never see this in a program where the path into the sensitive functions in gcm128.c, which appear to be CRYPTO_gcm128_aad, CRYPTO_gcm128_encrypt, and CRYPTO_gcm128_decrypt, is made up completely of code compiled with GCC. In our case we have non-GCC code along that path in some cases, and that non-GCC code does not follow GCC's rather arbitrary stack-frame alignment rules for x86, so GCC may be making an invalid assumption about callers further up the stack and how they'll pad and align stack frames. (It's known that with default build flags and optimization, GCC requires that callers align *parameters* strictly, because it may generate SSE code for operations on 64-bit and larger operations. But the problem here isn't a parameter, as I'll show in a moment.) Anyway, back to the issue. The affected functions declare a 64-bit integer object with automatic storage class: u64 alen = ctx->len.u[0]; and then operate on it: alen += len; GCC, under appropriate conditions, generates code that performs a packed-quadword operation (specifically a PADDQ) with alen as the destination. That requires alen have 64-bit alignment. However, the generated code puts alen on a 32-bit boundary; examining its address before the trap occurs confirms it ends with 0x8. The fix we're using is to add -mstackrealign to the build flags for OpenSSL on GCC x86 platforms. That adds prologue code to each function which checks the stack alignment at runtime and fixes it if necessary. Unfortunately this does mean some performance cost, obviously, which we have not yet tried to measure. After quite a bit of investigation, we're fairly confident we'd call this a GCC bug. It looks like a consequence of the "fix" for GCC bug 65105, which was made a couple of years ago, to use XMM registers in 32-bit generated code on x86. GCC has an unfortunate history of assuming stronger stack-alignment rules on this platform than are required by the ISA or enforced by other languages and compilers, and some members of the GCC team are a bit notorious for their ... enthusiasm ... in justifying this position. We have not yet attempted to raise this as a GCC bug, because, well, I've read those discussions in the GCC forums. -- Michael Wojcik