https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
Bug ID: 105930 Summary: Excessive stack spill generation on 32-bit x86 Product: gcc Version: 12.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: torva...@linux-foundation.org Target Milestone: --- Created attachment 53121 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53121&action=edit Test-case extracted from the generic blake2b kernel code Gcc-12 seems to generate a huge number of stack spills on this blake2b test-case, to the point where it overflows the allowable kernel stack on 32-bit x86. This crypto thing has two 128-byte buffers, so a stack frame a bit larger than 256 is expected when the dataset doesn't fit in the register set. Just as an example, on this code, clang-.14.0.0 generates a stack frame that is 296 bytes. In contrast, gcc-12.1.1 generates a stack frame that is almost an order of magnitude(!) larger, at 2620 bytes. The trivial Makefile I used for this test-case is # The kernel cannot just randomly use FP/MMX/AVX CFLAGS := -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx CFLAGS += -m32 CFLAGS += -O2 test: gcc $(CFLAGS) -Wall -S blake2b.c grep "sub.*%[er]sp" blake2b.s to easily test different flags and the end result, but as can be seen from above, it really doesn't need any special flags except the ones that disable MMX/AVX code generation. And the generated code looks perfectly regular, except for the fact that it uses almost 3kB of stack space. Note that "-m32" is required to trigger this - the 64-bit case does much better, presumably because it has more registers and this needs fewer spills. It gets worse with some added debug flags we use in the kernel, but not that kind of "order of magnitude" worse. Using -O1 or -Os makes no real difference. This is presumably due to some newly triggered optimization in gcc-12, but I can't even begin to guess at what we'd need to disable (or enable) to avoid this horrendous stack growth. Some very aggressive instruction scheduling thing that spreads out all the calculations and always wants to spill-and-reload the subepxressions that it CSE'd? I dunno. Pls advice. The excessive stack literally causes build failures due to us using -Werror-frame-larger-than= to make sure stack use remains sanely bounded. The kernel stack is a rather limited resource.