https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930
--- Comment #21 from Linus Torvalds <torva...@linux-foundation.org> --- (In reply to CVS Commits from comment #20) > > One might think > that splitting early gives the register allocator more freedom to > use available registers, but in practice the constraint that double > word values occupy consecutive registers (when ultimately used as a > DImode value) is the greater constraint. Whee. Why does gcc have that constraint, btw? I tried to look at the clang code generation once more, and I don't *think* clang has the same constraint, and maybe that is why it does so much better? Yes, x86 itself inherently has a couple of forced register pairings (notably %edx:%eax for 64-bit multiplication and division), and obviously the whole calling convention requires well-defined pairings, but in the general case it seems to be a mistake to keep DImode values as DImode values and force them to be consecutive registers when used. Maybe I misunderstand. But now that this comes up I have this dim memory of actually having had a discussion like this before on bugzilla, where gcc generated horrible DImode code. > GCC 11 [use %ecx to address memory, require a 24-byte stack frame] > sub esp, 24 > mov ecx, DWORD PTR [esp+40] > > GCC 12 [use %eax to address memory, require a 44-byte stack frame] > sub esp, 44 > mov eax, DWORD PTR [esp+64] I just checked the current git -tip, and this does seem to fix the original case too, with the old horrid 2620 bytes of stack frame now being a *much* improved 404 bytes! So your patch - or other changes - does fix it for me, unless I did something wrong in my testing (which is possible). Thanks. I'm not sure what the gcc policy on closing the bug is (and I don't even know if I am allowed), so I'm not marking this closed, but it seems to be fixed as far as I am concerned, and I hope it gets released as a dot-release for the gcc-12 series.