https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88770
Peter Cordes <peter at cordes dot ca> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |peter at cordes dot ca --- Comment #2 from Peter Cordes <peter at cordes dot ca> --- Note that mov r64, imm64 is a 10-byte instruction, and can be slow to read from the uop-cache on Sandybridge-family. The crap involving OR is clearly sub-optimal, but *if* you already have two spare call-preserved registers across this call, the following is actually smaller code-size: movabs rdi, 21474836483 mov rbp, rdi movabs rsi, 39743127552 mov rbx, rsi call test mov rdi, rbp mov rsi, rbx call test This is more total uops for the back-end though (movabs is still single-uop, but takes 2 entries the uop cache on Sandybridge-family; https://agner.org/optimize/). So saving x86 machine-code size this way does limit the ability of out-of-order exec to see farther, if the front-end isn't the bottleneck. And it's highly unlikely to be worth saving/restoring two regs to enable this. (Or to push rdi / push rsi before call, then pop after!) Setting up the wrong value and then fixing it twice with OR is obviously terrible and never has any advantage, but the general idea to CSE large constants isn't totally crazy. (But it's profitable only in such limited cases that it might not be worth looking for, especially if it's only helpful at -Os)