Hello! The problem, exposed by the testcase in the PR, was with the generation of unwanted MMX registers. Instructions that touch %mm registers switch x87 register stack to MMX mode and this way clobber all x87 registers in the register stack.
The core of the problem was in totally bogus cost values for MMX and SSE moves that tricked register allocator into allocating an unnecessary %mm register. The patch changes these values to the values of the pentiumpro processor. BTW: I gave up on constructing a testcase, because even a small perturbation of the source caused the %mm registers to disappear. 2016-02-02 Uros Bizjak <ubiz...@gmail.com> PR target/67032 * config/i386/i386.c (geode_cost): Increase cost of MMX and SSE moves. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Also, I have checked that there were no MMX registers generated for the original (preprocessed) testcase. Patch was committed to mainline SVN and will be committed to all release branches. Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b500233..121e802 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -595,17 +595,17 @@ struct processor_costs geode_cost = { {4, 6, 6}, /* cost of storing fp registers in SFmode, DFmode and XFmode */ - 1, /* cost of moving MMX register */ - {1, 1}, /* cost of loading MMX registers + 2, /* cost of moving MMX register */ + {2, 2}, /* cost of loading MMX registers in SImode and DImode */ - {1, 1}, /* cost of storing MMX registers + {2, 2}, /* cost of storing MMX registers in SImode and DImode */ - 1, /* cost of moving SSE register */ - {1, 1, 1}, /* cost of loading SSE registers + 2, /* cost of moving SSE register */ + {2, 2, 8}, /* cost of loading SSE registers in SImode, DImode and TImode */ - {1, 1, 1}, /* cost of storing SSE registers + {2, 2, 8}, /* cost of storing SSE registers in SImode, DImode and TImode */ - 1, /* MMX or SSE register to integer */ + 3, /* MMX or SSE register to integer */ 64, /* size of l1 cache. */ 128, /* size of l2 cache. */ 32, /* size of prefetch block */