http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55752
Bug #: 55752 Summary: __builtin_ia32_ldmxcsr / __builtin_ia32_stmxcsr are not scheduling barriers Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: rgue...@gcc.gnu.org Target: x86_64-*-* float foo (float x, float f32) { unsigned int mxscr_stat; mxscr_stat = __builtin_ia32_stmxcsr (); __builtin_ia32_ldmxcsr (mxscr_stat | 0x00000800); f32 = (x + f32) - f32; mxscr_stat = mxscr_stat & 0xffffffc0; __builtin_ia32_ldmxcsr (mxscr_stat); return f32; } Compiled at O2 yields: foo: .LFB0: .cfi_startproc stmxcsr -4(%rsp) movl -4(%rsp), %eax movl %eax, %edx orb $8, %dh movl %edx, -4(%rsp) ldmxcsr -4(%rsp) addss %xmm1, %xmm0 andl $-64, %eax movl %eax, -4(%rsp) ldmxcsr -4(%rsp) subss %xmm1, %xmm0 ret note how the subss is scheduled after the ldmxcsr call. It's ok (by pure luck of course) at the GIMPLE level.