http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56876
Bug #: 56876 Summary: Combine does not invent new moves Classification: Unclassified Product: gcc Version: 4.9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: gli...@gcc.gnu.org Target: x86_64-linux-gnu Hello, I am looking at this testcase: typedef unsigned long long vec __attribute__((vector_size(16))); vec g; vec f1(vec a, vec b){ return ~a&b; } vec f2(vec a, vec b){ return ~g&b; } which compiles to: f1: pandn %xmm1, %xmm0 f2: pcmpeqd %xmm0, %xmm0 pxor g(%rip), %xmm0 pand %xmm1, %xmm0 whereas I would like to get, like I do with the _mm_andnot_si128 builtin: movdqa g(%rip), %xmm0 pandn %xmm1, %xmm0 It seems that combine cannot match the pandn pattern because the first argument is a memory load and not a register. In this case, it would be better if it emitted a move to put it in a register so it can match, instead of giving up. I don't know if there is a good way to characterize such situations where an extra move is worth it.