http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56175
Bug #: 56175 Summary: Issue with combine phase on x86. Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: ysrum...@gmail.com Analyzing performance of important benchmark on x86 Atom in 32bit mode we found out that the code produced for attached testcase is not optimal - the inner loop contains 18 instructions instead of 12. The problem is that 'combine' does not perform desired substitution for the following stmt: t = (u8)((x & 1) ^ ((u8)y & 1)); It is not able to convert it to more optimal form like: t = (u8)((x ^ (u8)y ) & 1); This issue can be explained using the following testcase: int foo( unsigned char x, unsigned short y) { unsigned char z; if (x ==0 || y == 0) return 0; x>>=1; y>>=1; z = (unsigned char)((x & 1) ^ ((unsigned char)y & 1)); if (z == 1) return 1; return 0; } For this case combine performs needed transformation and we get optimal assembly: ... xorl %edx, %eax andl $1, %eax ret For this case combine tries to perform the following substitution: Trying 22, 20 -> 23: Failed to match this instruction: (parallel [ (set (reg:QI 83 [ D.1758 ]) (and:QI (xor:QI (reg:QI 79 [ x ]) (subreg:QI (reg:HI 81 [ y ]) 0)) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) Failed to match this instruction: (set (reg:QI 83 [ D.1758 ]) (and:QI (xor:QI (reg:QI 79 [ x ]) (subreg:QI (reg:HI 81 [ y ]) 0)) (const_int 1 [0x1]))) Successfully matched this instruction: (set (reg:QI 82 [ D.1760 ]) (xor:QI (reg:QI 79 [ x ]) (subreg:QI (reg:HI 81 [ y ]) 0))) Successfully matched this instruction: (set (reg:QI 83 [ D.1758 ]) (and:QI (reg:QI 82 [ D.1760 ]) (const_int 1 [0x1]))) where (insn 20 19 21 4 (parallel [ (set (reg:QI 80 [ D.1759 ]) (and:QI (reg:QI 79 [ x ]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) t.c:8 405 {*andqi_1} (expr_list:REG_DEAD (reg:QI 79 [ x ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) (insn 22 21 23 4 (parallel [ (set (reg:HI 82 [ D.1760 ]) (and:HI (reg:HI 81 [ y ]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) t.c:8 404 {*andhi_1} (expr_list:REG_DEAD (reg:HI 81 [ y ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) (insn 23 22 24 4 (parallel [ (set (reg:QI 83 [ D.1758 ]) (xor:QI (reg:QI 80 [ D.1759 ]) (subreg:QI (reg:HI 82 [ D.1760 ]) 0))) (clobber (reg:CC 17 flags)) ]) t.c:8 426 {*xorqi_1} (expr_list:REG_DEAD (reg:HI 82 [ D.1760 ]) (expr_list:REG_DEAD (reg:QI 80 [ D.1759 ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))))) but for more compicated test that is attached combine tries to do the same substitution in reverse order of operands and it is failed: Trying 14, 13 -> 15: Failed to match this instruction: (parallel [ (set (reg:QI 63 [ D.1770 ]) (xor:QI (and:QI (reg/v:QI 72 [ x ]) (const_int 1 [0x1])) (and:QI (subreg:QI (reg/v:HI 74 [ y ]) 0) (const_int 1 [0x1])))) (clobber (reg:CC 17 flags)) ]) Failed to match this instruction: (set (reg:QI 63 [ D.1770 ]) (xor:QI (and:QI (reg/v:QI 72 [ x ]) (const_int 1 [0x1])) (and:QI (subreg:QI (reg/v:HI 74 [ y ]) 0) (const_int 1 [0x1])))) Successfully matched this instruction: (set (reg:QI 77 [ D.1771 ]) (and:QI (subreg:QI (reg/v:HI 74 [ y ]) 0) (const_int 1 [0x1]))) Failed to match this instruction: (set (reg:QI 63 [ D.1770 ]) (xor:QI (and:QI (reg/v:QI 72 [ x ]) (const_int 1 [0x1])) (reg:QI 77 [ D.1771 ]))) where (insn 13 12 14 3 (parallel [ (set (reg:HI 76 [ D.1772 ]) (and:HI (reg/v:HI 74 [ y ]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) t1.c:9 404 {*andhi_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 14 13 15 3 (parallel [ (set (reg:QI 77 [ D.1771 ]) (and:QI (reg/v:QI 72 [ x ]) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags)) ]) t1.c:9 405 {*andqi_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 15 14 16 3 (parallel [ (set (reg:QI 63 [ D.1770 ]) (xor:QI (reg:QI 77 [ D.1771 ]) (subreg:QI (reg:HI 76 [ D.1772 ]) 0))) (clobber (reg:CC 17 flags)) ]) t1.c:9 426 {*xorqi_1} (expr_list:REG_DEAD (reg:QI 77 [ D.1771 ]) (expr_list:REG_DEAD (reg:HI 76 [ D.1772 ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))))) It seems that if we tried to combine 13, 14 -> 15 we will be successful. Note also that an order of instructions is different after expand.