https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87104
--- Comment #12 from pipcet at gmail dot com --- (In reply to pipcet from comment #11) > (insn 7 6 8 2 (set (reg:CCZ 17 flags) > (compare:CCZ (and:DI (not:DI (reg/v:DI 86 [ i ])) > (const_int 12 [0xc])) > (const_int 0 [0]))) "h17.c":4 15 {*cmpdi_1} > (expr_list:REG_DEAD (reg:DI 88) > > Surely we should be dealing with a canonical form instead? Who's > generating this non-canonical expression, and why? simplify-rtx.c, it turns out, because it "canonicalizes" (x & y) = y to (~x & y) = 0. I think that's strange, but we can work around it. I'm testing these three approaches: 1. canonicalize to (x-y) & z = 0 2. don't canonicalize, but add a define_insn_and_split 3. original gcc head-to-head. I'm compiling trunk Emacs with Paul's patch reverted, then running "perf stat ./src/temacs --batch" in a loop and producing a histogram of the cycles needed. It seems (1) and (2) beat (3) quite significantly (1.1%) while (1) very narrowly beats (2) (< 0.1%). Both values are the median values, but it looks like the curves are simply shifted a little, so I'm prepared to say it's a consistent effect. The code looks good, and the slight difference between (1) and (2) makes sense, because (2) generates: leal -5(%rdi), %esi movq %rdi, %rax andl $7, %esi je .L129 ret .p2align 4,,10 .p2align 3 .L129: movslq suspicious_object_index(%rip), %rsi movl $0, %ecx while (1) realizes %rsi is zero at this point and skips the movl. (Looking at this code, I do not understand why movl is used rather than the standard xorl, though, so maybe this is another optimization opportunity). So I think the performance difference is really significant for Emacs; my plan is to test all three versions on other programs, make sure the code works for C bitfields, and then submit it for inclusion. Is that okay?