https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87104

--- Comment #12 from pipcet at gmail dot com ---
(In reply to pipcet from comment #11)
>  (insn 7 6 8 2 (set (reg:CCZ 17 flags)
>          (compare:CCZ (and:DI (not:DI (reg/v:DI 86 [ i ]))
>                  (const_int 12 [0xc]))
>              (const_int 0 [0]))) "h17.c":4 15 {*cmpdi_1}
>       (expr_list:REG_DEAD (reg:DI 88)
> 
> Surely we should be dealing with a canonical form instead?  Who's
> generating this non-canonical expression, and why?

simplify-rtx.c, it turns out, because it "canonicalizes" (x & y) = y to (~x &
y) = 0. I think that's strange, but we can work around it.

I'm testing these three approaches:
1. canonicalize to (x-y) & z = 0
2. don't canonicalize, but add a define_insn_and_split
3. original gcc

head-to-head. I'm compiling trunk Emacs with Paul's patch reverted, then
running  "perf stat ./src/temacs --batch" in a loop and producing a histogram
of the cycles needed. It seems (1) and (2) beat (3) quite significantly (1.1%)
while (1) very narrowly beats (2) (< 0.1%). Both values are the median values,
but it looks like the curves are simply shifted a little, so I'm prepared to
say it's a consistent effect.

The code looks good, and the slight difference between (1) and (2) makes sense,
because (2) generates:

        leal    -5(%rdi), %esi
        movq    %rdi, %rax
        andl    $7, %esi
        je      .L129
        ret
        .p2align 4,,10
        .p2align 3
.L129:
        movslq  suspicious_object_index(%rip), %rsi
        movl    $0, %ecx

while (1) realizes %rsi is zero at this point and skips the movl. (Looking at
this code, I do not understand why movl is used rather than the standard xorl,
though, so maybe this is another optimization opportunity).

So I think the performance difference is really significant for Emacs; my plan
is to test all three versions on other programs, make sure the code works for C
bitfields, and then submit it for inclusion. Is that okay?

Reply via email to