On 03/09/15 17:18, Segher Boessenkool wrote:
On Thu, Sep 03, 2015 at 03:59:00PM +0100, Wilco Dijkstra wrote:
However there are 2 issues with this, one is the spurious subreg,
Combine didn't make that up out of thin air; something already used
DImode here.  It could simplify it to SImode in this case, that is
true, don't know why it doesn't; it isn't necessarily faster code to
do so, it can be slower, it might not match, etc.
The relevant RTL instructions on AArch64 are:
[ You never gave a full  test case, or I missed it, or cannot find it
   anymore -- but I can reproduce this now:

void g(void);
void f(int *x) { if (*x & 2) g(); }

]

(insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
         (and:SI (reg/v:SI 76 [ xD.2641 ])
             (const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
      (nil))
  (insn 26 25 27 2 (set (reg:CC 66 cc)
         (compare:CC (reg:SI 77 [ D.2705 ])
             (const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
      (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
         (nil)))

I don't see anything using DI...
Yeah, I spoke too soon, sorry.  It looks like make_compound_operation came
up with it.

It's only a problem for AND-and-compare, no?
Yes, so it looks like some other backends match the odd pattern and then have 
another
pattern change it back into the canonical AND/TST form during the split phase 
(maybe
the subreg confuses register allocation or block other optimizations).
A subreg of a pseudo is not anything special, don't worry about it,
register_operand and similar treat it just like any other register.

This all seems
a lot of unnecessary complexity for a few special immediates when there is a 
much
simpler solution...
Feel free to post a patch!  I would love to have this all simplified.

But there are more efficient ways to emit single bit and masks tests that apply
to most CPUs rather than doing something specific that works for just one target
only. For example single bit test is a simple shift into carry flag or into the
sign bit, and for mask tests, you shift out all the non-mask bits.
Most of those are quite target-specific.  Some others are already done,
and/or done by other passes.
But what combine does here is even more target-specific.
Combine puts everything (well, most things) through
make_compound_operation, on all targets.

Combine converts the merged instructions to what it thinks is the
canonical or cheapest form, and uses that.  It does not try multiple
options (the zero_ext* -> and+shift rewriting is not changing the
semantics of the pattern at all).
But the change from AND to zero_extract is already changing semantics...
Oh?  It is not supposed to!

Or would it be better to let each target decide
on how to canonicalize bit tests and only try that alternative?
The question is how to write the pattern to be most convenient for all
targets.
The obvious choice is to try the 2 original instructions merged.
... without any simplification.  Yes, I've wanted combine to fall back
to that if the "simplified" version does not work out.  Not so easy to
do though.

Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well...
Combine does not try multiple options.
I'm not following - combine tries zero_extract and shift+AND - that's 2 options.
If that is feasible then adding a 3rd option should be possible.
The shift+and is *exactly the same* as the zero_extract, just written
differently.

We certainly need a lot more target hooks in general so GCC can do the right 
thing
(rather than using costs inconsistently all over the place). But that's a 
different
discussion...
This isn't about costs though.  That is a big other can of worms, indeed!


Anyway.  In that testcase I made, everything is simplified just fine on
aarch64, using *tbeqdi1; what am I missing?

A testcase I was looking at is:
int
foo (int a)
{
  return (a & 7) != 0;
}

For me this generates:
        and     w0, w0, 7
        cmp     w0, wzr
        cset    w0, ne
        ret

when it could be:
        tst      w0, 7
        cset     w0, ne
        ret

Kyrill



Segher


Reply via email to