Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

Segher Boessenkool Thu, 17 Oct 2019 05:42:06 -0700

On Wed, Oct 16, 2019 at 11:44:37PM +0100, Richard Sandiford wrote:
> Segher Boessenkool <seg...@kernel.crashing.org> writes:
> >> If someone wants to add a new canonical form then the ports should of
> >> course adapt, but until then I think the patch is doing the right thing.
> >
> > We used to generate this, until GCC 5.  There aren't many ports that have
> > adapted yet.
> 
> The patch has testcases, so this won't be a silent failure for SVE2
> if things change again in future.


Sure.  But I am saying the current behaviour should *not* be canonical
(and it never was); it would make more sense to have the more sensible
behaviour we had for the many years (or decades?) before as canonical.

> >> > If the mask is not a constant, we really shouldn't generate a totally
> >> > different form.  The xor-and-xor form is very hard to handle, too.
> >> >
> >> > Expand currently generates this, because gimple thinks this is simpler.
> >> > I think this should be fixed.
> >> 
> >> But the constant form is effectively folding away the NOT.
> >> Without it the equivalent rtl uses 4 operations rather than 3:
> >> 
> >>   (ior (and A C) (and B (not C)))
> >
> > RTL canonicalisation rules are not based around number of ops.
> 
> Not to the exclusion of all else, sure.

No, not *at all*.  As a side-effect this is sometimes the case, of course,
perhaps more often than not; but it is not a rule.

> But my point was that there
> are reasons why forcing the (ior ...) form for non-constants might not
> be a strict improvement.

Sure.  But we either have canonical forms, which might be a bit awkward
sometimes, or we have to handle two (or three, or sometimes many more)
different forms everywhere.  This hurts especially in the MDs.  Like,
aarch now has this xor-and-xor pattern, but it is not canonical; some
parts in GCC might generate the and/ior thing for example, or instead
use some zero_extract thing (which in combine will then also try the
and/ior thing).

> > For example, we do (and (not A) (not B)) rather than (not (ior (A B)) .
> 
> Right, hence my complaint about this the other day on IRC. :-)
> I hadn't noticed until then that gimple had a different rule.

I think I missed that, sorry.

> > Instead, there are other rules (like here: push "not"s inward,
> > which can be applied locally with the wanted result).
> 
> Sure.  But I think it's common ground that there's no existing
> rtl rule that applies naturally to (xor (and (xor A B) C) B),
> where there's no (not ...) to push down.

Yes.  The documentation says
  @cindex @code{xor}, canonicalization of
  @item
  The only possible RTL expressions involving both bitwise exclusive-or
  and bitwise negation are @code{(xor:@var{m} @var{x} @var{y})}
  and @code{(not:@var{m} (xor:@var{m} @var{x} @var{y}))}.
and that is all it says about xor (and it is under-defined, anyway; surely
  (set (reg)
       (xor (mult (not (reg))
                  (reg))
            (reg)))
(it's hard to come up with a less silly example) is valid RTL as well!)

Because in the
  ((A^B) & C) ^ B
we have B twice it can be expressed in quite different ways.  I prefer
  (A&C) | (B&~C)
(which is the disjunctive normal form for this) (using ior instead of xor
is common, but that is not a documented canonicalisation either); this is
nice because C has a different role (different than A and B) here, and it
is the same shape as you get for C a constant, and the expression is
nicely symmetrical too, and the expression tree is less deep (ignoring
the inversion ;-) )

In general, the RTL code does not handle xor very well, even aside from
all of the canonicalisation issues.  So maybe we should just not use xor
much?

> >> As you say, it's no accident that we get this form, it's something
> >> that match.pd specifically chose.  And I think there should be a
> >> strong justification for having an RTL canonical form that reverses
> >> a gimple decision.  RTL isn't as powerful as gimple and so isn't going
> >> to be able to undo the gimple transforms in all cases.
> >
> > Canonical RTL is different in many ways, already.
> 
> Sure, wasn't claiming otherwise.  But most of the rtl canonicalisation
> rules predate gimple by some distance, so while the individual choices
> are certainly deliberate, the differences weren't necessarily planned
> as differences.

The RTL and Gimple rules have very different goals.

> Whereas here we're talking about adding a new rtl rule
> with the full knowledge that it's the ooposite of the equivalent gimple
> rule.

And also with the knowledge that it is what existing target code still
expects!  (The aarch64 code is the first using the "gimple" form for this
as far as I know).

> If we're going to move in one direction, it seems better to move
> towards making the rules more consistent rather than towards deliberately
> making them (even) less consistent.

I don't think this would matter at all.

> > "Not as powerful", I have no idea what you mean, btw.  RTL is much closer
> > to the real machine, so is a lot *more* powerful than Gimple for modelling
> > machine instructions (where Gimple is much nicer for higher-level
> > optimisations).  We need both.
> 
> I meant rtl passes aren't generally as powerful as gimple passes
> (which wasn't what I said :-)).  E.g. match.pd sees potential
> combinations on gimple stmts that combine wouldn't see for the
> corresponding rtl insns.

Yes, because gimple deals with expressions, and RTL deals with machine
intructions.

RTL is better for what RTL does, and gimple is better for what gimple does.
Forcing RTL to do things just like gimple does them is not useful IMO.


Segher

Re: [AArch64][SVE2] Support for EOR3 and variants of BSL

Reply via email to