[Bug target/105197] [12 Regression] SVE: wrong code with -O -ftree-vectorize

tnfchris at gcc dot gnu.org via Gcc-bugs Sun, 10 Apr 2022 23:59:20 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105197


Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #3 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Looks like this started with

commit d846f225c25c5885250c303c8d118caa08c447ab
Author: Richard Biener <rguent...@suse.de>
Date:   Tue May 4 15:51:20 2021 +0200

    tree-optimization/79333 - fold stmts following SSA edges in VN

    This makes sure to follow SSA edges when folding eliminated stmts.
    This reaps the same benefit as forwprop folding all stmts, not
    waiting for one to produce copysign in the new testcase.

    2021-05-04  Richard Biener  <rguent...@suse.de>

            PR tree-optimization/79333
            * tree-ssa-sccvn.c (eliminate_dom_walker::eliminate_stmt):
            Fold stmt following SSA edges.

            * gcc.dg/tree-ssa/ssa-fre-94.c: New testcase.
            * gcc.dg/graphite/fuse-1.c: Adjust.
            * gcc.dg/pr43864-4.c: Likewise.

and what's happening is that the vectorize relies on a mask A and it's inverse
~A be represented by a negation of the mask. Ifcvt used to enforce this but
with the change it now pushes the ~ into the mask operation if it can.

So previously we would generate

  _26 = ~_25;
  _43 = ~_44;

out of ifcvt and how we generate

  _26 = _6 == 0;
  _43 = _4 == 0;

and force the creation of two new extra mask as it de-optimizes the vectorizers
ability to immediately see a mask invert.  

We however still detect that those two are inverses of

  _25 = _6 != 0;
  _44 = _4 != 0;

and when generating the second VEC_COND for the operation we end up flipping
the arguments somehow

------>vectorizing statement: _ifc__41 = _43 ? 0 : _ifc__40;
created new init_stmt: vect_cst__136 = { 0, ... }
add new stmt: _137 = mask__43.26_135 & loop_mask_111
note:  add new stmt: vect__ifc__41.27_138 = VEC_COND_EXPR <_137,
vect__ifc__40.25_133, vect_cst__136>;

so we've vectorized_ifc__41 = _43 ?_ifc__40 : 0; instead without negating _137
which is where the contradiction gets introduced. I'll fix that bug, but the
question remains whether we want this simplification to now happen in ifcvt for
masks.

It makes the vectorizer generate a lot more intermediate masks that are cleaned
up by the RPO pass I added at the end but we also lose the fact that they are
simple inverses, i.e. at -O3 on these integer masks we could have just
generated a NOT.

[Bug target/105197] [12 Regression] SVE: wrong code with -O -ftree-vectorize

Reply via email to