https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105197
Tamar Christina <tnfchris at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org --- Comment #3 from Tamar Christina <tnfchris at gcc dot gnu.org> --- Looks like this started with commit d846f225c25c5885250c303c8d118caa08c447ab Author: Richard Biener <rguent...@suse.de> Date: Tue May 4 15:51:20 2021 +0200 tree-optimization/79333 - fold stmts following SSA edges in VN This makes sure to follow SSA edges when folding eliminated stmts. This reaps the same benefit as forwprop folding all stmts, not waiting for one to produce copysign in the new testcase. 2021-05-04 Richard Biener <rguent...@suse.de> PR tree-optimization/79333 * tree-ssa-sccvn.c (eliminate_dom_walker::eliminate_stmt): Fold stmt following SSA edges. * gcc.dg/tree-ssa/ssa-fre-94.c: New testcase. * gcc.dg/graphite/fuse-1.c: Adjust. * gcc.dg/pr43864-4.c: Likewise. and what's happening is that the vectorize relies on a mask A and it's inverse ~A be represented by a negation of the mask. Ifcvt used to enforce this but with the change it now pushes the ~ into the mask operation if it can. So previously we would generate _26 = ~_25; _43 = ~_44; out of ifcvt and how we generate _26 = _6 == 0; _43 = _4 == 0; and force the creation of two new extra mask as it de-optimizes the vectorizers ability to immediately see a mask invert. We however still detect that those two are inverses of _25 = _6 != 0; _44 = _4 != 0; and when generating the second VEC_COND for the operation we end up flipping the arguments somehow ------>vectorizing statement: _ifc__41 = _43 ? 0 : _ifc__40; created new init_stmt: vect_cst__136 = { 0, ... } add new stmt: _137 = mask__43.26_135 & loop_mask_111 note: add new stmt: vect__ifc__41.27_138 = VEC_COND_EXPR <_137, vect__ifc__40.25_133, vect_cst__136>; so we've vectorized_ifc__41 = _43 ?_ifc__40 : 0; instead without negating _137 which is where the contradiction gets introduced. I'll fix that bug, but the question remains whether we want this simplification to now happen in ifcvt for masks. It makes the vectorizer generate a lot more intermediate masks that are cleaned up by the RPO pass I added at the end but we also lose the fact that they are simple inverses, i.e. at -O3 on these integer masks we could have just generated a NOT.