mp2decoddata2 with -O3

rguenth at gcc dot gnu.org via Gcc-bugs Thu, 15 Apr 2021 00:17:28 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-04-15
             Status|UNCONFIRMED                 |NEW
            Summary|[11 Performance regression  |[11 Regression] 30%
                   |] 30% for                   |performance regression for
                   |denbench/mp2decoddata2 with |denbench/mp2decoddata2 with
                   |-O3                         |-O3
   Target Milestone|---                         |11.0

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Indeed loop vectorization throws if-converted bodies at the BB vectorizer as a
last resort (because BB vectorization doesn't do if-conversion itself).  But
the BB vectorizer then uses the if-converted scalar code as the thing to
cost against (costing against the not if-converted loop body isn't really
possible).  To quote

      /* If we applied if-conversion then try to vectorize the
         BB of innermost loops.
         ???  Ideally BB vectorization would learn to vectorize
         control flow by applying if-conversion on-the-fly, the
         following retains the if-converted loop body even when
         only non-if-converted parts took part in BB vectorization.  */
      if (flag_tree_slp_vectorize != 0
          && loop_vectorized_call
          && ! loop->inner)
        {

as a "hack" we could see to scalar cost the always executed part of
the not if-converted loop body and apply the full bias of this cost
vs. the scalar cost of the if-converted body to the scalar cost of the
BB vectorization.  But that's really apples-to-oranges in the end
(as it is now).

Maybe we can cost the whole partly vectorized loop body in this mode
and compare it against the scalar cost of the original loop.  But even
the loop vectorizer costs the if-converted scalar loop, so it is off as well.

Long-term if-conversion needs to be integrated with vectorization so we
can at least keep track of what stmts were originally executed conditional
and what not.

Short-term I'm not sure we can do much.  Doing SLP on the if-converted
body does help in quite some cases.

[Bug tree-optimization/100089] [11 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3

Reply via email to