On 2/16/23 02:55, Richard Biener wrote:
On Wed, Feb 15, 2023 at 6:07 PM Andrew MacLeod via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
This patch implements the suggestion that we have an alternative
ssa-cache which does not zero memory, and instead uses a bitmap to track
whether a value is currently set or not.  It roughly mimics what
path_range_query was doing internally.

For sparsely used cases, expecially in large programs, this is more
efficient.  I changed path_range_query to use this, and removed it old
bitmap (and a hack or two around PHI calculations), and also utilized
this is the assume_query class.

Performance wise, the patch doesn't affect VRP (since that still uses
the original version).  Switching to the lazy version caused a slowdown
of 2.5% across VRP.

There was a noticeable improvement elsewhere.,  across 230 GCC source
files, threading ran over 12% faster!.  Overall compilation improved by
0.3%  Not sure it makes much difference in compiler.i, but it shouldn't
hurt.

bootstraps on x86_64-pc-linux-gnu with no regressions.   OK for trunk?
or do you want to wait for the next release...
I see

@@ -365,16 +335,8 @@ path_range_query::compute_ranges_in_phis (basic_block bb)

        Value_Range r (TREE_TYPE (name));
        if (range_defined_in_block (r, name, bb))
-       {
-         unsigned v = SSA_NAME_VERSION (name);
-         set_cache (r, name);
-         bitmap_set_bit (phi_set, v);
-         // Pretend we don't have a cache entry for this name until
-         // we're done with all PHIs.
-         bitmap_clear_bit (m_has_cache_entry, v);
-       }
+       m_cache.set_global_range (name, r);
      }
-  bitmap_ior_into (m_has_cache_entry, phi_set);
  }

  // Return TRUE if relations may be invalidated after crossing edge E.

which I think is not correct - if we have

  # _1 = PHI <..., _2>
  # _2 = PHI <..., _1>

then their effects are supposed to be executed in parallel, that is,
both PHI argument _2 and _1 are supposed to see the "old" version.
The previous code tried to make sure the range of the new _1 doesn't
get seen when processing the argument _1 in the definition of _2.

The new version drops this, possibly resulting in wrong-code.

This is dropped because it is actually handled properly in range_defined_in_block now.  (which I think Aldy was describing).

It didnt make sense to me why it was handled here like this, so I traced through the call chain to find out if it was still actually needed and discussed it with Aldy.  I think it was mostly a leftover wart.


While I think it's appropriate to sort out compile-time issues like this
during stage4 at least the above makes me think it should be defered
to next stage1.

I am happy to defer it since its a marginal increase anyway.

Andrew


Reply via email to