Re: Patch ping Re: [PATCH] icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]

Jan Hubicka Thu, 14 Mar 2024 06:55:12 -0700

> > int test (int a)
> > {
> > return a>0 ? CST1:  CST2;
> > }
> > 
> > gets same hash value no matter what CST1/CST2 is.  I added hasher and I
> > am re-running stats.
> 
> The hash should be commutative here at least.


It needs to match what comparator is doing later, and sadly it does not
try to find the right order of basic blocks.  I added TODO and will fix
it next stage1.

With the attached patch we have no comparsion with mismatching VRP
ranges with non-LTO bootstrap and there are 4 instances of mismatched
jump function with LTO bootstrap.

I checked them and these are functions that are still different and 
would not be unified anyway.  In testsuite we have
 gcc.c-torture/execute/builtin-prefetch-4.c
   this matches
   [irange] long unsigned int [0, 2147483647][18446744071562067968, +INF]
   [irange] long unsigned int [0, 2147483646][18446744071562067968, +INF]
   in postdec_glob_idx/22new vs predec_glob_idx/20

   The functions are different, but have precisely same statements, just
   differ by SSA names that we do not hash.
 c-c++-common/builtins.c
   [irange] long unsigned int [2, 2147483647] MASK 0x7ffffffe VALUE 0x0
   [irange] long unsigned int [8, 2147483647] MASK 0x7ffffff8 VALUE 0x0
   in bar.part.0/10new  foo.part.0/9

   This is a real mismatch
 23_containers/vector/cons/2.cc
   [irange] long unsigned int [1, 9223372036854775807] MASK 0x7fffffffffffffff 
VALUE 0x0
   [irange] long unsigned int [1, 9223372036854775807] MASK 0xfffffffffffffff8 
VALUE 0x0
   static std::vector<_Tp, _Alloc>::pointer std::vector<_Tp, 
_Alloc>::_S_do_relocate(pointer, pointer, pointer, _Tp_alloc_type&, 
std::true_type) [with _Tp = A<B>; _Alloc = std::allocator<A<B> >]/3482
   static std::vector<_Tp, _Alloc>::pointer std::vector<_Tp, 
_Alloc>::_S_do_relocate(pointer, pointer, pointer, _Tp_alloc_type&, 
std::true_type) [with _Tp = double; _Alloc = std::allocator<double>]/3372

   Here funtions are different initially but are optimized to same body
   while we keep info about ranges from optimized out code.

   I tried to make easy testcase, but

   int a;
   a = (char)a;

   is not optimized away, I wonder why, when for all valid values this
   is noop?

 25_algorithms/lexicographical_compare/uchar.cc
   [irange] long unsigned int [8, +INF] MASK 0xfffffffffffffff8 VALUE 0x0
   [irange] long unsigned int [4, +INF] MASK 0xfffffffffffffffc VALUE 0x0
   static constexpr bool std::__equal<true>::equal(const _Tp*, const _Tp*, 
const _Tp*) [with _Tp = long int]/13669
   static constexpr bool std::__equal<true>::equal(const _Tp*, const _Tp*, 
const _Tp*) [with _Tp = int]/13663
 std/ranges/conv/1.cc
   [irange] long unsigned int [8, +INF] MASK 0xfffffffffffffff8 VALUE 0x0
   [irange] long unsigned int [4, +INF] MASK 0xfffffffffffffffc VALUE 0x0
   static constexpr bool std::__equal<true>::equal(const _Tp*, const _Tp*, 
const _Tp*) [with _Tp = long int]/13757
   static constexpr bool std::__equal<true>::equal(const _Tp*, const _Tp*, 
const _Tp*) [with _Tp = int]/13751

   Those two are also happening in llvm.  We try to merge two functions
   which take pointer parameters of different type.

        boolD.2783 std::__equal<true>::equal<int>D.426577 (const intD.9 * 
__first1D.433380, const intD.9 * __last1D.433381, const intD.9 * 
__first2D.433382)
        { 
          const intD.9 * __first1_4(D) = __first1D.433380;
          const intD.9 * __last1_3(D) = __last1D.433381;
          const intD.9 * __first2_6(D) = __first2D.433382;
          long intD.12 _1;
          boolD.2783 _2;
          long unsigned intD.16 _7;
          boolD.2783 _8;
          intD.9 _9;

          _1 = __last1_3(D) - __first1_4(D);
          if (_1 != 0)
            goto <bb 3>; [50.00%]
          else
            goto <bb 4>; [50.00%]

          # RANGE [irange] long unsigned int [4, +INF] MASK 0xfffffffffffffffc 
VALUE 0x0
          _7 = (long unsigned intD.16) _1;

  Compared to

        boolD.2783 std::__equal<true>::equal<long int>D.426685 (const long 
intD.12 * __first1D.433472, const long intD.12 * __last1D.433473, const long 
intD.12 * __first2D.433474)
        {
          const long intD.12 * __first1_4(D) = __first1D.433472;
          const long intD.12 * __last1_3(D) = __last1D.433473;
          const long intD.12 * __first2_6(D) = __first2D.433474;
          long intD.12 _1;
          boolD.2783 _2;
          long unsigned intD.16 _7;
          boolD.2783 _8;
          intD.9 _9;

          _1 = __last1_3(D) - __first1_4(D);
          if (_1 != 0)
            goto <bb 3>; [50.00%]
          else
            goto <bb 4>; [50.00%]

          # RANGE [irange] long unsigned int [8, +INF] MASK 0xfffffffffffffff8 
VALUE 0x0
          _7 = (long unsigned intD.16) _1;

   This looks like potentially dangerous situation, since if we can
   derive range from pointed-to type, then ICF needs to match them too.

   However with
   #include <stdlib.h>
   size_t diff (long *a, long *b)
   {
     long int d = (b - a) * sizeof (long);
     if (!d)
       abort ();
     return d;
   }
   size_t diff2 (int *a, int *b)
   {
     long int d = (b - a) * sizeof (int);
     if (!d)
       abort ();
     return d;
   }

   I get range [1, +INF]

   What seems to be happening in the testcase is that we inline
    intD.9 std::__memcmp<long int, long int>D.433477 (const long intD.12 * 
__first1D.438292, const long intD.12 * __first2D.438293, size_tD.2817 
__numD.438294)
    { 
      # PT = nonlocal null
      const long intD.12 * __first1_1(D) = __first1D.438292;
      # PT = nonlocal null
      const long intD.12 * __first2_2(D) = __first2D.438293;
      size_tD.2817 __num_3(D) = __numD.438294;
      long unsigned intD.16 _5;
      intD.9 _6;

    ;;   basic block 2, loop depth 0, count 1073741824 (estimated locally, freq 
1.0000), maybe hot
    ;;    prev block 0, next block 1, flags: (NEW, VISITED)
    ;;    pred:       ENTRY [always]  count:1073741824 (estimated locally, freq 
1.0000) (FALLTHRU,EXECUTABLE)
      # RANGE [irange] long unsigned int [0, 0][8, +INF] MASK 
0xfffffffffffffff8 VALUE 0x0
      _5 = __num_3(D) * 8;
   So the range here is determined from multiplication by 8 and while this
   multiplication is later optimized out after inlining range remains.
   So VRP should be unable to determine the range again after ICF.

> 
> The obvious thing would be range info making it possible to prove
> a stmt cannot trap, say for an array index or for arithmetic with
> -ftrapv.  But I'm not sure that makes a differnce for pure/const-ness.

I am looking into this.
> 
> Making something looping vs. non-looping should be easy though,
> just have range info for a loop with an unsigned IV that evolves
> like { 0, +, increment } and with a != exit condition where for some 
> 'increment' we know we eventually reach equality but with some
> other we know we never do.

Yep, but here we seem to be safe since finite_p flag of every loop is
compared.
> 
> But that just means pure/const state needs to be recomputed after
> merging?  Or compared.

Recomputing summaries is also possible (by triggering node removal and
node insertion pair), but I would rather stay with oriignal summaries if
possible.

This is the patch for hashing PHI nodes.  It can be abused to trigger
quadratic behaviour, so I plan to commit it.

Bootstrapped/regtested x86_64-linux.

gcc/ChangeLog:

        * ipa-icf.cc (sem_function::init): Hash PHI operands.
        (sem_function::compare_phi_node): Add comment.

diff --git a/gcc/ipa-icf.cc b/gcc/ipa-icf.cc
index 5d5a42f9c6c..a0266755992 100644
--- a/gcc/ipa-icf.cc
+++ b/gcc/ipa-icf.cc
@@ -1387,6 +1387,23 @@ sem_function::init (ipa_icf_gimple::func_checker 
*checker)
          cfg_checksum = iterative_hash_host_wide_int (e->flags,
                         cfg_checksum);
 
+       /* TODO: We should be able to match PHIs with different order of
+          parameters.  This needs to be also updated in
+          sem_function::compare_phi_node.  */
+       gphi_iterator si;
+       for (si = gsi_start_nonvirtual_phis (bb); !gsi_end_p (si);
+            gsi_next_nonvirtual_phi (&si))
+         {
+           hstate.add_int (GIMPLE_PHI);
+           gphi *phi = si.phi ();
+           m_checker->hash_operand (gimple_phi_result (phi), hstate, 0,
+                                    func_checker::OP_NORMAL);
+           hstate.add_int (gimple_phi_num_args (phi));
+           for (unsigned int i = 0; i < gimple_phi_num_args (phi); i++)
+             m_checker->hash_operand (gimple_phi_arg_def (phi, i),
+                                      hstate, 0, func_checker::OP_NORMAL);
+         }
+
        for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
             gsi_next (&gsi))
          {
@@ -1579,6 +1596,8 @@ sem_function::compare_phi_node (basic_block bb1, 
basic_block bb2)
       if (size1 != size2)
        return return_false ();
 
+      /* TODO: We should be able to match PHIs with different order of
+        parameters.  This needs to be also updated in sem_function::init.  */
       for (i = 0; i < size1; ++i)
        {
          t1 = gimple_phi_arg (phi1, i)->def;

Re: Patch ping Re: [PATCH] icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions [PR113907]

Reply via email to