> > int test (int a) > > { > > return a>0 ? CST1: CST2; > > } > > > > gets same hash value no matter what CST1/CST2 is. I added hasher and I > > am re-running stats. > > The hash should be commutative here at least.
It needs to match what comparator is doing later, and sadly it does not try to find the right order of basic blocks. I added TODO and will fix it next stage1. With the attached patch we have no comparsion with mismatching VRP ranges with non-LTO bootstrap and there are 4 instances of mismatched jump function with LTO bootstrap. I checked them and these are functions that are still different and would not be unified anyway. In testsuite we have gcc.c-torture/execute/builtin-prefetch-4.c this matches [irange] long unsigned int [0, 2147483647][18446744071562067968, +INF] [irange] long unsigned int [0, 2147483646][18446744071562067968, +INF] in postdec_glob_idx/22new vs predec_glob_idx/20 The functions are different, but have precisely same statements, just differ by SSA names that we do not hash. c-c++-common/builtins.c [irange] long unsigned int [2, 2147483647] MASK 0x7ffffffe VALUE 0x0 [irange] long unsigned int [8, 2147483647] MASK 0x7ffffff8 VALUE 0x0 in bar.part.0/10new foo.part.0/9 This is a real mismatch 23_containers/vector/cons/2.cc [irange] long unsigned int [1, 9223372036854775807] MASK 0x7fffffffffffffff VALUE 0x0 [irange] long unsigned int [1, 9223372036854775807] MASK 0xfffffffffffffff8 VALUE 0x0 static std::vector<_Tp, _Alloc>::pointer std::vector<_Tp, _Alloc>::_S_do_relocate(pointer, pointer, pointer, _Tp_alloc_type&, std::true_type) [with _Tp = A<B>; _Alloc = std::allocator<A<B> >]/3482 static std::vector<_Tp, _Alloc>::pointer std::vector<_Tp, _Alloc>::_S_do_relocate(pointer, pointer, pointer, _Tp_alloc_type&, std::true_type) [with _Tp = double; _Alloc = std::allocator<double>]/3372 Here funtions are different initially but are optimized to same body while we keep info about ranges from optimized out code. I tried to make easy testcase, but int a; a = (char)a; is not optimized away, I wonder why, when for all valid values this is noop? 25_algorithms/lexicographical_compare/uchar.cc [irange] long unsigned int [8, +INF] MASK 0xfffffffffffffff8 VALUE 0x0 [irange] long unsigned int [4, +INF] MASK 0xfffffffffffffffc VALUE 0x0 static constexpr bool std::__equal<true>::equal(const _Tp*, const _Tp*, const _Tp*) [with _Tp = long int]/13669 static constexpr bool std::__equal<true>::equal(const _Tp*, const _Tp*, const _Tp*) [with _Tp = int]/13663 std/ranges/conv/1.cc [irange] long unsigned int [8, +INF] MASK 0xfffffffffffffff8 VALUE 0x0 [irange] long unsigned int [4, +INF] MASK 0xfffffffffffffffc VALUE 0x0 static constexpr bool std::__equal<true>::equal(const _Tp*, const _Tp*, const _Tp*) [with _Tp = long int]/13757 static constexpr bool std::__equal<true>::equal(const _Tp*, const _Tp*, const _Tp*) [with _Tp = int]/13751 Those two are also happening in llvm. We try to merge two functions which take pointer parameters of different type. boolD.2783 std::__equal<true>::equal<int>D.426577 (const intD.9 * __first1D.433380, const intD.9 * __last1D.433381, const intD.9 * __first2D.433382) { const intD.9 * __first1_4(D) = __first1D.433380; const intD.9 * __last1_3(D) = __last1D.433381; const intD.9 * __first2_6(D) = __first2D.433382; long intD.12 _1; boolD.2783 _2; long unsigned intD.16 _7; boolD.2783 _8; intD.9 _9; _1 = __last1_3(D) - __first1_4(D); if (_1 != 0) goto <bb 3>; [50.00%] else goto <bb 4>; [50.00%] # RANGE [irange] long unsigned int [4, +INF] MASK 0xfffffffffffffffc VALUE 0x0 _7 = (long unsigned intD.16) _1; Compared to boolD.2783 std::__equal<true>::equal<long int>D.426685 (const long intD.12 * __first1D.433472, const long intD.12 * __last1D.433473, const long intD.12 * __first2D.433474) { const long intD.12 * __first1_4(D) = __first1D.433472; const long intD.12 * __last1_3(D) = __last1D.433473; const long intD.12 * __first2_6(D) = __first2D.433474; long intD.12 _1; boolD.2783 _2; long unsigned intD.16 _7; boolD.2783 _8; intD.9 _9; _1 = __last1_3(D) - __first1_4(D); if (_1 != 0) goto <bb 3>; [50.00%] else goto <bb 4>; [50.00%] # RANGE [irange] long unsigned int [8, +INF] MASK 0xfffffffffffffff8 VALUE 0x0 _7 = (long unsigned intD.16) _1; This looks like potentially dangerous situation, since if we can derive range from pointed-to type, then ICF needs to match them too. However with #include <stdlib.h> size_t diff (long *a, long *b) { long int d = (b - a) * sizeof (long); if (!d) abort (); return d; } size_t diff2 (int *a, int *b) { long int d = (b - a) * sizeof (int); if (!d) abort (); return d; } I get range [1, +INF] What seems to be happening in the testcase is that we inline intD.9 std::__memcmp<long int, long int>D.433477 (const long intD.12 * __first1D.438292, const long intD.12 * __first2D.438293, size_tD.2817 __numD.438294) { # PT = nonlocal null const long intD.12 * __first1_1(D) = __first1D.438292; # PT = nonlocal null const long intD.12 * __first2_2(D) = __first2D.438293; size_tD.2817 __num_3(D) = __numD.438294; long unsigned intD.16 _5; intD.9 _6; ;; basic block 2, loop depth 0, count 1073741824 (estimated locally, freq 1.0000), maybe hot ;; prev block 0, next block 1, flags: (NEW, VISITED) ;; pred: ENTRY [always] count:1073741824 (estimated locally, freq 1.0000) (FALLTHRU,EXECUTABLE) # RANGE [irange] long unsigned int [0, 0][8, +INF] MASK 0xfffffffffffffff8 VALUE 0x0 _5 = __num_3(D) * 8; So the range here is determined from multiplication by 8 and while this multiplication is later optimized out after inlining range remains. So VRP should be unable to determine the range again after ICF. > > The obvious thing would be range info making it possible to prove > a stmt cannot trap, say for an array index or for arithmetic with > -ftrapv. But I'm not sure that makes a differnce for pure/const-ness. I am looking into this. > > Making something looping vs. non-looping should be easy though, > just have range info for a loop with an unsigned IV that evolves > like { 0, +, increment } and with a != exit condition where for some > 'increment' we know we eventually reach equality but with some > other we know we never do. Yep, but here we seem to be safe since finite_p flag of every loop is compared. > > But that just means pure/const state needs to be recomputed after > merging? Or compared. Recomputing summaries is also possible (by triggering node removal and node insertion pair), but I would rather stay with oriignal summaries if possible. This is the patch for hashing PHI nodes. It can be abused to trigger quadratic behaviour, so I plan to commit it. Bootstrapped/regtested x86_64-linux. gcc/ChangeLog: * ipa-icf.cc (sem_function::init): Hash PHI operands. (sem_function::compare_phi_node): Add comment. diff --git a/gcc/ipa-icf.cc b/gcc/ipa-icf.cc index 5d5a42f9c6c..a0266755992 100644 --- a/gcc/ipa-icf.cc +++ b/gcc/ipa-icf.cc @@ -1387,6 +1387,23 @@ sem_function::init (ipa_icf_gimple::func_checker *checker) cfg_checksum = iterative_hash_host_wide_int (e->flags, cfg_checksum); + /* TODO: We should be able to match PHIs with different order of + parameters. This needs to be also updated in + sem_function::compare_phi_node. */ + gphi_iterator si; + for (si = gsi_start_nonvirtual_phis (bb); !gsi_end_p (si); + gsi_next_nonvirtual_phi (&si)) + { + hstate.add_int (GIMPLE_PHI); + gphi *phi = si.phi (); + m_checker->hash_operand (gimple_phi_result (phi), hstate, 0, + func_checker::OP_NORMAL); + hstate.add_int (gimple_phi_num_args (phi)); + for (unsigned int i = 0; i < gimple_phi_num_args (phi); i++) + m_checker->hash_operand (gimple_phi_arg_def (phi, i), + hstate, 0, func_checker::OP_NORMAL); + } + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) { @@ -1579,6 +1596,8 @@ sem_function::compare_phi_node (basic_block bb1, basic_block bb2) if (size1 != size2) return return_false (); + /* TODO: We should be able to match PHIs with different order of + parameters. This needs to be also updated in sem_function::init. */ for (i = 0; i < size1; ++i) { t1 = gimple_phi_arg (phi1, i)->def;