[Bug target/104686] [12 Regression] Huge compile-time regression building SPEC 2017 538.imagick_r with -march=skylake

vmakarov at gcc dot gnu.org via Gcc-bugs Tue, 01 Mar 2022 13:59:37 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104686


--- Comment #19 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #16)
> it doesn't make a difference for this testcase but profiling shows that
> allocnos_conflict_p is quite expensive so it's best to do it after the other
> continue checks like the following.  I also notice that the comment of
> allocnos_conflict_p says
> 
> /* Return TRUE if allocnos A1 and A2 conflicts. Here we are
>    interesting only in conflicts of allocnos with intersected allocno
>    classes. */
> 
> so doing it after the ira_reg_classes_intersect_p check makes even more
> sense(?)
> 
> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> index 8b6db1bb417..a5fd79484eb 100644
> --- a/gcc/ira-color.cc
> +++ b/gcc/ira-color.cc
> @@ -1572,15 +1572,14 @@ update_conflict_hard_regno_costs (int *costs, enum
> reg_class aclass,
>         else
>           gcc_unreachable ();
>  
> +       another_aclass = ALLOCNO_CLASS (another_allocno);
>         if (another_allocno == from
> +           || ALLOCNO_ASSIGNED_P (another_allocno)
> +           || ALLOCNO_COLOR_DATA (another_allocno)->may_be_spilled_p
> +           || ! ira_reg_classes_intersect_p[aclass][another_aclass]
>             || allocnos_conflict_p (another_allocno, start))
>           continue;
>  
> -       another_aclass = ALLOCNO_CLASS (another_allocno);
> -       if (! ira_reg_classes_intersect_p[aclass][another_aclass]
> -           || ALLOCNO_ASSIGNED_P (another_allocno)
> -           || ALLOCNO_COLOR_DATA (another_allocno)->may_be_spilled_p)
> -         continue;
>         class_size = ira_class_hard_regs_num[another_aclass];
>         ira_allocate_and_copy_costs
>           (&ALLOCNO_UPDATED_CONFLICT_HARD_REG_COSTS (another_allocno),
> 
> 

If it is allocnos_conflict_p takes significant time, this change definitely has
sense.  On my estimation it will decrease allocnos_conflict_p calls in about 4
times (assuming fp and int reg classes and half allocnos already assigned).

In any case, the above change is profitable as allocnos_conflict_p practically
always takes more time than the condition tests moved up.

> Now, what's more odd is that we sometimes have a nice bitmap representation
> for the conflicts but we always iterate.  So it _seems_ we should be able
> to do sth like
> 
> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> index 8b6db1bb417..682d1ef7562 100644
> --- a/gcc/ira-color.cc
> +++ b/gcc/ira-color.cc
> @@ -1352,9 +1352,23 @@ allocnos_conflict_p (ira_allocno_t a1, ira_allocno_t
> a2)
>      {
>        obj = ALLOCNO_OBJECT (a1, word);
>        /* Take preferences of conflicting allocnos into account.  */
> -      FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci)
> -       if (OBJECT_ALLOCNO (conflict_obj) == a2)
> -         return true;
> +      if  (!OBJECT_CONFLICT_VEC_P (obj))
> +       {
> +         for (int w2 = 0; w2 < ALLOCNO_NUM_OBJECTS (a2); w2++)
> +           {
> +             ira_object_t obj2 = ALLOCNO_OBJECT (a2, w2);
> +             if (OBJECT_CONFLICT_ID (obj2) >= OBJECT_MIN (obj)
> +                 && OBJECT_CONFLICT_ID (obj2) <= OBJECT_MAX (obj)
> +                 && TEST_MINMAX_SET_BIT (OBJECT_CONFLICT_BITVEC (obj),
> +                                         OBJECT_CONFLICT_ID (obj2),
> +                                         OBJECT_MIN (obj), OBJECT_MAX
> (obj)))
> +               return true;
> +           }
> +       }
> +      else
> +       FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci)
> +         if (OBJECT_ALLOCNO (conflict_obj) == a2)
> +           return true;
>      }
>    return false;
>  }  
> 
> which reduces compile-time from 10s to 1s for me ... the above should
> be split out so we can "optimally" use the bit test for
> object vs. allocno when possible.
> 
> Vlad - any thoughts about the above two things?  Shall I try to polish and
> optimize the bit test or would you be willing to pick those two speedups up?

This change also has sense.  Usually for big functions conflict sets are very
sparse and bit vectors are not used.  But it seems this is not the case for the
PR.

Please, polish and optimize the change as you proposed and I approve the final
version promptly.

Thank you for working on this PR, Richard.

[Bug target/104686] [12 Regression] Huge compile-time regression building SPEC 2017 538.imagick_r with -march=skylake

Reply via email to