On 11/12/23 07:08, Lehua Ding wrote:
gcc/ChangeLog:

        * hard-reg-set.h (struct HARD_REG_SET): New shift operator.
        * ira-build.cc (ira_create_object): Adjust.
        (find_object): New.
        (find_object_anyway): New.
        (ira_create_allocno): Adjust.
        (get_range): New.
        (ira_copy_allocno_objects): New.
        (merge_hard_reg_conflicts): Adjust copy.
        (create_cap_allocno): Adjust.
        (find_subreg_p): New.
        (add_subregs): New.
        (create_insn_allocnos): Collect subreg.
        (create_bb_allocnos): Ditto.
        (move_allocno_live_ranges): Adjust.
        (copy_allocno_live_ranges): Adjust.
        (setup_min_max_allocno_live_range_point): Adjust.
        * ira-color.cc (INCLUDE_MAP): include map.
        (setup_left_conflict_sizes_p): Adjust conflict size.
        (setup_profitable_hard_regs): Adjust.
        (get_conflict_and_start_profitable_regs): Adjust.
        (check_hard_reg_p): Adjust conflict check.
        (assign_hard_reg): Adjust.
        (push_allocno_to_stack): Adjust conflict size.
        (improve_allocation): Adjust.
        * ira-conflicts.cc (record_object_conflict): Simplify.
        (build_object_conflicts): Adjust.
        (build_conflicts): Adjust.
        (print_allocno_conflicts): Adjust.
        * ira-emit.cc (modify_move_list): Adjust.
        * ira-int.h (struct ira_object): Adjust struct.
        (struct ira_allocno): Adjust struct.
        (ALLOCNO_NUM_OBJECTS): New accessor.
        (ALLOCNO_UNIT_SIZE): Ditto.
        (ALLOCNO_TRACK_SUBREG_P): Ditto.
        (ALLOCNO_NREGS): Ditto.
        (OBJECT_SUBWORD): Ditto.
        (OBJECT_INDEX): Ditto.
        (OBJECT_START): Ditto.
        (OBJECT_NREGS): Ditto.
        (find_object): Exported.
        (find_object_anyway): Ditto.
        (ira_copy_allocno_objects): Ditto.
        (has_subreg_object_p): Ditto.
        (get_full_object): Ditto.
        * ira-lives.cc (INCLUDE_VECTOR): Include vector.
        (add_onflict_hard_regs): New.
        (add_onflict_hard_reg): New.
        (make_hard_regno_dead): Adjust.
        (make_object_live): Adjust.
        (update_allocno_pressure_excess_length): Adjust.
        (make_object_dead): Adjust.
        (mark_pseudo_regno_live): Adjust.
        (add_subreg_point): New.
        (mark_pseudo_object_live): Adjust.
        (mark_pseudo_regno_subword_live): Adjust.
        (mark_pseudo_regno_subreg_live): Adjust.
        (mark_pseudo_regno_subregs_live): Adjust.
        (mark_pseudo_reg_live): Adjust.
        (mark_pseudo_regno_dead): Adjust.
        (mark_pseudo_object_dead): Adjust.
        (mark_pseudo_regno_subword_dead): Adjust.
        (mark_pseudo_regno_subreg_dead): Adjust.
        (mark_pseudo_reg_dead): Adjust.
        (process_single_reg_class_operands): Adjust.
        (process_out_of_region_eh_regs): Adjust.
        (add_conflict_from_region_landing_pads): Adjust.
        (process_bb_node_lives): Adjust.
        (class subreg_live_item): New class.
        (create_subregs_live_ranges): New function.
        (ira_create_allocno_live_ranges): Adjust.
        * ira.cc (check_allocation): Adjust.

Again changeLog is too ambiguous.  Adjust to what?  For example, you should write what members are added to a structure instead of just "adjust structure".

General issues to your patch which I wrote in my 1st email are applicable here too.  Especially I'd mention typos in function names add_onflict_hard_reg[s].

There are too many changes should be done. So I'd like to see a new version of the patch.

---
  gcc/hard-reg-set.h   |  33 +++
  gcc/ira-build.cc     | 235 +++++++++++++++++---
  gcc/ira-color.cc     | 302 +++++++++++++++++---------
  gcc/ira-conflicts.cc |  48 ++---
  gcc/ira-emit.cc      |   2 +-
  gcc/ira-int.h        |  57 ++++-
  gcc/ira-lives.cc     | 500 ++++++++++++++++++++++++++++++++-----------
  gcc/ira.cc           |  52 ++---
  8 files changed, 907 insertions(+), 322 deletions(-)

--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
  <http://www.gnu.org/licenses/>.  */
#include "config.h"
+#define INCLUDE_MAP
  #include "system.h"
  #include "coretypes.h"
  #include "backend.h"
@@ -852,18 +853,17 @@ setup_left_conflict_sizes_p (ira_allocno_t a)
    node_preorder_num = node->preorder_num;
    node_set = node->hard_regs->set;
    node_check_tick++;
+  /* Collect conflict objects.  */
+  std::map<int, bitmap> allocno_conflict_regs;
    for (k = 0; k < nobj; k++)
      {
        ira_object_t obj = ALLOCNO_OBJECT (a, k);
        ira_object_t conflict_obj;
        ira_object_conflict_iterator oci;
-
+
        FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci)
        {
-         int size;
-         ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
-         allocno_hard_regs_node_t conflict_node, temp_node;
-         HARD_REG_SET conflict_node_set;
+         ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
          allocno_color_data_t conflict_data;
conflict_data = ALLOCNO_COLOR_DATA (conflict_a);
@@ -872,6 +872,24 @@ setup_left_conflict_sizes_p (ira_allocno_t a)
                                             conflict_data
                                             ->profitable_hard_regs))
            continue;
+         int num = ALLOCNO_NUM (conflict_a);
+         if (allocno_conflict_regs.count (num) == 0)
+           allocno_conflict_regs.insert ({num, ira_allocate_bitmap ()});
+         bitmap_head temp;
+         bitmap_initialize (&temp, &reg_obstack);
+         bitmap_set_range (&temp, OBJECT_START (conflict_obj),
+                           OBJECT_NREGS (conflict_obj));
+         bitmap_and_compl_into (&temp, allocno_conflict_regs.at (num));
+         int size = bitmap_count_bits (&temp);
+         bitmap_clear (&temp);
+         if (size == 0)
+           continue;
+
+         bitmap_set_range (allocno_conflict_regs.at (num),
+                           OBJECT_START (conflict_obj),
+                           OBJECT_NREGS (conflict_obj));
+         allocno_hard_regs_node_t conflict_node, temp_node;
+         HARD_REG_SET conflict_node_set;
          conflict_node = conflict_data->hard_regs_node;
          conflict_node_set = conflict_node->hard_regs->set;
          if (hard_reg_set_subset_p (node_set, conflict_node_set))
@@ -886,14 +904,13 @@ setup_left_conflict_sizes_p (ira_allocno_t a)
              temp_node->check = node_check_tick;
              temp_node->conflict_size = 0;
            }
-         size = (ira_reg_class_max_nregs
-                 [ALLOCNO_CLASS (conflict_a)][ALLOCNO_MODE (conflict_a)]);
-         if (ALLOCNO_NUM_OBJECTS (conflict_a) > 1)
-           /* We will deal with the subwords individually.  */
-           size = 1;
          temp_node->conflict_size += size;
        }
      }
+  /* Setup conflict nregs of ALLOCNO.  */
+  for (auto &kv : allocno_conflict_regs)
+    ira_free_bitmap (kv.second);
+
    for (i = 0; i < data->hard_regs_subnodes_num; i++)
      {
        allocno_hard_regs_node_t temp_node;
These changes are terrible.  Creating and
destroying 2 bitmaps (allocating and freeing them) for most allocnos
are too much.  First of all you can use hard reg set instead of
bitmaps.  Using maps is also too bad (maps you introduced in your patches are
responsible for > 10% compilation time slowdown).  First of all it is
better to use an array where allocno number is an index.  But even
better you can avoid any data structure if you make the hard reg set
is a part of the allocno.


@@ -2667,21 +2763,16 @@ push_allocno_to_stack (ira_allocno_t a)
  {
    enum reg_class aclass;
    allocno_color_data_t data, conflict_data;
-  int size, i, n = ALLOCNO_NUM_OBJECTS (a);
-
+  int i, n = ALLOCNO_NUM_OBJECTS (a);
+
    data = ALLOCNO_COLOR_DATA (a);
    data->in_graph_p = false;
    allocno_stack_vec.safe_push (a);
    aclass = ALLOCNO_CLASS (a);
    if (aclass == NO_REGS)
      return;
-  size = ira_reg_class_max_nregs[aclass][ALLOCNO_MODE (a)];
-  if (n > 1)
-    {
-      /* We will deal with the subwords individually.  */
-      gcc_assert (size == ALLOCNO_NUM_OBJECTS (a));
-      size = 1;
-    }
+  /* Already collect conflict objects.  */
+  std::map<int, bitmap> allocno_conflict_regs;
    for (i = 0; i < n; i++)
      {
        ira_object_t obj = ALLOCNO_OBJECT (a, i);
@@ -2706,6 +2797,21 @@ push_allocno_to_stack (ira_allocno_t a)
            continue;
          ira_assert (bitmap_bit_p (coloring_allocno_bitmap,
                                    ALLOCNO_NUM (conflict_a)));
+
+         int num = ALLOCNO_NUM (conflict_a);
+         if (allocno_conflict_regs.count (num) == 0)
+           allocno_conflict_regs.insert ({num, ira_allocate_bitmap ()});
+         bitmap_head temp;
+         bitmap_initialize (&temp, &reg_obstack);
+         bitmap_set_range (&temp, OBJECT_START (obj), OBJECT_NREGS (obj));
+         bitmap_and_compl_into (&temp, allocno_conflict_regs.at (num));
+         int size = bitmap_count_bits (&temp);
+         bitmap_clear (&temp);
+         if (size == 0)
+           continue;
+
+         bitmap_set_range (allocno_conflict_regs.at (num), OBJECT_START (obj),
+                           OBJECT_NREGS (obj));
          if (update_left_conflict_sizes_p (conflict_a, a, size))
            {
              delete_allocno_from_bucket
@@ -2721,6 +2827,9 @@ push_allocno_to_stack (ira_allocno_t a)
        
        }
      }
+
+  for (auto &kv : allocno_conflict_regs)
+    ira_free_bitmap (kv.second);
  }

The same as above.  The code can be and should be done more performant.

diff --git a/gcc/ira-conflicts.cc b/gcc/ira-conflicts.cc
index a4d93c8d734..0585ad10043 100644
--- a/gcc/ira-conflicts.cc
+++ b/gcc/ira-conflicts.cc
diff --git a/gcc/ira-int.h b/gcc/ira-int.h
index 0685e1f4e8d..9095a8227f7 100644
--- a/gcc/ira-int.h
+++ b/gcc/ira-int.h
@@ -222,11 +224,13 @@ extern int ira_max_point;
  extern live_range_t *ira_start_point_ranges, *ira_finish_point_ranges;
/* A structure representing conflict information for an allocno
-   (or one of its subwords).  */
+   (or one of its subregs).  */
  struct ira_object
  {
    /* The allocno associated with this record.  */
    ira_allocno_t allocno;
+  /* Index in allocno->objects array */
+  unsigned int index;
    /* Vector of accumulated conflicting conflict_redords with NULL end
       marker (if OBJECT_CONFLICT_VEC_P is true) or conflict bit vector
       otherwise.  */
@@ -387,8 +393,11 @@ struct ira_allocno
    /* An array of structures describing conflict information and live
       ranges for each object associated with the allocno.  There may be
       more than one such object in cases where the allocno represents a
-     multi-word register.  */
-  ira_object_t objects[2];
+     multi-hardreg pesudo.  */
+  std::vector<ira_object_t> objects;
+  /* An array of structures decribing the subreg mode start and subreg end for
+     this allocno.  */
+  std::vector<subreg_range> subregs;

I like Richard Sandiford's idea to use different representation for 1
object allocno.  Vector is too much for the most cases.  Even 1 object
vector is the same as array `objects[4]` when taking malloc internal data, plus
it is one redirection and worse data locality as it is allocated at
the different time of allocno one. Initial capacity of vector as I
understand is not defined by the standard although it is most probably
zero, so for subregs is less critical for most common case (no
subregs).  I think for such critical data, it is even better don't use
std::vect as its behaviour is not defined fully.  All of this can be
fixed later.

    /* Registers clobbered by intersected calls.  */
     HARD_REG_SET crossed_calls_clobbered_regs;
    /* Array of usage costs (accumulated and the one updated during
diff --git a/gcc/ira-lives.cc b/gcc/ira-lives.cc
index 05e2be12a26..9ca9e5548da 100644
--- a/gcc/ira-lives.cc
+++ b/gcc/ira-lives.cc
@@ -98,6 +103,33 @@ make_hard_regno_live (int regno)
    SET_HARD_REG_BIT (hard_regs_live, regno);
  }
+/* Update conflict hard regs of ALLOCNO a for current live part. */
+static void
+add_onflict_hard_regs (ira_allocno_t a, HARD_REG_SET regs)
+{
+  gcc_assert (has_subreg_object_p (a));
+
+  if (subreg_live_points->subreg_live_ranges.count (ALLOCNO_NUM (a)) == 0)
+    return;
+
+  for (const subreg_range &r :
+       subreg_live_points->subreg_live_ranges.at (ALLOCNO_NUM (a)).ranges)
+    {
+      ira_object_t obj = find_object_anyway (a, r.start, r.end - r.start);
+      OBJECT_CONFLICT_HARD_REGS (obj) |= regs;
+      OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= regs;
+    }
+}
+

Using C++ std::set subreg_ranges is bad with the performance point of view.  Building live range even before the patch took a lot of time.  Using std::set makes it much worse.  But it is not my call but the 1st patch (subreg-live-ranges) reviewer. In general after analysis of new RA version performance, I got an impression that using C++ stl for critical data should be avoided in GCC.

So I'd like to see a new version of the patch after fixing issues I mentioned here before its approval. Thank you.


Reply via email to