Returning to this old thread...

Richard Sandiford <rdsandif...@googlemail.com> writes:
> Tejas Belagod <tbela...@arm.com> writes:
>> When I relaxed CANNOT_CHANGE_MODE_CLASS to undefined for AArch64, 
>> gcc.c-torture/execute/copysign1.c generates incorrect code because LRA 
>> cannot 
>> seem to handle subregs like
>>
>>   (subreg:DI (reg:TF hard_reg) 8)
>>
>> on hard registers where the subreg byte offset is unaligned to a hard 
>> register 
>> boundary(16 for AArch64). It seems to quietly ignore the 8 and resolves this 
>> to 
>> incorrect an hard register during reload.
>>
>> When I compile this test with -O3,
>>
>> long double
>> cl (long double x, long double y)
>> {
>>    return __builtin_copysignl (x, y);
>> }
>>
>> cs.c.213r.ira:
>>
>> (insn 26 10 33 2 (set (reg:DI 87 [ y+8 ])
>>          (subreg:DI (reg:TF 33 v1 [ y ]) 8)) cs.c:4 34 {*movdi_aarch64}
>>       (expr_list:REG_DEAD (reg:TF 33 v1 [ y ])
>>          (nil)))
>> (insn 33 26 35 2 (set (reg:TF 93)
>>          (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64}
>>       (expr_list:REG_DEAD (reg:TF 32 v0 [ x ])
>>          (nil)))
>> (insn 35 33 34 2 (set (reg:DI 92 [ x+8 ])
>>          (subreg:DI (reg:TF 93) 8)) cs.c:4 34 {*movdi_aarch64}
>>       (nil))
>> (insn 34 35 23 2 (set (reg:DI 91 [ x ])
>>          (subreg:DI (reg:TF 93) 0)) cs.c:4 34 {*movdi_aarch64}
>>       (expr_list:REG_DEAD (reg:TF 93)
>>          (nil)))
>> ....
>>
>> cs.c.214r.reload
>>
>> (insn 26 10 33 2 (set (reg:DI 2 x2 [orig:87 y+8 ] [87])
>>          (reg:DI 33 v1 [ y+8 ])) cs.c:4 34 {*movdi_aarch64}
>>       (nil))
>> (insn 33 26 35 2 (set (reg:TF 0 x0 [93])
>>          (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64}
>>       (nil))
>> (insn 35 33 34 2 (set (reg:DI 1 x1 [orig:92 x+8 ] [92])
>>          (reg:DI 1 x1 [+8 ])) cs.c:4 34 {*movdi_aarch64}
>>       (nil))
>> (insn 34 35 8 2 (set (reg:DI 0 x0 [orig:91 x ] [91])
>>          (reg:DI 0 x0 [93])) cs.c:4 34 {*movdi_aarch64}
>>       (nil))
>> .....
>>
>> You can see the changes to insn 26 before and after reload - the SUBREG_BYTE 
>> offset of 8 seems to have been translated to v0 instead of v0.d[1] by 
>> get_hard_regno ().
>>
>> What's interesting here is that the SUBREG_BYTE that is generated for
>>
>> (subreg:DI (reg:TF 33 v1 [ y ]) 8)
>>
>> isn't aligned to a hard register boundary on SIMD regs where UNITS_PER_VREG 
>> for 
>> AArch64 is 16. Therefore when this subreg is resolved, it resolves to v1 
>> instead 
>> of v1.d[1]. Is this something going wrong in LRA or is this a more 
>> fundamental 
>> problem with generating subregs of hard regs with unaligned subreg byte 
>> offsets? 
>> The same subreg on a pseudo works OK because in insn 33, the TF mode is 
>> allocated integer registers and all is well.
>
> I think this is the same problem that was being discussed for x86
> after your no-op vec-select patch:
>
>    http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00801.html
>
> and long following thread.
>
> I'd still like to solve this in a target-independent way rather than add
> an offset to CANNOT_CHANGE_MODE_CLASS, but I haven't had time to look at
> it...

FWIW, here's one possible approach.  The main part is to make the
invalid_mode_change code calculate a set of registers that are either
(a) invalid for the pseudo mode to begin with or (b) do not allow one
of the subregs to be taken (as calculated by simplify_subreg_regno,
which includes the original CANNOT_CHANGE_MODE_CLASS check).

One concern might be about compilation speed when collecting this info.
OTOH, the query is now genuinely constant time, whereas the old bitmap
test was O(num-pseudos) in the worst case.  It might also be possible
to speed things up by walking the subregs using the DF information,
if it's up-to-date at this point (haven't checked).  It would also be
possible to give an ID to each (inner mode, outer mode, byte) combination
and lazily cache the invalid register set for each one.

I went through the other uses of CANNOT_CHANGE_MODE_CLASS.  Most of them
were checking for lowpart mode changes so look safe.  The exception was
combine.c:subst.

This is really four patches squashed into one, but it's not ready to be
submitted yet.  Was just wondering whether this solved your problem.

Thanks,
Richard



*** /tmp/OCSP7f_combine.c       2014-03-11 07:34:37.928138693 +0000
--- gcc/combine.c       2014-03-10 21:39:09.428718086 +0000
*************** subst (rtx x, rtx from, rtx to, int in_d
*** 5082,5096 ****
                      )
                    return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
  
- #ifdef CANNOT_CHANGE_MODE_CLASS
                  if (code == SUBREG
                      && REG_P (to)
                      && REGNO (to) < FIRST_PSEUDO_REGISTER
!                     && REG_CANNOT_CHANGE_MODE_P (REGNO (to),
!                                                  GET_MODE (to),
!                                                  GET_MODE (x)))
                    return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
- #endif
  
                  new_rtx = (unique_copy && n_occurrences ? copy_rtx (to) : to);
                  n_occurrences++;
--- 5082,5094 ----
                      )
                    return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
  
                  if (code == SUBREG
                      && REG_P (to)
                      && REGNO (to) < FIRST_PSEUDO_REGISTER
!                     && simplify_subreg_regno (REGNO (to), GET_MODE (to),
!                                               SUBREG_BYTE (x),
!                                               GET_MODE (x)) < 0)
                    return gen_rtx_CLOBBER (VOIDmode, const0_rtx);
  
                  new_rtx = (unique_copy && n_occurrences ? copy_rtx (to) : to);
                  n_occurrences++;
*** /tmp/Cs4Cnh_i386.c  2014-03-11 07:34:37.953154637 +0000
--- gcc/config/i386/i386.c      2014-03-10 21:39:11.167734347 +0000
*************** ix86_cannot_change_mode_class (enum mach
*** 37439,37451 ****
         the vec_dupv4hi pattern.  */
        if (GET_MODE_SIZE (from) < 4)
        return true;
- 
-       /* Vector registers do not support subreg with nonzero offsets, which
-        are otherwise valid for integer registers.  Since we can't see
-        whether we have a nonzero offset from here, prohibit all
-          nonparadoxical subregs changing size.  */
-       if (GET_MODE_SIZE (to) < GET_MODE_SIZE (from))
-       return true;
      }
  
    return false;
--- 37439,37444 ----
*** /tmp/uUmuMg_ira-costs.c     2014-03-11 07:34:37.978170581 +0000
--- gcc/ira-costs.c     2014-03-10 21:39:10.292726165 +0000
*************** print_allocno_costs (FILE *f)
*** 1416,1425 ****
        {
          rclass = cost_classes[k];
          if (contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (regno)]
! #ifdef CANNOT_CHANGE_MODE_CLASS
!             && ! invalid_mode_change_p (regno, (enum reg_class) rclass)
! #endif
!             )
            {
              fprintf (f, " %s:%d", reg_class_names[rclass],
                       COSTS (costs, i)->cost[k]);
--- 1416,1422 ----
        {
          rclass = cost_classes[k];
          if (contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (regno)]
!             && ! invalid_mode_change_p (regno, (enum reg_class) rclass))
            {
              fprintf (f, " %s:%d", reg_class_names[rclass],
                       COSTS (costs, i)->cost[k]);
*************** print_pseudo_costs (FILE *f)
*** 1458,1467 ****
        {
          rclass = cost_classes[k];
          if (contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (regno)]
! #ifdef CANNOT_CHANGE_MODE_CLASS
!             && ! invalid_mode_change_p (regno, (enum reg_class) rclass)
! #endif
!             )
            fprintf (f, " %s:%d", reg_class_names[rclass],
                     COSTS (costs, regno)->cost[k]);
        }
--- 1455,1461 ----
        {
          rclass = cost_classes[k];
          if (contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (regno)]
!             && ! invalid_mode_change_p (regno, (enum reg_class) rclass))
            fprintf (f, " %s:%d", reg_class_names[rclass],
                     COSTS (costs, regno)->cost[k]);
        }
*************** find_costs_and_classes (FILE *dump_file)
*** 1703,1712 ****
              /* Ignore classes that are too small or invalid for this
                 operand.  */
              if (! contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (i)]
! #ifdef CANNOT_CHANGE_MODE_CLASS
!                 || invalid_mode_change_p (i, (enum reg_class) rclass)
! #endif
!                 )
                continue;
              if (i_costs[k] < best_cost)
                {
--- 1697,1703 ----
              /* Ignore classes that are too small or invalid for this
                 operand.  */
              if (! contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (i)]
!                 || invalid_mode_change_p (i, (enum reg_class) rclass))
                continue;
              if (i_costs[k] < best_cost)
                {
*************** find_costs_and_classes (FILE *dump_file)
*** 1786,1795 ****
                      /* Ignore classes that are too small or invalid
                         for this operand.  */
                      if (! contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (i)]
! #ifdef CANNOT_CHANGE_MODE_CLASS
!                         || invalid_mode_change_p (i, (enum reg_class) rclass)
! #endif
!                         )
                        ;
                      else if (total_a_costs[k] < best_cost)
                        {
--- 1777,1783 ----
                      /* Ignore classes that are too small or invalid
                         for this operand.  */
                      if (! contains_reg_of_mode[rclass][PSEUDO_REGNO_MODE (i)]
!                         || invalid_mode_change_p (i, (enum reg_class) rclass))
                        ;
                      else if (total_a_costs[k] < best_cost)
                        {
*** /tmp/MB8Pfk_reginfo.c       2014-03-11 07:34:37.995181423 +0000
--- gcc/reginfo.c       2014-03-10 21:39:10.292726165 +0000
*************** reg_classes_intersect_p (reg_class_t c1,
*** 1191,1303 ****
  }
  
  
  
! /* Passes for keeping and updating info about modes of registers
!    inside subregisters.  */
  
! #ifdef CANNOT_CHANGE_MODE_CLASS
  
! static bitmap invalid_mode_changes;
  
! static void
! record_subregs_of_mode (rtx subreg, bitmap subregs_of_mode)
  {
!   enum machine_mode mode;
!   unsigned int regno;
  
    if (!REG_P (SUBREG_REG (subreg)))
!     return;
! 
!   regno = REGNO (SUBREG_REG (subreg));
!   mode = GET_MODE (subreg);
  
    if (regno < FIRST_PSEUDO_REGISTER)
!     return;
  
!   if (bitmap_set_bit (subregs_of_mode,
!                     regno * NUM_MACHINE_MODES + (unsigned int) mode))
      {
!       unsigned int rclass;
!       for (rclass = 0; rclass < N_REG_CLASSES; rclass++)
!       if (!bitmap_bit_p (invalid_mode_changes,
!                          regno * N_REG_CLASSES + rclass)
!           && CANNOT_CHANGE_MODE_CLASS (PSEUDO_REGNO_MODE (regno),
!                                        mode, (enum reg_class) rclass))
!         bitmap_set_bit (invalid_mode_changes,
!                         regno * N_REG_CLASSES + rclass);
      }
- }
  
! /* Call record_subregs_of_mode for all the subregs in X.  */
! static void
! find_subregs_of_mode (rtx x, bitmap subregs_of_mode)
! {
!   enum rtx_code code = GET_CODE (x);
!   const char * const fmt = GET_RTX_FORMAT (code);
!   int i;
  
!   if (code == SUBREG)
!     record_subregs_of_mode (x, subregs_of_mode);
  
!   /* Time for some deep diving.  */
!   for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
!     {
!       if (fmt[i] == 'e')
!       find_subregs_of_mode (XEXP (x, i), subregs_of_mode);
!       else if (fmt[i] == 'E')
!       {
!         int j;
!         for (j = XVECLEN (x, i) - 1; j >= 0; j--)
!           find_subregs_of_mode (XVECEXP (x, i, j), subregs_of_mode);
!       }
!     }
  }
  
  void
  init_subregs_of_mode (void)
  {
    basic_block bb;
    rtx insn;
-   bitmap_obstack srom_obstack;
-   bitmap subregs_of_mode;
  
!   gcc_assert (invalid_mode_changes == NULL);
!   invalid_mode_changes = BITMAP_ALLOC (NULL);
!   bitmap_obstack_initialize (&srom_obstack);
!   subregs_of_mode = BITMAP_ALLOC (&srom_obstack);
  
    FOR_EACH_BB_FN (bb, cfun)
      FOR_BB_INSNS (bb, insn)
        if (NONDEBUG_INSN_P (insn))
!         find_subregs_of_mode (PATTERN (insn), subregs_of_mode);
! 
!   BITMAP_FREE (subregs_of_mode);
!   bitmap_obstack_release (&srom_obstack);
  }
  
! /* Return 1 if REGNO has had an invalid mode change in CLASS from FROM
!    mode.  */
  bool
  invalid_mode_change_p (unsigned int regno,
                       enum reg_class rclass)
  {
!   return bitmap_bit_p (invalid_mode_changes,
!                      regno * N_REG_CLASSES + (unsigned) rclass);
  }
  
  void
  finish_subregs_of_mode (void)
  {
!   BITMAP_FREE (invalid_mode_changes);
! }
! #else
! void
! init_subregs_of_mode (void)
! {
! }
! void
! finish_subregs_of_mode (void)
! {
  }
- 
- #endif /* CANNOT_CHANGE_MODE_CLASS */
--- 1191,1337 ----
  }
  
  
+ /* Information about one subreg of a pseudo register.  */
+ struct pseudo_subreg {
+   /* The next subreg seen for this pseudo register.  */
+   struct pseudo_subreg *next;
  
!   /* The mode of the subreg.  */
!   enum machine_mode mode;
  
!   /* The byte offset of the subreg from the start of the pseudo register.  */
!   unsigned int offset;
! };
  
! /* Information about all subregs of a pseudo register.  */
! struct pseudo_subreg_list {
!   /* The pseudo register that this structure describes.  */
!   unsigned int regno;
  
!   /* A list of the individual subregs seen.  */
!   struct pseudo_subreg *subregs;
! 
!   /* The set of all hard registers that are invalid because of subregs
!      listed in SUBREGS.  */
!   HARD_REG_SET invalid_regs;
! };
! 
! /* Maps pseudo register numbers to the associated entry of 
pseudo_subreg_lists.
!    Entries are initially undefined rather than zeroed; entry X is valid if:
! 
!       pseudo_subreg_index[X] < pseudo_subreg_lists.length ()
!       && pseudo_subreg_lists[pseudo_subreg_index[X]]->regno == X.  */
! static unsigned int *pseudo_subreg_index;
! 
! /* A list of all pseudo registers that have been seen in a subreg.  */
! static vec<struct pseudo_subreg_list *> pseudo_subreg_lists;
! 
! /* An obstack used to allocate the above.  */
! static struct obstack pseudo_subreg_obstack;
! 
! /* A for_each_rtx callback.  Record all subregs of pseudo registers.  */
! 
! static int
! record_subregs_of_mode (rtx *loc, void *)
  {
!   rtx subreg = *loc;
!   if (GET_CODE (subreg) != SUBREG)
!     return 0;
  
    if (!REG_P (SUBREG_REG (subreg)))
!     return -1;
  
+   unsigned int regno = REGNO (SUBREG_REG (subreg));
    if (regno < FIRST_PSEUDO_REGISTER)
!     return -1;
  
!   /* If we haven't seem this pseudo before, create an entry for it.  */
!   unsigned int index = pseudo_subreg_index[regno];
!   if (index >= pseudo_subreg_lists.length ()
!       || pseudo_subreg_lists[index]->regno != regno)
      {
!       struct pseudo_subreg_list *list = XOBNEW (&pseudo_subreg_obstack,
!                                               struct pseudo_subreg_list);
!       list->subregs = 0;
!       CLEAR_HARD_REG_SET (list->invalid_regs);
!       list->regno = regno;
! 
!       index = pseudo_subreg_lists.length ();
!       pseudo_subreg_lists.safe_push (list);
!       pseudo_subreg_index[regno] = index;
      }
  
!   /* Quick return if we've already seen a subreg with the same mode
!      and offset.  In general the number of distinct subregs of a particular
!      pseudo register should be small.  */
!   struct pseudo_subreg_list *list = pseudo_subreg_lists[index];
!   enum machine_mode outer_mode = GET_MODE (subreg);
!   unsigned int offset = SUBREG_BYTE (subreg);
!   for (struct pseudo_subreg *ps = list->subregs; ps; ps = ps->next)
!     if (ps->mode == outer_mode && ps->offset == offset)
!       return -1;
  
!   /* Record this mode and offset.  */
!   struct pseudo_subreg *ps = XOBNEW (&pseudo_subreg_obstack,
!                                    struct pseudo_subreg);
!   ps->mode = outer_mode;
!   ps->offset = offset;
!   ps->next = list->subregs;
!   list->subregs = ps;
  
!   /* Record a hard register as invalid if it wouldn't be possible for
!      it to form the subreg.  */
!   enum machine_mode inner_mode = GET_MODE (SUBREG_REG (subreg));
!   for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
!     if (!TEST_HARD_REG_BIT (list->invalid_regs, i)
!       && (!HARD_REGNO_MODE_OK (i, inner_mode)
!           || simplify_subreg_regno (i, inner_mode, offset, outer_mode) < 0))
!       SET_HARD_REG_BIT (list->invalid_regs, i);
! 
!   return -1;
  }
  
+ /* Scan the function and record any information needed by
+    invalid_mode_change_p.  */
+ 
  void
  init_subregs_of_mode (void)
  {
    basic_block bb;
    rtx insn;
  
!   gcc_assert (!pseudo_subreg_index);
!   unsigned int num_regs = max_reg_num ();
!   pseudo_subreg_index = XNEWVEC (unsigned int, num_regs);
! 
!   gcc_obstack_init (&pseudo_subreg_obstack);
  
    FOR_EACH_BB_FN (bb, cfun)
      FOR_BB_INSNS (bb, insn)
        if (NONDEBUG_INSN_P (insn))
!       for_each_rtx (&PATTERN (insn), record_subregs_of_mode, 0);
  }
  
! /* Return true if REGNO is involved in a subreg that prevents it from using
!    class RCLASS.  */
! 
  bool
  invalid_mode_change_p (unsigned int regno,
                       enum reg_class rclass)
  {
!   unsigned int index = pseudo_subreg_index[regno];
!   return (index < pseudo_subreg_lists.length ()
!         && pseudo_subreg_lists[index]->regno == regno
!         && hard_reg_set_subset_p (reg_class_contents[rclass],
!                                   pseudo_subreg_lists[index]->invalid_regs));
  }
  
+ /* Release the structures created by init_subregs_of_mode.  */
+ 
  void
  finish_subregs_of_mode (void)
  {
!   free (pseudo_subreg_index);
!   pseudo_subreg_index = 0;
!   pseudo_subreg_lists.release ();
  }
*** /tmp/3iptsc_rtl.h   2014-03-11 07:34:38.011189919 +0000
--- gcc/rtl.h   2014-03-09 10:32:38.412143704 +0000
*************** struct subreg_info
*** 2128,2137 ****
  {
    /* Offset of first hard register involved in the subreg.  */
    int offset;
!   /* Number of hard registers involved in the subreg.  */
    int nregs;
    /* Whether this subreg can be represented as a hard reg with the new
!      mode.  */
    bool representable_p;
  };
  
--- 2128,2140 ----
  {
    /* Offset of first hard register involved in the subreg.  */
    int offset;
!   /* Number of hard registers involved in the subreg.  In the case of
!      a paradoxical subreg, this is the number of registers that would
!      be modified by writing to the subreg; some of them may be don't-care
!      when reading from the subreg.  */
    int nregs;
    /* Whether this subreg can be represented as a hard reg with the new
!      mode (by adding OFFSET to the original hard register).  */
    bool representable_p;
  };
  
*** /tmp/jHXMdb_rtlanal.c       2014-03-11 07:34:38.026197136 +0000
--- gcc/rtlanal.c       2014-03-09 10:33:39.645693024 +0000
*************** subreg_lsb (const_rtx x)
*** 3321,3327 ****
     xmode  - The mode of xregno.
     offset - The byte offset.
     ymode  - The mode of a top level SUBREG (or what may become one).
!    info   - Pointer to structure to fill in.  */
  void
  subreg_get_info (unsigned int xregno, enum machine_mode xmode,
                 unsigned int offset, enum machine_mode ymode,
--- 3321,3340 ----
     xmode  - The mode of xregno.
     offset - The byte offset.
     ymode  - The mode of a top level SUBREG (or what may become one).
!    info   - Pointer to structure to fill in.
! 
!    Rather than considering one particular inner register (and thus one
!    particular "outer" register) in isolation, this function really uses
!    XREGNO as a model for a sequence of isomorphic hard registers.  Thus the
!    function does not check whether adding INFO->offset to XREGNO gives
!    a valid hard register; even if INFO->offset + XREGNO is out of range,
!    there might be another register of the same type that is in range.
!    Likewise it doesn't check whether HARD_REGNO_MODE_OK accepts the new
!    register, since that can depend on things like whether the final
!    register number is even or odd.  Callers that want to check whether
!    this particular subreg can be replaced by a simple (reg ...) should
!    use simplify_subreg_regno.  */
! 
  void
  subreg_get_info (unsigned int xregno, enum machine_mode xmode,
                 unsigned int offset, enum machine_mode ymode,

Reply via email to