Re: [ARC PATCH] Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.

Jeff Law Mon, 30 Oct 2023 07:55:29 -0700



On 10/29/23 03:16, Roger Sayle wrote:


This patch overhauls the ARC backend's insn_cost target hook, and makes
some related improvements to rtx_costs, BRANCH_COST, etc.  The primary
goal is to allow the backend to indicate that shifts and rotates are
slow (discouraged) when the CPU doesn't have a barrel shifter. I should
also acknowledge Richard Sandiford for inspiring the use of set_cost
in this rewrite of arc_insn_cost; this implementation borrows heavily
for the target hooks for AArch64 and ARM.

The motivating example is derived from PR rtl-optimization/110717.

struct S { int a : 5; };
unsigned int foo (struct S *p) {
   return p->a;
}

With a barrel shifter, GCC -O2 generates the reasonable:

foo:    ldb_s   r0,[r0]
         asl_s   r0,r0,27
         j_s.d   [blink]
         asr_s   r0,r0,27

What's interesting is that during combine, the middle-end actually
has two shifts by three bits, and a sign-extension from QI to SI.

Trying 8, 9 -> 11:
     8: r158:SI=r157:QI#0<<0x3
       REG_DEAD r157:QI
     9: r159:SI=sign_extend(r158:SI#0)
       REG_DEAD r158:SI
    11: r155:SI=r159:SI>>0x3
       REG_DEAD r159:SI

Whilst it's reasonable to simplify this to two shifts by 27 bits when
the CPU has a barrel shifter, it's actually a significant pessimization
when these shifts are implemented by loops.  This combination can be
prevented if the backend provides accurate-ish estimates for insn_cost.

Same scenario on the H8, though we already had the cost issues undercontrol. byte load (effectively shift by 24), 3 bit shifts and extension.


Jeff

Re: [ARC PATCH] Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.

Reply via email to