Re: [PATCH v3 01/10] Initial TI PRU GCC port

Dimitar Dimitrov Wed, 22 Aug 2018 22:09:46 -0700

On Monday, 8/20/2018 11:12:23 EEST Jeff Law wrote:
> On 08/15/2018 10:49 PM, Dimitar Dimitrov wrote:
> > ChangeLog:
> > 
> > 2018-07-27  Dimitar Dimitrov  <dimi...@dinux.eu>
> > 
> >     * configure: Regenerate.
> >     * configure.ac: Add PRU target.
> > 
> > gcc/ChangeLog:
> > 
> > 2018-07-27  Dimitar Dimitrov  <dimi...@dinux.eu>
> > 
> >     * common/config/pru/pru-common.c: New file.
> >     * config.gcc: Add PRU target.
> >     * config/pru/alu-zext.md: New file.
> >     * config/pru/constraints.md: New file.
> >     * config/pru/predicates.md: New file.
> >     * config/pru/pru-opts.h: New file.
> >     * config/pru/pru-passes.c: New file.
> >     * config/pru/pru-pragma.c: New file.
> >     * config/pru/pru-protos.h: New file.
> >     * config/pru/pru.c: New file.
> >     * config/pru/pru.h: New file.
> >     * config/pru/pru.md: New file.
> >     * config/pru/pru.opt: New file.
> >     * config/pru/t-pru: New file.
> >     * doc/extend.texi: Document PRU pragmas.
> >     * doc/invoke.texi: Document PRU-specific options.
> >     * doc/md.texi: Document PRU asm constraints.
> > 
> > Signed-off-by: Dimitar Dimitrov <dimi...@dinux.eu>
> > 
> > diff --git a/gcc/config/pru/alu-zext.md b/gcc/config/pru/alu-zext.md
> > new file mode 100644
> > index 00000000000..2112b08d3f4
> > --- /dev/null
> > +++ b/gcc/config/pru/alu-zext.md
> 
> [ ... ]
> I wonder if the zero-extensions in alu-zext.md are really needed.  Can
> we define WORD_REGISTER_OPERATIONS for PRU?  Or do you really want to
> expose sub-word operations?  I simply don't know enough about the
> hardware to provide guidance here.  But it might allow for some
> simplification of the port.  You wouldn't want to define sub-word
> patterns such as addqi/addhi anymore either.
The QI/HI ALU patterns are needed for efficient code generation and ABI 
compliance.


In the PRU GCC port, UNITS_PER_WORD=1. Hence addqi is a full word size 
pattern, and WORD_REGISTER_OPERATIONS should not matter.

Note that in PRU architecture, ALU operands can be either 8, 16 or 32 bits. 
All inputs are zero-extended to 32 bits, and outputs are truncated as needed. 
Here is a relevant discussion: http://gcc.gnu.org/ml/gcc/2017-01/msg00217.html

For example, the following C snippet:
  uint16_t test(uint8_t a, uint16_t b)
  {
    return a + b;
  }


Produces the following RTL and efficient assembler output:

  (insn 13 8 14 2 (set (reg/i:HI 56 r14.b0)
      (plus:HI (zero_extend:HI (reg:QI 56 r14.b0 [ a ]))
         (reg:HI 57 r14.b1 [ b ]))) "test.c":407 76 )


  test:
        # Note: Function arguments are passed in sub-hw-registers
        add     r14.w0, r14.b0, r14.w1
        ret

> 
> > diff --git a/gcc/config/pru/pru.c b/gcc/config/pru/pru.c
> > new file mode 100644
> > index 00000000000..de72d22278e
> > --- /dev/null
> > +++ b/gcc/config/pru/pru.c
> > @@ -0,0 +1,3001 @@
> > 
> > +/* Emit efficient RTL equivalent of ADD3 with the given const_int for
> > +   frame-related registers.
> > +     op0     - Destination register.
> > +     op1     - First addendum operand (a register).
> > +     addendum     - Second addendum operand (a constant).
> > +     kind    - Note kind.  REG_NOTE_MAX if no note must be added.
> > +     reg_note_rtx - Reg note RTX.  NULL if it should be computed
> > automatically. + */
> > +static rtx
> > +pru_add3_frame_adjust (rtx op0, rtx op1, int addendum,
> > +                  const enum reg_note kind, rtx reg_note_rtx)
> > +{
> > +  rtx insn;
> > +
> > +  rtx op0_adjust = gen_rtx_SET (op0, plus_constant (Pmode, op1,
> > addendum)); +
> > +  if (UBYTE_INT (addendum) || UBYTE_INT (-addendum))
> > +    insn = emit_insn (op0_adjust);
> > +  else
> > +    {
> > +      /* Help the compiler to cope with an arbitrary integer constant.
> > +    Reload has finished so we can't expect the compiler to
> > +    auto-allocate a temporary register.  But we know that call-saved
> > +    registers are not live yet, so we utilize them.  */
> 
> Note I think this can run afoul of IPA-RA.  THough it looks like you're
> exposing this in RTL, so you're probably safe.
Grepping for PROLOGUE_TEMP_REG, it appears that MIPS and Risc-V use the same 
technique to get a temporary register for prologue/epilougue.

> 
> > +/* Emit function epilogue.  */
> > +void
> > +pru_expand_epilogue (bool sibcall_p)
> > +{
> > +  rtx cfa_adj;
> > +  int total_frame_size;
> > +  int sp_adjust, save_offset;
> > +  int regno_start;
> > +
> > +  if (!sibcall_p && pru_can_use_return_insn ())
> > +    {
> > +      emit_jump_insn (gen_return ());
> > +      return;
> > +    }
> > +
> > +  emit_insn (gen_blockage ());
> > +
> > +  total_frame_size = pru_compute_frame_layout ();
> > +  if (frame_pointer_needed)
> > +    {
> > +      /* Recover the stack pointer.  */
> > +      cfa_adj = plus_constant (Pmode, stack_pointer_rtx,
> > +                          (total_frame_size
> > +                           - cfun->machine->save_regs_offset));
> > +      pru_add3_frame_adjust (stack_pointer_rtx,
> > +                        hard_frame_pointer_rtx,
> > +                        -cfun->machine->fp_save_offset,
> > +                        REG_CFA_DEF_CFA,
> > +                        cfa_adj);
> > +
> > +      save_offset = 0;
> > +      sp_adjust = total_frame_size - cfun->machine->save_regs_offset;
> > +    }
> > +  else if (!UBYTE_INT (total_frame_size))
> > +    {
> > +      pru_add_to_sp (cfun->machine->save_regs_offset,
> > +                       REG_CFA_ADJUST_CFA);
> > +      save_offset = 0;
> > +      sp_adjust = total_frame_size - cfun->machine->save_regs_offset;
> > +    }
> > +  else
> > +    {
> > +      save_offset = cfun->machine->save_regs_offset;
> > +      sp_adjust = total_frame_size;
> > +    }
> > +
> > +  regno_start = 0;
> > +  do {
> > +      regno_start = xbbo_next_reg_cluster (regno_start, &save_offset,
> > false); +  } while (regno_start >= 0);
> > +
> > +  if (sp_adjust)
> > +      pru_add_to_sp (sp_adjust, REG_CFA_ADJUST_CFA);
> > +
> > +  if (!sibcall_p)
> > +    emit_jump_insn (gen_simple_return ());
> 
> Note you probably need a blockage before the restore of the stack
> pointer to avoid some really nasty problems with the scheduler
> reordering things such that the current frame gets deallocated before
> register restores happen.  It usually doesn't cause a problem, but if an
> interrupt occurs between the de-allocation and register restore, then
> all bets are off.

Thank you. I will fix it.

> 
> > diff --git a/gcc/config/pru/pru.h b/gcc/config/pru/pru.h
> > new file mode 100644
> > index 00000000000..de1b9100209
> > --- /dev/null
> > +++ b/gcc/config/pru/pru.h
> 
> [ ... ]
> 
> > +
> > +#ifdef IN_LIBGCC2
> > +/* This is to get correct SI and DI modes in libgcc2.c (32 and 64 bits). 
> > */ +#define UNITS_PER_WORD 4
> > +#else
> > +/* Width of a word, in units (bytes).  */
> > +#define UNITS_PER_WORD 1
> > +#endif
> 
> This looks odd. Can you explain a bit more what you're doing here?  I'm
> guessing you took it from AVR which is an 8 bit port.  I'm not sure it's
> the right thing to do for AVR either :-)
You are correct - I took it from avr. It is needed to support SItype and 
DItype in libgcc. See libgcc/libgcc2.h:126, considering that UNITS_PER_WORD=1:

  #if MIN_UNITS_PER_WORD > 1
  /* These typedefs are usually forbidden on dsp's with UNITS_PER_WORD 1.  */
  typedef        int SItype     __attribute__ ((mode (SI)));
  typedef                int DItype     __attribute__ ((mode (DI)));

Without that snippet, I get build errors:
  ../../../gcc/libgcc/libgcc2.h:411:8: error: unknown type name 'DItype'
  411 | extern DItype __bswapdi2 (DItype);


> 
> > diff --git a/gcc/config/pru/pru.md b/gcc/config/pru/pru.md
> > new file mode 100644
> > index 00000000000..4be1958e6c9
> > --- /dev/null
> > +++ b/gcc/config/pru/pru.md
> > @@ -0,0 +1,956 @@
> > +;; Machine Description for TI PRU.
> > +;; Copyright (C) 2014-2018 Free Software Foundation, Inc.
> > +;; Contributed by Dimitar Dimitrov <dimi...@dinux.eu>
> > +;; Based on the NIOS2 GCC port.
> > +;;
> > +;; This file is part of GCC.
> > +;;
> > +;; GCC is free software; you can redistribute it and/or modify
> > +;; it under the terms of the GNU General Public License as published by
> > +;; the Free Software Foundation; either version 3, or (at your option)
> > +;; any later version.
> > +;;
> > +;; GCC is distributed in the hope that it will be useful,
> > +;; but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +;; GNU General Public License for more details.
> > +;;
> > +;; You should have received a copy of the GNU General Public License
> > +;; along with GCC; see the file COPYING3.  If not see
> > +;; <http://www.gnu.org/licenses/>.
> > +
> > +
> > +
> > +;; Split loading of integer constants into a distinct pattern, in order
> > to
> > +;; prevent CSE from considering "SET (MEM, CONST_INT)" as a valid insn
> > +;; selection.  This fixes an abnormally long compile time exposed by
> > +;; gcc.dg/pr48141.c
> 
> Note that in general both reload and LRA require that the movXX patterns 
> be unified for any given mode.  Reload and LRA can change the operands
> without re-recognizing the pattern.  You might be getting away with not
> adhering to that requirement because you're just dealing with constant
> source operands (most of the problems I've run into through the years
> involve memory operands).  But be aware you may need to twiddle this and
> fold those patterns back into your movXX patterns.
Noted. I will try the folding after acceptance.

> 
> > +
> > +; I cannot think of any reason for the core to pass a 64-bit symbolic
> > +; constants.  Hence simplify the rule and handle only numeric constants.
> 
> That's fine.  THough you'll generally get better code generated if you
> tighten up the operand predicates rather than handling things in
> contraints.  So it may be worth investigating if you can tighten up the
> predicate for your movXX patterns.
> 
> And that advice applies in general.  It's usually better to have a
> predicate that is tight and use constraints to drive instruction
> selection.  A wide predicate and tight constraints means reload/LRA has
> to work harder and they're not particularly good at optimizing away
> redundancies they create.  Additionally on a target which defines
> instruction scheduling, you generally get a more accurate first schedule
> if the predicates are tight.
I don't think I can make the movdi predicate any more specific without 
splitting the pattern.

For example, "nonimmediate_operand" fits exacly the following constraints: 
"=m,r,r,r,r,r" .
  gensupport.c:  {"nonimmediate_operand", false, false, {SUBREG, REG, MEM}},

> 
> > +
> > +;; Sign extension patterns.  We have to emulate them due to lack of
> > +;; signed operations in PRU's ALU.
> > +
> > +(define_insn "extend<EQS0:mode><EQD:mode>2"
> > +  [(set (match_operand:EQD 0 "register_operand"                      "=r")
> > +   (sign_extend:EQD (match_operand:EQS0 1 "register_operand"  "r")))]
> > +  ""
> > +  {
> > +    return pru_output_sign_extend (operands);
> > +  }
> > +  [(set_attr "type" "complex")
> > +   (set_attr "length" "12")])
> 
> You might be better off expanding sign extension as a pair of shifts.
> Those can be optimized and combined with other nearby operations.  ISTM
> that's something you could experiment with once the port is accepted.
Unfortunately PRU has no native ashiftrt instruction. The currently 
implemented "compare->branch->bit-fill" is much faster than an emulated 
ashiftrt.

> 
> > +;; Arithmetic Operations
> > +
> > +(define_expand "add<mode>3"
> > +  [(set (match_operand:QISI 0 "register_operand"             "=r,r,r")
> > +   (plus:QISI (match_operand:QISI 1 "register_operand"   "%r,r,r")
> > +            (match_operand:QISI 2 "nonmemory_operand"  "r,I,M")))]
> > +  ""
> > +  ""
> > +  [(set_attr "type" "alu")])
> > +
> > +(define_insn "adddi3"
> > +  [(set (match_operand:DI 0 "register_operand"                 "=&r,&r,&r")
> > +   (plus:DI (match_operand:DI 1 "register_operand"     "%r,r,r")
> > +            (match_operand:DI 2 "reg_or_ubyte_operand" "r,I,M")))]
> > +  ""
> > +  "@
> > +   add\\t%F0, %F1, %F2\;adc\\t%N0, %N1, %N2
> > +   add\\t%F0, %F1, %2\;adc\\t%N0, %N1, 0
> > +   sub\\t%F0, %F1, %n2\;suc\\t%N0, %N1, 0"
> > +  [(set_attr "type" "alu,alu,alu")
> > +   (set_attr "length" "8,8,8")])
> 
> Note that if you expose the C bit and define an add/sub with carry insn
> the compiler will handle this kind of stuff for you automatically.
> That's something you can experiment with after acceptance as well.
I will explore this. Thanks for the pointer.

> 
> Overall it looks pretty good.  I think the blockage in the epilogue
> issue needs to be addressed.  The UNITS_PER_WORD concern needs to be
> answered.  The rest are things you might want to investigate, but aren't
> things that need to be changed to get approval.
> 
> I think Joseph had some concerns about diagnostics emitted by the PRU
> target code.  Whatever those where need to be addressed if they haven't
> already.
I have already addressed the diagnostics comments in this patch version.

Regards,
Dimitar

Re: [PATCH v3 01/10] Initial TI PRU GCC port

Reply via email to