Re: [PATCH v3 01/10] Initial TI PRU GCC port

2018-08-29 Thread Dimitar Dimitrov
On Thursday, 08/23/2018. 8:09:05 EEST Dimitar Dimitrov wrote:
> > > +/* Emit function epilogue.  */
> > > +void
> > > +pru_expand_epilogue (bool sibcall_p)
> > > +{
> > > +  rtx cfa_adj;
> > > +  int total_frame_size;
> > > +  int sp_adjust, save_offset;
> > > +  int regno_start;
> > > +
> > > +  if (!sibcall_p && pru_can_use_return_insn ())
> > > +{
> > > +  emit_jump_insn (gen_return ());
> > > +  return;
> > > +}
> > > +
> > > +  emit_insn (gen_blockage ());
> > > +
> > > +  total_frame_size = pru_compute_frame_layout ();
> > > +  if (frame_pointer_needed)
> > > +{
> > > +  /* Recover the stack pointer.  */
> > > +  cfa_adj = plus_constant (Pmode, stack_pointer_rtx,
> > > +  (total_frame_size
> > > +   - cfun->machine->save_regs_offset));
> > > +  pru_add3_frame_adjust (stack_pointer_rtx,
> > > +hard_frame_pointer_rtx,
> > > +-cfun->machine->fp_save_offset,
> > > +REG_CFA_DEF_CFA,
> > > +cfa_adj);
> > > +
> > > +  save_offset = 0;
> > > +  sp_adjust = total_frame_size - cfun->machine->save_regs_offset;
> > > +}
> > > +  else if (!UBYTE_INT (total_frame_size))
> > > +{
> > > +  pru_add_to_sp (cfun->machine->save_regs_offset,
> > > +   REG_CFA_ADJUST_CFA);
> > > +  save_offset = 0;
> > > +  sp_adjust = total_frame_size - cfun->machine->save_regs_offset;
> > > +}
> > > +  else
> > > +{
> > > +  save_offset = cfun->machine->save_regs_offset;
> > > +  sp_adjust = total_frame_size;
> > > +}
> > > +
> > > +  regno_start = 0;
> > > +  do {
> > > +  regno_start = xbbo_next_reg_cluster (regno_start, _offset,
> > > false); +  } while (regno_start >= 0);
> > > +
> > > +  if (sp_adjust)
> > > +  pru_add_to_sp (sp_adjust, REG_CFA_ADJUST_CFA);
> > > +
> > > +  if (!sibcall_p)
> > > +emit_jump_insn (gen_simple_return ());
> > 
> > Note you probably need a blockage before the restore of the stack
> > pointer to avoid some really nasty problems with the scheduler
> > reordering things such that the current frame gets deallocated before
> > register restores happen.  It usually doesn't cause a problem, but if an
> > interrupt occurs between the de-allocation and register restore, then
> > all bets are off.
> 
> Thank you. I will fix it.

I have attached a fixup patch for this particular blockage issue, in case you 
want to apply it on top of the v3 patch series. If that is not convenient, I 
will wait a few more days for comments, and then I will send v4.

Thanks,
Dimitar
>From 3feed8e7d19db555af0665dec24233a97d9a634c Mon Sep 17 00:00:00 2001
From: Dimitar Dimitrov 
Date: Thu, 23 Aug 2018 08:14:13 +0300
Subject: [PATCH] Add blockage in epilogue before SP restore.

Signed-off-by: Dimitar Dimitrov 
---
 gcc/config/pru/pru.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/pru/pru.c b/gcc/config/pru/pru.c
index de72d22278e..f1ce8bfeddd 100644
--- a/gcc/config/pru/pru.c
+++ b/gcc/config/pru/pru.c
@@ -389,6 +389,13 @@ pru_expand_epilogue (bool sibcall_p)
   regno_start = xbbo_next_reg_cluster (regno_start, _offset, false);
   } while (regno_start >= 0);
 
+  /* Emit a blockage insn here to keep these insns from being moved to
+ an earlier spot in the epilogue.
+
+ This is necessary as we must not cut the stack back before all the
+ restores are finished.  */
+  emit_insn (gen_blockage ());
+
   if (sp_adjust)
   pru_add_to_sp (sp_adjust, REG_CFA_ADJUST_CFA);
 
-- 
2.11.0



Re: [PATCH v3 01/10] Initial TI PRU GCC port

2018-08-22 Thread Dimitar Dimitrov
On Monday, 8/20/2018 11:12:23 EEST Jeff Law wrote:
> On 08/15/2018 10:49 PM, Dimitar Dimitrov wrote:
> > ChangeLog:
> > 
> > 2018-07-27  Dimitar Dimitrov  
> > 
> > * configure: Regenerate.
> > * configure.ac: Add PRU target.
> > 
> > gcc/ChangeLog:
> > 
> > 2018-07-27  Dimitar Dimitrov  
> > 
> > * common/config/pru/pru-common.c: New file.
> > * config.gcc: Add PRU target.
> > * config/pru/alu-zext.md: New file.
> > * config/pru/constraints.md: New file.
> > * config/pru/predicates.md: New file.
> > * config/pru/pru-opts.h: New file.
> > * config/pru/pru-passes.c: New file.
> > * config/pru/pru-pragma.c: New file.
> > * config/pru/pru-protos.h: New file.
> > * config/pru/pru.c: New file.
> > * config/pru/pru.h: New file.
> > * config/pru/pru.md: New file.
> > * config/pru/pru.opt: New file.
> > * config/pru/t-pru: New file.
> > * doc/extend.texi: Document PRU pragmas.
> > * doc/invoke.texi: Document PRU-specific options.
> > * doc/md.texi: Document PRU asm constraints.
> > 
> > Signed-off-by: Dimitar Dimitrov 
> > 
> > diff --git a/gcc/config/pru/alu-zext.md b/gcc/config/pru/alu-zext.md
> > new file mode 100644
> > index 000..2112b08d3f4
> > --- /dev/null
> > +++ b/gcc/config/pru/alu-zext.md
> 
> [ ... ]
> I wonder if the zero-extensions in alu-zext.md are really needed.  Can
> we define WORD_REGISTER_OPERATIONS for PRU?  Or do you really want to
> expose sub-word operations?  I simply don't know enough about the
> hardware to provide guidance here.  But it might allow for some
> simplification of the port.  You wouldn't want to define sub-word
> patterns such as addqi/addhi anymore either.
The QI/HI ALU patterns are needed for efficient code generation and ABI 
compliance.

In the PRU GCC port, UNITS_PER_WORD=1. Hence addqi is a full word size 
pattern, and WORD_REGISTER_OPERATIONS should not matter.

Note that in PRU architecture, ALU operands can be either 8, 16 or 32 bits. 
All inputs are zero-extended to 32 bits, and outputs are truncated as needed. 
Here is a relevant discussion: http://gcc.gnu.org/ml/gcc/2017-01/msg00217.html

For example, the following C snippet:
  uint16_t test(uint8_t a, uint16_t b)
  {
return a + b;
  }


Produces the following RTL and efficient assembler output:

  (insn 13 8 14 2 (set (reg/i:HI 56 r14.b0)
  (plus:HI (zero_extend:HI (reg:QI 56 r14.b0 [ a ]))
 (reg:HI 57 r14.b1 [ b ]))) "test.c":407 76 )


  test:
# Note: Function arguments are passed in sub-hw-registers
add r14.w0, r14.b0, r14.w1
ret

> 
> > diff --git a/gcc/config/pru/pru.c b/gcc/config/pru/pru.c
> > new file mode 100644
> > index 000..de72d22278e
> > --- /dev/null
> > +++ b/gcc/config/pru/pru.c
> > @@ -0,0 +1,3001 @@
> > 
> > +/* Emit efficient RTL equivalent of ADD3 with the given const_int for
> > +   frame-related registers.
> > + op0 - Destination register.
> > + op1 - First addendum operand (a register).
> > + addendum - Second addendum operand (a constant).
> > + kind- Note kind.  REG_NOTE_MAX if no note must be added.
> > + reg_note_rtx - Reg note RTX.  NULL if it should be computed
> > automatically. + */
> > +static rtx
> > +pru_add3_frame_adjust (rtx op0, rtx op1, int addendum,
> > +  const enum reg_note kind, rtx reg_note_rtx)
> > +{
> > +  rtx insn;
> > +
> > +  rtx op0_adjust = gen_rtx_SET (op0, plus_constant (Pmode, op1,
> > addendum)); +
> > +  if (UBYTE_INT (addendum) || UBYTE_INT (-addendum))
> > +insn = emit_insn (op0_adjust);
> > +  else
> > +{
> > +  /* Help the compiler to cope with an arbitrary integer constant.
> > +Reload has finished so we can't expect the compiler to
> > +auto-allocate a temporary register.  But we know that call-saved
> > +registers are not live yet, so we utilize them.  */
> 
> Note I think this can run afoul of IPA-RA.  THough it looks like you're
> exposing this in RTL, so you're probably safe.
Grepping for PROLOGUE_TEMP_REG, it appears that MIPS and Risc-V use the same 
technique to get a temporary register for prologue/epilougue.

> 
> > +/* Emit function epilogue.  */
> > +void
> > +pru_expand_epilogue (bool sibcall_p)
> > +{
> > +  rtx cfa_adj;
> > +  int total_frame_size;
> > +  int sp_adjust, save_offset;
> > +  int regno_start;
> > +
> > +  if (!sibcall_p && pru_can_use_return_insn ())
> > +{
> > +  emit_jump_insn (gen_return ());
> > +  return;
> > +}
> > +
> > +  emit_insn (gen_blockage ());
> > +
> > +  total_frame_size = pru_compute_frame_layout ();
> > +  if (frame_pointer_needed)
> > +{
> > +  /* Recover the stack pointer.  */
> > +  cfa_adj = plus_constant (Pmode, stack_pointer_rtx,
> > +  (total_frame_size
> > +   - cfun->machine->save_regs_offset));
> > +  pru_add3_frame_adjust (stack_pointer_rtx,
> > +hard_frame_pointer_rtx,
> > + 

Re: [PATCH v3 01/10] Initial TI PRU GCC port

2018-08-20 Thread Jeff Law
On 08/15/2018 10:49 PM, Dimitar Dimitrov wrote:
> ChangeLog:
> 
> 2018-07-27  Dimitar Dimitrov  
> 
>   * configure: Regenerate.
>   * configure.ac: Add PRU target.
> 
> gcc/ChangeLog:
> 
> 2018-07-27  Dimitar Dimitrov  
> 
>   * common/config/pru/pru-common.c: New file.
>   * config.gcc: Add PRU target.
>   * config/pru/alu-zext.md: New file.
>   * config/pru/constraints.md: New file.
>   * config/pru/predicates.md: New file.
>   * config/pru/pru-opts.h: New file.
>   * config/pru/pru-passes.c: New file.
>   * config/pru/pru-pragma.c: New file.
>   * config/pru/pru-protos.h: New file.
>   * config/pru/pru.c: New file.
>   * config/pru/pru.h: New file.
>   * config/pru/pru.md: New file.
>   * config/pru/pru.opt: New file.
>   * config/pru/t-pru: New file.
>   * doc/extend.texi: Document PRU pragmas.
>   * doc/invoke.texi: Document PRU-specific options.
>   * doc/md.texi: Document PRU asm constraints.

> 
> Signed-off-by: Dimitar Dimitrov 
> 
> diff --git a/gcc/config/pru/alu-zext.md b/gcc/config/pru/alu-zext.md
> new file mode 100644
> index 000..2112b08d3f4
> --- /dev/null
> +++ b/gcc/config/pru/alu-zext.md
[ ... ]
I wonder if the zero-extensions in alu-zext.md are really needed.  Can
we define WORD_REGISTER_OPERATIONS for PRU?  Or do you really want to
expose sub-word operations?  I simply don't know enough about the
hardware to provide guidance here.  But it might allow for some
simplification of the port.  You wouldn't want to define sub-word
patterns such as addqi/addhi anymore either.








> diff --git a/gcc/config/pru/pru.c b/gcc/config/pru/pru.c
> new file mode 100644
> index 000..de72d22278e
> --- /dev/null
> +++ b/gcc/config/pru/pru.c
> @@ -0,0 +1,3001 @@

> +/* Emit efficient RTL equivalent of ADD3 with the given const_int for
> +   frame-related registers.
> + op0   - Destination register.
> + op1   - First addendum operand (a register).
> + addendum - Second addendum operand (a constant).
> + kind  - Note kind.  REG_NOTE_MAX if no note must be added.
> + reg_note_rtx - Reg note RTX.  NULL if it should be computed 
> automatically.
> + */
> +static rtx
> +pru_add3_frame_adjust (rtx op0, rtx op1, int addendum,
> +const enum reg_note kind, rtx reg_note_rtx)
> +{
> +  rtx insn;
> +
> +  rtx op0_adjust = gen_rtx_SET (op0, plus_constant (Pmode, op1, addendum));
> +
> +  if (UBYTE_INT (addendum) || UBYTE_INT (-addendum))
> +insn = emit_insn (op0_adjust);
> +  else
> +{
> +  /* Help the compiler to cope with an arbitrary integer constant.
> +  Reload has finished so we can't expect the compiler to
> +  auto-allocate a temporary register.  But we know that call-saved
> +  registers are not live yet, so we utilize them.  */
Note I think this can run afoul of IPA-RA.  THough it looks like you're
exposing this in RTL, so you're probably safe.

> +/* Emit function epilogue.  */
> +void
> +pru_expand_epilogue (bool sibcall_p)
> +{
> +  rtx cfa_adj;
> +  int total_frame_size;
> +  int sp_adjust, save_offset;
> +  int regno_start;
> +
> +  if (!sibcall_p && pru_can_use_return_insn ())
> +{
> +  emit_jump_insn (gen_return ());
> +  return;
> +}
> +
> +  emit_insn (gen_blockage ());
> +
> +  total_frame_size = pru_compute_frame_layout ();
> +  if (frame_pointer_needed)
> +{
> +  /* Recover the stack pointer.  */
> +  cfa_adj = plus_constant (Pmode, stack_pointer_rtx,
> +(total_frame_size
> + - cfun->machine->save_regs_offset));
> +  pru_add3_frame_adjust (stack_pointer_rtx,
> +  hard_frame_pointer_rtx,
> +  -cfun->machine->fp_save_offset,
> +  REG_CFA_DEF_CFA,
> +  cfa_adj);
> +
> +  save_offset = 0;
> +  sp_adjust = total_frame_size - cfun->machine->save_regs_offset;
> +}
> +  else if (!UBYTE_INT (total_frame_size))
> +{
> +  pru_add_to_sp (cfun->machine->save_regs_offset,
> + REG_CFA_ADJUST_CFA);
> +  save_offset = 0;
> +  sp_adjust = total_frame_size - cfun->machine->save_regs_offset;
> +}
> +  else
> +{
> +  save_offset = cfun->machine->save_regs_offset;
> +  sp_adjust = total_frame_size;
> +}
> +
> +  regno_start = 0;
> +  do {
> +  regno_start = xbbo_next_reg_cluster (regno_start, _offset, false);
> +  } while (regno_start >= 0);
> +
> +  if (sp_adjust)
> +  pru_add_to_sp (sp_adjust, REG_CFA_ADJUST_CFA);
> +
> +  if (!sibcall_p)
> +emit_jump_insn (gen_simple_return ());
Note you probably need a blockage before the restore of the stack
pointer to avoid some really nasty problems with the scheduler
reordering things such that the current frame gets deallocated before
register restores happen.  It usually doesn't cause a problem, but if an
interrupt occurs between the