Re: [PATCH v3 01/10] Initial TI PRU GCC port
On Thursday, 08/23/2018. 8:09:05 EEST Dimitar Dimitrov wrote: > > > +/* Emit function epilogue. */ > > > +void > > > +pru_expand_epilogue (bool sibcall_p) > > > +{ > > > + rtx cfa_adj; > > > + int total_frame_size; > > > + int sp_adjust, save_offset; > > > + int regno_start; > > > + > > > + if (!sibcall_p && pru_can_use_return_insn ()) > > > +{ > > > + emit_jump_insn (gen_return ()); > > > + return; > > > +} > > > + > > > + emit_insn (gen_blockage ()); > > > + > > > + total_frame_size = pru_compute_frame_layout (); > > > + if (frame_pointer_needed) > > > +{ > > > + /* Recover the stack pointer. */ > > > + cfa_adj = plus_constant (Pmode, stack_pointer_rtx, > > > + (total_frame_size > > > + - cfun->machine->save_regs_offset)); > > > + pru_add3_frame_adjust (stack_pointer_rtx, > > > +hard_frame_pointer_rtx, > > > +-cfun->machine->fp_save_offset, > > > +REG_CFA_DEF_CFA, > > > +cfa_adj); > > > + > > > + save_offset = 0; > > > + sp_adjust = total_frame_size - cfun->machine->save_regs_offset; > > > +} > > > + else if (!UBYTE_INT (total_frame_size)) > > > +{ > > > + pru_add_to_sp (cfun->machine->save_regs_offset, > > > + REG_CFA_ADJUST_CFA); > > > + save_offset = 0; > > > + sp_adjust = total_frame_size - cfun->machine->save_regs_offset; > > > +} > > > + else > > > +{ > > > + save_offset = cfun->machine->save_regs_offset; > > > + sp_adjust = total_frame_size; > > > +} > > > + > > > + regno_start = 0; > > > + do { > > > + regno_start = xbbo_next_reg_cluster (regno_start, _offset, > > > false); + } while (regno_start >= 0); > > > + > > > + if (sp_adjust) > > > + pru_add_to_sp (sp_adjust, REG_CFA_ADJUST_CFA); > > > + > > > + if (!sibcall_p) > > > +emit_jump_insn (gen_simple_return ()); > > > > Note you probably need a blockage before the restore of the stack > > pointer to avoid some really nasty problems with the scheduler > > reordering things such that the current frame gets deallocated before > > register restores happen. It usually doesn't cause a problem, but if an > > interrupt occurs between the de-allocation and register restore, then > > all bets are off. > > Thank you. I will fix it. I have attached a fixup patch for this particular blockage issue, in case you want to apply it on top of the v3 patch series. If that is not convenient, I will wait a few more days for comments, and then I will send v4. Thanks, Dimitar >From 3feed8e7d19db555af0665dec24233a97d9a634c Mon Sep 17 00:00:00 2001 From: Dimitar Dimitrov Date: Thu, 23 Aug 2018 08:14:13 +0300 Subject: [PATCH] Add blockage in epilogue before SP restore. Signed-off-by: Dimitar Dimitrov --- gcc/config/pru/pru.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/gcc/config/pru/pru.c b/gcc/config/pru/pru.c index de72d22278e..f1ce8bfeddd 100644 --- a/gcc/config/pru/pru.c +++ b/gcc/config/pru/pru.c @@ -389,6 +389,13 @@ pru_expand_epilogue (bool sibcall_p) regno_start = xbbo_next_reg_cluster (regno_start, _offset, false); } while (regno_start >= 0); + /* Emit a blockage insn here to keep these insns from being moved to + an earlier spot in the epilogue. + + This is necessary as we must not cut the stack back before all the + restores are finished. */ + emit_insn (gen_blockage ()); + if (sp_adjust) pru_add_to_sp (sp_adjust, REG_CFA_ADJUST_CFA); -- 2.11.0
Re: [PATCH v3 01/10] Initial TI PRU GCC port
On Monday, 8/20/2018 11:12:23 EEST Jeff Law wrote: > On 08/15/2018 10:49 PM, Dimitar Dimitrov wrote: > > ChangeLog: > > > > 2018-07-27 Dimitar Dimitrov > > > > * configure: Regenerate. > > * configure.ac: Add PRU target. > > > > gcc/ChangeLog: > > > > 2018-07-27 Dimitar Dimitrov > > > > * common/config/pru/pru-common.c: New file. > > * config.gcc: Add PRU target. > > * config/pru/alu-zext.md: New file. > > * config/pru/constraints.md: New file. > > * config/pru/predicates.md: New file. > > * config/pru/pru-opts.h: New file. > > * config/pru/pru-passes.c: New file. > > * config/pru/pru-pragma.c: New file. > > * config/pru/pru-protos.h: New file. > > * config/pru/pru.c: New file. > > * config/pru/pru.h: New file. > > * config/pru/pru.md: New file. > > * config/pru/pru.opt: New file. > > * config/pru/t-pru: New file. > > * doc/extend.texi: Document PRU pragmas. > > * doc/invoke.texi: Document PRU-specific options. > > * doc/md.texi: Document PRU asm constraints. > > > > Signed-off-by: Dimitar Dimitrov > > > > diff --git a/gcc/config/pru/alu-zext.md b/gcc/config/pru/alu-zext.md > > new file mode 100644 > > index 000..2112b08d3f4 > > --- /dev/null > > +++ b/gcc/config/pru/alu-zext.md > > [ ... ] > I wonder if the zero-extensions in alu-zext.md are really needed. Can > we define WORD_REGISTER_OPERATIONS for PRU? Or do you really want to > expose sub-word operations? I simply don't know enough about the > hardware to provide guidance here. But it might allow for some > simplification of the port. You wouldn't want to define sub-word > patterns such as addqi/addhi anymore either. The QI/HI ALU patterns are needed for efficient code generation and ABI compliance. In the PRU GCC port, UNITS_PER_WORD=1. Hence addqi is a full word size pattern, and WORD_REGISTER_OPERATIONS should not matter. Note that in PRU architecture, ALU operands can be either 8, 16 or 32 bits. All inputs are zero-extended to 32 bits, and outputs are truncated as needed. Here is a relevant discussion: http://gcc.gnu.org/ml/gcc/2017-01/msg00217.html For example, the following C snippet: uint16_t test(uint8_t a, uint16_t b) { return a + b; } Produces the following RTL and efficient assembler output: (insn 13 8 14 2 (set (reg/i:HI 56 r14.b0) (plus:HI (zero_extend:HI (reg:QI 56 r14.b0 [ a ])) (reg:HI 57 r14.b1 [ b ]))) "test.c":407 76 ) test: # Note: Function arguments are passed in sub-hw-registers add r14.w0, r14.b0, r14.w1 ret > > > diff --git a/gcc/config/pru/pru.c b/gcc/config/pru/pru.c > > new file mode 100644 > > index 000..de72d22278e > > --- /dev/null > > +++ b/gcc/config/pru/pru.c > > @@ -0,0 +1,3001 @@ > > > > +/* Emit efficient RTL equivalent of ADD3 with the given const_int for > > + frame-related registers. > > + op0 - Destination register. > > + op1 - First addendum operand (a register). > > + addendum - Second addendum operand (a constant). > > + kind- Note kind. REG_NOTE_MAX if no note must be added. > > + reg_note_rtx - Reg note RTX. NULL if it should be computed > > automatically. + */ > > +static rtx > > +pru_add3_frame_adjust (rtx op0, rtx op1, int addendum, > > + const enum reg_note kind, rtx reg_note_rtx) > > +{ > > + rtx insn; > > + > > + rtx op0_adjust = gen_rtx_SET (op0, plus_constant (Pmode, op1, > > addendum)); + > > + if (UBYTE_INT (addendum) || UBYTE_INT (-addendum)) > > +insn = emit_insn (op0_adjust); > > + else > > +{ > > + /* Help the compiler to cope with an arbitrary integer constant. > > +Reload has finished so we can't expect the compiler to > > +auto-allocate a temporary register. But we know that call-saved > > +registers are not live yet, so we utilize them. */ > > Note I think this can run afoul of IPA-RA. THough it looks like you're > exposing this in RTL, so you're probably safe. Grepping for PROLOGUE_TEMP_REG, it appears that MIPS and Risc-V use the same technique to get a temporary register for prologue/epilougue. > > > +/* Emit function epilogue. */ > > +void > > +pru_expand_epilogue (bool sibcall_p) > > +{ > > + rtx cfa_adj; > > + int total_frame_size; > > + int sp_adjust, save_offset; > > + int regno_start; > > + > > + if (!sibcall_p && pru_can_use_return_insn ()) > > +{ > > + emit_jump_insn (gen_return ()); > > + return; > > +} > > + > > + emit_insn (gen_blockage ()); > > + > > + total_frame_size = pru_compute_frame_layout (); > > + if (frame_pointer_needed) > > +{ > > + /* Recover the stack pointer. */ > > + cfa_adj = plus_constant (Pmode, stack_pointer_rtx, > > + (total_frame_size > > + - cfun->machine->save_regs_offset)); > > + pru_add3_frame_adjust (stack_pointer_rtx, > > +hard_frame_pointer_rtx, > > +
Re: [PATCH v3 01/10] Initial TI PRU GCC port
On 08/15/2018 10:49 PM, Dimitar Dimitrov wrote: > ChangeLog: > > 2018-07-27 Dimitar Dimitrov > > * configure: Regenerate. > * configure.ac: Add PRU target. > > gcc/ChangeLog: > > 2018-07-27 Dimitar Dimitrov > > * common/config/pru/pru-common.c: New file. > * config.gcc: Add PRU target. > * config/pru/alu-zext.md: New file. > * config/pru/constraints.md: New file. > * config/pru/predicates.md: New file. > * config/pru/pru-opts.h: New file. > * config/pru/pru-passes.c: New file. > * config/pru/pru-pragma.c: New file. > * config/pru/pru-protos.h: New file. > * config/pru/pru.c: New file. > * config/pru/pru.h: New file. > * config/pru/pru.md: New file. > * config/pru/pru.opt: New file. > * config/pru/t-pru: New file. > * doc/extend.texi: Document PRU pragmas. > * doc/invoke.texi: Document PRU-specific options. > * doc/md.texi: Document PRU asm constraints. > > Signed-off-by: Dimitar Dimitrov > > diff --git a/gcc/config/pru/alu-zext.md b/gcc/config/pru/alu-zext.md > new file mode 100644 > index 000..2112b08d3f4 > --- /dev/null > +++ b/gcc/config/pru/alu-zext.md [ ... ] I wonder if the zero-extensions in alu-zext.md are really needed. Can we define WORD_REGISTER_OPERATIONS for PRU? Or do you really want to expose sub-word operations? I simply don't know enough about the hardware to provide guidance here. But it might allow for some simplification of the port. You wouldn't want to define sub-word patterns such as addqi/addhi anymore either. > diff --git a/gcc/config/pru/pru.c b/gcc/config/pru/pru.c > new file mode 100644 > index 000..de72d22278e > --- /dev/null > +++ b/gcc/config/pru/pru.c > @@ -0,0 +1,3001 @@ > +/* Emit efficient RTL equivalent of ADD3 with the given const_int for > + frame-related registers. > + op0 - Destination register. > + op1 - First addendum operand (a register). > + addendum - Second addendum operand (a constant). > + kind - Note kind. REG_NOTE_MAX if no note must be added. > + reg_note_rtx - Reg note RTX. NULL if it should be computed > automatically. > + */ > +static rtx > +pru_add3_frame_adjust (rtx op0, rtx op1, int addendum, > +const enum reg_note kind, rtx reg_note_rtx) > +{ > + rtx insn; > + > + rtx op0_adjust = gen_rtx_SET (op0, plus_constant (Pmode, op1, addendum)); > + > + if (UBYTE_INT (addendum) || UBYTE_INT (-addendum)) > +insn = emit_insn (op0_adjust); > + else > +{ > + /* Help the compiler to cope with an arbitrary integer constant. > + Reload has finished so we can't expect the compiler to > + auto-allocate a temporary register. But we know that call-saved > + registers are not live yet, so we utilize them. */ Note I think this can run afoul of IPA-RA. THough it looks like you're exposing this in RTL, so you're probably safe. > +/* Emit function epilogue. */ > +void > +pru_expand_epilogue (bool sibcall_p) > +{ > + rtx cfa_adj; > + int total_frame_size; > + int sp_adjust, save_offset; > + int regno_start; > + > + if (!sibcall_p && pru_can_use_return_insn ()) > +{ > + emit_jump_insn (gen_return ()); > + return; > +} > + > + emit_insn (gen_blockage ()); > + > + total_frame_size = pru_compute_frame_layout (); > + if (frame_pointer_needed) > +{ > + /* Recover the stack pointer. */ > + cfa_adj = plus_constant (Pmode, stack_pointer_rtx, > +(total_frame_size > + - cfun->machine->save_regs_offset)); > + pru_add3_frame_adjust (stack_pointer_rtx, > + hard_frame_pointer_rtx, > + -cfun->machine->fp_save_offset, > + REG_CFA_DEF_CFA, > + cfa_adj); > + > + save_offset = 0; > + sp_adjust = total_frame_size - cfun->machine->save_regs_offset; > +} > + else if (!UBYTE_INT (total_frame_size)) > +{ > + pru_add_to_sp (cfun->machine->save_regs_offset, > + REG_CFA_ADJUST_CFA); > + save_offset = 0; > + sp_adjust = total_frame_size - cfun->machine->save_regs_offset; > +} > + else > +{ > + save_offset = cfun->machine->save_regs_offset; > + sp_adjust = total_frame_size; > +} > + > + regno_start = 0; > + do { > + regno_start = xbbo_next_reg_cluster (regno_start, _offset, false); > + } while (regno_start >= 0); > + > + if (sp_adjust) > + pru_add_to_sp (sp_adjust, REG_CFA_ADJUST_CFA); > + > + if (!sibcall_p) > +emit_jump_insn (gen_simple_return ()); Note you probably need a blockage before the restore of the stack pointer to avoid some really nasty problems with the scheduler reordering things such that the current frame gets deallocated before register restores happen. It usually doesn't cause a problem, but if an interrupt occurs between the