On Monday, 8/20/2018 11:12:23 EEST Jeff Law wrote: > On 08/15/2018 10:49 PM, Dimitar Dimitrov wrote: > > ChangeLog: > > > > 2018-07-27 Dimitar Dimitrov <dimi...@dinux.eu> > > > > * configure: Regenerate. > > * configure.ac: Add PRU target. > > > > gcc/ChangeLog: > > > > 2018-07-27 Dimitar Dimitrov <dimi...@dinux.eu> > > > > * common/config/pru/pru-common.c: New file. > > * config.gcc: Add PRU target. > > * config/pru/alu-zext.md: New file. > > * config/pru/constraints.md: New file. > > * config/pru/predicates.md: New file. > > * config/pru/pru-opts.h: New file. > > * config/pru/pru-passes.c: New file. > > * config/pru/pru-pragma.c: New file. > > * config/pru/pru-protos.h: New file. > > * config/pru/pru.c: New file. > > * config/pru/pru.h: New file. > > * config/pru/pru.md: New file. > > * config/pru/pru.opt: New file. > > * config/pru/t-pru: New file. > > * doc/extend.texi: Document PRU pragmas. > > * doc/invoke.texi: Document PRU-specific options. > > * doc/md.texi: Document PRU asm constraints. > > > > Signed-off-by: Dimitar Dimitrov <dimi...@dinux.eu> > > > > diff --git a/gcc/config/pru/alu-zext.md b/gcc/config/pru/alu-zext.md > > new file mode 100644 > > index 00000000000..2112b08d3f4 > > --- /dev/null > > +++ b/gcc/config/pru/alu-zext.md > > [ ... ] > I wonder if the zero-extensions in alu-zext.md are really needed. Can > we define WORD_REGISTER_OPERATIONS for PRU? Or do you really want to > expose sub-word operations? I simply don't know enough about the > hardware to provide guidance here. But it might allow for some > simplification of the port. You wouldn't want to define sub-word > patterns such as addqi/addhi anymore either. The QI/HI ALU patterns are needed for efficient code generation and ABI compliance.
In the PRU GCC port, UNITS_PER_WORD=1. Hence addqi is a full word size pattern, and WORD_REGISTER_OPERATIONS should not matter. Note that in PRU architecture, ALU operands can be either 8, 16 or 32 bits. All inputs are zero-extended to 32 bits, and outputs are truncated as needed. Here is a relevant discussion: http://gcc.gnu.org/ml/gcc/2017-01/msg00217.html For example, the following C snippet: uint16_t test(uint8_t a, uint16_t b) { return a + b; } Produces the following RTL and efficient assembler output: (insn 13 8 14 2 (set (reg/i:HI 56 r14.b0) (plus:HI (zero_extend:HI (reg:QI 56 r14.b0 [ a ])) (reg:HI 57 r14.b1 [ b ]))) "test.c":407 76 ) test: # Note: Function arguments are passed in sub-hw-registers add r14.w0, r14.b0, r14.w1 ret > > > diff --git a/gcc/config/pru/pru.c b/gcc/config/pru/pru.c > > new file mode 100644 > > index 00000000000..de72d22278e > > --- /dev/null > > +++ b/gcc/config/pru/pru.c > > @@ -0,0 +1,3001 @@ > > > > +/* Emit efficient RTL equivalent of ADD3 with the given const_int for > > + frame-related registers. > > + op0 - Destination register. > > + op1 - First addendum operand (a register). > > + addendum - Second addendum operand (a constant). > > + kind - Note kind. REG_NOTE_MAX if no note must be added. > > + reg_note_rtx - Reg note RTX. NULL if it should be computed > > automatically. + */ > > +static rtx > > +pru_add3_frame_adjust (rtx op0, rtx op1, int addendum, > > + const enum reg_note kind, rtx reg_note_rtx) > > +{ > > + rtx insn; > > + > > + rtx op0_adjust = gen_rtx_SET (op0, plus_constant (Pmode, op1, > > addendum)); + > > + if (UBYTE_INT (addendum) || UBYTE_INT (-addendum)) > > + insn = emit_insn (op0_adjust); > > + else > > + { > > + /* Help the compiler to cope with an arbitrary integer constant. > > + Reload has finished so we can't expect the compiler to > > + auto-allocate a temporary register. But we know that call-saved > > + registers are not live yet, so we utilize them. */ > > Note I think this can run afoul of IPA-RA. THough it looks like you're > exposing this in RTL, so you're probably safe. Grepping for PROLOGUE_TEMP_REG, it appears that MIPS and Risc-V use the same technique to get a temporary register for prologue/epilougue. > > > +/* Emit function epilogue. */ > > +void > > +pru_expand_epilogue (bool sibcall_p) > > +{ > > + rtx cfa_adj; > > + int total_frame_size; > > + int sp_adjust, save_offset; > > + int regno_start; > > + > > + if (!sibcall_p && pru_can_use_return_insn ()) > > + { > > + emit_jump_insn (gen_return ()); > > + return; > > + } > > + > > + emit_insn (gen_blockage ()); > > + > > + total_frame_size = pru_compute_frame_layout (); > > + if (frame_pointer_needed) > > + { > > + /* Recover the stack pointer. */ > > + cfa_adj = plus_constant (Pmode, stack_pointer_rtx, > > + (total_frame_size > > + - cfun->machine->save_regs_offset)); > > + pru_add3_frame_adjust (stack_pointer_rtx, > > + hard_frame_pointer_rtx, > > + -cfun->machine->fp_save_offset, > > + REG_CFA_DEF_CFA, > > + cfa_adj); > > + > > + save_offset = 0; > > + sp_adjust = total_frame_size - cfun->machine->save_regs_offset; > > + } > > + else if (!UBYTE_INT (total_frame_size)) > > + { > > + pru_add_to_sp (cfun->machine->save_regs_offset, > > + REG_CFA_ADJUST_CFA); > > + save_offset = 0; > > + sp_adjust = total_frame_size - cfun->machine->save_regs_offset; > > + } > > + else > > + { > > + save_offset = cfun->machine->save_regs_offset; > > + sp_adjust = total_frame_size; > > + } > > + > > + regno_start = 0; > > + do { > > + regno_start = xbbo_next_reg_cluster (regno_start, &save_offset, > > false); + } while (regno_start >= 0); > > + > > + if (sp_adjust) > > + pru_add_to_sp (sp_adjust, REG_CFA_ADJUST_CFA); > > + > > + if (!sibcall_p) > > + emit_jump_insn (gen_simple_return ()); > > Note you probably need a blockage before the restore of the stack > pointer to avoid some really nasty problems with the scheduler > reordering things such that the current frame gets deallocated before > register restores happen. It usually doesn't cause a problem, but if an > interrupt occurs between the de-allocation and register restore, then > all bets are off. Thank you. I will fix it. > > > diff --git a/gcc/config/pru/pru.h b/gcc/config/pru/pru.h > > new file mode 100644 > > index 00000000000..de1b9100209 > > --- /dev/null > > +++ b/gcc/config/pru/pru.h > > [ ... ] > > > + > > +#ifdef IN_LIBGCC2 > > +/* This is to get correct SI and DI modes in libgcc2.c (32 and 64 bits). > > */ +#define UNITS_PER_WORD 4 > > +#else > > +/* Width of a word, in units (bytes). */ > > +#define UNITS_PER_WORD 1 > > +#endif > > This looks odd. Can you explain a bit more what you're doing here? I'm > guessing you took it from AVR which is an 8 bit port. I'm not sure it's > the right thing to do for AVR either :-) You are correct - I took it from avr. It is needed to support SItype and DItype in libgcc. See libgcc/libgcc2.h:126, considering that UNITS_PER_WORD=1: #if MIN_UNITS_PER_WORD > 1 /* These typedefs are usually forbidden on dsp's with UNITS_PER_WORD 1. */ typedef int SItype __attribute__ ((mode (SI))); typedef int DItype __attribute__ ((mode (DI))); Without that snippet, I get build errors: ../../../gcc/libgcc/libgcc2.h:411:8: error: unknown type name 'DItype' 411 | extern DItype __bswapdi2 (DItype); > > > diff --git a/gcc/config/pru/pru.md b/gcc/config/pru/pru.md > > new file mode 100644 > > index 00000000000..4be1958e6c9 > > --- /dev/null > > +++ b/gcc/config/pru/pru.md > > @@ -0,0 +1,956 @@ > > +;; Machine Description for TI PRU. > > +;; Copyright (C) 2014-2018 Free Software Foundation, Inc. > > +;; Contributed by Dimitar Dimitrov <dimi...@dinux.eu> > > +;; Based on the NIOS2 GCC port. > > +;; > > +;; This file is part of GCC. > > +;; > > +;; GCC is free software; you can redistribute it and/or modify > > +;; it under the terms of the GNU General Public License as published by > > +;; the Free Software Foundation; either version 3, or (at your option) > > +;; any later version. > > +;; > > +;; GCC is distributed in the hope that it will be useful, > > +;; but WITHOUT ANY WARRANTY; without even the implied warranty of > > +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > +;; GNU General Public License for more details. > > +;; > > +;; You should have received a copy of the GNU General Public License > > +;; along with GCC; see the file COPYING3. If not see > > +;; <http://www.gnu.org/licenses/>. > > + > > + > > + > > +;; Split loading of integer constants into a distinct pattern, in order > > to > > +;; prevent CSE from considering "SET (MEM, CONST_INT)" as a valid insn > > +;; selection. This fixes an abnormally long compile time exposed by > > +;; gcc.dg/pr48141.c > > Note that in general both reload and LRA require that the movXX patterns > be unified for any given mode. Reload and LRA can change the operands > without re-recognizing the pattern. You might be getting away with not > adhering to that requirement because you're just dealing with constant > source operands (most of the problems I've run into through the years > involve memory operands). But be aware you may need to twiddle this and > fold those patterns back into your movXX patterns. Noted. I will try the folding after acceptance. > > > + > > +; I cannot think of any reason for the core to pass a 64-bit symbolic > > +; constants. Hence simplify the rule and handle only numeric constants. > > That's fine. THough you'll generally get better code generated if you > tighten up the operand predicates rather than handling things in > contraints. So it may be worth investigating if you can tighten up the > predicate for your movXX patterns. > > And that advice applies in general. It's usually better to have a > predicate that is tight and use constraints to drive instruction > selection. A wide predicate and tight constraints means reload/LRA has > to work harder and they're not particularly good at optimizing away > redundancies they create. Additionally on a target which defines > instruction scheduling, you generally get a more accurate first schedule > if the predicates are tight. I don't think I can make the movdi predicate any more specific without splitting the pattern. For example, "nonimmediate_operand" fits exacly the following constraints: "=m,r,r,r,r,r" . gensupport.c: {"nonimmediate_operand", false, false, {SUBREG, REG, MEM}}, > > > + > > +;; Sign extension patterns. We have to emulate them due to lack of > > +;; signed operations in PRU's ALU. > > + > > +(define_insn "extend<EQS0:mode><EQD:mode>2" > > + [(set (match_operand:EQD 0 "register_operand" "=r") > > + (sign_extend:EQD (match_operand:EQS0 1 "register_operand" "r")))] > > + "" > > + { > > + return pru_output_sign_extend (operands); > > + } > > + [(set_attr "type" "complex") > > + (set_attr "length" "12")]) > > You might be better off expanding sign extension as a pair of shifts. > Those can be optimized and combined with other nearby operations. ISTM > that's something you could experiment with once the port is accepted. Unfortunately PRU has no native ashiftrt instruction. The currently implemented "compare->branch->bit-fill" is much faster than an emulated ashiftrt. > > > +;; Arithmetic Operations > > + > > +(define_expand "add<mode>3" > > + [(set (match_operand:QISI 0 "register_operand" "=r,r,r") > > + (plus:QISI (match_operand:QISI 1 "register_operand" "%r,r,r") > > + (match_operand:QISI 2 "nonmemory_operand" "r,I,M")))] > > + "" > > + "" > > + [(set_attr "type" "alu")]) > > + > > +(define_insn "adddi3" > > + [(set (match_operand:DI 0 "register_operand" "=&r,&r,&r") > > + (plus:DI (match_operand:DI 1 "register_operand" "%r,r,r") > > + (match_operand:DI 2 "reg_or_ubyte_operand" "r,I,M")))] > > + "" > > + "@ > > + add\\t%F0, %F1, %F2\;adc\\t%N0, %N1, %N2 > > + add\\t%F0, %F1, %2\;adc\\t%N0, %N1, 0 > > + sub\\t%F0, %F1, %n2\;suc\\t%N0, %N1, 0" > > + [(set_attr "type" "alu,alu,alu") > > + (set_attr "length" "8,8,8")]) > > Note that if you expose the C bit and define an add/sub with carry insn > the compiler will handle this kind of stuff for you automatically. > That's something you can experiment with after acceptance as well. I will explore this. Thanks for the pointer. > > Overall it looks pretty good. I think the blockage in the epilogue > issue needs to be addressed. The UNITS_PER_WORD concern needs to be > answered. The rest are things you might want to investigate, but aren't > things that need to be changed to get approval. > > I think Joseph had some concerns about diagnostics emitted by the PRU > target code. Whatever those where need to be addressed if they haven't > already. I have already addressed the diagnostics comments in this patch version. Regards, Dimitar