On Monday, 9/24/2018 11:38:23 EEST Richard Sandiford wrote: > Dimitar Dimitrov <dimi...@dinux.eu> writes: > > On Thursday, 9/13/2018 13:02:21 EEST Richard Sandiford wrote: > >> Dimitar Dimitrov <dimi...@dinux.eu> writes: > >> > +/* Callback for walk_gimple_seq that checks TP tree for TI ABI > >> > compliance. */ +static tree > >> > +check_op_callback (tree *tp, int *walk_subtrees, void *data) > >> > +{ > >> > + struct walk_stmt_info *wi = (struct walk_stmt_info *) data; > >> > + > >> > + if (RECORD_OR_UNION_TYPE_P (*tp) || TREE_CODE (*tp) == > >> > ENUMERAL_TYPE) > >> > + { > >> > + /* Forward declarations have NULL tree type. Skip them. */ > >> > + if (TREE_TYPE (*tp) == NULL) > >> > + return NULL; > >> > + } > >> > + > >> > + /* TODO - why C++ leaves INTEGER_TYPE forward declarations around? > >> > */ > >> > + if (TREE_TYPE (*tp) == NULL) > >> > + return NULL; > >> > + > >> > + const tree type = TREE_TYPE (*tp); > >> > + > >> > + /* Direct function calls are allowed, obviously. */ > >> > + if (TREE_CODE (*tp) == ADDR_EXPR && TREE_CODE (type) == > >> > POINTER_TYPE) > >> > + { > >> > + const tree ptype = TREE_TYPE (type); > >> > + if (TREE_CODE (ptype) == FUNCTION_TYPE) > >> > + return NULL; > >> > + } > >> > >> This seems like a bit of a dangerous heuristic. Couldn't it also cause > >> > >> us to skip things like: > >> (void *) func > >> > >> ? (OK, that's dubious C, but we do support it.) > > > > The cast yields a "data pointer", which is 32 bits for both types of ABI. > > Hence it is safe to skip "(void *) func". > > > > The TI ABI's 16 bit function pointers become a problem when they change > > the > > layout of structures and function argument registers. > > OK. The reason this stood out is that the code doesn't obviously match > the comment. If the code is just trying to skip direct function calls, > I think the gcall sequence I mentioned would be more obvious, and would > match the existing comment. If anything that takes the address of a > function is OK then it might be worth changing the comment to include that. I will use your suggested gcall snippet since it is safe and obvious. The comment matches my original intent.
> >> > +/* Implement TARGET_PREFERRED_RELOAD_CLASS. */ > >> > +static reg_class_t > >> > +pru_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t > >> > regclass) > >> > +{ > >> > + return regclass == NO_REGS ? GENERAL_REGS : regclass; > >> > +} > >> > >> This looks odd: PREFERRED_RELOAD_CLASS should return a subset of the > >> original class rather than add new registers. Can you remember why > >> it was needed? > > > > I'm not sure what target code is supposed to return when NO_REGS is passed > > here. I saw how other ports handle NO_REGS (c-sky, m32c, nios2, rl78). So > > I > > followed suite. > > > > Here is a backtrace of LRA when NO_REGS is passed: > > 0xf629ae pru_preferred_reload_class > > > > ../../gcc/gcc/config/pru/pru.c:788 > > > > 0xa3d6e8 process_alt_operands > > > > ../../gcc/gcc/lra-constraints.c:2600 > > > > 0xa3ef7d curr_insn_transform > > > > ../../gcc/gcc/lra-constraints.c:3889 > > > > 0xa4301e lra_constraints(bool) > > > > ../../gcc/gcc/lra-constraints.c:4906 > > > > 0xa2726c lra(_IO_FILE*) > > > > ../../gcc/gcc/lra.c:2446 > > > > 0x9c97a9 do_reload > > > > ../../gcc/gcc/ira.c:5469 > > > > 0x9c97a9 execute > > > > ../../gcc/gcc/ira.c:5653 > > I think it should just pass NO_REGS through unmodified (which means > spill to memory). In some ways it would be good if it didn't have to > handle this case, but again that's going to be work to fix. > > The RA will pass ALL_REGS if it can handle any register, and wants to > know what the target would prefer. But for any input, the hook needs > to stick within the registers it was given. Thanks for the clarification. I will remove the PRU hook and will rely on the default implementation (i.e. return the given rclass). > >> > +; LRA cannot cope with clobbered op2, hence the scratch register. > >> > +(define_insn "ashr<mode>3" > >> > + [(set (match_operand:QISI 0 "register_operand" "=&r,r") > >> > + (ashiftrt:QISI > >> > + (match_operand:QISI 1 "register_operand" "0,r") > >> > + (match_operand:QISI 2 "reg_or_const_1_operand" "r,P"))) > >> > + (clobber (match_scratch:QISI 3 "=r,X"))] > >> > + "" > >> > + "@ > >> > + mov %3, %2\;ASHRLP%=:\;qbeq ASHREND%=, %3, 0\; sub %3, %3, 1\; > >> > lsr\\t%0, %0, 1\; qbbc ASHRLP%=, %0, (%S0 * 8) - 2\; set %0, %0, (%S0 * > >> > 8) - 1\; jmp ASHRLP%=\;ASHREND%=: + lsr\\t%0, %1, 1\;qbbc LSIGN%=, > >> > %0, > >> > (%S0 * 8) - 2\;set %0, %0, (%S0 * 8) - 1\;LSIGN%=:" + [(set_attr > >> > "type" > >> > "complex,alu") > >> > + (set_attr "length" "28,4")]) > >> > >> What do you mean by LRA not coping? What did you try originally and > >> what went wrong? > > > > Better assembly could be generated if the shift count register was used > > for a> > > loop counter. Pseudo code: > > while (op2--) > > > > op0 >>= 1; > > > > I originally tried to clobber operand 2: > > (define_insn "ashr<mode>3" > > > > [(set (match_operand:QISI 0 "register_operand" "=&r,r") > > > > (ashiftrt:QISI > > > > (match_operand:QISI 1 "register_operand" "0,r") > > (match_operand:QISI 2 "reg_or_const_1_operand" "+r,P")) > > > > )] > > > > But with the above pattern operand 2 was not clobbered. Its value was > > deemed untouched (i.e. live across the pattern). So I came up with > > clobbering a separate register to fix this, at the expense of slightly > > bigger generated code. > > Ah, yeah, "+" has to be on an output operand (the destination of a set > or clobber). > > In this kind of situation you might be better off expanding the loop > as individual RTL instructions, which is what several targets do for > things like memmove. But if you'd prefer to keep a single pattern, > you could use: > > [(set (match_operand:QISI 0 "register_operand" "=r") > (ashiftrt:QISI > (match_operand:QISI 2 "register_operand" "0") > (match_operand:QISI 3 "register_operand" "1"))) > (set (match_operand:QISI 1 "register_operand" "=r") > (const_int -1))] > > (assuming the optimised loop does leave operand 2 containing all-1s). > > The shift by 1 would then need to be a separate pattern, with the > define_expand choosing between them. > Expanding the loop works like a charm. Thanks for the tip. I will include the rework in the next patch revision. > >> Despite the long reply (sorry), this looks in really good shape to me > >> FWIW. > >> Thanks for the submission. > > > > I take this sentence as an indication that the PRU port is welcome for > > inclusion into GCC. I'll work on fixing the comments and will send a new > > patch version. > > Well, process-wise, it's up to the steering committee to decide whether > to accept the port. But one of the main requirements is of course > reviewing the patches. > > Which of the other patches in the series still need review? I think Jeff > approved some of them earlier. Rest of patches are approved: https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01448.html https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01152.html Thanks, Dimitar