Re: Add a load_extend_op wrapper
> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > > Thanks, > Richard > > > gcc/ > * rtlanal.c (load_extend_op): Move to... > * rtl.h: ...here and make inline. OK, thanks. -- Eric Botcazou
Re: Add a load_extend_op wrapper
Eric Botcazouwrites: >> 2016-11-15 Richard Sandiford >> Alan Hayward >> David Sherwood >> >> * rtl.h (load_extend_op): Declare. >> * rtlanal.c (load_extend_op): New function. > > I'd make it an inline function. Sorry, I'd committed it before I got this message. Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ * rtlanal.c (load_extend_op): Move to... * rtl.h: ...here and make inline. diff --git a/gcc/rtl.h b/gcc/rtl.h index df5172b..b21514f 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -2954,7 +2954,6 @@ extern void set_insn_deleted (rtx); /* Functions in rtlanal.c */ -extern rtx_code load_extend_op (machine_mode); extern rtx single_set_2 (const rtx_insn *, const_rtx); extern bool contains_symbol_ref_p (const_rtx); extern bool contains_symbolic_reference_p (const_rtx); @@ -3771,5 +3770,17 @@ struct GTY(()) cgraph_rtl_info { unsigned function_used_regs_valid: 1; }; +/* If loads from memories of mode MODE always sign or zero extend, + return SIGN_EXTEND or ZERO_EXTEND as appropriate. Return UNKNOWN + otherwise. */ + +inline rtx_code +load_extend_op (machine_mode mode) +{ + if (SCALAR_INT_MODE_P (mode) + && GET_MODE_PRECISION (mode) < BITS_PER_WORD) +return LOAD_EXTEND_OP (mode); + return UNKNOWN; +} #endif /* ! GCC_RTL_H */ diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index f07a77a..55a9d2c 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -3863,19 +3863,6 @@ subreg_nregs_with_regno (unsigned int regno, const_rtx x) return info.nregs; } -/* If loads from memories of mode MODE always sign or zero extend, - return SIGN_EXTEND or ZERO_EXTEND as appropriate. Return UNKNOWN - otherwise. */ - -rtx_code -load_extend_op (machine_mode mode) -{ - if (SCALAR_INT_MODE_P (mode) - && GET_MODE_PRECISION (mode) < BITS_PER_WORD) -return LOAD_EXTEND_OP (mode); - return UNKNOWN; -} - struct parms_set_data { int nregs;
Re: Add a load_extend_op wrapper
On 11/15/2016 11:56 AM, Jeff Law wrote: On 11/15/2016 11:12 AM, Richard Sandiford wrote: Jeff Lawwrites: On 11/15/2016 05:42 AM, Richard Sandiford wrote: LOAD_EXTEND_OP only applies to scalar integer modes that are narrower than a word. However, callers weren't consistent about which of these checks they made beforehand, and also weren't consistent about whether "smaller" was based on (bit)size or precision (IMO it's the latter). This patch adds a wrapper to try to make the macro easier to use. It's unclear to me how GET_MODE_PRECISION is different from GET_MODE_SIZE or GET_MODE_BITSIZE. But I haven't really thought about it, particularly in the context of vector modes and such. I'm certainly willing to trust your judgment on this. In this case it's really more about scalar integer modes than vector modes. I think using size and precision are equivalent for MODE_INT but they can be different for MODE_PARTIAL_INT. Using precision allows LOAD_EXTEND_OP to apply to (say) PSImode extensions to SImode, whereas using the size wouldn't, since PSImode and SImode have the same (memory) size. Ah, partial modes. No idea what the right thing to do would be. So again, I'll trust your judgment. The only target where I had to deal with partial modes was the mn102. On that target partial modes (PSI/24 bits) are larger than the machine's natural word size (16 bits) so LOAD_EXTEND_OP didn't apply. More correctly, didn't apply to partial modes. We certainly used it to avoid unnecessary extensions when loading 8 bit values. jeff
Re: Add a load_extend_op wrapper
On 11/15/2016 11:12 AM, Richard Sandiford wrote: Jeff Lawwrites: On 11/15/2016 05:42 AM, Richard Sandiford wrote: LOAD_EXTEND_OP only applies to scalar integer modes that are narrower than a word. However, callers weren't consistent about which of these checks they made beforehand, and also weren't consistent about whether "smaller" was based on (bit)size or precision (IMO it's the latter). This patch adds a wrapper to try to make the macro easier to use. It's unclear to me how GET_MODE_PRECISION is different from GET_MODE_SIZE or GET_MODE_BITSIZE. But I haven't really thought about it, particularly in the context of vector modes and such. I'm certainly willing to trust your judgment on this. In this case it's really more about scalar integer modes than vector modes. I think using size and precision are equivalent for MODE_INT but they can be different for MODE_PARTIAL_INT. Using precision allows LOAD_EXTEND_OP to apply to (say) PSImode extensions to SImode, whereas using the size wouldn't, since PSImode and SImode have the same (memory) size. Ah, partial modes. No idea what the right thing to do would be. So again, I'll trust your judgment. The only target where I had to deal with partial modes was the mn102. On that target partial modes (PSI/24 bits) are larger than the machine's natural word size (16 bits) so LOAD_EXTEND_OP didn't apply. jeff
Re: Add a load_extend_op wrapper
> 2016-11-15 Richard Sandiford> Alan Hayward > David Sherwood > > * rtl.h (load_extend_op): Declare. > * rtlanal.c (load_extend_op): New function. I'd make it an inline function. -- Eric Botcazou
Re: Add a load_extend_op wrapper
Jeff Lawwrites: > On 11/15/2016 05:42 AM, Richard Sandiford wrote: >> LOAD_EXTEND_OP only applies to scalar integer modes that are narrower >> than a word. However, callers weren't consistent about which of these >> checks they made beforehand, and also weren't consistent about whether >> "smaller" was based on (bit)size or precision (IMO it's the latter). >> This patch adds a wrapper to try to make the macro easier to use. > It's unclear to me how GET_MODE_PRECISION is different from > GET_MODE_SIZE or GET_MODE_BITSIZE. But I haven't really thought about > it, particularly in the context of vector modes and such. I'm certainly > willing to trust your judgment on this. In this case it's really more about scalar integer modes than vector modes. I think using size and precision are equivalent for MODE_INT but they can be different for MODE_PARTIAL_INT. Using precision allows LOAD_EXTEND_OP to apply to (say) PSImode extensions to SImode, whereas using the size wouldn't, since PSImode and SImode have the same (memory) size. >> The patch doesn't change reload, since different checks could have >> unforeseen consequences. > I think the same concepts apply in reload, but I understand the > hesitation to twiddle that code and deal with possible fallout. Yeah :-) I know it's a bit of a cop-out, but given the scale of the SVE changes as a whole, we didn't want to go looking for unnecessary trouble. >> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >> >> Thanks, >> Richard >> >> >> [ This patch is part of the SVE series posted here: >> https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ] >> >> gcc/ >> 2016-11-15 Richard Sandiford >> Alan Hayward >> David Sherwood >> >> * rtl.h (load_extend_op): Declare. >> * rtlanal.c (load_extend_op): New function. >> (nonzero_bits1): Use it. >> (num_sign_bit_copies1): Likewise. >> * cse.c (cse_insn): Likewise. >> * fold-const.c (fold_single_bit_test): Likewise. >> (fold_unary_loc): Likewise. >> * fwprop.c (free_load_extend): Likewise. >> * postreload.c (reload_cse_simplify_set): Likewise. >> (reload_cse_simplify_operands): Likewise. >> * combine.c (try_combine): Likewise. >> (simplify_set): Likewise. Remove redundant SUBREG_BYTE and >> subreg_lowpart_p checks. > OK. > jeff Thanks, Richard
Re: Add a load_extend_op wrapper
On 11/15/2016 05:42 AM, Richard Sandiford wrote: LOAD_EXTEND_OP only applies to scalar integer modes that are narrower than a word. However, callers weren't consistent about which of these checks they made beforehand, and also weren't consistent about whether "smaller" was based on (bit)size or precision (IMO it's the latter). This patch adds a wrapper to try to make the macro easier to use. It's unclear to me how GET_MODE_PRECISION is different from GET_MODE_SIZE or GET_MODE_BITSIZE. But I haven't really thought about it, particularly in the context of vector modes and such. I'm certainly willing to trust your judgment on this. LOAD_EXTEND_OP is often used to disable transformations that aren't beneficial when extends from memory are free, so being stricter about the check accidentally exposed more optimisation opportunities. Right. "SUBREG_BYTE (...) == 0" and subreg_lowpart_p are implied by paradoxical_subreg_p, so the patch also removes some redundant tests. Always helpful. The patch doesn't change reload, since different checks could have unforeseen consequences. I think the same concepts apply in reload, but I understand the hesitation to twiddle that code and deal with possible fallout. Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Thanks, Richard [ This patch is part of the SVE series posted here: https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ] gcc/ 2016-11-15 Richard SandifordAlan Hayward David Sherwood * rtl.h (load_extend_op): Declare. * rtlanal.c (load_extend_op): New function. (nonzero_bits1): Use it. (num_sign_bit_copies1): Likewise. * cse.c (cse_insn): Likewise. * fold-const.c (fold_single_bit_test): Likewise. (fold_unary_loc): Likewise. * fwprop.c (free_load_extend): Likewise. * postreload.c (reload_cse_simplify_set): Likewise. (reload_cse_simplify_operands): Likewise. * combine.c (try_combine): Likewise. (simplify_set): Likewise. Remove redundant SUBREG_BYTE and subreg_lowpart_p checks. OK. jeff
Add a load_extend_op wrapper
LOAD_EXTEND_OP only applies to scalar integer modes that are narrower than a word. However, callers weren't consistent about which of these checks they made beforehand, and also weren't consistent about whether "smaller" was based on (bit)size or precision (IMO it's the latter). This patch adds a wrapper to try to make the macro easier to use. LOAD_EXTEND_OP is often used to disable transformations that aren't beneficial when extends from memory are free, so being stricter about the check accidentally exposed more optimisation opportunities. "SUBREG_BYTE (...) == 0" and subreg_lowpart_p are implied by paradoxical_subreg_p, so the patch also removes some redundant tests. The patch doesn't change reload, since different checks could have unforeseen consequences. Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Thanks, Richard [ This patch is part of the SVE series posted here: https://gcc.gnu.org/ml/gcc/2016-11/msg00030.html ] gcc/ 2016-11-15 Richard SandifordAlan Hayward David Sherwood * rtl.h (load_extend_op): Declare. * rtlanal.c (load_extend_op): New function. (nonzero_bits1): Use it. (num_sign_bit_copies1): Likewise. * cse.c (cse_insn): Likewise. * fold-const.c (fold_single_bit_test): Likewise. (fold_unary_loc): Likewise. * fwprop.c (free_load_extend): Likewise. * postreload.c (reload_cse_simplify_set): Likewise. (reload_cse_simplify_operands): Likewise. * combine.c (try_combine): Likewise. (simplify_set): Likewise. Remove redundant SUBREG_BYTE and subreg_lowpart_p checks. diff --git a/gcc/combine.c b/gcc/combine.c index 0665f38..d685f44 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -3738,7 +3738,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, { /* Or as a SIGN_EXTEND if LOAD_EXTEND_OP says that that's what it really is. */ - if (LOAD_EXTEND_OP (GET_MODE (SUBREG_REG (*split))) + if (load_extend_op (GET_MODE (SUBREG_REG (*split))) == SIGN_EXTEND) SUBST (*split, gen_rtx_SIGN_EXTEND (split_mode, SUBREG_REG (*split))); @@ -6794,16 +6794,13 @@ simplify_set (rtx x) would require a paradoxical subreg. Replace the subreg with a zero_extend to avoid the reload that would otherwise be required. */ - if (GET_CODE (src) == SUBREG && subreg_lowpart_p (src) - && INTEGRAL_MODE_P (GET_MODE (SUBREG_REG (src))) - && LOAD_EXTEND_OP (GET_MODE (SUBREG_REG (src))) != UNKNOWN - && SUBREG_BYTE (src) == 0 - && paradoxical_subreg_p (src) - && MEM_P (SUBREG_REG (src))) + enum rtx_code extend_op; + if (paradoxical_subreg_p (src) + && MEM_P (SUBREG_REG (src)) + && (extend_op = load_extend_op (GET_MODE (SUBREG_REG (src != UNKNOWN) { SUBST (SET_SRC (x), -gen_rtx_fmt_e (LOAD_EXTEND_OP (GET_MODE (SUBREG_REG (src))), - GET_MODE (src), SUBREG_REG (src))); +gen_rtx_fmt_e (extend_op, GET_MODE (src), SUBREG_REG (src))); src = SET_SRC (x); } diff --git a/gcc/cse.c b/gcc/cse.c index 11b8fbe..72f1c4f 100644 --- a/gcc/cse.c +++ b/gcc/cse.c @@ -4915,11 +4915,10 @@ cse_insn (rtx_insn *insn) also have such operations, but this is only likely to be beneficial on these machines. */ + rtx_code extend_op; if (flag_expensive_optimizations && src_related == 0 - && (GET_MODE_SIZE (mode) < UNITS_PER_WORD) - && GET_MODE_CLASS (mode) == MODE_INT && MEM_P (src) && ! do_not_record - && LOAD_EXTEND_OP (mode) != UNKNOWN) + && (extend_op = load_extend_op (mode)) != UNKNOWN) { struct rtx_def memory_extend_buf; rtx memory_extend_rtx = _extend_buf; @@ -4928,7 +4927,7 @@ cse_insn (rtx_insn *insn) /* Set what we are trying to extend and the operation it might have been extended with. */ memset (memory_extend_rtx, 0, sizeof (*memory_extend_rtx)); - PUT_CODE (memory_extend_rtx, LOAD_EXTEND_OP (mode)); + PUT_CODE (memory_extend_rtx, extend_op); XEXP (memory_extend_rtx, 0) = src; for (tmode = GET_MODE_WIDER_MODE (mode); diff --git a/gcc/fold-const.c b/gcc/fold-const.c index e14471e..c597414 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -6725,7 +6725,7 @@ fold_single_bit_test (location_t loc, enum tree_code code, /* If we are going to be able to omit the AND below, we must do our operations as unsigned. If we must use the AND, we have a choice. Normally unsigned is faster, but for some machines signed is. */ - ops_unsigned = (LOAD_EXTEND_OP (operand_mode) == SIGN_EXTEND + ops_unsigned = (load_extend_op