Re: [PATCH] simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]
Maxim Kuvyrkov writes: > Hi Richard, > > You patch below increases code-size on aarch64-linux-gnu with -Os on SPEC2k6 > 400.perlbench and 453.povray -- by 1% and 2% respectively. > > 400.perlbench,perlbench_base.default, 101,939261,951221 > 453.povray,povray_base.default, 102,707807,721399 > > Would you please check whether these can be avoided? [Let me know if you > need help reproducing this problem.] I reverted the patch on Wednesday, see: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01962.html Thanks, Richard > > Thank you, > > -- > Maxim Kuvyrkov > https://www.linaro.org > >> On Jan 27, 2020, at 7:41 PM, Richard Sandiford >> wrote: >> >> In the gcc.target/aarch64/lsl_asr_sbfiz.c part of this PR, we have: >> >> Failed to match this instruction: >> (set (reg:SI 95) >>(ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 97) 0) >>(const_int 3 [0x3]) >>(const_int 0 [0])) 0) >>(const_int 19 [0x13]))) >> >> If we perform the natural simplification to: >> >> (set (reg:SI 95) >>(ashift:SI (sign_extract:SI (reg:SI 97) >>(const_int 3 [0x3]) >>(const_int 0 [0])) 0) >>(const_int 19 [0x13]))) >> >> then the pattern matches. And it turns out that we do have a >> simplification like that already, but it would only kick in for >> extractions from a reg, not a subreg. E.g.: >> >> (set (reg:SI 95) >>(ashift:SI (subreg:SI (sign_extract:DI (reg:DI X) >>(const_int 3 [0x3]) >>(const_int 0 [0])) 0) >>(const_int 19 [0x13]))) >> >> would simplify to: >> >> (set (reg:SI 95) >>(ashift:SI (sign_extract:SI (subreg:SI (reg:DI X) 0) >>(const_int 3 [0x3]) >>(const_int 0 [0])) 0) >>(const_int 19 [0x13]))) >> >> IMO the subreg case is even more obviously a simplification >> than the bare reg case, since the net effect is to remove >> either one or two subregs, rather than simply change the >> position of a subreg/truncation. >> >> However, doing that regressed gcc.dg/tree-ssa/pr64910-2.c >> for -m32 on x86_64-linux-gnu, because we could then simplify >> a :HI zero_extract to a :QI one. The associated *testqi_ext_3 >> pattern did already seem to want to handle QImode extractions: >> >> "ix86_match_ccmode (insn, CCNOmode) >> && ((TARGET_64BIT && GET_MODE (operands[2]) == DImode) >> || GET_MODE (operands[2]) == SImode >> || GET_MODE (operands[2]) == HImode >> || GET_MODE (operands[2]) == QImode) >> >> but I'm not sure how often the QI case would trigger in practice, >> since the zero_extract mode was restricted to HI and above. I checked >> the other x86 patterns and couldn't see any other instances of this. >> >> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu, >> OK to install? >> >> Richard >> >> >> 2020-01-27 Richard Sandiford >> >> gcc/ >> PR rtl-optimization/87763 >> * simplify-rtx.c (simplify_truncation): Extend sign/zero_extract >> simplification to handle subregs as well as bare regs. >> * config/i386/i386.md (*testqi_ext_3): Match QI extracts too. >> --- >> gcc/config/i386/i386.md | 2 +- >> gcc/simplify-rtx.c | 4 +++- >> 2 files changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md >> index 6e9c9bd2fb6..a125ab350bb 100644 >> --- a/gcc/config/i386/i386.md >> +++ b/gcc/config/i386/i386.md >> @@ -8927,7 +8927,7 @@ (define_insn "*testqi_ext_2" >> (define_insn_and_split "*testqi_ext_3" >> [(set (match_operand 0 "flags_reg_operand") >> (match_operator 1 "compare_operator" >> - [(zero_extract:SWI248 >> + [(zero_extract:SWI >> (match_operand 2 "nonimmediate_operand" "rm") >> (match_operand 3 "const_int_operand" "n") >> (match_operand 4 "const_int_operand" "n")) >> diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c >> index eff1d07a253..db4f9339c15 100644 >> --- a/gcc/simplify-rtx.c >> +++ b/gcc/simplify-rtx.c >> @@ -736,7 +736,9 @@ simplify_truncation (machine_mode mode, rtx op, >> (*_extract:M1 (truncate:M1 (reg:M2)) (len) (pos')) if possible without >> changing len. */ >> if ((GET_CODE (op) == ZERO_EXTRACT || GET_CODE (op) == SIGN_EXTRACT) >> - && REG_P (XEXP (op, 0)) >> + && (REG_P (XEXP (op, 0)) >> + || (SUBREG_P (XEXP (op, 0)) >> + && REG_P (SUBREG_REG (XEXP (op, 0) >> && GET_MODE (XEXP (op, 0)) == GET_MODE (op) >> && CONST_INT_P (XEXP (op, 1)) >> && CONST_INT_P (XEXP (op, 2))) >> -- >> 2.17.1 >>
Re: [PATCH] simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]
Hi Richard, You patch below increases code-size on aarch64-linux-gnu with -Os on SPEC2k6 400.perlbench and 453.povray -- by 1% and 2% respectively. 400.perlbench,perlbench_base.default, 101,939261,951221 453.povray,povray_base.default, 102,707807,721399 Would you please check whether these can be avoided? [Let me know if you need help reproducing this problem.] Thank you, -- Maxim Kuvyrkov https://www.linaro.org > On Jan 27, 2020, at 7:41 PM, Richard Sandiford > wrote: > > In the gcc.target/aarch64/lsl_asr_sbfiz.c part of this PR, we have: > > Failed to match this instruction: > (set (reg:SI 95) >(ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 97) 0) >(const_int 3 [0x3]) >(const_int 0 [0])) 0) >(const_int 19 [0x13]))) > > If we perform the natural simplification to: > > (set (reg:SI 95) >(ashift:SI (sign_extract:SI (reg:SI 97) >(const_int 3 [0x3]) >(const_int 0 [0])) 0) >(const_int 19 [0x13]))) > > then the pattern matches. And it turns out that we do have a > simplification like that already, but it would only kick in for > extractions from a reg, not a subreg. E.g.: > > (set (reg:SI 95) >(ashift:SI (subreg:SI (sign_extract:DI (reg:DI X) >(const_int 3 [0x3]) >(const_int 0 [0])) 0) >(const_int 19 [0x13]))) > > would simplify to: > > (set (reg:SI 95) >(ashift:SI (sign_extract:SI (subreg:SI (reg:DI X) 0) >(const_int 3 [0x3]) >(const_int 0 [0])) 0) >(const_int 19 [0x13]))) > > IMO the subreg case is even more obviously a simplification > than the bare reg case, since the net effect is to remove > either one or two subregs, rather than simply change the > position of a subreg/truncation. > > However, doing that regressed gcc.dg/tree-ssa/pr64910-2.c > for -m32 on x86_64-linux-gnu, because we could then simplify > a :HI zero_extract to a :QI one. The associated *testqi_ext_3 > pattern did already seem to want to handle QImode extractions: > > "ix86_match_ccmode (insn, CCNOmode) > && ((TARGET_64BIT && GET_MODE (operands[2]) == DImode) > || GET_MODE (operands[2]) == SImode > || GET_MODE (operands[2]) == HImode > || GET_MODE (operands[2]) == QImode) > > but I'm not sure how often the QI case would trigger in practice, > since the zero_extract mode was restricted to HI and above. I checked > the other x86 patterns and couldn't see any other instances of this. > > Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu, > OK to install? > > Richard > > > 2020-01-27 Richard Sandiford > > gcc/ > PR rtl-optimization/87763 > * simplify-rtx.c (simplify_truncation): Extend sign/zero_extract > simplification to handle subregs as well as bare regs. > * config/i386/i386.md (*testqi_ext_3): Match QI extracts too. > --- > gcc/config/i386/i386.md | 2 +- > gcc/simplify-rtx.c | 4 +++- > 2 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > index 6e9c9bd2fb6..a125ab350bb 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -8927,7 +8927,7 @@ (define_insn "*testqi_ext_2" > (define_insn_and_split "*testqi_ext_3" > [(set (match_operand 0 "flags_reg_operand") > (match_operator 1 "compare_operator" > - [(zero_extract:SWI248 > + [(zero_extract:SWI >(match_operand 2 "nonimmediate_operand" "rm") >(match_operand 3 "const_int_operand" "n") >(match_operand 4 "const_int_operand" "n")) > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > index eff1d07a253..db4f9339c15 100644 > --- a/gcc/simplify-rtx.c > +++ b/gcc/simplify-rtx.c > @@ -736,7 +736,9 @@ simplify_truncation (machine_mode mode, rtx op, > (*_extract:M1 (truncate:M1 (reg:M2)) (len) (pos')) if possible without > changing len. */ > if ((GET_CODE (op) == ZERO_EXTRACT || GET_CODE (op) == SIGN_EXTRACT) > - && REG_P (XEXP (op, 0)) > + && (REG_P (XEXP (op, 0)) > + || (SUBREG_P (XEXP (op, 0)) > + && REG_P (SUBREG_REG (XEXP (op, 0) > && GET_MODE (XEXP (op, 0)) == GET_MODE (op) > && CONST_INT_P (XEXP (op, 1)) > && CONST_INT_P (XEXP (op, 2))) > -- > 2.17.1 >
Re: [PATCH] simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]
On Thu, 2020-01-30 at 17:54 +, Richard Sandiford wrote: > Jeff Law writes: > > On Wed, 2020-01-29 at 19:18 +, Richard Sandiford wrote: > > > Andreas Schwab writes: > > > > On Jan 27 2020, Richard Sandiford wrote: > > > > > > > > > * simplify-rtx.c (simplify_truncation): Extend sign/zero_extract > > > > > simplification to handle subregs as well as bare regs. > > > > > > > > That breaks gcc.target/m68k/pr39726.c > > > > > > Gah. Jeff pointed out off-list that it also broke > > > gcc.target/sh/pr64345-1.c on sh3-linux-gnu. It didn't look like either > > > of them would be fixable in an acceptably non-invasive and unhacky way, > > > so I've reverted the patch "for now". > > I would have considered letting those two targets regress those tests > > to move forward on 87763. aarch64 is (IMHO) more important than the sh > > and m68k combined ;-) It also seems to me that your patch generates > > better RTL and that we could claim that a port that regresses on code > > quality needs its port maintainer to step in and fix the port. > > > > WRT the m68k issue I'd think it could be fixed by twiddling > > cbranchsi4_btst_reg_insn_1 to accept a mode iterator on the > > zero_extract and making some minor adjustments in its output code. > > Something like the attached. I haven't tested it in any real way and > > haven't really thought about whether or not it does the right thing for > > a MEM operand. > > It just feels like this is breaking the contract with extv and extzv, > where all *_extracts are supposed to be in the mode that those patterns > accept. My i386 patch was doing that too TBH, I was just being > optimistic given that QImode was already handled by that pattern. :-) > > So IMO my patch has too many knock-on effects for it to be suitable for > stage 4. While we have make_extraction, we probably have to leave this > kind of expression untouched. > > For AArch64 I was planning on adding a new pattern to match the > (subreg:SI (zero_extract:DI ...)) -- as one of the comments in the > PR suggested -- but with the subreg matched via subreg_lowpart_operator, > to make it endian-safe. This is similar in spirit to the new i686 > popcount patterns. Fair enough on the simplify-rtx change :-) One might argue that we should be loosening the requirements, but memory operands are particularly troublesome. I think HP and I had a light discussion on that a year or so ago. WRT adding patterns to aarch64. Yea, I went that route last year, but never was able to go from "hey, this seems to work pretty well" to "it's OK for the trunk". Jeff
Re: [PATCH] simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]
Jeff Law writes: > On Wed, 2020-01-29 at 19:18 +, Richard Sandiford wrote: >> Andreas Schwab writes: >> > On Jan 27 2020, Richard Sandiford wrote: >> > >> > > * simplify-rtx.c (simplify_truncation): Extend sign/zero_extract >> > > simplification to handle subregs as well as bare regs. >> > >> > That breaks gcc.target/m68k/pr39726.c >> >> Gah. Jeff pointed out off-list that it also broke >> gcc.target/sh/pr64345-1.c on sh3-linux-gnu. It didn't look like either >> of them would be fixable in an acceptably non-invasive and unhacky way, >> so I've reverted the patch "for now". > I would have considered letting those two targets regress those tests > to move forward on 87763. aarch64 is (IMHO) more important than the sh > and m68k combined ;-) It also seems to me that your patch generates > better RTL and that we could claim that a port that regresses on code > quality needs its port maintainer to step in and fix the port. > > WRT the m68k issue I'd think it could be fixed by twiddling > cbranchsi4_btst_reg_insn_1 to accept a mode iterator on the > zero_extract and making some minor adjustments in its output code. > Something like the attached. I haven't tested it in any real way and > haven't really thought about whether or not it does the right thing for > a MEM operand. It just feels like this is breaking the contract with extv and extzv, where all *_extracts are supposed to be in the mode that those patterns accept. My i386 patch was doing that too TBH, I was just being optimistic given that QImode was already handled by that pattern. :-) So IMO my patch has too many knock-on effects for it to be suitable for stage 4. While we have make_extraction, we probably have to leave this kind of expression untouched. For AArch64 I was planning on adding a new pattern to match the (subreg:SI (zero_extract:DI ...)) -- as one of the comments in the PR suggested -- but with the subreg matched via subreg_lowpart_operator, to make it endian-safe. This is similar in spirit to the new i686 popcount patterns. Thanks, Richard > > I'd be surprised if the SH fix wasn't similar, but I know almost > nothing about the SH implementation. > > Jeff
Re: [PATCH] simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]
On Thu, 2020-01-30 at 18:27 +0100, Jakub Jelinek wrote: > On Thu, Jan 30, 2020 at 10:23:35AM -0700, Jeff Law wrote: > > diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md > > index 8e35357ea23..78c4cbe4753 100644 > > --- a/gcc/config/m68k/m68k.md > > +++ b/gcc/config/m68k/m68k.md > > @@ -644,12 +644,12 @@ > >return m68k_output_branch_integer (code); > > }) > > > > -(define_insn "cbranchsi4_btst_reg_insn_1" > > +(define_insn "cbranch4_btst_reg_insn_1" > >[(set (pc) > > (if_then_else (match_operator 0 "equality_comparison_operator" > > - [(zero_extract:SI (match_operand:SI 1 > > "nonimmediate_operand" "do,dQ") > > -(const_int 1) > > -(match_operand:SI 2 > > "const_int_operand" "n,n")) > > + [(zero_extract:I (match_operand:I 1 > > "nonimmediate_operand" "do,dQ") > > + (const_int 1) > > + (match_operand:I 2 "const_int_operand" > > "n,n")) > > (const_int 0)]) > > (label_ref (match_operand 3 "")) > > (pc)))] > > @@ -665,8 +665,9 @@ > > } > >else > > { > > - operands[2] = GEN_INT (31 - INTVAL (operands[2])); > > - code = m68k_output_btst (operands[2], operands[1], code, 31); > > + operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1])) > > +- INTVAL (operands[2]) - 1); > > + code = m68k_output_btst (operands[2], operands[1], code, > > GET_MODE_BITSIZE (GET_MODE (operands[1])) - 1); > > s/GET_MODE (operands[1])/mode/g ? > Also, the last line is too long. It's not meant for inclusion as-is, but to show how we might fix this and allow us to move forward on 87763. Jeff
Re: [PATCH] simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]
On Thu, Jan 30, 2020 at 10:23:35AM -0700, Jeff Law wrote: > diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md > index 8e35357ea23..78c4cbe4753 100644 > --- a/gcc/config/m68k/m68k.md > +++ b/gcc/config/m68k/m68k.md > @@ -644,12 +644,12 @@ >return m68k_output_branch_integer (code); > }) > > -(define_insn "cbranchsi4_btst_reg_insn_1" > +(define_insn "cbranch4_btst_reg_insn_1" >[(set (pc) > (if_then_else (match_operator 0 "equality_comparison_operator" > -[(zero_extract:SI (match_operand:SI 1 > "nonimmediate_operand" "do,dQ") > - (const_int 1) > - (match_operand:SI 2 > "const_int_operand" "n,n")) > +[(zero_extract:I (match_operand:I 1 > "nonimmediate_operand" "do,dQ") > + (const_int 1) > + (match_operand:I 2 "const_int_operand" > "n,n")) > (const_int 0)]) > (label_ref (match_operand 3 "")) > (pc)))] > @@ -665,8 +665,9 @@ > } >else > { > - operands[2] = GEN_INT (31 - INTVAL (operands[2])); > - code = m68k_output_btst (operands[2], operands[1], code, 31); > + operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1])) > + - INTVAL (operands[2]) - 1); > + code = m68k_output_btst (operands[2], operands[1], code, > GET_MODE_BITSIZE (GET_MODE (operands[1])) - 1); s/GET_MODE (operands[1])/mode/g ? Also, the last line is too long. Jakub
Re: [PATCH] simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]
On Wed, 2020-01-29 at 19:18 +, Richard Sandiford wrote: > Andreas Schwab writes: > > On Jan 27 2020, Richard Sandiford wrote: > > > > > * simplify-rtx.c (simplify_truncation): Extend sign/zero_extract > > > simplification to handle subregs as well as bare regs. > > > > That breaks gcc.target/m68k/pr39726.c > > Gah. Jeff pointed out off-list that it also broke > gcc.target/sh/pr64345-1.c on sh3-linux-gnu. It didn't look like either > of them would be fixable in an acceptably non-invasive and unhacky way, > so I've reverted the patch "for now". I would have considered letting those two targets regress those tests to move forward on 87763. aarch64 is (IMHO) more important than the sh and m68k combined ;-) It also seems to me that your patch generates better RTL and that we could claim that a port that regresses on code quality needs its port maintainer to step in and fix the port. WRT the m68k issue I'd think it could be fixed by twiddling cbranchsi4_btst_reg_insn_1 to accept a mode iterator on the zero_extract and making some minor adjustments in its output code. Something like the attached. I haven't tested it in any real way and haven't really thought about whether or not it does the right thing for a MEM operand. I'd be surprised if the SH fix wasn't similar, but I know almost nothing about the SH implementation. Jeff diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md index 8e35357ea23..78c4cbe4753 100644 --- a/gcc/config/m68k/m68k.md +++ b/gcc/config/m68k/m68k.md @@ -644,12 +644,12 @@ return m68k_output_branch_integer (code); }) -(define_insn "cbranchsi4_btst_reg_insn_1" +(define_insn "cbranch4_btst_reg_insn_1" [(set (pc) (if_then_else (match_operator 0 "equality_comparison_operator" - [(zero_extract:SI (match_operand:SI 1 "nonimmediate_operand" "do,dQ") - (const_int 1) - (match_operand:SI 2 "const_int_operand" "n,n")) + [(zero_extract:I (match_operand:I 1 "nonimmediate_operand" "do,dQ") + (const_int 1) + (match_operand:I 2 "const_int_operand" "n,n")) (const_int 0)]) (label_ref (match_operand 3 "")) (pc)))] @@ -665,8 +665,9 @@ } else { - operands[2] = GEN_INT (31 - INTVAL (operands[2])); - code = m68k_output_btst (operands[2], operands[1], code, 31); + operands[2] = GEN_INT (GET_MODE_BITSIZE (GET_MODE (operands[1])) + - INTVAL (operands[2]) - 1); + code = m68k_output_btst (operands[2], operands[1], code, GET_MODE_BITSIZE (GET_MODE (operands[1])) - 1); } return m68k_output_branch_integer (code); }
Re: [PATCH] simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]
Andreas Schwab writes: > On Jan 27 2020, Richard Sandiford wrote: > >> * simplify-rtx.c (simplify_truncation): Extend sign/zero_extract >> simplification to handle subregs as well as bare regs. > > That breaks gcc.target/m68k/pr39726.c Gah. Jeff pointed out off-list that it also broke gcc.target/sh/pr64345-1.c on sh3-linux-gnu. It didn't look like either of them would be fixable in an acceptably non-invasive and unhacky way, so I've reverted the patch "for now". Thanks, Richard
Re: [PATCH] simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]
On Jan 27 2020, Richard Sandiford wrote: > * simplify-rtx.c (simplify_truncation): Extend sign/zero_extract > simplification to handle subregs as well as bare regs. That breaks gcc.target/m68k/pr39726.c @@ -67,39 +67,39 @@ Disassembly of section .text: 0088 : 88: 302f 0006 movew %sp@(6),%d0 8c: 906f 000a subw %sp@(10),%d0 - 90: 4a00tstb %d0 - 92: 6a08bpls 9c - 94: 13fc 0001 moveb #1,0 - 9a: - 98: R_68K_32v - 9c: 4e75rts + 90: 0240 0080 andiw #128,%d0 + 94: 6708beqs 9e + 96: 13fc 0001 moveb #1,0 + 9c: + 9a: R_68K_32v + 9e: 4e75rts -009e : - 9e: 302f 0006 movew %sp@(6),%d0 - a2: d06f 000a addw %sp@(10),%d0 - a6: 4a00tstb %d0 - a8: 6a08bpls b2 - aa: 13fc 0001 moveb #1,0 - b0: - ae: R_68K_32v - b2: 4e75rts +00a0 : + a0: 302f 0006 movew %sp@(6),%d0 + a4: d06f 000a addw %sp@(10),%d0 + a8: 0240 0080 andiw #128,%d0 + ac: 6708beqs b6 + ae: 13fc 0001 moveb #1,0 + b4: + b2: R_68K_32v + b6: 4e75rts -00b4 : - b4: 302f 0006 movew %sp@(6),%d0 - b8: 906f 000a subw %sp@(10),%d0 - bc: 0240 8421 andiw #-31711,%d0 - c0: 6708beqs ca - c2: 13fc 0001 moveb #1,0 - c8: - c6: R_68K_32v - ca: 4e75rts +00b8 : + b8: 302f 0006 movew %sp@(6),%d0 + bc: 906f 000a subw %sp@(10),%d0 + c0: 0240 8421 andiw #-31711,%d0 + c4: 6708beqs ce + c6: 13fc 0001 moveb #1,0 + cc: + ca: R_68K_32v + ce: 4e75rts -00cc : - cc: 302f 0006 movew %sp@(6),%d0 - d0: d06f 000a addw %sp@(10),%d0 - d4: 0240 8421 andiw #-31711,%d0 - d8: 6708beqs e2 - da: 13fc 0001 moveb #1,0 - e0: - de: R_68K_32v - e2: 4e75rts +00d0 : + d0: 302f 0006 movew %sp@(6),%d0 + d4: d06f 000a addw %sp@(10),%d0 + d8: 0240 8421 andiw #-31711,%d0 + dc: 6708beqs e6 + de: 13fc 0001 moveb #1,0 + e4: + e2: R_68K_32v + e6: 4e75rts Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
Re: [PATCH] simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]
Jeff Law writes: > On Mon, 2020-01-27 at 16:41 +, Richard Sandiford wrote: >> In the gcc.target/aarch64/lsl_asr_sbfiz.c part of this PR, we have: >> >> Failed to match this instruction: >> (set (reg:SI 95) >> (ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 97) 0) >> (const_int 3 [0x3]) >> (const_int 0 [0])) 0) >> (const_int 19 [0x13]))) >> >> If we perform the natural simplification to: >> >> (set (reg:SI 95) >> (ashift:SI (sign_extract:SI (reg:SI 97) >> (const_int 3 [0x3]) >> (const_int 0 [0])) 0) >> (const_int 19 [0x13]))) >> >> then the pattern matches. And it turns out that we do have a >> simplification like that already, but it would only kick in for >> extractions from a reg, not a subreg. E.g.: > Yea. I ran into similar problems with the extract/extend bits in > combine. And we know it's a fairly general problem that we don't > handle SUBREGs anywhere near as consistently as REGs. > > >> >> (set (reg:SI 95) >> (ashift:SI (subreg:SI (sign_extract:DI (reg:DI X) >> (const_int 3 [0x3]) >> (const_int 0 [0])) 0) >> (const_int 19 [0x13]))) >> >> would simplify to: >> >> (set (reg:SI 95) >> (ashift:SI (sign_extract:SI (subreg:SI (reg:DI X) 0) >> (const_int 3 [0x3]) >> (const_int 0 [0])) 0) >> (const_int 19 [0x13]))) >> >> IMO the subreg case is even more obviously a simplification >> than the bare reg case, since the net effect is to remove >> either one or two subregs, rather than simply change the >> position of a subreg/truncation. >> >> However, doing that regressed gcc.dg/tree-ssa/pr64910-2.c >> for -m32 on x86_64-linux-gnu, because we could then simplify >> a :HI zero_extract to a :QI one. The associated *testqi_ext_3 >> pattern did already seem to want to handle QImode extractions: >> >> "ix86_match_ccmode (insn, CCNOmode) >>&& ((TARGET_64BIT && GET_MODE (operands[2]) == DImode) >>|| GET_MODE (operands[2]) == SImode >>|| GET_MODE (operands[2]) == HImode >>|| GET_MODE (operands[2]) == QImode) >> >> but I'm not sure how often the QI case would trigger in practice, >> since the zero_extract mode was restricted to HI and above. I checked >> the other x86 patterns and couldn't see any other instances of this. >> >> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu, >> OK to install? >> >> Richard >> >> >> 2020-01-27 Richard Sandiford >> >> gcc/ >> PR rtl-optimization/87763 >> * simplify-rtx.c (simplify_truncation): Extend sign/zero_extract >> simplification to handle subregs as well as bare regs. >> * config/i386/i386.md (*testqi_ext_3): Match QI extracts too. > Do you need to check for and reject paradoxicals here? If not, OK as- > is. If you need to check, then that's pre-approved as well. Thanks, pushed. On the paradoxical thing: the outer subreg is always non-paradoxical for simplify_truncation, so we don't need to check that specifically. For the inner subreg we need to handle the paradoxical case for the PR. I wondered at one point about punting for non-paradoxical inner subregs, but there didn't seem to be a good reason to keep an expression like (truncate (op (truncate...))) over (op (truncate ...)). If it turns out there is a good reason though, changing SUBREG_P to paradoxical_subreg_p would be fine in terms of this PR. Richard
Re: [PATCH] simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]
On Mon, 2020-01-27 at 16:41 +, Richard Sandiford wrote: > In the gcc.target/aarch64/lsl_asr_sbfiz.c part of this PR, we have: > > Failed to match this instruction: > (set (reg:SI 95) > (ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 97) 0) > (const_int 3 [0x3]) > (const_int 0 [0])) 0) > (const_int 19 [0x13]))) > > If we perform the natural simplification to: > > (set (reg:SI 95) > (ashift:SI (sign_extract:SI (reg:SI 97) > (const_int 3 [0x3]) > (const_int 0 [0])) 0) > (const_int 19 [0x13]))) > > then the pattern matches. And it turns out that we do have a > simplification like that already, but it would only kick in for > extractions from a reg, not a subreg. E.g.: Yea. I ran into similar problems with the extract/extend bits in combine. And we know it's a fairly general problem that we don't handle SUBREGs anywhere near as consistently as REGs. > > (set (reg:SI 95) > (ashift:SI (subreg:SI (sign_extract:DI (reg:DI X) > (const_int 3 [0x3]) > (const_int 0 [0])) 0) > (const_int 19 [0x13]))) > > would simplify to: > > (set (reg:SI 95) > (ashift:SI (sign_extract:SI (subreg:SI (reg:DI X) 0) > (const_int 3 [0x3]) > (const_int 0 [0])) 0) > (const_int 19 [0x13]))) > > IMO the subreg case is even more obviously a simplification > than the bare reg case, since the net effect is to remove > either one or two subregs, rather than simply change the > position of a subreg/truncation. > > However, doing that regressed gcc.dg/tree-ssa/pr64910-2.c > for -m32 on x86_64-linux-gnu, because we could then simplify > a :HI zero_extract to a :QI one. The associated *testqi_ext_3 > pattern did already seem to want to handle QImode extractions: > > "ix86_match_ccmode (insn, CCNOmode) >&& ((TARGET_64BIT && GET_MODE (operands[2]) == DImode) >|| GET_MODE (operands[2]) == SImode >|| GET_MODE (operands[2]) == HImode >|| GET_MODE (operands[2]) == QImode) > > but I'm not sure how often the QI case would trigger in practice, > since the zero_extract mode was restricted to HI and above. I checked > the other x86 patterns and couldn't see any other instances of this. > > Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu, > OK to install? > > Richard > > > 2020-01-27 Richard Sandiford > > gcc/ > PR rtl-optimization/87763 > * simplify-rtx.c (simplify_truncation): Extend sign/zero_extract > simplification to handle subregs as well as bare regs. > * config/i386/i386.md (*testqi_ext_3): Match QI extracts too. Do you need to check for and reject paradoxicals here? If not, OK as- is. If you need to check, then that's pre-approved as well. jeff >